# Welcome to your first python notebook (at least in Comparative Biomechanics)! 

This is the top cell of the notebook. Cells are confined sets of information that can be run together. You can think of them like paragraphs of a coding paper.

Cells can used for mark-up (notes) or code. If you click the cell to select it then hit enter (or double-click the cell), you can edit the contents on the cell. This is called "edit mode". To exit "edit mode", hit the escape key, entering "command mode." With the cell selected in command mode, you can hit the "m" button to convert a cell to mark-up. Hitting the "y" key makes the cell code. Both mark-up and code cells can be run by pressing "shift" + "enter". This will either format the text for a mark-up cell or run the code for the code cell. Try converting this cell back and forth from mark-up to code, and running the cell in both conditions. What happened?

Good python notebooks have ample mark-up cells to help orient the reader of the code. Often you include a short description of the notebook at the top. 
Here we'll be learning how to load in data from a .csv file, define variables, perform arithemetic on arrays of data, and produce a graph.

.

.

.

.

.

.

.

## Step 1: import your packages
There are thousands or pre-coded functions in python available for you to use. They are most often packaged with similar functions into "packages." In order to work with these functions in your notebook, you need to import the package, which loads the functions into your workspace. In this way you can cater your environment to the functions you want to use, which saves time and computer memory

In [None]:
# In python code cells, commented lines start with at "#" and are passed over when the cell is run. 
# You can manually comment a line by adding a #, or you can comment multiple lines by selecting them and hitting:
#      MAC: "command" + "/"
#      Windows: "control" + "/"
# Repeating those keystrokes will uncomment the lines. 
# This can be helpful when troubleshooting your code. If you think some lines are working you can comment them out and try to run the cell.



# Now, let's import some packages...

# the package numpy allows you to work with vectors and arrays of data. 
# Say you want to multiply all 100 data points by 2, you could just do data*2, instead of running through and multiplying each element in a data list and
import numpy as np # when we want to use a numpy function we'll call it by saying np.<function name>

# pandas is a package that allows for storing data in a special dataframe storage. 
# pandas dataframes allow you have to have heading for each array column, among other awesome features.
import pandas as pd

# the package matplotlib has a lot of functionality similar to matlab. here's we'll be using the plotting functions
import matplotlib.pyplot as plt # when we want to use a matplotlib.pyplot function we'll call it by saying plt.<function name>

# the package os allows us to work with the operating system. for example, if we want to change a file name, move files, read or write a file
import os

# the line below controls the "backend" of the matplotlib functions. For our purposes, this informs whether the graphs you plot will stay
# in the notebook or pop out in a new window.
%matplotlib auto

# this is how you can print things to your notebook cell output. it's a great way of checking that your code ran correctly
print('we have imported all the packages!') 


# RUN THIS CELL ("SHIFT" + "ENTER") to import your packages

## Step 2: import your data
We saved your data as a .csv file. Let's import it now

In [None]:
# below we will specify a variable with the pathway to our data.csv file. Pathways are formatted as: Hardrive:\Folder\Folder\Folder\File.extension
# On Windows, the slashes are backslashes as shown. On Macs, the slashes are forward slashes "/". 
# Things become tricky if any of the folder or file names have spaces. Try to avoid this!


# This python notebook is likely in the same folder as your CrabScaling.csv file. 
# Therefore, instead of typing out the whole long pathway, we can use the "." shortcut. 
# "." stands for the current working directory. You can see it by looking in the left panel. 
# Therefore, you could probably use ".\CrabScaling.csv"
csv_path = "type_something_here.csv"


# use pandas package to read in the .csv
# this creates a dataframe, which is a way of organizing different types of data. 
# Say you have a dataset of individual names, ages, and heights. You would need to store both strings, integers, and decimal place (floats).
# A dataframe can do this. Plus it has lots of special powers (inserting new data, sorting, applying functions)
data = pd.read_csv(csv_path);

print('read in the data .csv')

In [None]:
# for accessing the 4th row of the dataframe, you would type data.iloc[3]. 
# BUT WAIT!!! Why 3 and not 4? In python, the indexing starts at 0. So the first row is the 0th, the 2nd row is the 1st, etc.
# This may seem unintuitive at first but is actually much more elegant. I won't go into the reasons here, but you will need to know that indexing starts at 0
print(data.iloc[0])
# The print output shows all of the column headers and values for the 4th row of data


In [None]:
# for accessing the data column named X you would type data.X or data["X"]. Let's print the column for "Section":
print(data.Section)
print(data["Section"])
# it will list the row numbers and values. 
# It also lists what type of data it is (dtype), which could be integers (ints), numbers with decimal places (floats), strings (objects), among others


## Step 3: work with your data
Let's plot what we got, make it pretty, and manipulate it

In [None]:
# We're interested in how crab carapace width and length scaled with body mass. Well, let's plot it to find out!

# pandas dataframes work nicely with matplotlib to make plotting commands very intuitive.
plt.close('all') # when you plot something, a plot window will pop open until you close it. let's close any/all open plot windows.

# plot a scatter plot of width vs. mass
ax = data.plot(kind='scatter',x='Mass (g)',y='Width (mm)', color='black',label = "Width")

# plot a scatter plot of length vs. width but use the same window/axes as defined when we plotted width
data.plot(kind='scatter',x='Mass (g)',y='Length (mm)', color='green', ax = ax, label = "Length")

# since we're plotting length and width on the same axes, we need to label the y-axis as just millimeters
plt.ylabel('millimeters') 

# it's best practice to include 0 for your x- and y-axes
ax.set_ylim(ymin=0)
ax.set_xlim(xmin=0)

# don't forget a title!
plt.title("type your title here")

plt.show() # show the plot

In [None]:
# Say we wanted to plot a linear regression of width vs. mass. 
# We can calculate the best fit line for data using the polyfit function in numpy. 
# It reduces the distance of the datapoints from the line, outputing a best fit line with the number of degrees specified.
# degree = 1 --> y = a*x + b = linear regression
# degree = 2 --> y = a*x^2 + b*x + c

d = np.polyfit( data["Mass (g)"], data["Width (mm)"], 1) # This finds the best fit of width vs. mass with degree 1
f = np.poly1d(d) # the function f now takes an input x and output y using the coefficients found and stored in d
print(d)
print (f)

# Let's now plot the line on our plot using he matplotlib function "plot"
# We'll input the 
    # x-values, 
    # y-values (generated using the f function of our x-value), and 
    # how we'd like to plot: '-' = with lines connecting points, 'k' = black
plt.plot(data["Mass (g)"], f(data["Mass (g)"]), '-k')
# Not sure what I'm doing here? Read this: https://matplotlib.org/stable/tutorials/introductory/pyplot.html

# Let's also put the linear regression info on the plot, so we know the slope and intercept values
plt.text(1, np.median(data["Width (mm)"]), '%0.2f * x + %0.2f'%(d[0],d[1]));
# Not sure what I'm doing here? Watch this: https://realpython.com/lessons/formatting/

.

.

.

## I've walked you though the basics, now it's your turn to work through some things on your own! 
## If (when) you get stuck, google your question -- it's what all coders do! The website stackoverflow is very helpful.
.

.

.


#### Let's plot all data that you did not collect in black, and your group's data in red.

First, we need to find just your group's data and the class's data.

You can create a subset of a dataframe based on the values in a given column using: data[data.Section=="A"].
This would create a dataframe of just data collected in the A section of BIO391. 

The square brackets are used to show that you are indexing (pulling info) from a variable, in this case the datafame "data". Remember from above, that data.Section pulls out all the values in the "Section" column.

The "==" is an example of a conditional statement. We only want rows of data where the "Section" value is the same as "A". 
Other conditionals include ">" (greater than), "<" (lesser than), ">=" (greater than or equal to), "<=" (lesser than or equal to), and "!=" (not equal to).

You can index a subset of a dataframe based on more than one conditional statement using the logic "&" (and) or "|" (or) operands. 
Say you wanted all rows with values greater than 2 but less than 7. This is the same as saying all values greater than 2 AND less than 7, and could be coded as (data.value > 2) & (data.value < 7). Say you wanted all values greater than 2, but not equal to 5: (data.value > 2) & (data.value != 5). 

This also works for conditions in different columns. For example, rows when column value1 values are greater than 2 or column value2 values are equal to 3: (data.value1 > 2) or (data.value2 == 3)


In [None]:
# Based on the above info, create new subsets of the data dataframe.

your_section = "" # What section number are you in?
your_group = "" # What group number were you in?

group_data = "type your code here" # data from just your group
class_data = "type your code here" # data from the class, not including your group's data

In [None]:
# Now, based on the example plotting code above, plot the height vs. mass and width vs. mass data for the class (in black) and your group (in a color of your choosing)

plt.close('all') 


# PUT YOUR CODE HERE












plt.show() # show the plot

# make sure that your graph has correct axes, labels, and a title. You'll be submitting a photo of your graph for your Synchronous Assignment

#### Let's find the scaling relationship between height and width vs. mass

Remember that scaling relationships are exponential: y = a * x^b with scaling constant b

To find the scaling constant we take the log of each side of the equation. 

The log of a number is equivalent to finding the exponent needed to get that number from 10 or another "base".
Sometimes pi (3.14...) is used as the base when taking the "natural log" or "ln".

For the base of 10, 

    10^0 = 1 --> log(1) = 0
    10^1 = 10 --> log(10) = 1
    10^2 = 100 --> log(100) = 2
    etc.
    

Log functions have special properties, including 

    log(a*b) = log(a) + log(b) 
    log(a/b) = log(a) - log(b)
    log(a^b) = b*log(a)

Applying this to our equation above y = a * x^b:

    log(y) = log(a) + log(x^b)
    log(y) = log(a) + b*log(x)
    
Therefore, to find the scaling coefficient, b, we can take the log of both x and y then find the linear regression of log(y) vs. log(x). b will be the slope of the linear regression.

In [None]:
# First let's create new columns in our dataframe for the log values

data["log(Mass)"] = [] # google how to do this!
data["log(Width)"] = []
data["log(Length)"] = []


In [None]:
# Now plot log(Width) and log(Length) vs. log(Mass)







In [None]:
# Find the linear regression for the above plots. Add them to the plots






# make sure that your graph has correct axes, labels, and an informative/specific title. 
# There should be at least one color on your graph--have fun, but be practical (yellow is hard to see, data of all one type should be the same color, etc.)
# The linear regressions should be labeled on the graph.
# You'll be submitting a photo of your graph for your Synchronous Assignment

## If you are a BME student, complete the section below

In [None]:
# Make two graphs, one for Width and the other for Length, plotting and finding the scaling coefficients for male and female crabs separately.
# In other words, males and females should be plot on the same graph with different colors or marker types (dots: ".", circles: "o", stars: "*", etc.)









# make sure that your graphs have correct axes, labels, and an informative/specific title. 
# There should be at least one customization--a color or marker type
# The linear regressions should be labeled on the graph.
# You'll be submitting photos of your graphs for your Synchronous Assignment