<a href="https://colab.research.google.com/github/pyclub-cu/classes/blob/master/Week_6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Using pandas to work with data**

# This week's goals:


- Learn about CTDs

- What are python packages? (review)

- Open   some   data   using the python "pandas" package 

- Find   maximum   and   minimum 

- Find   mean 

- Make   a   new   column (maybe better in next lesson where Laura calculates density)

- Review   best   coding   practices 

In [2]:
from IPython.display import Image
from IPython.core.display import HTML 
Image(url= "https://media.giphy.com/media/QYwMxfDpoH3VBfPEET/giphy.gif")

#Ice breaker

What show is this gif from? \
What does this seal below have in common with an Argo float?

In [2]:
Image(url= "https://static.skepticalscience.com/pics/Weddell_Seal_DanCosta.jpg")

#CTDs!! What are they?
Arguably the most important instrument package in oceanography. CTD stands for: 
- **C**onductivity (as in electrical conductivity... which we use to measure salinity! Salts are ionic compounds, meaning they carry a charge that we can quantify)
- **T**emperature 
- **D**epth (as calculated from measurements by a pressure sensor! Pressure increases approximately 10 decibars every 10 meters you go down from the surface, so if you have a pressure measurements, you have a depth)

As a professor in our department likes to say, CTDs are the bread and butter of oceanography.

This is because **temperature and salinity are a fundamental way to understand what's happening in the ocean**. Where are water masses coming and going? How are they being mixed together? What kind of microorganisms can live here? How much Co2 can this water hold? 

Even ocean modellers who never set food on a research vessel rely on CTD measurements to run and check their models.

*Today we'll look at CTD data taken by a seal in Antarctica!* 



Before we work with CTD data though...

#How do we import data files into python?

One very useful tool for doing so is "pandas".

In [3]:
Image(url= "https://media.giphy.com/media/z6xE1olZ5YP4I/giphy.gif")

Pandas is a python package, which we learned last week is a library of functions packaged together to help with a specific aspect of data analysis - a toolbox of sorts. The pandas package is a toolbox for viewing and perfoming calculations on tabular data.

Tabular data is just data that comes in rows and columns (think Microsoft Excel). This is how instruments output the data they have acquired, so as an oceanographer, you will be working with tabular data often! You'll see it come in file formats such as *.csv* and *.txt*. 

##Step 1: Look at our data

This is CTD data collected by a seal in Antarctica! 
https://raw.githubusercontent.com/pyclub-cu/classes/master/data/ct4-9908-04_ODV.csv

What kind of data do we have? \
How is this data delimited?

##Step 2: Import the pandas package so we can use it to open our data in python

We import packages into our python workspace with the following syntax:


In [None]:
import name of package 

Using this syntax, import the pandas package below

In [11]:
#your code here

Woohoo! Now we can use any and all of the functions that make up the pandas library : )



##Step 3: Use panda's "read_csv" function to import and view our .csv file

In python, we call functions like this:

In [None]:
pandas.read_csv('filename.csv') #the text inside parenthesis of a function is the inputs
#here we are inputting the name of the tabular data file we want to look at

Following the syntax above, try import our CTD data file: 


https://raw.githubusercontent.com/pyclub-cu/classes/master/data/ct4-9908-04_ODV.csv

Make sure the filename is a string, ie. in 'commas'

In [None]:
#your code here

We did it! But... looks a little weird, huh. All smooshed together. How can we fix this?

The "read_csv" function can take more inputs than just the file name, including things that tell it how the data file is formatted. For a full list of possible inputs into a function, type it out followed by a question mark.

Execute the cell below. What do you see?

In [15]:
pandas.read_csv?

To better read in our data file, we are going to tell the function two things:

- The "header line" is the 2nd line of the file
- The data are delimited by white space 

Note the extra function inputs in the cell below and execute!

In [31]:
pandas.read_csv('https://raw.githubusercontent.com/pyclub-cu/classes/master/data/ct4-9908-04_ODV.csv', header = 1, delim_whitespace=True)

Unnamed: 0,Cruise,Station,Type,mon/day/yr,hh:mm,Longitude,Latitude,Depth,QF,Temperature,QF.1,Salinity,QF.2
0,ct4-9908-04,1,C,06/11/2004,08:42,-122.899,37.203,5.0,0,11.8270,0,33.2968,0.0
1,ct4-9908-04,1,C,06/11/2004,08:42,-122.899,37.203,6.0,0,11.7647,0,33.3088,0.0
2,ct4-9908-04,1,C,06/11/2004,08:42,-122.899,37.203,7.0,0,11.7024,0,33.3208,0.0
3,ct4-9908-04,1,C,06/11/2004,08:42,-122.899,37.203,8.0,0,11.6401,0,33.3329,0.0
4,ct4-9908-04,1,C,06/11/2004,08:42,-122.899,37.203,9.0,0,11.5778,0,33.3449,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
111574,ct4-9908-04,260,C,09/25/2004,06:44,179.470,43.975,65.0,0,5.7521,0,1.0000,
111575,ct4-9908-04,260,C,09/25/2004,06:44,179.470,43.975,66.0,0,5.6333,0,1.0000,
111576,ct4-9908-04,260,C,09/25/2004,06:44,179.470,43.975,67.0,0,5.5146,0,1.0000,
111577,ct4-9908-04,260,C,09/25/2004,06:44,179.470,43.975,68.0,0,5.3959,0,1.0000,


Phew, looks much better : ) This table a pandas "dataframe". Dataframes are python objects, just like strings and integers are objects.

#Now... let's play with the data!

Just like other variables we've worked with, we want to give this data frame a name. Let's call it seal_data.

In [32]:
seal_data = pandas.read_csv('https://raw.githubusercontent.com/pyclub-cu/classes/master/data/ct4-9908-04_ODV.csv', header = 1, delim_whitespace=True)

Let's say we only want to focus on one of the variables for now - salinity. How do we that? 

There are two ways to index dataframe variables.

In [36]:
seal_data.Salinity #using dot syntax

0         33.2968
1         33.3088
2         33.3208
3         33.3329
4         33.3449
           ...   
111574     1.0000
111575     1.0000
111576     1.0000
111577     1.0000
111578     1.0000
Name: Salinity, Length: 111579, dtype: float64

In [35]:
seal_data['Salinity'] #using brackets 

0         33.2968
1         33.3088
2         33.3208
3         33.3329
4         33.3449
           ...   
111574     1.0000
111575     1.0000
111576     1.0000
111577     1.0000
111578     1.0000
Name: Salinity, Length: 111579, dtype: float64

Try using either the dot or bracket syntax to extract Temperature!

In [None]:
#your code here

Notice that depth, temperature, and salinity columns are followed by columns called "QF".

What could that be... (hint: think back to Spencer's lesson!)

Let's remove the bad salinity data!

In [51]:
salinity = seal_data.Salinity.where(seal_data['QF.2'].notna() == True)
salinity
                                  

0         33.2968
1         33.3088
2         33.3208
3         33.3329
4         33.3449
           ...   
111574        NaN
111575        NaN
111576        NaN
111577        NaN
111578        NaN
Name: Salinity, Length: 111579, dtype: float64

What is the minimum, maximum, and mean salinity that this seal has measured?

In [59]:
print('The minimum salinity is ' + str(salinity.min()))
print('The maximum salinity is ' + str(salinity.max()))
print('The mean salinity is ' + str(salinity.mean()))

The minimum salinity is31.9908
The maximum salinity is34.2432
The mean salinity is33.76876016047914


#Let's take a breather. Any questions so far? : )

In [60]:
Image(url='https://cdn.the-scientist.com/assets/articleNo/32598/iImg/6278/e58dd2a0-02b2-4052-9508-4a0145c6f7a4-notebook1.jpg')

Now, try finding the minimum and maximum temperatures yourself. Remember to first create your temperature object, the same way we created the salinity object!

In [None]:
#your code here

How about the deepest depth this seal has dived? 

In [None]:
#your code here