<a href="https://colab.research.google.com/github/pyclub-cu/classes/blob/master/Week_6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Week 6: Using Pandas with real live data!**

Last week we learned about python packages, and specifically, the Pandas package. This week we'll continue working with Pandas to open and analyze some oceanographic data.

**Learning objectives**
* Learn about CTD data (10 mins)
* Review packages and Pandas (xx mins)
* Open a data file using Pandas (xx mins)
* Calculate statistics on data (xx mins)

### **Icebreaker!**

![Sokka breaking ice](https://media.giphy.com/media/QYwMxfDpoH3VBfPEET/giphy.gif)

> **Question: What does this seal below have in common with an Argo float?** \
Think back to Week 4... what are Argo floats and what do they do?
>
> <img src="https://static.skepticalscience.com/pics/Weddell_Seal_DanCosta.jpg" width="420" height="300" />



#CTDs!! What are they?
Arguably the most important instrument package in oceanography. CTD stands for: 
- **C**onductivity (as in electrical conductivity... which we use to measure salinity! Salts are ionic compounds, meaning they carry a charge that we can quantify)
- **T**emperature 
- **D**epth (as calculated from measurements by a pressure sensor! Pressure increases approximately 10 decibars every 10 meters you go down from the surface, so if you have a pressure measurements, you have a depth)

Seal CTD            |  CTD-Rosette | Argo Float CTD
:-------------------------:|:-------------------------:|:-------------------------:
<img src="https://static.skepticalscience.com/pics/Weddell_Seal_DanCosta.jpg" width="200" height="150" />  |  <img src="https://southernoceanscience.files.wordpress.com/2016/04/img_9608.jpg" width="200" height="150" /> |  <img src="https://www.mbari.org/wp-content/uploads/2020/10/soccom-float-carry-640.jpg" width="200" height="150" />


**Temperature and salinity are a fundamental way to understand what's happening in the ocean**. How does ocean water move around the Earth?  What kind of organisms can live here? How much Co2 can this water hold? What does the ocean do with excess heat from a warming planet? 


*Today we'll use Pandas to look at CTD data taken by a seal in Antarctica! Let's review what we learned last week about packages & Pandas.* 



#Review: Packages and Pandas
* Python packages are sets of commands packaged together to help with a specific aspect of data analysis 
  * Think of them like toolboxes

<img src='https://drive.google.com/uc?id=1QH1Jt2iG0ZiBAH99FQSlPm1weSv7St27' width="520" height="300" />


* The pandas package is a toolbox for viewing and perfoming calculations on data in tables
  
  <img src='https://drive.google.com/uc?id=1ABTetjG6IPdyGKcS-n0OIVRejY6YVffR' width="520" height="300" />

  > **Remember!** Data types in python are called `objects`. Tables that we work with in pandas are objects called `dataframes`







Let's quickly revisit the example from last week:

First, how do we `import` pandas?


In [14]:
#Let's all type it together


In [15]:
ocean_basins = ['Arctic', 'Atlantic', 'Indian', 'Pacific', 'Southern'] #What kind of object is this?
avg_salinity = [32, 35, 34.5, 35, 34.7] #What kind of object is this?
avg_temp = [-1.8, 14, 22, 20, 4] #What kind of object is this?

avg_data = {'avg_salinity': avg_salinity, #What kind of object is this?
        'avg_temp': avg_temp}


df = pd.DataFrame(data=avg_data, index=ocean_basins) #What kind of object is this?

In [None]:
df

### Any questions about pandas and dataframes before we continue?


  <img src='https://media.giphy.com/media/z6xE1olZ5YP4I/giphy.gif' width="300" height="200" />


# Import a data file (.csv, .ascii, .txt, etc.) using pandas

We created the dataframe above using data lists we typed out. But how do we import data from outside of python, such as a file from a CTD?

##Step 1: Look at our data

Collected by our cute friend in Antarctica!

<img src="https://static.skepticalscience.com/pics/Weddell_Seal_DanCosta.jpg" width="200" height="130" />

Click the link and take a look. \
https://raw.githubusercontent.com/pyclub-cu/classes/master/data/ct4-9908-04_ODV_trimmed.csv

What kind of data do we have? \
How is this data *delimited*?

##Step 2: Import the pandas package so we can use it to open our data in python

>**Reminder:** `import` *nameofpackage* `as` *nickname* 


In [None]:
#your code here

##Step 3: Use panda's `.read_csv()` command to import and view our *.csv* file

This is like opening a file in excel so that you can work with the data inside!

In [None]:
pandas.read_csv('path/filename.csv') #we put inputs to the command inside of the parentheses
#here, the input is name and path of the file we want to open

Following the syntax above, try import our CTD data file: 


https://raw.githubusercontent.com/pyclub-cu/classes/master/data/ct4-9908-04_ODV_trimmed.csv

Make sure the filename is a string, ie. in single (') or double (") quotation marks

In [18]:
#your code here
pd.read_csv('https://raw.githubusercontent.com/pyclub-cu/classes/master/data/ct4-9908-04_ODV_trimmed.csv')

Unnamed: 0,// created: 08-Apr-2018 09:14:31,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6
0,mon/day/yr,hh:mm,Longitude,Latitude,Depth,Temperature,Salinity
1,6/11/2004,8:42,-122.899,37.203,5,11.827,33.2968
2,6/11/2004,8:42,-122.899,37.203,6,11.7647,33.3088
3,6/11/2004,8:42,-122.899,37.203,7,11.7024,33.3208
4,6/11/2004,8:42,-122.899,37.203,8,11.6401,33.3329
...,...,...,...,...,...,...,...
111575,9/25/2004,6:44,179.47,43.975,65,5.7521,
111576,9/25/2004,6:44,179.47,43.975,66,5.6333,
111577,9/25/2004,6:44,179.47,43.975,67,5.5146,
111578,9/25/2004,6:44,179.47,43.975,68,5.3959,


We did it! But... looks a little weird, huh. All smooshed together. How can we fix this?

The "read_csv" function can take more inputs than just the file name, including things that tell it how the data file is formatted. For a full list of possible inputs into a function, type it out followed by a question mark.

Execute the cell below. What do you see?

In [None]:
pandas.read_csv?

To better read in our data file, we are going to tell the function two things:

- The "header line" is the 2nd line of the file
- The data are delimited by white space 

Note the extra function inputs in the cell below and execute!

In [None]:
pandas.read_csv('https://raw.githubusercontent.com/pyclub-cu/classes/master/data/ct4-9908-04_ODV.csv', 
                header = 1, delim_whitespace=True)

Unnamed: 0,Cruise,Station,Type,mon/day/yr,hh:mm,Longitude,Latitude,Depth,QF,Temperature,QF.1,Salinity,QF.2
0,ct4-9908-04,1,C,06/11/2004,08:42,-122.899,37.203,5.0,0,11.8270,0,33.2968,0.0
1,ct4-9908-04,1,C,06/11/2004,08:42,-122.899,37.203,6.0,0,11.7647,0,33.3088,0.0
2,ct4-9908-04,1,C,06/11/2004,08:42,-122.899,37.203,7.0,0,11.7024,0,33.3208,0.0
3,ct4-9908-04,1,C,06/11/2004,08:42,-122.899,37.203,8.0,0,11.6401,0,33.3329,0.0
4,ct4-9908-04,1,C,06/11/2004,08:42,-122.899,37.203,9.0,0,11.5778,0,33.3449,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
111574,ct4-9908-04,260,C,09/25/2004,06:44,179.470,43.975,65.0,0,5.7521,0,1.0000,
111575,ct4-9908-04,260,C,09/25/2004,06:44,179.470,43.975,66.0,0,5.6333,0,1.0000,
111576,ct4-9908-04,260,C,09/25/2004,06:44,179.470,43.975,67.0,0,5.5146,0,1.0000,
111577,ct4-9908-04,260,C,09/25/2004,06:44,179.470,43.975,68.0,0,5.3959,0,1.0000,


Phew, looks much better : ) This table a pandas "dataframe". Dataframes are python objects, just like strings and integers are objects.

#Now... let's play with the data!

Just like other variables we've worked with, we want to give this data frame a name. Let's call it seal_data.

In [None]:
seal_data = pandas.read_csv('https://raw.githubusercontent.com/pyclub-cu/classes/master/data/ct4-9908-04_ODV.csv', 
                            header = 1, delim_whitespace=True)

Let's say we only want to focus on one of the variables for now - salinity. How do we that? 

There are two ways to index dataframe variables.

In [None]:
seal_data.Salinity #using dot syntax

0         33.2968
1         33.3088
2         33.3208
3         33.3329
4         33.3449
           ...   
111574     1.0000
111575     1.0000
111576     1.0000
111577     1.0000
111578     1.0000
Name: Salinity, Length: 111579, dtype: float64

In [None]:
seal_data['Salinity'] #using brackets 

0         33.2968
1         33.3088
2         33.3208
3         33.3329
4         33.3449
           ...   
111574     1.0000
111575     1.0000
111576     1.0000
111577     1.0000
111578     1.0000
Name: Salinity, Length: 111579, dtype: float64

Try using either the dot or bracket syntax to extract Temperature!

In [None]:
#your code here
seal_data.Temperature

0         11.8270
1         11.7647
2         11.7024
3         11.6401
4         11.5778
           ...   
111574     5.7521
111575     5.6333
111576     5.5146
111577     5.3959
111578     5.2772
Name: Temperature, Length: 111579, dtype: float64

Notice that depth, temperature, and salinity columns are followed by columns called "QF".

What could that be... (hint: think back to Spencer's lesson!)

*** Skip QF and just remove salinity data fresher than some threshold (like Spencers lesson with temp)

Also remove extraneous columns from data file

Let's remove the bad salinity data!

In [None]:
salinity = seal_data.Salinity.where(seal_data['QF.2'].notna() == True)
salinity
                                  

0         33.2968
1         33.3088
2         33.3208
3         33.3329
4         33.3449
           ...   
111574        NaN
111575        NaN
111576        NaN
111577        NaN
111578        NaN
Name: Salinity, Length: 111579, dtype: float64

What is the minimum, maximum, and mean salinity that this seal has measured?

Work with temperature instead

In [None]:
salinity.min() 

NameError: ignored

In [None]:
print('The minimum salinity is ' + str(salinity.min()))
print('The maximum salinity is ' + str(salinity.max()))
print('The mean salinity is ' + str(salinity.mean()))

The minimum salinity is31.9908
The maximum salinity is34.2432
The mean salinity is33.76876016047914


#Let's take a breather. Any questions so far? : )

In [None]:
Image(url='https://cdn.the-scientist.com/assets/articleNo/32598/iImg/6278/e58dd2a0-02b2-4052-9508-4a0145c6f7a4-notebook1.jpg')

Now, try finding the minimum and maximum temperatures yourself. Remember to first create your temperature object, the same way we created the salinity object!

In [None]:
#your code here

How about the deepest depth this seal has dived? 

In [None]:
#your code here