<a href="https://colab.research.google.com/github/pyclub-cu/classes/blob/master/Week_6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Week 6: Using Pandas with real live data!**

Last week we learned about python packages, and specifically, the Pandas package. This week we'll continue working with Pandas to open and analyze some oceanographic data.

**Learning objectives**
* Learn about CTD data (10 mins)
* Review packages and Pandas (xx mins)
* Open a data file using Pandas (xx mins)
* Calculate statistics on data (xx mins)

### **Icebreaker!**

![Sokka breaking ice](https://media.giphy.com/media/QYwMxfDpoH3VBfPEET/giphy.gif)

> **Question: What does this seal below have in common with an Argo float?** \
Think back to Week 4... what are Argo floats and what do they do?
>
> <img src="https://static.skepticalscience.com/pics/Weddell_Seal_DanCosta.jpg" width="420" height="300" />



#CTDs!! What are they?
Arguably the most important instrument package in oceanography. CTD stands for: 
- **C**onductivity (as in electrical conductivity... which we use to measure salinity! Salts are ionic compounds, meaning they carry a charge that we can quantify)
- **T**emperature 
- **D**epth (as calculated from measurements by a pressure sensor! Pressure increases approximately 10 decibars every 10 meters you go down from the surface, so if you have a pressure measurements, you have a depth)

Seal CTD            |  CTD-Rosette | Argo Float CTD
:-------------------------:|:-------------------------:|:-------------------------:
<img src="https://static.skepticalscience.com/pics/Weddell_Seal_DanCosta.jpg" width="200" height="150" />  |  <img src="https://southernoceanscience.files.wordpress.com/2016/04/img_9608.jpg" width="200" height="150" /> |  <img src="https://www.mbari.org/wp-content/uploads/2020/10/soccom-float-carry-640.jpg" width="200" height="150" />


**Temperature and salinity are a fundamental way to understand what's happening in the ocean**. How does ocean water move around the Earth?  What kind of organisms can live here? How much Co2 can this water hold? What does the ocean do with excess heat from a warming planet? 


*Today we'll use Pandas to look at CTD data taken by a seal in Antarctica! Let's review what we learned last week about packages & Pandas.* 



#Review: Packages and Pandas
* Python packages are sets of commands packaged together to help with a specific aspect of data analysis 
  * Think of them like toolboxes

<img src='https://drive.google.com/uc?id=1QH1Jt2iG0ZiBAH99FQSlPm1weSv7St27' width="520" height="300" />


* The pandas package is a toolbox for viewing and perfoming calculations on data in tables
  
  <img src='https://drive.google.com/uc?id=1ABTetjG6IPdyGKcS-n0OIVRejY6YVffR' width="520" height="300" />

  > **Remember!** Data types in python are called `objects`. Tables that we work with in pandas are objects called `dataframes`







Let's quickly revisit the example from last week:

First, how do we `import` pandas?


In [14]:
#Let's all type it together


In [15]:
ocean_basins = ['Arctic', 'Atlantic', 'Indian', 'Pacific', 'Southern'] #What kind of object is this?
avg_salinity = [32, 35, 34.5, 35, 34.7] #What kind of object is this?
avg_temp = [-1.8, 14, 22, 20, 4] #What kind of object is this?

avg_data = {'avg_salinity': avg_salinity, #What kind of object is this?
        'avg_temp': avg_temp}


df = pd.DataFrame(data=avg_data, index=ocean_basins) #What kind of object is this?

In [None]:
df

### Any questions about pandas and dataframes before we continue?


  <img src='https://media.giphy.com/media/z6xE1olZ5YP4I/giphy.gif' width="300" height="200" />


# Import a data file (.csv, .ascii, .txt, etc.) using pandas

We created the dataframe above using data lists we typed out. But how do we import data from outside of python, such as a file from a CTD?

##Step 1: Look at our data

Collected by our cute friend in Antarctica!

<img src="https://static.skepticalscience.com/pics/Weddell_Seal_DanCosta.jpg" width="200" height="130" />

Click the link and take a look. \
https://raw.githubusercontent.com/pyclub-cu/classes/master/data/ct4-9908-04_ODV_trimmed.csv

What kind of data do we have? \
How is this data *delimited*?

##Step 2: Import the pandas package so we can use it to open our data in python

>**Reminder:** `import` *nameofpackage* `as` *nickname* 


In [None]:
#your code here

##Step 3: Use panda's `.read_csv()` command to import and view our *.csv* file

This is like opening a file in excel so that you can work with the data inside!


In [None]:
nameofdataframe = pd.read_csv('path/filename.csv') 


Following the syntax above, try import our CTD data file: 


https://raw.githubusercontent.com/pyclub-cu/classes/master/data/ct4-9908-04_ODV_trimmed.csv

Make sure the filename is a string, ie. in single (') or double (") quotation marks

In [18]:
#your code here
seal_data = pd.read_csv('https://raw.githubusercontent.com/pyclub-cu/classes/master/data/ct4-9908-04_ODV_trimmed.csv')
seal_data

Unnamed: 0,// created: 08-Apr-2018 09:14:31,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6
0,mon/day/yr,hh:mm,Longitude,Latitude,Depth,Temperature,Salinity
1,6/11/2004,8:42,-122.899,37.203,5,11.827,33.2968
2,6/11/2004,8:42,-122.899,37.203,6,11.7647,33.3088
3,6/11/2004,8:42,-122.899,37.203,7,11.7024,33.3208
4,6/11/2004,8:42,-122.899,37.203,8,11.6401,33.3329
...,...,...,...,...,...,...,...
111575,9/25/2004,6:44,179.47,43.975,65,5.7521,
111576,9/25/2004,6:44,179.47,43.975,66,5.6333,
111577,9/25/2004,6:44,179.47,43.975,67,5.5146,
111578,9/25/2004,6:44,179.47,43.975,68,5.3959,


We did it! But what's up with the header line?

The `.read_csv()` command can take more inputs than just the file name, including things that tell it how the data file is formatted. For a full list of possible inputs into a function, type it out followed by a question mark.

Execute the cell below. What do you see?

In [21]:
pd.read_csv?

To better read in our data file, we are going to tell the command that `header = 1`, or in plain english, the column names are on the 2nd line of the data file.  


In [35]:
seal_data = pd.read_csv('https://raw.githubusercontent.com/pyclub-cu/classes/master/data/ct4-9908-04_ODV_trimmed.csv', 
                header = 1)
seal_data

Unnamed: 0,mon/day/yr,hh:mm,Longitude,Latitude,Depth,Temperature,Salinity
0,6/11/2004,8:42,-122.899,37.203,5,11.8270,33.2968
1,6/11/2004,8:42,-122.899,37.203,6,11.7647,33.3088
2,6/11/2004,8:42,-122.899,37.203,7,11.7024,33.3208
3,6/11/2004,8:42,-122.899,37.203,8,11.6401,33.3329
4,6/11/2004,8:42,-122.899,37.203,9,11.5778,33.3449
...,...,...,...,...,...,...,...
111574,9/25/2004,6:44,179.470,43.975,65,5.7521,
111575,9/25/2004,6:44,179.470,43.975,66,5.6333,
111576,9/25/2004,6:44,179.470,43.975,67,5.5146,
111577,9/25/2004,6:44,179.470,43.975,68,5.3959,




### Let's take a breather. Any questions so far? : ) 
  <img src='
https://cdn.the-scientist.com/assets/articleNo/32598/iImg/6278/e58dd2a0-02b2-4052-9508-4a0145c6f7a4-notebook1.jpg' width="520" height="300" />


#Time for data analysis!



Let's focus on only one of the variables for now - salinity. How do we that? 

There are two ways to *index* dataframe variables.

> **Remember**! *Indexing* a data object means to retrieve a specific chunk of data. In Battleship, you *index* the playing board by saying "B6", which means row B, column 6.  
<img src='
https://www.videoamusement.com/wp-content/uploads/2019/01/Giant-Battleship-Game-for-rent.jpg' width="320" height="300" />
 

In [39]:
salinity = seal_data.Salinity #using dot syntax

0         33.2968
1         33.3088
2         33.3208
3         33.3329
4         33.3449
           ...   
111574        NaN
111575        NaN
111576        NaN
111577        NaN
111578        NaN
Name: Salinity, Length: 111579, dtype: float64

In [43]:
salinity = seal_data['Salinity'] #using brackets 

> **Question:** What is the data frame `object` and what is the data array `object` here? 

> **Try it!** Use either the dot or bracket syntax to extract Temperature. Don't forget to assign your data array a name!

In [41]:
#your code here


###Great! Now we have our salinity and temperature data arrays.###
What is the minimum, maximum, and mean salinity that this seal has measured?

In [44]:
salinity.min() #.max(), .mean()

31.9908

In [45]:
print('The minimum salinity is ' + str(salinity.min()))
print('The maximum salinity is ' + str(salinity.max()))
print('The mean salinity is ' + str(salinity.mean()))

The minimum salinity is 31.9908
The maximum salinity is 34.2432
The mean salinity is 33.76876016047914


Do these values seem reasonable? Do we need to *Quality Control?* (Week 4)

> **Try it!** Find the minimum, maximum, and mean temperature that this seal has measured. 

In [None]:
#your code here

>**Extra credit:** How about the deepest depth this seal has dived? 

In [None]:
#your code here