<a href="https://colab.research.google.com/github/pyclub-cu/classes/blob/master/Week_5_Salinity_Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Week 5: Ocean Salinity, Packages and Pandas**

**Learning Goals**
- Learn about ocean salinity (10 min)
- What are Python packages? (5 min)
- `import` essential packages (5 min)
- Let's play with `pandas`! (20 min)

### **Icebreaker!**
> **How many oceans are there?** 🤔...

##### Sort of a trick question...

![Five Oceans](https://wakeuptodigital.files.wordpress.com/2019/02/oceans.jpg)

> In the field of oceanography there are five major ocean basins on this planet: Arctic, Atlantic, Indian, Pacific and Southern Ocean. 
>
> Oceanographers go even further and study the different kinds of "*water masses*" in the basins, and their influence on the earth's system. 

> **The way we define water masses is generally through the water's temperature and salinty profile.** (To be continued in Week 8...)

# **Recap of Logical Statements**



In [None]:
#Boolean 
indian_ocean = 30 #˚C
atlantic_ocean = 28 #˚C

indian_ocean == atlantic_ocean

In [None]:
# in statement
oceans = ['Arctic','Atlantic','Pacific','Indian']

In [None]:
#check to see if the Southern Ocean is included in our `oceans` variable
'Southern' in oceans

In [None]:
# if statement

temp1 = 30 #˚C
temp2 = 0 #˚C

if (temp1 != temp2):
  print('temp1 does not equal temp2')

> **We measure ocean temperature and depth with argo floats. What else do you think an Argo float measures in the water?**


# **Ocean Salinity**

### Salt vs Salinity

> **Salts** are compounds like sodium chloride, magnesium sulfate, potassium nitrate, and sodium bicarbonate which dissolve into ions.

> **Salinity** is the quantity of dissolved salt content of the water. It is measured as a mass fraction, or the ratio of dissolved salts (g) to a unit mass of water (kg). Often the unit for salinity is expressed as "practical salinity unit" (psu= $\frac{g}{kg}$).

<img width=800 src="https://smap.jpl.nasa.gov/system/news_items/main_images/1265_SMAP_salinity.jpg">

> **Based on the figure above...**
>
> **1) Which ocean basin is the saltiest/freshest?**
>
> **2) What is the average salinity of the ocean?**

# **Python Packages**

[Python Packages Index (PyPi)](https://pypi.org/)

<img src='https://drive.google.com/uc?id=1nNDhReHl4IfoqLcrZR-40I2sVdZNqxvY' width="520" height="300" />

* A module is a file containing Python code. 
* A package, however, is like a directory that holds sub-packages and modules. A package must hold the file **`__init__`.**py. This does not apply to modules.
* Packages are a way of structuring Python’s module namespace by using “dotted module names” (e.g `A.B` --> `plt.plot()`.

#### **Essential Python Packages:**

- `numpy`
- `pandas`
- `matplotlib`

> **How do we get packages into our notebooks?**
>
> We `import` them!

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline 
#so that you can see the plot load right under the cell you're running

# **Pandas**

<img width="400" src='https://miro.medium.com/max/1400/1*KdxlBR9P3mDp9JZ_URMdYQ.jpeg'>

>No but seriously, `pandas` is a Python library that allows for efficient, high-performing analysis on _tabular_ data (i.e. excel sheet type of data).

### **There are two main data structures in pandas**

> **Data Series**: 1-dimensional array of values with an index
>
> **Data Frame**: 2-dimensional array of values with a row and a column index

<img width="500" src='https://miro.medium.com/max/1400/1*o5c599ueURBTZWDGmx1SiA.png'>

> Anytime you need more information on a package/function, call `?` after the function name.

In [None]:
pd?

#### **Example of a pandas Data Series:**

In [None]:
ocean_basins = ['Arctic', 'Atlantic', 'Indian', 'Pacific', 'Southern']
avg_salinity = [32, 35, 34.5, 35, 34.7]
ds = pd.Series(data=avg_salinity, index=ocean_basins, name="Ocean basins' average salinities")

In [None]:
ds

In [None]:
# To figure out the index do:
ds.index

**Indexing**

> We can get values back out using the index via the `.loc` attribute

In [None]:
ds.loc['Southern']

> Or by raw position using `.iloc`

In [None]:
ds.iloc[4]

> We can pass a list or array to `loc` to get multiple rows back:

In [None]:
ds.loc[['Arctic', 'Atlantic']]

> And we can even use slice notation

In [None]:
ds.loc['Arctic':'Indian']

In [None]:
ds.iloc[:3]

> **Print the following statements using your choice of indexing/slicing.**

In [None]:
#This might be a little tricky, so here is an example.
print('My research area takes place in the ocean basin that has an average salinity of ' 
      + str(ds.loc['Southern']) 
      + 'psu. That is, the ' 
      + ds.index[-1] 
      + ' ocean.')

In [None]:
# What is the saltiest ocean basin?
print('<Your answer here>')

In [None]:
# Which ocean basins have the same average salinity?
print('<Your answer here>')

In [None]:
# Which ocean basin would you like to explore?
print('<Your answer here>')

#### **Example of a pandas Data Frame:**

In [None]:
#first create a dictionary
ocean_basins = ['Arctic', 'Atlantic', 'Indian', 'Pacific', 'Southern']
avg_salinity = [32, 35, 34.5, 35, 34.7]
avg_temp = [-1.8, 14, 22, 20, 4]

avg_data = {'avg_salinity': avg_salinity,
        'avg_temp': avg_temp}


df = pd.DataFrame(data=avg_data, index=ocean_basins)

In [None]:
df

In [None]:
df.info()

> You can use many statistical functions on both Series and DataFrames.

In [None]:
df.min()

In [None]:
df.max()

In [None]:
df.mean()

> Or, if you want all the basic stats, you can call `describe()`


In [None]:
df.describe()

> We can get a single column as a Series using python's getitem syntax on the DataFrame object.

In [None]:
df['avg_salinity']

> or using attribute syntax.

In [None]:
df.avg_salinity

> **Find the following in this dataframe:**

In [None]:
#What ocean basin has the coldest average temperature? What is that temperature?
df.avg_temp.min()

In [None]:
df.avg_salinity.plot(kind='bar')

In [None]:
df.plot(kind='bar')

**Learning Goals**
- Learn about ocean salinity 
- What are Python packages?
- `import` essential packages 
- Let's play with `pandas`!