# <center> MAS 332L Lab 3: Visualizing Salinity Data in the Ocean
    

### Learning Objectives

You will practice skills in:

* Reading in data
    
* Filtering data and calculating basic summary statistics
    
* Plotting

Hopefully, you've noticed a pattern developing in how we process data in Python. 

In general, we do the following:

1. Upload and read in the data file

2. Do any sort of cleaning up of the data. This can involve removing NaNs, pulling out certain dates, removing excess columns, converting time metadata to timesteps, etc. 

3. Manipulate the data to find the parameters of interest, such as averages, ranges, etc.

4. Display the data

We will be doing all of that today.

First, upload the data included as part of this assignment. Filter out bad sensor data, as indicated by the quality control data, using a mask. Find the average, min, max, and standard deviation for each station for each time period. Then, make a box and whisker plot of the data for better visualization. I have given you much of the required code in an example for  one dataset (Dauphin Island Winter); you will repeat the code for the remaining three stations. 

## Before starting:

#### Assignment Details

We are asking you to submit both a copy of the notebook and images of the boxplot you will generate. 

#### Commenting Code

As we progress through our Python assignments, it will become increasingly important to comment your code. This will allow you to revisit it at a later time or have someone else look over it to help more easily. 

#### Please acknowledge that you understand the instructions by copying and pasting each of the following into the next cells.

#I understand how to save my progress and reopen the notebook.

#I understand that I am being asked to save and submit copies of my notebook as well as my boxplot.

#I understand that I need to comment my code. 

### Let's get going!

Before we get started, we need to import the packages we require to retrieve, manipulate, and plot the data. 

In [None]:
#Import the NumPy package and call it 'np'
import numpy as np

#Import Pandas and alias it as 'pd'
import pandas as pd

#import matplotlib, call it plt
import matplotlib.pyplot as plt

Now, let's read in the data so Python knows what we're trying to work with.

I've shown the example for the Dauphin Island Winter data. 

In [None]:
#define a filename for the data files to more easily read in 
DIWinter = "Dauphin_Island_hyd january.csv"

#create a separate data frame for each
#this allows us to read in the data and begin calculations
df_DIW=pd.read_csv(DIWinter, sep=',', engine='python')

In [None]:
#take a peak at the data
df_DIW.head()

---

You will notice that there are several columns of data that we just do not need to worry about. Those include data on the water temperature, dissolved oxygen, depth, and turbidity.


Today, all we care about is the salinity data. Those data are in the `salinity1_avg` column. These data are recorded in psu units.

The next column, `salinity1Flag`, gives metadata on the salinity sensor. Long term monitoring projects nearly always have a way to tell if the data collected is trustworthy. Sensors often fail in the field, and we need a way to determine which data should be disregarded. For these data, a flag value of 3 indicates the sensor is working correctly; anything less is questionable at best.

So, in order to filter out any bad data, we will create a mask so we can keep only the salinity data that we can trust (those with a flag of 3). 

In [None]:
#let's start by getting the salinity and flag data
#salinity first
DIW_salinity = df_DIW['salinity1_avg']
DIW_salinity.head() #take a peak

In [None]:
#now get the DIW flag data
DIW_flag = df_DIW['salinity1Flag']
DIW_flag.head() #take a peak

Before we can find the parameters of interest, we need to create a mask and filter out the bad data. Look back at your other assignments if this next section is confusing. 

In [None]:
#masks keep false values and retain true ones
#I want to keep any salinity data with a flag of 3
#Anything else should be disposed of
DIW_mask = DIW_flag == 3 
DIW_mask.head() #take a peak

In [None]:
#now apply the mask to the salinity data
#define the remaining data as an array with the name DIW_sal
DIW_sal = DIW_salinity[DIW_mask]
DIW_sal #take a peak 

Okay! Now we have salinity data for Dauphin Island during the winter month in the variable `DIW_sal`. 

You should now repeat the above code, filling in fields as necessary. I've started the next one, for what I call DIS for Dauphin Island Summer. You will also need to pull out the salinity data for Perdido Key Winter and Summer. Read in the data files, create data frames for the salinity and salinity flag data, and filter out bad sensor data for all. 

**In the end, you should have arrays for both stations in both seasons.**

---

In [None]:
#this is just the start
#fill in the appropriate file/variable names
#and continue from here to applying the mask to the data, as I did above

#define a filename for the data files to more easily read in 
DISummer = ".csv"

#create a separate data frame for each
#this allows us to read in the data and begin calculations
=pd.read_csv(DISummer, sep=',', engine='python')

*Hint: if you keep getting errors, double check your variable and file names.* 

In [None]:
#make sure to do Perdido Key for both seasons as well!

In [None]:
#define a filename for the data files to more easily read in 
PKSummer = "perdido_key_hyd august.csv"

#create a separate data frame for each
#this allows us to read in the data and begin calculations
df_PKS=pd.read_csv(PKSummer, sep=',', engine='python')
#let's start by getting the salinity and flag data
#salinity first
PKS_salinity = df_PKS['salinity1_avg']
PKS_flag = df_PKS['salinity1Flag']
PKS_mask = PKS_flag == 3 
PKS_sal = PKS_salinity[PKS_mask]

In [None]:
#define a filename for the data files to more easily read in 
PKWinter = "perdido_key_hyd january.csv"

#create a separate data frame for each
#this allows us to read in the data and begin calculations
df_PKW=pd.read_csv(PKWinter, sep=',', engine='python')
#let's start by getting the salinity and flag data
#salinity first
PKW_salinity = df_PKW['salinity1_avg']
PKW_flag = df_PKW['salinity1Flag']
PKW_mask = PKW_flag == 3 
PKW_sal = PKW_salinity[PKW_mask]

If you've done your coding correctly, you should now have four arrays of varying size. They should be named something along the lines of `DIW_sal`, `DIS_sal`, `PKW_sal`, and `PKS_sal`. The names of your arrays may be different. 

*Make sure to check the names of any variables in the code.*

We now need to calculate the parameters described in your lab handout! The syntax has been demonstrated for Dauphin Island in the winter; you should now fill in for the remaining data. 

In [None]:
DIW_avg= DIW_sal.mean() #define and calculate the average
DIW_min= DIW_sal.min() #define and calculate the minimum
DIW_max= DIW_sal.max() #define and calculate the maximum
DIW_stdev= DIW_sal.std() #define and calculate the standard deviation
print("The parameters for Dauphin Island in the winter are, in the order of (average, min, max, stdev)") #print statement
(round(DIW_avg,1), round(DIW_min,1), round(DIW_max,1), round(DIW_stdev,1)) #printing values rounded for to read easier

In [None]:
DIS_avg= DIS_sal.mean() #define and calculate the average
DIS_min= DIS_sal.min() #define and calculate the minimum
DIS_max= DIS_sal.max() #define and calculate the maximum
DIS_stdev= DIS_sal.std() #define and calculate the standard deviation
print("The parameters are, in the order of (average, min, max, stdev)") #print statement
(round(DIS_avg,1), round(DIS_min,1), round(DIS_max,1), round(DIS_stdev,1)) #printing values

In [None]:
PKW_avg= PKW_sal.mean() #define and calculate the average
PKW_min= PKW_sal.min() #define and calculate the minimum
PKW_max= PKW_sal.max() #define and calculate the maximum
PKW_stdev= PKW_sal.std() #define and calculate the standard deviation
print("The parameters are, in the order of (average, min, max, stdev)") #print statement
(round(PKW_avg,1), round(PKW_min,1), round(PKW_max,1), round(PKW_stdev,1)) #printing values

In [None]:
PKS_avg= PKS_sal.mean() #define and calculate the average
PKS_min= PKS_sal.min() #define and calculate the minimum
PKS_max= PKS_sal.max() #define and calculate the maximum
PKS_stdev= PKS_sal.std() #define and calculate the standard deviation
print("The parameters are, in the order of (average, min, max, stdev)") #print statement
(round(PKS_avg,1), round(PKS_min,1), round(PKS_max,1), round(PKS_stdev,1)) #printing values

### Boxplots

Let's try to make a boxplot again. This time, we're going to try to plot both sites and seasons all together. 

To do so, we will make a figure with multiple `subplots`. This allows us to make a separate plot for each, with their own titles, data, etc., within a single larger figure. You will save this figure and upload it to Canvas with this Python assignment. 

#### Read through the code. Check that your variable names match the ones in the code. Add a figure name to save it

In [None]:
#make a figure with multiple axes and subplots
#we need four total subplots
#plotting them in a (2,2) or 2x2 grid is best visually
#sharey means that the plots share y-axes, which reduces clutter in the plots
#defining the figure size as (10,10) makes it large enough to see
fig, axs = plt.subplots(2,2, sharey= True, figsize = (10,10))

#Let's plot the Dauphin Island summer salinities first
#puts in location (0,0) 
#this is the upper left corner
axs[0,0].boxplot(DIS_sal) #makes the boxplot
axs[0,0].set(ylabel="Salinity (psu)") #put a label on the common y-axis
axs[0,0].set_title('Dauphin Island Summer') #sets the title
axs[0,0].set_xlabel("")

#Now Dauphin Island winter
#I want this to line up horizontally with the other Dauphin Island Data
#this (0,1) is the upper right corner
axs[0,1].boxplot(DIW_sal) #makes the boxplot
axs[0,1].set_title('Dauphin Island Winter') #sets the title

#Now Perdido Key Summer
#I want this to line up vertically with the other Summer data
#(1,0) is the bottom left corner
axs[1,0].boxplot(PKS_sal) #makes the boxplot
axs[1,0].set(ylabel="Salinity (psu)") #put a label on the common y-axis
axs[1,0].set_title('Perdido Key Summer') #sets the title

#Now Perdido Key Winter
#I want this to line up vertically with the other Winter data
#do this one mostly alone
axs[1,1].boxplot(PKW_sal) #make the boxplot
axs[1,1].set_title(' ') #set the title


#add a file name to the line of code here to save the figure
plt.savefig(" .png")

---

### You're done!! 

Make sure you've saved two copies of your notebook (html and IPYNB) and your boxplot as well.