# <center> Lab 7: Plotting and Analyzing CTD Data </center>

---
Welcome to the Python component of our boat trip! We are going to actually analyze our own data! 

We will go through some more plotting and manipulation of the data we actually collected.

Just as before, we would like you to download your work as both an html and ipynb file and **save a figure of your choice** to Canvas.

---
#### Please acknowledge that you understand the instructions by copying and pasting each of the following into the next cells.

#I understand how to save my progress and reopen the notebook.

#I understand that I am being asked to save and submit copies of my notebook as well as a figure of my choice for the assignment.

#I understand that I need to comment my code.


###### A note about copying and pasting code in this assignment:
Like your last assignment, you are provided code in markdown cells highlighted in gray like this. Copy and paste the given code, modifying where asked, into the coding cell below. Add more cells as necessary.

There are also several times in this assignment where we ask you to rerun some code. You will have to copy and paste the correct code from earlier in the assignment into a new cell and modify variable names or add a section as asked.

We tried to make this as clear as possible. Code will be provided in blocks of gray text. Code tasks and questions are numbered. Answer the questions as comments or markdown cells. Each numbered task will be graded for completion and accuracy. Remember to add comments!

You have all the information you should need, but remember you can always ask questions!

## Let's get started!

First, import a couple packages that we will use throughout this assignment

In [1]:
#Import Pandas and alias it as 'pd'
#for manipulating tables and timeseries
import pandas as pd

#Import MatPlotLib and alias it as 'plt'
#for making plots/graphs/figures
import matplotlib.pyplot as plt

#Import Numpy and alias it as 'np'
#for the arrays
import numpy as np

#### Read in data

1. Start by reading in the data files.
We'll help with by loading Station 1.
Keeping the variable names consistent with our recommendations will help later in the assignment

`filename1 = "station1_CTD3_092422" 
df_1 = pd.read_excel(filename1) `

2. Peak at the data.

`df_1.head()`

Now it's your turn! 

3. Repeat what we just did for Station 2. Read in the second data file, take a peek at the data, and then answer the questions from Exercise 1.

#### Answer some Qs about the data.

4. What does CTD stand for?


5. How does the CTD calculate depth?


6. How does it measure salinity?


7. What is the importance of O2 measurements? What are some possible sources and sinks of oxygen in the water column?


Look at the headings of the columns. Let's make a table in a markdown cell showing the abbreviation (in the raw data), measured variable, and unit for *any five variables*. See the syntax for a table in the next cell.


| Heading1 | Heading2 | Heading3 |
| :-: | :-: | :-: |
| cellinfo1 | cellinfo2 | cellinfo3 |

This cell contains the markdown code. Your headings should be: Abbreviation (in the raw data), Measured Variable, and Unit.

`| Heading1 | Heading2 | Heading3 | #You can have as many rows as you'd like, but each must be bookended by vertical lines
| :-: | :-: | :-: | #This line aligns the cell text in the middle
| cellinfo1 | cellinfo2 | cellinfo3 | #Repeat this line for as many additional rows as you'd like`

8. Make the table below. Keep the cell in markdown mode.

---
#### Vertical Profiles

Let's start with the temperature at Station 1. There's only 1 plot, so we will leave the subplot argument blank *for now*. 

9. Make the plot

`fig1, (ax1) = plt.subplots() # There's only 1 plot, so subplot argument is empty
ax1.plot(df_1["TempC"],df_1["Depth"])
ax1.set_xlabel('Temperature (C)') #draw x label
ax1.xaxis.set_label_position('top') # this moves the label to the top
ax1.xaxis.set_ticks_position('top') # this moves the ticks to the top
ax1.set_ylabel('Depth (m)')# Draw y label
ax1.set_ylim(ax1.get_ylim()[::-1]) #this reverses the yaxis (i.e. deep at the bottom)`

#### Add another CTD profile to the plot

Let's add another variable to the graph to see two CTD profiles plotted together.

We can plot a second variable with temperature by creating another axis using
`ax2 = ax1.twiny()`. Note: there are no arguments for this line.

By default, ax1 will be on the bottom, and ax2 will be on the top. Because we want temperature displayed on the top, we will call it ax2. We no longer need the command moving the x-axis and tick marks for temperature to the top of the graph.

10. Add a plot of DO on ax1. Move the temperature plot to ax2. 

*Hint:* Amend the code from Part 9 to remove the lines that move the ax1 label and ticks to the top and plots DO on ax1. Add code for `ax2.plot()` and `ax2.set_xlabel()` (you will need to fill in the appropriate arguments) underneath the `ax2 = ax1.twiny()` code to plot temperature on the top.

So, your code should look *something* like:

`fig1, ax1 = plt.subplots()
ax1.plot()
... #fill in the rest of the ax1 lines
ax2 = ax1.twiny() #fill in the other two lines.`


Ok this looks pretty good except we can't tell which line is plotting which variable. 
Let's change the colors for each variable and axis, so we can easily read the plot. 
Let's take a look at the base colors available for matplotlib.

| Syntax | Color |
| :-: | :-: |
| 'b' | blue |
| 'g' | green |
| 'r' | red |
| 'c' | cyan |
| 'm' | magenta |
| 'y' | yellow |
| 'k' | black |
| 'w' | white |

There are a lot more colors available, but these are your basic choices. 


Now we can use these commands to change the colors of the plot on ax1. 

11. Add these lines of code in the appropriate location to Part 10. Make sure to change the argument to an actual color and not 'p' OR blue. 

`
ax1.xaxis.label.set_color('p') #Set X-axis label color to color 'p'
ax1.tick_params(axis='x', colors='p') #Set X-axis tick color to color 'p'
ax1.spines['top'].set_color('p') #Set top spine color to color 'p' `

Let's start with DO.

#### Change the ax2 color

Recreate the plot above, but make the below changes:

12. Change the xticks, xlabel, xspine, and x values for temperature on ax2 to all the same color. Add similar code from Part 11. 

---
#### Vertical Profile Subplots

Instead of graphing the CTD profiles for variables on the same graph, let's try doing subplots. Sometimes a single graph for multiple variables is too messy, and having subplots can display the information more clearly.

We'll start with our original plot for temp, then add a plot next to it for salinity

I'll get y'all started on the first two variables. 

13. Copy and paste the code below, making sure to check that the appropriate arguments are in the right spots. 
Note:
Subplot now has axs instead of ax1. This allows us to add as many subplots as we want in a grid. 

Subplot arguments are `(1, 2, sharey = True)`. 
The `(1,2)` creates a 1x2 grid of subplots. We will change this again later to add more.
`sharey = true` allows them to share y-axes with the same scaling and scale labels.
    
`fig1, axs = plt.subplots(1,2,sharey=True)
axs[0,0].plot(df_1["TempC"],df_1["Depth"]) # plot temperature first. (0,0) is the first grid location.
axs[0,0].set_xlabel('Temperature (°C)') # x axis
axs[0,0].xaxis.set_label_position('top') 
axs[0,0].xaxis.set_ticks_position('top') 
axs[0,0].set_ylim(ax1.get_ylim()[::-1])
axs[0,0].set_ylabel('Depth (m)') # y axis
axs[0,1].plot(df_1["Salinity"],df_1["Depth"], 'r') # plot salinity second. (0,1) is to the right of temperature.
axs[0,1].set_xlabel('Salinity (psu)') # x axis
axs[0,1].xaxis.set_label_position('top') 
axs[0,1].xaxis.set_ticks_position('top') 
axs[0,1].set_ylabel('Depth (m)') # y axis
fig1.suptitle("Station 1, March 18, 2023") #gives a grand title that covers all subplots `

How does the title look to you? 
We need to move it in order to read it! 

14. Add x and y coordinates to move the grand title up using:

`fig1.suptitle("Station 1, March 18, 2023", x= , y= ) `

*Try your own values for x and y until you find ones that make sense.*

Let's also get rid of the ylabel 'Depth' for the 2nd plot since we don't really need it. 

15. Add 2 more subplots of the variable of your choosing. Make each variable a different color or style of line. Refer to the tables above Part 11 for the syntax. 

Put the new plots next to the first two. 

*Hint:* you will need to change the `plt.subplot()` arguments to expand the grid to accomodate 2 more subplots. They can all still share a y-axis. Amend the `axs[x,y]` lines of code as well to include the third and fourth plot. 


#### Answer the Questions Below

Look at the variables you plotted. 

16. Describe the characteristics of each vertical profile. What, if any, are the differences between the surface, middle, and bottom water?

17. How many layers of water masses are indicated from the vertical profiles? 

18. Under what conditions would you expect to see the values for a given variable to be the same at the surface and bottom? If surface and bottom values were the same for salinity, would you expect them to be the same for other variables too?

19. What variable do you think is driving the differences between the water column layers?


---
#### Add Plots from Station 2

Great! Now we've visualized the data from Station 1. Now, lets add a second row of subplots for Station 2. I will walk you through it for these, so don't start panicking just yet. Let's just think for a second. 

One of the things that can take the longest in replotting data are the data manipulations, to filter the data and whatnot. Did we do any data manipulations so far for this exercise? No! This is good; it shouldn't be too hard then to replot. Most of what we've done so far today has been reformatting. Essentially, That last chunk of code for Part 15 needs to be copied and modified for these data. But we need Python to have the data first, so let's start there. 

20. Read in the data from Station 2 and take a peak at the data. Look back at Parts 1&2 if you need help. **Make sure to rename the filename and particularly the dataframe `df_1`.**

Copy and paste your code from Part 15 *twice* in the cell below with some space between the two. Let's refer to these as Sections A and B. We're going to make major changes in Section B while keeping Section A mostly intact and run it all at once. Each step is still numbered, but do not run the cell until you are ready. 

21. Let's start at the top of Section A, at the first line. It should start with `fig1, axs = plt.subplots(1,4,sharey=True)`. We need to modify the `plt.subplots()` arguments to have a 2x4 grid. Modify appropriately.


22. Now let's look at the several lines for `axs[0,0]` in Section A. We had to format the y-axis correctly for the rest of the graphs here. Move down to Section B. Modify each of the `axs[0,0]` arguments in Section B to be for the first column of row 2. Then, starting at the first line, go row by row and change the arguments when needed. *Hint:* after modifying the location for each of the six rows, you should only need to change the arguments in one.


23. Great! We've now modified the code for Row 2, Column 1 of Station 2 data. We have three more to modify! Start by modifying the positions `[x,y]` for each subplot in Section B. Then go back through and change the arguments when appropriate. *Hint:* again, we only need to change the arguments in one line of code per subplot.

Okay, so now we should have modified each subplot's code for Section B. Make sure to change the `suptitle` to something that makes sense too! Then answer the questions below.

#### Compare Data Between Stations

24. Let's look at the plots for the each variable. Identify which variables show different patterns (e.g., salinity is high at the surface and decreases with depth for Station 1 vs. salinity is low at the surface and increases with depth) or very different scales (e.g., a salinity of 5 vs. a salinity of 35). 

25. Speculate on what is driving the difference between the stations. Consider the important variables and processes that might be influencing them. Is one closer to a freshwater source? Is one more likely to have more nutrients available? Answers will be graded in the how well reasoned and explained your responses are. 

---
#### Compare to this time last year

Let's compare the data from this year to the same time last year. Dr. Sprinkle's class took a Lenny CTD cast from a similar location on March 19, 2022. I'd like us to plot the exact same variables as above, but for this date. This process is the nearly the same for what we did in Parts 20-23.

26. Read in the data from Dr. Sprinkle's class and take a peak at the data. Look back at Parts 1&2 (or 20) if you need help. **Remember to rename the filename and dataframe `df_1` in particular.**

Copy and paste your code modified for Parts 21-23 *once* in the cell below. Then copy and paste Section B again below (so that you now have once cell with Section A, Section B, and Section B again). Let's refer to this as Section C. We're going to make major changes in Section C while keeping Sections A & B mostly intact and run it all at once. Each step is still numbered, but do not run the code in the cell below until you are ready. 

27. Let's start at the top of Section A, at the first line again. It should start with `fig1, axs = plt.subplots(2,4,sharey=True)`. We need to modify the `plt.subplots()` arguments to have a 3x4 grid now. Modify appropriately.


28. Now let's move down to Section C. Modify each of the `axs[1,0]` arguments in Section C to be for the first column of row 3. Then, starting at the first line of Section C, go row by row and change the arguments when needed. *Hint:* after modifying the subplot location for each of the six rows, you should only need to change the arguments in one.


29. Great! We've now modified the code for Row 2, Column 1 of Station 2 data. We have three more to modify! Start by modifying the positions `[x,y]` for each subplot in Section B. Then go back through and change the arguments when appropriate. *Hint:* again, we only need to change the arguments in one line of code per subplot.

Okay, so now we should have modified each subplot's code for Section C. Make sure to change the `suptitle` to something that makes sense too! Then answer the questions below.

#### Last Round of Questions:

30. Take a second to admire your plots. Now, compare Station 2 to Station 1 and then compare 2022 to 2023 Station 1. Which data are more different: Station 2 or Year 2022. 


31. Speculate on what is driving the change. Did something happen around this time last year? Does it make sense to you?

#### That's all!!!

Many of the above answers depend heavily on which variables you chose to define/plot, so please keep that in mind as you're looking over your grades.