#  <font color='orange'>Tutorial 1 - Introduction to Python</font>

#### <font color='skyblue'>12.860 Climate Variability and Diagnostics (Fall 2021)</font>

##### *This tutorial is based on the "Introduction to Matlab Tutorial" written by Svenja Ryan for 12.860, Fall 2019*

In this tutorial, we will go through some basics on working with Python in Jupyter Notebooks. Before starting, make sure the following files are installed from Canvas:

1. <font color='magenta'>example_ctd.mat</font> (placed in the <font color='magenta'>"Tutorial01_Intro"</font> folder)
2. <font color='magenta'>cvd_utils.py</font> (placed in the <font color='magenta'>"CVD_Tutorials"</font> folder)

More instructions regarding the folders will be in <font color='green'>**the Working Directory**</font> section below. Also make sure that you are running this notebook with in the "cvd-12860" python environment, which has all the needed modules.

Throughout this tutorial, key tasks are indicated by lines that start with two exclaimation marks. See below for an example:

<font color='orange'> **\!! Q0** - *Look for tasks like these! There are 9 in this tutorial.*</font>


Note that this tutorial is optional, but for future tutorials, we will check that these tasks have been completed for marking.

## <font color='green'>Working in Jupyter Notebooks<font color='orange'>

Jupyter Notebooks provide an interactive way to write and evaluate blocks of both code and markdown text, known as cells.

To evaluate a cell, first select the cell by clicking on it (the selection is indicate by a blue vertical bar to the left). Press Ctrl-Enter or the "&#9658;" button the upper menu to evaluate the selected cell(s). The result will be printed right beneath the cell. Try it out below:

In [None]:
# Type anything between the quotations and evaluate the cell.
print("Write anything here!") 

## <font color='green'>Importing Modules<font color='orange'>

Many of the functions we use in Climate Data Analysis are stored in user-developed packages and modules. To access these functions, the modules must be installed (usually handled through anaconda), then imported. Generally, import statements are written at the top of the script.

See below for some examples of how modules are imported.

In [None]:
# Evaluate the cell below to import the modules
import os
import numpy as np                   # You can import a short name for module (np.mean rather than numpy.mean)
import matplotlib.pyplot as plt      # Import submodules from within a module. This is a module for plotting
from scipy.io import loadmat,savemat # Import specific functions within a module. Be aware that
# This may erase any existing functions in your current workspace with the same name. These functions
# will be used to read/load matfiles for this exercise

# Add cvd-utils module (this should be located in the CVD_Tutorials directory).
import sys
sys.path.append("../") # This assumes that your /.ipynb file is located in CVD_Tutorials/Tutorial01_Intro/
# If you prefer to place cvd_utils elsewhere, swap "../" with its current path.
import cvd_utils as cvd

## <font color='green'>The Working Directory</font>

Once you start a project you will be dealing with a number of data files and python scripts.
These scripts contain a list of instructions that you want Python to perform. 

It is important to keep all your files organized. One way is to keep all your files in a single directory for each lab.

1. Make a folder called <font color='magenta'>CVD_Tutorials</font>. Place <font color='magenta'>"cvd_utils.py"</font> in this folder.
2. Within <font color='magenta'>CVD_Tutorials</font>, create another folder <font color='magenta'>Tutorial01_Intro</font>.
3. For this lab, you need the dataset *example_ctd.mat* and the script *transect.py*. Put them in the <font color='magenta'>Tutorial01_Intro</font> folder.
4. Make sure you are in the right working directory (specified in 2). You can check your current working directory with
<font color='blue'>os.getcwd()</font> and change to a target directory using <font color='blue'>os.chdir(directory_name)</font>

In [None]:
# Check the current working directory
os.getcwd()

When loading and saving datasets/figures (ex. with <font color='blue'>loadmat</font>), Python checks the current working directory.
If the dataset is not contained in that directory, then the path to that dataset must be specified. One way to be explicit about this is to save a path to the working directory to a variable. You can then refer to this variable when you are loading datasets.

In [None]:
# Try to set your current working directory (hint: you can use os.getcwd(), or explicitly type out the path)
working_directory = "" # Type your code here! 
os.chdir(working_directory)

## <font color='green'> Using Help</font>

For documentation or further information on specific functions, you can type <font color='blue'>help(function_name)</font>.
Remember that the call to a function must be preceded by the name of the imported module (ex. <font color='blue'>help(os.getcwd) </font>)

In [None]:
# Evaluate the cell below to preview the help for the mean function from numpy.
help(np.mean)

Oftentimes, more detailed documentation with examples is available on the corresponding webpage for the module. For example, here is a link to the online documentation for [np.mean](https://numpy.org/doc/stable/reference/generated/numpy.mean.html?highlight=numpy%20mean#numpy.mean).

## <font color='green'>Using Python as a calculator</font>

Simple mathematical operations can be performed with Python and NumPy.


In [None]:
# Try to evaluate the cells below to see what happens:
1+1

In [None]:
2**3+10*(5-3)-1/2 # Note: To exponentiate in python, you can use a**b or np.power(a,b)

In [None]:
(np.sqrt(9)**2/9)

In [None]:
2*np.pi*10

Note that none of these outputs are saved unless they are assigned to a variable.

## <font color='green'> Types of variables</font>

A **vector** is a list of numbers. An **matrix** is a multidimensional table of numbers. In Python, the NumPy package is often used to handle operations involving vectors and matrices. In the NumPy, both vectors and matrices are handled as a basic type known as an **array**$^{1}$. Climate data is often stored arrays due to its multi-dimensional nature (for example, we might have temperature measurements over \[longitude x latitude x depth x time\], which is a 4-D array.

![Caption goes here](climate-data-diagram.png "Climate Data Example")
 <font size="1.5">**Figure 1.** Climate data often has multiple dimensions, such as x (longitude), y (latitude) and t (time) (source:[geohackweek](https://geohackweek.github.io/nDarrays/01-introduction/)).</font>


**Strings** are lists of characters which are enclosed by either double (") or single (') quotes. In Python, you can check the type of any single variable using <font color=blue>type(variable_name)</font>. To check the type of the elements within a NumPy array, use <font color=blue> array_name.dtype </font>

Generally, variables can be defined using any name you want, but be aware that certain names are built-in keywords in Python (ex. lambda), and trying to assign variables to these keywords will result in errors. To see a full list of keywords, run <font color=blue>\_\_builtins\_\_.\_\_dict\_\_.keys()</font> 

 <font size="1.5">[1]: This is unlike MatLab, where the basic type is a 2-D matrix (For example, a vector with 12 entries in Matlab is of size \[12 x 1\], but is of size [12,] in NumPy). A more detailed discussion of these differences can be found at [this link](https://numpy.org/doc/stable/user/numpy-for-matlab-users.html)</font>

In [None]:
# Try to evaluate the cells below to see what happens:
C = 10
C,type(C)

In [None]:
cat = 20.3
cat,type(cat)

In [None]:
filename = "data.mat"
filename,type(filename)

You can perform mathematical operations with your variables, and if desired, save into a new variable:

In [None]:
D = C*cat
D, type(D)

Variables don't have to be single numbers, they can be vectors (1-D arrays). For example: the monthly average temperature in your office in 2016:

In [None]:
# Use np.array() to initialize a numpy vector or array
temperature = np.array([16, 16.5, 17.8, 18.3, 20, 20.4, 21.1, 21.1, 20.3, 19.5, 18.2, 17.5]) 

temperature

The operation above creates an array of size (12,). You can always check the size of numpy array by typing <font color='blue'>array_name.shape</font> at the end of the array or <font color='blue'>np.shape()</font>.

In [None]:
# Print the size of each array
# Use str() to convert a variable to a string. Strings can be concatenated by adding them together.
print("The size of 'temperature' is " + str(temperature.shape)) 
print("The size of 'temperature' is " + str(np.shape(temperature)))

Variables can also be matrices (e.g. temperature each month in your office during 2016, 2017, and 2018):

In [None]:
temp2016_2018= np.array([
    [16.5000, 17.5, 18.3, 18.8, 20.5, 20.4, 21.6, 21.1, 20.8, 19.5, 18.2, 18.1], # 1st row is 2016
    [16.5, 17.0, 18.8, 18.3, 20.5, 20.9, 22.1, 22.1, 20.3, 20.0, 18.7, 17.6],    # 2nd row is 2017
    [16, 16.5, 17.8, 18.3, 20, 20.4, 21.1, 21.1, 20.3, 19.5, 18.2, 17.1]         # 3rd row is 2018
    ])

print("The size of 'temp2016_2018' is " + str(temp2016_2018.shape)) # Check the shape ([year x month])
print("The type of 'temp2016_2018' is " + str(temp2016_2018.dtype)) # Check the type of the numpy array with array_name.dtype

## <font color='green'> How to manipulate and work with variables</font>

To display the temperature for November 2016:


In [None]:
temp2016_2018[0,10] # Remember python uses zero indexing!

<font color='orange'> **!! Q1 -** *In the cell below, please write code to display the temperature for March 2017*</font>

In [None]:
## Write your code here


You can do arithmetic on vectors as well. If the office themometer is biased and showed temperatures too cold by $0.5^{\circ}C$, you can add an offset to all the temperatures in the array:

In [None]:
temp2016_2018+0.5 # Note that this modification is not saved unless you assign it to a variable, or modify it in place (ex. temp2016_2018 += 0.5)

You can select all the records in a given row or column by using a colon. (ex. <font color=green>variable_name[0,:]</font> grabs all entries in the first row.)

<font color='orange'> **!! Q2 -** *In the cells below, please write code to display:*</font>
1. <font color='orange'>All temperatures in 2016</font>
2. <font color='orange'>All July temperatures</font>

In [None]:
# Write code for displaying all temperatures in 2016


In [None]:
# Write code for displaying all July temperatures


To get subsets of the data, you use the colon. For example, the first six months of 2017 are retrieved as follows:

In [None]:
# Note that when using indexing in Python, the end number of the slice (6 in this case) is NOT included.
temp2016_2018[1,0:6] 

<font color='orange'>**!! Q3** - Please write code to display temperatures for March and April of 2016 and 2017 </font>

In [None]:
# Enter code here to display temperatures for March and April of 2016 and 2017

Some other indexing tricks in NumPy are briefly described below...
- array_name[-1] retrieves the last element along the specified dimension. -2 retreives the second last, and so on...
- array_name[1:] Not including an *upper bound* to the slice retrieves all values from the specified lower bound to (and including) the last element.
- array_name[:10] Not including the *lower bound* to the slice retrieves all values up to (but not including) the specified upper bound.

A more comprehensive discussion is found in NumPy's [Advanced Indexing Guide](https://numpy.org/doc/stable/reference/arrays.indexing.html).

## <font color='green'>Using Functions</font> 
Packages such as NumPy include many pre-defined functions useful for data analysis. One example is the <font color='blue'>*np.mean*</font> function.

<font color='orange'>**!! Q4** - Evaluate the following cells and briefly describe the different outcomes.</font>

In [None]:
# Evaluate and describe the outcome.
np.mean(temp2016_2018)
"""
Write your (brief) description here:

"""

In [None]:
np.mean(temp2016_2018,axis=0)
"""
Write your (brief) description here:

"""

In [None]:
np.mean(temp2016_2018,axis=1)
"""
Write your (brief) description here:

"""

<font color='orange'>**!! Q5** - Write code to find the mean Jan-Feb-Mar Temperature for 2018.</font>

In [None]:
#  Write code to find the mean Jan-Feb-Mar Temperature for 2018


Also take a look at other functions (ex. <font color=blue>np.max, np.min, np.std</font>). Remember, you can always use the <font color=blue>help</font> command to learn how to use a certain function.

## <font color='green'>First Data Example: Working with CTD Casts</font> 
We will use the example Conductivity-Temperature-Depth (CTD) dataset which includes 95 CTD casts. While this is a simple dataset, there are some challenges to it:
1. As the ocean is not flat, each cast usually has a different length (i.e., a different number of data points). Thus it is not straightforward to put all profiles into a matrix.
2. The data points might not be at example the same depth levels for each profile.
    
Load the dataset <font color='magenta'>example_ctd.mat</font>. In Python, matfiles can be loaded as a <font color="green">dictionary</font>, which is a data structure in Python where each entry is retrieved using a "key".


In [None]:
# Load the CTD data in, and check the keys
loaded_data = loadmat("example_ctd.mat")

# You can retrieve and print the keys using "dictionary_name.keys()"
print(loaded_data.keys()) 

In [None]:
# Retrieve selected arrays using the keys above.
P = loaded_data['P'] # Pressure
S = loaded_data['S'] # Salinity
T = loaded_data['T'] # Temperature
x = loaded_data['x'] # Locations of cast

print("The size of P is " + str(P.shape))

In [None]:
# Since the default data type in Matlab is a 2-D array, we can squeeze the data to reduce the extra dimension
P = P.squeeze() # Automatically drops singleton dimensions from a numpy array
S = S.squeeze()
T = T.squeeze()
x = x.squeeze()

print("The new size of P is " + str(P.shape))

We now have 3 arrays which represent pressure (P), salinity (S) and temperature (T). Each array contains our 95 casts and allow for them to have different sizes.
You can access data for a specific cast using indexing:

In [None]:
# Retrieve pressure data for casts 1 and 29
cast01_pressure  = P[0]
cast30_pressure  = P[29]

# Note the different number of pressure levels between the two casts
print("The size of cast01_pressure is " + str(cast01_pressure.shape))
print("The size of cast30_pressure is " + str(cast30_pressure.shape))

## <font color='green'>Plotting CTD Profiles</font> 

Now let's do some plotting. In Python, a popular package for visualizing data is known as [*matplotlib*](https://matplotlib.org/stable/index.html). Specifically, the functions in [matplotlib.pyplot](https://matplotlib.org/stable/tutorials/introductory/pyplot.html) provide the ability to visualize data using syntax similar to MatLab.

<font color='orange'> **!! Q6** - Below is an example plot of temperatures for profiles 1 and 30. Pick out two different temperature or salinity profiles and plot them, and or try to plot them all with a loop. For all graphs, please make sure to add the appropriate axis labels and title. </font>
    
Additional Tips
- Try to adjust/experiment with different plotting options such as color, linewidth, and linestyle (see the [ax.plot documentation](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.plot.html) for different parameters)
- If you dislike the default look of matplotlib style plots, you can quickly change to various preset styles using <font color=blue>plt.style.use(style_name)</font>$^2$. Documentation on the available styles can be found [here](https://matplotlib.org/stable/gallery/style_sheets/style_sheets_reference.html)
- Other common axes adjustments not included here include setting the x and y limits (<font color=blue>ax.set_xlim, ax.set_ylim</font>), tick frequency (<font color=blue>ax.set_xtick, ax.set_ytick</font>).

<font size="1.5">[2]: Note that "plt.style.use" modifies all future plots within the workspace. To switch back to default, run plt.style.use('default')</font>

In [None]:
# Example plotting code
plt.style.use('default') # Adjust style (optional)

# Creates a "figure" and "axes" object.
fig,ax = plt.subplots(1,1) 

# Plot the selected data on the axes object
ax.plot(T[0],P[0],color='b',linestyle='-',linewidth=1.5,label="Profile 1") 
ax.plot(T[29],P[29],color='r',linestyle='--',linewidth=2,label="Profile 30")

# Label the axis
ax.set_xlabel("Temperature [$\degree$C]")
ax.set_ylabel("Depth [m]")
ax.set_title("ADD TITLE")

# Reverse y-axis, so that the surface is on top
ax.invert_yaxis()

# Additional commands
ax.legend()          # 
ax.grid(True,ls=':') # Add grid

# Save the figure if desired. By default, this saves the current active figure
working_directory = os.getcwd()                            # Get current working directory
figname = working_directory + "Tutorial01_CTD_Profile.png" # Set Figure Name
plt.savefig(figname)

In [None]:
# Paste the code from the cell above, and try modifying it here.


<font color='orange'>**!! Q7** - What can the temperature profiles tell us? Do you think this data has been collected from the high or low latitudes?</font>

In [None]:
"""
Type your answer here:


"""

## <font color='green'>T-S Diagram</font> 

In oceanography, CTD data is often plotted in a T-S diagram, which allows us to identify water masses. 

For this, salinity is commonly plotted on the x-axis and temperature on the y-axis.

<font color='orange'>**!! Q8** - Try plotting a T-S diagram for all profiles of the CTD data. Hint: plot one data point for each CTD cast and depth level.</font>


In [None]:
# Try plotting a T-S diagram for the CTD data. Hint: plot one data point for each CTD cast and depth level. You can try ax.plot() or ax.scatter()

fig,ax = plt.subplots(1,1) # Initialize figure...

# Write the rest of the code below:

### <font color='green'>Hydrographic Section</font>
Another important part is plotting all stations as a hydrographic section, which represents a slice through the ocean.
For this, some data manipulation has to be done since plotting routines generally require a 2-D matrix to do these surface plots
(see the above section on challenges working with CTD data).

If you are already advanced in Python, try to do the following steps yourself
1. Interpolate each profile on the same depth levels (try [numpy.interp](https://numpy.org/doc/stable/reference/generated/numpy.interp.html) or scipy's [interp1d](https://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html))
2. Create an empty matrix. It has to have the dimensions [# of profiles x length of longest cast].

If you don't feel comfortable doing this, you can import and use provided <font color='blue'>cvd.transect()</font> function, which is based on a similar Matlab function from Chad. A Greene's [CDT toolbox](http://www.chadagreene.com/CDT/transect_documentation.html). This function saves you all the work and allows you to plot a fancy section fairly easily ;-).

Explore different options that are described on the link. Make it YOUR section, and be sure to label the axes of the graph and add a colorbar. 

<font color='orange'>**!! Q9** - Following the instructions above, plot a hydrographic section using either cvd.transects or your own code. Describe below what you see in the graph. </font>

In [None]:
#Write your code here 



In [None]:
"""
Describe your result here.

"""