<b><font size=6 color=mediumblue>Introduction to Python Language and Jupyter Notebooks</font></b>

## What is Jupyter Notebook?
You are currently reading information in a Jupyter notebook. A notebook is like a document, but it is specific to Jupyter. It's like a Word docx or a Google Doc. The .docx file extension is specific to Word, the Google Doc is specific to Google Drive, the notebook is specific to Jupyter. You can open and read the files using other programs, but it works best in Jupyter, just like .docx work best with Word and Google Docs work best in Google Drive. 

A Jupyter notebook has a couple of different cell types to write in. This cell is a Markdown. The cell below is a code cell, the default cell type. You can change the cell type in Jupyter Notebook using the Toolbar. There are also Raw NBConverts, but you don't need to worry about those. To edit a cell, double click on it. When you are done editing, press Shift+Return on your keyboard. Try this with the code cell below:

In [None]:
x = 3 + 7
print(x)

If you want to make a new cell, there are a few options. You can click the **+** button in the top left corner of this screen (under File). This makes a new code cell appear below the current working cell. If you want a cell to appear above, you can click out of the current cell in a grey margin (the highlight color will turn blue instead of green) and press A on your keyboard. To make a cell below, press B. You can also click "Insert" on the menu bar and choose either "Insert Cell Above" or "Insert Cell Below".

You can copy and paste text and code just like you would in a Word document or Google Doc. You can also copy and paste entire cells, using the Edit dropdown menu or the shortcut buttons in the tool bar.

There are a few other buttons of importance in the Menu: File -> Save and Checkpoint, Kernel -> Restart, and File -> Close and Halt. 
1. Save and Checkpoint is pretty straightforward. Click whenever you need to make sure your data is saved! 
2. Kernel -> Restart will clear the memory of everything you've run so far. This is most useful if you've been testing out new code and want to run the whole script through to make sure it works. 
3. File -> Close and Halt should be used to exit each notebook. **Make sure you click Close and Halt whenever you are done with a notebook.** Otherwise, the scripts will keep running in the background, which may take up a lot of memory on your computer!

## What is Python?
Python is a type of programming language. It is often described as having a "simple, easy-to-learn syntax". This basically means that writing in Python is logical for humans. We write Python code like how we would talk or form a logical argument. But before we can write in Python, there is some basic vocabulary to learn.

## Packages and Modules
Python is a language, and inside every language there are different types of words. In English, we have parts of speech like nouns, verbs, adjectives, etc. In Python, we have *packages* like Numpy, Xarray, Matplotlib, etc. Each package has a specific purpose:
* Numpy is all about matrices, lists, and math operations on those matrices and lists.
* Xarray is a powerful Python library designed for working with multi-dimensional labeled datasets, often used in fields such as climate science, oceanography, and remote sensing. It provides a high-level interface for manipulating and analyzing datasets that can be thought of as extensions of NumPy arrays. Xarray is particularly useful for geospatial data because it supports labeled axes (dimensions), coordinates, and metadata, making it easier to work with datasets that vary across time, space, and other dimensions.
* Matplotlib is a plotting package. But I didn't import all of Matplotlib, I only imported one *module* from Matplotlib called pyplot.

Python doesn't automatically open all packages when you start a notebook or script. Instead, you need to tell Python which packages you want. You do this with the import statement. You can also give each package a nickname, like "np" in the example below. Here are some more examples of importing packages.

In [None]:
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt

## Basics of netCDF file manipulation
To handle more complex 2D and 3D datasets, we use packages such as Xarray.

In [None]:
ds = xr.open_dataset('WOA23.nc')
print(ds)

**Array Type:** The very first thing that is shown in the above print out is 'xarray.Dataset'. There are two main types of arrays that xarray can handle: DataArrays and Datasets. A Dataset is a collection of DataArrays. Imagine that a DataArray is a rubix cube. It is a 3D (or 2D or 4D or however shaped) array of data. A Dataset would therefore be a collection of rubix cubes. Imagine you ordered a shipment of 30 rubix cubes. Each individual rubix cube is a DataArray, and the box that all the cubes were shipped in is the Dataset. We can select one DataArray using its variable name. There are some data selection methods that only work on DataArrays, which is why it is important to distinguish between DataArrays and Datasets.

**Dimensions:** The dimensions are the directions that the file has, or the names of the axes. There are three dimensions listed in this file: depth, lat, and lon.

**Coordinates:** Coordinates are labels for each step of a dimension. This particular file has three of the same coords as in dims: lat, lon, and depth.

**Data Variables:** These are the variables of interest. This is the actual data in your file.

**Indexes:** Indexes are not always in an Xarray Dataset. Dimensions and coords are more useful in Xarray than the indexes, but you might find Datasets that include them.

**Attributes:** Finally, Xarray has more metadata stored in the Attributes. Some of this information can be helpful, such as the lat and lon corners. Knowing the shape and map projection is important for properly formatting the data for visualization.

If you are interested in knowing just the attributes or just the coordinates or just the dims of an Xarray DataArray, you can call this information specifically.

In [None]:
# here, we call the dimensions using the key word "dims"
ds.dims

In [None]:
# here, we call the coordinates using the key word "coords"
ds.coords

In [None]:
# here, we use the key word "attrs". Can you guess what this key word calls?
ds.attrs

In [None]:
# here, we call the data variables using the key word "var"
ds.var

### Selecting and indexing data
Remember earlier when we said that DataArrays and Datasets are different? Indexing and selecting data is one of the places that distinguish DataArrays and Datasets. Right now, we have a Dataset. This is a collection of DataArrays. We can select one of these DataArrays from the whole Dataset using the variable name.

In [None]:
# get a DataArray from our Dataset
o_an = ds['o_an']
print(o_an)

The biggest difference between $o\_an$ and $ds$ is that $o\_an$ is only one variable from the Dataset, but $ds$ has all the data variables. Therefore, $o\_an$ is a DataArray.

xarray has a relatively straightforward way of allowing us to pick out the data that we want based on the coordinates of our data.

In [None]:
# positional indexing of dimensions using integers
# translation: use a number to select the axis
o_an[0,:,:]

In [None]:
# dimension name indexing using integers
o_an.isel(depth=0)

In [None]:
o_an.sel(depth=0)

For example, if I wanted to plot a depth vs. latitude section of the annual mean oxygen along 140˚W, the code to do that would look like this:

In [None]:
oxygen=ds['o_an'].sel(lon=-140, method='nearest') #method=nearest tells xarray to choose the longitude
#in the dataset that is nearest to the value I provided
#WEST IS NEGATIVE!!!
oxygen

## Data visualization
Now that we know how to select the data of interest, let's work on visualizing our data. Here, we will use the Matplotlib commands.

There are a couple of different ways to make plots, which I'll go through below:
1. Option 1: The simplest way is to use xarray's built in plotting method:

In [None]:
oxygen.plot(yincrease=False, vmin=0,vmax=400,cmap=plt.cm.plasma,figsize=(8,5))
#vmin and vmax change the colorbar min and max limits!
print('min:', oxygen.min().values, 'max: ', oxygen.max().values)
#cmap is the plot colormap

In [None]:
o_an.sel(depth=0).plot(vmin=0,vmax=400,cmap=plt.cm.plasma,figsize=(8,5))

2. Option 2: Some finer control over plots. Here's another way to make the same plot above, but now you can add your own title, choose your own colormap, etc, which is what I'd prefer you do, and what I use in the examples that follow. If you want to choose a different colormap, you can find all the standard python colormaps and their names [here](https://matplotlib.org/stable/tutorials/colors/colormaps.html)

### Saving figures
You'll need to save your figures and put them in your final lab report. We'll continue on with the example above but this time save the figure we created. The code will save a figure called figure1.pdf in your current working directory. If you can't remember what your current working directory is, open a new line below, type ```pwd```, and hit enter

In [None]:
fs=14 #fontsize
#set up the plot
fig,ax=plt.subplots()
fig.set_size_inches(8,5)
#make the actual plot
cs=ax.pcolormesh(ds.lat,-ds.depth,oxygen,vmin=0,vmax=400, cmap=plt.cm.viridis)
#add the colorbar
cbar=plt.colorbar(cs)
#add plot labels
plt.ylabel('Depth (m)',fontsize=fs)
plt.xlabel('Latitude',fontsize=fs)
plt.title('Oxygen at 140˚W',fontsize=fs)
cbar.set_label(r'O$_{2}$ ($\mu$mol kg$^{-1}$)',rotation=270,labelpad=25,fontsize=fs)

#SAVE THE FIGURE
#this will save a figure called figure1.pdf in your current working directory
#plt.savefig('figure1.pdf')