Intro to Scientific Programming, developed by Lily N. Zhang (lilynzhang.com/teaching/)

## Lesson Three

# Plotting

### Review

Last time, we learned about arrays:
* What they are: n-dimensional structures that can store data of any type (e.g., integers, strings)
* How we use them: indexing, computing means, statistics across an axis, plotting

### Lesson Objectives

Today, we will learn:
* How to download and process data arrays
* More on plotting
We will learn this through two examples--the best way to learn is by doing!

## Basic Concepts

First, let's create a fake dataset in NumPy to demonstrate some basic concepts:

In [2]:
import numpy as np
x = np.linspace(1,10,10) # this creates an array from x1 to x2 with n points
y = x*2 # we can multiply arrays by scalars in this way -- be careful when multiplying two arrays

Now, we're going to import the `matplotlib` library and use its `pyplot` module:

Some other things we can change are the x- and y-ticks, the tick labels, the font size, add an additional y-xais, the limits and scale of the axes, figure size, and more! Later on, when we start looking at a 2-D dataset, we'll be adding a colorbar with its own limits, ticks, and labels.

## CO2 Example: Plotting 1-D Arrays

### Data Download

We're going to download the .txt file that contains Mauna Loa annual mean CO2 measurements from here: https://gml.noaa.gov/ccgg/trends/data.html and add it to our directory (whatever folder we're working out of). We can read in .txt files using the `.loadtxt()` function in NumPy:

We know that the year is the first column and the corresponding annual mean CO2 is the second column, so let's extract them:

### Line Plots using `.plot(x1,y2)`

### Modiying Line Properties

From the matplotlib documentation, we learn that we can specify properties such as `linestyle`, `linewidth`, `marker`, and `color`:

### Axes Properties

### Scatter Plots `.scatter(x1,y1)`

Using similar syntax, we can call the `.scatter` function

Scatter plots are good if you have a lot of samples that result in noisy data. We can also adjust the axes and add labels to this graph as for the previous examples.

## CONUS Precipitation Example: Plotting 2-D and 3-D Arrays, Geospatial Data

### NetCDF Files

Larger datasets (e.g., from satellite) are typically stored as **NetCDF files**. This format allows the creator to save n-dimensional data arrays and include additional information about the content. Let's read a NetCDF file that contains monthly average precipitation over CONUS for 70 years.

This tells us that there are four variables: lat, lon, time, and precip (which has dimensions of time x lat x lon). We are going to extract each of them using the `.variables` command:

Looks like there are some placeholders for missing values that are causing issues. We're going to replace their missing value with `nan`s

So we have a 846 x 120 x 300 (time x lat x lon) array called precip. We also have 1-D arrays that tell us what the time, lat, and lon indices are. One easy first step we can do is average across time and get the average precipitation over the entire time period for each grid point:

Something a little more difficult, but potentially more useful is to separate the data by month. To do that, we note that the time array starts and ends at:

What do these time stamps mean? To understand the units, we have to reference the NetCDF file data and find that the units of time are "hours since 1/1/1700." What??

To get our time indices into a human-readable format, we're going to use the `datetime` module:

Perfect! Now, let's eliminate the last 6 months so we have a complete set of years:

Now precip is 840 x 120 x 300! Nice. Finally, we want to get this into year x month x lat x lon format, which we do by noting 840 months = 70 year x 12 months in a year. We can use NumPy to separate them by using the reshape function:

This tells us that the first index, based on how NumPy reshaping works, should be the number of years.

Finally, to get the mean for each month, we can take the average across the year dimension (axis 0):

### Contour Plots

Contour plots allow us to plot 2-D data (e.g., over lat and lon coordinates). In this case, the color will correspond to the value of precip at each grid point. Let's try a plot for May:

Wow! There's lots that we can do with these contour plots, but for now, let's experiment with different colorbars: https://matplotlib.org/stable/users/explain/colors/colormaps.html