# Problem Set #5

In this problem set, we will be delving a little bit deeper into a data type that is commonly used in climate science. We will be using packages and functions we learned from previous problem sets to do simple analyses on climate related data.

## Learning objectives: 
1. Explain data structures commonly used in climate science
2. Accumulate and apply skills from previous problem sets

**Total points for this problem set: 21 pts**
*   Example codes executed: 5 pts
*   Correct answers to problems: 11 pts
*   Comments added to responses: 5 pts

**Please do not forget to add comments (with the # sign) next to your code for all of the problems to explain what you are doing.**

## #1. How is climate data usually stored?

Often times, climate data is stored in netCDF (.nc) format. Although we didn't go into detail about netCDF files, we already used the this type of file in Problem Set #3. netCDF files are commonly used in climatology, meteorology, and oceanography. They are able to store observed data to respective location on earth. 

Below I'm going to read in data from a "[reanalysis](https://climatedataguide.ucar.edu/climate-data/atmospheric-reanalysis-overview-comparison-tables)" dataset. Essentially reanalysis uses a climate model to fill in the spatial and temporal gaps between observations. So it's not quite observations, it's not quite model output, it's something different. But it's often used in climate analysis because it is heavily vetted, is nearly observational, and is a powerful tool for climate research. So the data we're going to use comes from the [NCEP/NCAR Reanalysis](https://psl.noaa.gov/data/gridded/data.ncep.reanalysis.html).

We will be using NCEP/NCAR Reanalysis data to assess heat waves. Let's read in a NCEP average daily surface temperature in 2010 and make map plots of the first day's values: 

In [None]:
# Let's reimport the packages we were using in the previous problem set
import cartopy
import cartopy.crs as ccrs
import netCDF4

In [None]:
# download with curl instead of wget
import xarray as xr # we're going to use xarray to do this analysis, which 
!curl -O https://psl.noaa.gov/thredds/fileServer/Datasets/ncep.reanalysis.dailyavgs/surface/air.sig995.2010.nc
    
f = xr.open_dataset('air.sig995.2010.nc')
f # print the metadata

Take a minute with what just printed to the screen. Click on some of the icons on the right of what printed. We got the coordinate info -- what latitude and longitude and timestep the data refer to and we've got the data itself, called "air", which has dimensions of time, lat, and lon. This is surface air temperature, or SAT. If you click on the little "print-like" icon, you can see more information about it. What are the units? Hmm, we'll probably want those in something we know better, like Celsius.

Great. Let's take a quick look at it:

In [None]:
# We can extract certain aspects from this dataset
air = f.air

# Let's see what that looks like
air

In [None]:
# We want to view the temperature in Celsius, not Kelvin
# So, to convert, we will subtract 273.15 from the Kelvin temperature
air_c = air[:] - 273.15

Let's see what this looks like:

In [None]:
air_c

### Problem #1 

Describe how `air_c` object is organized, or what it looks like, in your own words. 

In [None]:
# ENTER REPONSE HERE

## #2. Visualizing climate data

We have seen many spatial plots during our lectures and have even plotted simple ones in the previous problem set. Let's start plotting our own complex spatial plots using the data we just downloaded.

In [None]:
# Let's view the daily air temperature as a spatial plot
import matplotlib.pyplot as plt # Import matplotlib
fig=plt.figure(figsize=(12, 8) )
lat = f.lat # index latitude
lon = f.lon # index longititude

plt.imshow(air_c[0,:,:],cmap=plt.cm.get_cmap('jet'))

Hmm, we can vaguely see the outlines of continents. Also, there's no colorbar to explain the magnitude of the air temperature. Let's try something different.

In [None]:
# Let's view the daily air temperature as a spatial plot
import matplotlib.pyplot as plt # Import matplotlib
import numpy as np
fig=plt.figure(figsize=(12, 8) )
lat = f.lat # index latitude
lon = f.lon # index longititude
import cartopy.crs as ccrs

ax=plt.axes(projection=ccrs.Orthographic(central_longitude=-100, central_latitude=45))
air_c[0,:,:].plot.imshow(ax=ax, transform=ccrs.PlateCarree(),cmap=plt.cm.get_cmap('jet'))
ax.coastlines()
ax.gridlines()

# Add a dot showing the location of Hanover:
hlat = 43.7; hlon = -72.29
ax.scatter(hlon, hlat, transform=ccrs.PlateCarree(), c='black', marker='*', s=1000)
plt.show()

### Problem #2

2A - Plot the same orthographic projection of the air temperature as above. This time, add a dot showing the location of your favorite city in the United States. 

In [None]:
# ENTER CODE HERE

2B - Interpret the surface air temperature of your favorite city using the plot. 

In [None]:
# ENTER REPONSE HERE

## #3. Data Analysis

Ok, so we've got absolute surface air temperatures plotted on January 1st. 
Nothing too special there. It would be great if we could see these same data with some kind of reference point. Like if we somehow knew whether the temperature on this days was higher, lower, or consistent with the average temperature during that month. 

What we want is called an anomaly. A value expressed as a departure from an average. Typically, we do it from a long-term average, like all of the January 1s from the previous 30 or so years. But here we've only got one year of data. So let's get some context by assessing January 1 relative to the January average. First, we need to calculate the monthly average for these data:

In [None]:
# First, we're going to use the power of xarray to group the data by month and then
# Take the average across all of those days falling into that group.
air_clim = air_c.groupby('time.month').mean(dim='time') 
air_clim.sizes

Ok, so that looks like it worked. We now have an array called `air_clim` that is the monthly average surface air temperature. Let's see what that looks like as a time series at a random place, like near Hanover: 

In [None]:
air_clim.sel(lon=-72.29, lat=43.7, method='nearest').plot() #plot the long-term daily average 
air_clim

Ok, seems like monthly data. We have Hanover warming to a peak in July (month 7) and cooling into December. Passes the smell test! 

Now we need to estimate how each day in each month departs from that monthly mean. We do that by subtracting out the corresponding monthly average from each day:

In [None]:
air_anom = air_c.groupby('time.month') - air_clim
air_anom.sizes

Ok, `air_anom` has the anomalies for all of the longitude and latitude coordinates. We want to  take a look at a time series of these anomalies at Hanover.

We'll specify the longitude as -72.29 and latitude as 43.7 for the coordinates for Hanover. 

In [None]:
# Plot the long-term daily average 
air_anom.sel(lon=-72.29, lat=43.7, method='nearest').plot() 

Cool! Looks like we have each day's air temperature as a departure from the monthly average. Not fully a climatology, but it gives us some reference for whether or not a day is warmer or cooler than expected. 

Let's look at that Orthographic projection of surface air temperatures for January 1 again, but this time as our anomaly:

In [None]:
# Let's view the anomaly as a spatial plot

fig=plt.figure(figsize=(12, 8) )
lat = f.lat # index latitude
lon = f.lon # index longititude

ax=plt.axes(projection=ccrs.Orthographic(central_longitude=-100, central_latitude=45))
air_anom[0,:,:].plot.imshow(ax=ax, transform=ccrs.PlateCarree(),cmap=plt.cm.get_cmap('jet'))
ax.coastlines()
ax.gridlines()

# Add a dot showing the location of Hanover:
hlat = 43.7; hlon = -72.29
ax.scatter(hlon, hlat, transform=ccrs.PlateCarree(), c='black', marker='*', s=1000)
plt.show()

### Problem #3

3A - Plot a line graph of the temperature anomaly for Russia (45N, 80E).

In [None]:
# ENTER CODE HERE

3B -  Plot, using an orthographic projection like the one above, the temperature anomaly for Russia (45N, 80E) in August. 

Hint #1: First, you can use the operator `.loc` on the anomaly data to find the August.

Please comment on each line of code you write, explaining the purpose of each code, even if the code is being copied.


In [None]:
# ENTER CODE HERE

3C - Explain what you see.

In [None]:
# ENTER RESPONSE HERE

### Problem #4

4A - Use the `!curl -O` function we used earlier to download the air temperature data, but this time, download the monthly surface pressure data from NCEP. 
The data is at this link: `https://downloads.psl.noaa.gov/Datasets/ncep.reanalysis2/Monthlies/surface/mslp.mon.mean.nc`

In [None]:
# ENTER CODE HERE

In [None]:
# View the data

4B - This is monthly surface pressure data from 1979 to the present. What are the units?

In [None]:
# ENTER RESPONSE HERE

4C - Calculate each month as an anomaly from all other months in the dataset.

In [None]:
# ENTER CODE HERE

4D - Plot the surface pressure for the preceding month of July 2010 over Russia using the same orthographic projection. 

In [None]:
# ENTER CODE HERE

4E - Interpret your plot.

In [None]:
# ENTER RESPONSE HERE

### Who did you work on this problem set with? What was the nature of that collaboration?

# Process Log (to complete):

How did you approach these problems?

Did you run into any errors or confusing results? What were they?

What changes did you make to fix or improve your code?

What was the most important thing you learned from this exercise?