# Chapter 7 - Example: Atmospheric Data 
### Analyze monthy wind data for a selected region

In this chapter, we exemplify the use of an atmospheric/climate data set, the reanalysis dataset ERA-5, to analyze change in wind vectors at 10m. We characterize its variability over a given region, plot the field and calculate linear trends.

[ERA-5 (ECMWF)](https://registry.opendata.aws/ecmwf-era5/) reanalysis incorporates satellite and in-situ data, and its output variables include ocean, land and atmospheric ones. Therefore, this script can be easily modified for other data. 

In [1]:
import warnings
warnings.simplefilter('ignore') 

import numpy as np
import pandas as pd
import xarray as xr
from calendar import month_abbr # function that gives you the abbreviated name of a month
from calendar import monthrange # gives the number of day in a month
import matplotlib.pyplot as plt 
import hvplot.pandas
import hvplot.xarray
import fsspec
import s3fs
import dask
from dask.distributed import performance_report, Client, progress
import os # library to interact with the operating system

ModuleNotFoundError: No module named 'hvplot'

***
## For this example we select a region, and also a specific month and a range of years to analyze

In [None]:
# Select region by defining latitude and longitude range. 
# ERA-5 data has a 1/4 degree resolution. 
latr = [39, 40] # Latitude range. Make sure lat1 < lat2 since no test is done below to simplify the code. resolution 0.25 degrees
lonr = [-125, -123] # lon1 < lon2. and use the range -180 : 180
# time selection
mon = 5 # month to analyze
iyr = 2000 # initial year. by default, we set it to the start year of ERA5 dataset
fyr = 2021 # final year. by default, we set it to the end year of ERA5 dataset


***
## Acquire data from the AWS cloud

In this case, files are stored in a different format than SST.  ERA5 data is stored in monthly files (of daily data) organized in yearly folders. Then, monhtly files have to be accessed individually.

In [None]:
tdt = list() # initialize a list to store the time index

# v meridional component
print('Acquiring meridional wind v10m')
for iy, y in enumerate(range(iyr, fyr+1)): # for loop over the selected years
    file_location = 'https://era5-pds.s3.us-east-1.amazonaws.com/zarr/'+str(y)+'/'+str(mon).zfill(2)+'/data/northward_wind_at_10_metres.zarr'
    # filename includes: bucket name: era5-pds, year: y (transformed to string type), month: mon, and the name of the variable with extenssion zarr
    ds = xr.open_zarr(file_location,consolidated=True) # open access to data

    # generate time frame to obtain the whole month data (first to last day of selected month)
    dte1 = str(y)+'-'+str(mon).zfill(2)+'-01'
    dte2 = str(y)+'-'+str(mon).zfill(2)+'-'+str(monthrange(y, mon)[1]) #monthrange provides the lenght of the month
    # select data region and time - meridional wind
    vds = ds['northward_wind_at_10_metres'].sel(time0 = slice(dte1,dte2),
                                            lat  = slice(latr[1],latr[0],), 
                                            lon  = slice(lonr[0]+360,lonr[1]+360)
                                           ).mean(axis=0).load() # calculae mean before downloading it
    if iy==0: # if the first year, create an array to store data
        v10_dt = np.full((len(range(iyr, fyr+1)),vds.shape[0],vds.shape[1]), np.nan) # create an array of the size [years,lat,lon]
    v10_dt[iy,:,:] = vds.data # store selected data per year
    
# u component
print('Acquiring zonal wind u10m')
for iy, y in enumerate(range(iyr, fyr+1)):
    file_location = 'https://era5-pds.s3.us-east-1.amazonaws.com/zarr/'+str(y)+'/'+str(mon).zfill(2)+'/data/eastward_wind_at_10_metres.zarr'
    # note that each variable has a distintive file name
    ds = xr.open_zarr(file_location,consolidated=True)

    dte1 = str(y)+'-'+str(mon).zfill(2)+'-01'
    dte2 = str(y)+'-'+str(mon).zfill(2)+'-'+str(monthrange(y, mon)[1])
    uds = ds['eastward_wind_at_10_metres'].sel(time0 = slice(dte1,dte2),
                                            lat  = slice(latr[1],latr[0],), 
                                            lon  = slice(lonr[0]+360,lonr[1]+360)
                                           ).mean(axis=0).load()
    if iy==0: 
        u10_dt = np.full((len(range(iyr, fyr+1)),uds.shape[0],uds.shape[1]), np.nan)
    u10_dt[iy,:,:] = uds.data 
    
    # append month-year time to the list
    tdt.append(str(y)+'-'+str(mon).zfill(2)+'-01') # add first day of month
    


In [None]:
# Build a dataset from the selected data. not only a dataarray since we have 2 variables for the vector
mw10 = xr.Dataset(data_vars=dict(u10m=(['time','lat','lon'],u10_dt),
                                 v10m=(['time','lat','lon'],v10_dt), ),
                    coords=dict(time=tdt,lat=vds.lat.values, lon=vds.lon.values-360),attrs=vds.attrs) 
# Add a wind speed variable
mw10['wsp10m'] = np.sqrt(mw10.u10m**2+mw10.v10m**2) # calculate wind speed
mw10.to_netcdf('./data/ERA5_wind10m_mon'+str(mon).zfill(2)+'.nc') # saving the file for a future use, so we don't have to get data again
mw10 # taking a peek


In [None]:
mw10 = xr.open_dataset('./data/ERA5_wind10m_mon05.nc')
mw10.close()
mw10

***
## Plotting the data

As before, there is a simple way to plot the data for quick inspection, and also a way to make the plot ready for sharing or publication.

In [None]:
# simple plot of data, using the matplotlib function quiver to plot vectors
x,y = np.meshgrid(mw10.lon,mw10.lat) # generate an lat/lon grid to plot the vectors
plt.quiver(x, y, mw10.u10m[0,:,:], mw10.v10m[0,:,:]) 
plt.show()

In [None]:
# Now a more detailed plot
from cartopy.mpl.ticker import LongitudeFormatter, LatitudeFormatter
import cartopy.feature as cfeature
import cartopy.crs as ccrs
from calendar import month_abbr

# Select a region of our data, giving it a margin
margin = 0.5 # extra space for the plot
region = np.array([[latr[0]-margin,latr[1]+margin],[lonr[0]-margin,lonr[1]+margin]]) # numpy array that specifies the lat/lon boundaries of our selected region

# Create and set the figure context
fig = plt.figure(figsize=(8,5)) # create a figure object, and assign it a variable name fig
ax = plt.axes(projection=ccrs.PlateCarree()) # projection type - this one is easy to use
ax.coastlines(resolution='50m',linewidth=2,color='black') 
ax.add_feature(cfeature.LAND, color='grey', alpha=0.3)
ax.set_extent([region[1,0],region[1,1],region[0,0],region[0,1]],crs=ccrs.PlateCarree()) 
ax.set_xticks([*np.arange(region[1,0],region[1,1]+1,1)], crs=ccrs.PlateCarree()) # customize ticks and labels to longitude
ax.set_yticks([*np.arange(region[0,0],region[0,1]+1,1)], crs=ccrs.PlateCarree()) # customize ticks and labels to latitude
ax.xaxis.set_major_formatter(LongitudeFormatter(zero_direction_label=True))
ax.yaxis.set_major_formatter(LatitudeFormatter())

# Plot average wind for the selected month, color is the wind speed
plt.quiver(x, y, mw10.u10m.mean(axis=0), mw10.v10m.mean(axis=0),mw10.wsp10m.mean(axis=0), cmap='jet')
cbar=plt.colorbar()
cbar.set_label('m/s') # color bar label
plt.title('Wind for '+month_abbr[mon]+' ('+str(iyr)+'-'+str(fyr)+')')
#fig.savefig('filename') # save your figure by usinig the method .savefig. python recognized the format from the filename extension. 
plt.show()

*** 
## To analyze the data in time, we select only one point in space. 
But if you want to analyze the entire field, you can:
- Average spatially using .mean(axis=(1,2)) on the variables
- Repeat the analysis for each point (using a `for` loop)
- Or even better: use `xarray` methods to apply a function to the array

In [None]:
print('Latitude values: ', mw10.lat.values)
print('Longitude values: ',mw10.lon.values)

In [None]:
# select a point from the range of latitude and longitude values above
slat = 39 # selected latitude
slon = -124 # selected longitude

In [None]:
# Select data for an specific location, and do a simple plot of each variable
plt.figure(figsize=(12,8))

# meridional wind change
plt.subplot(2,2,1)
plt.plot(range(iyr,fyr+1),mw10.v10m.sel(lat=slat,lon=slon), 'bd-',zorder=2)
plt.axhline(y=0,c='k', alpha=0.4)
plt.ylabel('Wind speed (m/s)')
plt.title('Meridional wind (v), Lat='+str(slat)+', Lon='+str(slon))
plt.grid(zorder=0)

# zonal wind change
plt.subplot(2,2,2)
plt.plot(range(iyr,fyr+1),mw10.u10m.sel(lat=slat,lon=slon), 'go-',zorder=2)
plt.axhline(y=0,c='k', alpha=0.4)
plt.ylabel('Wind speed (m/s)')
plt.title('Zonal wind (u), Lat='+str(slat)+', Lon='+str(slon))
plt.grid(zorder=0)

# wind speed change
plt.subplot(2,2,3)
plt.plot(range(iyr,fyr+1), mw10.wsp10m.sel(lat=slat,lon=slon), 's-',c='darkorange',zorder=2)
plt.axhline(y=0,c='k', alpha=0.4)
plt.ylabel('Wind speed (m/s)')
plt.title('Wind speed, Lat='+str(slat)+', Lon='+str(slon))
plt.grid(zorder=0)

plt.tight_layout()
plt.show()

***
## Now, let's calculate the temporal trend on one of the wind variables, using a first degree linear regression 

In [None]:
# libraries for statistics and machine learning functions
from sklearn.preprocessing import PolynomialFeatures
import statsmodels.api as sm

var='wsp10m' # select a variable from our Dataset

x = np.array([*range(iyr,fyr+1)]).reshape(-1,1) # we generate an array of years, and transpose it by using .reshape(-1,1)
y = mw10[var].sel(lat=slat,lon=slon).values.reshape(-1,1) # selected variable at the selected point

polf = PolynomialFeatures(1) # linear regression (order=1)
xp = polf.fit_transform(x) # generate a array with the years and a dummy / constant variable
mods = sm.OLS(y,xp).fit() # calculate regression model, stored in mods

print(mods.summary()) # each variable of the modell can also be accessed individually

# this summary shows different metrics and significance levels along with the equation variables and constants. 
# for more details see the resources section below

***
# Resources
**Data**
- AWS [ERA-5 (ECMWF)](https://registry.opendata.aws/ecmwf-era5/) reanalysis data.
This page also has links to other tutorials that use other libraries.
- [List of data available](https://github.com/planet-os/notebooks/blob/master/aws/era5-pds.md) on ERA5 and details on how the files are organized.
- Google Earth Engine ERA-5 data. [[Monthly]](https://developers.google.com/earth-engine/datasets/catalog/ECMWF_ERA5_MONTHLY#bands) [[Daily]](https://developers.google.com/earth-engine/datasets/catalog/ECMWF_ERA5_DAILY).

**More on the libraries:**
- [xarray apply](https://www.programcreek.com/python/example/123575/xarray.apply_ufunc) Examples on how to apply a function to an xarray structure
- [sckit-learn (sklearn)](https://scikit-learn.org/stable/) a library for machine learning functions
- [statsmodels](https://www.statsmodels.org/stable/user-guide.html) a library to calculalte statistical models.


