### Modules used in this notebook
xarray, cfgrib, matplotlib

# Climate: C.02 - Extracting and averaging

Weather and climate data is large and can have many dimenstions, for example climate model data would generally have dimensions [time , latitude , longitude]. For this reason filetypes like .csv .dat are not suitable, and some different formats are used. The most common of these are .pp .netcdf and .grib

To read these files you will need some particular python libraries. There are multiple options (e.g. 'xarray', 'cfpython') but for this example `cfgrib` and `eccodes` are needed to read GRIB files.

> Q1: What is the GRIB format? https://en.wikipedia.org/wiki/GRIB

After reading this you should be happy with how the file type differs from the type of data files you could load into software like Excel.

Once the libraries are installed you will then need to load in some data.
For the prusposes of this example the data that we are using has been downloaded from the climate data store in advance: https://cds.climate.copernicus.eu/#!/home

We will be using data from the ERA5 reanalysis today. For information on what a reanalysis is in broad terms see this page: https://climate.copernicus.eu/climate-reanalysis.

Take this opportunity to look through the extra documentation we provided for more information on reanalysis and other types of weather and climate data: https://research.reading.ac.uk/met-energy/wp-content/uploads/sites/53/2021/09/energymet_education_videos_links_checked.pdf

If there is a particular type of data you are interested in please ask the helpers in the session.

Please explore the climate data store website in your free time. There are good examples of how to download the data to your machine of choice using the 'cdsapi'.

# Opening the file with xarray
`xarray` is a powerful open-source library designed to access and manipulate multi-dimensional data. With the `cfgrib` engine, [developed by ECMWF](https://github.com/ecmwf/cfgrib), we can access GRIB data using the `ecCodes` library that was previously downloaded..

> Q2: what is the structure of a `xarray` dataset? https://docs.xarray.dev/en/stable/user-guide/data-structures.html#dataset


Run the code below to import the xarray librariy and open the dataset.
The file naming convention here tells us some information (e.g. that the data is from era5 and probably from March 2019) but all this information can be checked once the data is opened.


In [None]:
import xarray as xr
d = xr.open_dataset('../data/era5-u100_v100_201903.grib')
d

## Calculating wind speed
`u100` and `v100` are respectively the west-east (known as the zonal component) and the south-north (known as meridional) components.

The wind speed can be calculated is the magnitude of the vector, calculated using the Pythagorean theorem
<!-- <div style="max-width:400px;margin-left: auto; margin-right: 0;">

![windspeed-diagram.png](https://disc.gsfc.nasa.gov/media/image/07af14c37a0a44e482feea5975e1731f/windspeed-diagram.png)

</div> -->

<div>
<img src="https://disc.gsfc.nasa.gov/media/image/07af14c37a0a44e482feea5975e1731f/windspeed-diagram.png" width="500"/>
</div>

Run the line of code below for an example of how to do this using xarray, and then to reprint the open dataset to see the new field within it.

In [None]:
d['ws100'] = (d['u100']**2 + d['v100']**2)**(1/2)
d


> Q3 Can you adapt the code above to also make ' 100m wind speed cubed'  as a variable in the file?

## Extracting a time-series from a location of interest

We may want to calculate timeseries of wind or solar power at a particular location. To do this we need some knowledge of the area covered within the data file (see above).

Run the following lines of code for examples of how to extract a time-series of data by selecting the nearest grid point to a location of interest, and plotting this out.

Note we are using our new 100m wind speed field created in the previous example.


> Q4: Can you adapt the above code to extact some data from an operational wind farm location?

In [None]:
sel_lat = 56.84
sel_lon = 23.88

single_nearest = d.sel(latitude = sel_lat, longitude = sel_lon, method = 'nearest')
print(single_nearest)
single_nearest['ws100'].plot()




## Box average

Another task we may wish to do is extract some data averaged over a region (this may be of interest if you want to create an average energy demand over the region from temperature data).

Run the code example below to see how you can use xarrays 'slice' function to subset the data and then take the mean over the area.


> Q5: Can you adapt this code to find the maximum wind speed over the box?

In [None]:
min_lat = 37.8
max_lat = 43
min_lon = -8
max_lon = -1.8

box_average = d.sel(
    latitude = slice(max_lat, min_lat), 
    longitude = slice(min_lon, max_lon)).mean(dim = ['latitude', 'longitude'])
box_average['ws100'].plot()