# NetCDF

Explore usage of netcdf files with xarray.

We are going to look at some exemplary output from CMIP6. See https://pcmdi.llnl.gov/CMIP6/Guide/dataUsers.html#3-accessing-model-output for guidance where to find and how to access CMIP6 data. Go to the [DKRZ node](https://esgf-data.dkrz.de/search/cmip6-dkrz/) to search for specific datasets.

For testing, we would like to analyze data from the NCAR CESM model as follows:
* MIP Era: CMIP6
* Activity: CMIP
* Institution ID: NCAR
* Nominal resolution: 100 km
* CF_standard_name: air_temperature, specific_humidity
All other fields are left blank.

This gives a rather extensive list...  More or less arbitrarily, I clicked on "Thredds catalogue" for one of the entries and get a link to the file [ta_day_CESM2_amip_r2i1p1f1_gn_20100101-20150101.nc](http://esgf-data.ucar.edu/thredds/catalog/esgcet/913/CMIP6.CMIP.NCAR.CESM2.amip.r2i1p1f1.day.ta.gn.v20190220.html?dataset=CMIP6.CMIP.NCAR.CESM2.amip.r2i1p1f1.day.ta.gn.v20190220.ta_day_CESM2_amip_r2i1p1f1_gn_20100101-20150101.nc). Couldn't figure out how to download via this link. Clicked on "wget script" and copied url from there.

In [None]:
from os import access, R_OK
from os.path import isfile
import requests as r
import xarray as xr
from pprint import pprint

In [None]:
# broken: url = "http://esgf-data.ucar.edu/thredds/catalog/esgcet/913/CMIP6.CMIP.NCAR.CESM2.amip.r2i1p1f1.day.ta.gn.v20190220.html?dataset=CMIP6.CMIP.NCAR.CESM2.amip.r2i1p1f1.day.ta.gn.v20190220.ta_day_CESM2_amip_r2i1p1f1_gn_20100101-20150101.nc"
url = "http://esgf-data.ucar.edu/thredds/fileServer/esg_dataroot/CMIP6/CMIP/NCAR/CESM2/amip/r2i1p1f1/day/ta/gn/v20190220/ta_day_CESM2_amip_r2i1p1f1_gn_20100101-20150101.nc"

# wget fails: !wget url
# open the dataset directly from the web
# direct open with xarray also fails: ds = xr.load_dataset(url)

# downloading via requests seems to work (takes quite a while, though)
filename = "/data/mschul98/esdp1/" + url.rsplit('/', 1)[1]

# check if file exists. If not, download it from the ESGF server
if isfile(filename) and access(filename, R_OK):
    print(f"File {filename} already available.")
else:
    response = r.get(url)
    with open(filename, "wb") as f:
        f.write(response.content)
        print(f"Downloaded file written to {filename}.")

In [None]:
# now open the dataset in xarray and print a summary
with xr.open_dataset(filename) as ds:
    print(ds)

In [None]:
# pretty print the global attributes
pprint(ds.attrs)

In [None]:
# pretty print the variable attributes
pprint(ds["ta"].attrs)

In [None]:
# cut out continental US (Note: use http://bboxfinder.com tool)
bbox = [-130.429688,23.563987,-60.292969,50.958427]
bbox = [x+360. if x < 0. else x for x in bbox]
ds_us = ds.sel({'lon':slice(bbox[0],bbox[2]), 'lat':slice(bbox[1],bbox[3])})
print(ds_us)
# print lon and lat values
print(f"longitudes: {ds_us['lon']}\n latitudes: {ds_us['lat']}")

In [None]:
# save the result as a new netcdf file
outfile = "/data/mschul98/esdp1/CESM_ta_us-only.nc"
ds_us.to_netcdf(outfile, "w", format='NETCDF4', engine='h5netcdf')  # format and engine are optional
!ls -l {outfile}