# 01 Open NetCDF file (ERA5 reanalysis)

In this notebook, I will quickly demonstrate how we can:

- open a netcdf file (ERA5 reanalysis) 
- export a value from the NetCDF file into the CSV file

There are a number of Python libraries to open and process the NetCDF files such as xarray, netCDF4. 
- the netCDF4 (covered in another video)
- xarray

Please refer to the following about the python libraries if interested.

Useful links:
http://xarray.pydata.org/en/stable/
https://unidata.github.io/netcdf4-python/netCDF4/index.html

Here is the link to download the data that I am using in this notebook
https://drive.google.com/open?id=18sG05s6kOE3N5ta1ChcheLxVQlUQw1Iq

Acknowledgement:
https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era5

In [1]:
# first import libraries
import xarray as xr
import numpy as np
import pandas as pd
print('all libraries are loaded')

all libraries are loaded


In [2]:
# specify where is the location of the data
path_in = "data/"
path_out = "./"

In [3]:
# open the data
df = xr.open_dataset(path_in + 'ERA5_reanalysis.nc')

In [4]:
# look at the data to see what it is
print(df)

<xarray.Dataset>
Dimensions:    (latitude: 721, level: 10, longitude: 1440, time: 2)
Coordinates:
  * longitude  (longitude) float32 0.0 0.25 0.5 0.75 ... 359.25 359.5 359.75
  * latitude   (latitude) float32 90.0 89.75 89.5 89.25 ... -89.5 -89.75 -90.0
  * level      (level) int32 1 7 50 150 250 450 650 800 900 1000
  * time       (time) datetime64[ns] 2020-03-10 2020-03-10T06:00:00
Data variables:
    r          (time, level, latitude, longitude) float32 ...
    t          (time, level, latitude, longitude) float32 ...
    u          (time, level, latitude, longitude) float32 ...
    v          (time, level, latitude, longitude) float32 ...
Attributes:
    Conventions:  CF-1.6
    history:      2020-03-20 00:22:45 GMT by grib_to_netcdf-2.16.0: /opt/ecmw...


In [5]:
print(df.coords)

Coordinates:
  * longitude  (longitude) float32 0.0 0.25 0.5 0.75 ... 359.25 359.5 359.75
  * latitude   (latitude) float32 90.0 89.75 89.5 89.25 ... -89.5 -89.75 -90.0
  * level      (level) int32 1 7 50 150 250 450 650 800 900 1000
  * time       (time) datetime64[ns] 2020-03-10 2020-03-10T06:00:00


In [6]:
print(df.dims)

Frozen(SortedKeysDict({'longitude': 1440, 'latitude': 721, 'level': 10, 'time': 2}))


In [7]:
# print to see what dimensions the NetCDF has
print(df.attrs)

{'Conventions': 'CF-1.6', 'history': '2020-03-20 00:22:45 GMT by grib_to_netcdf-2.16.0: /opt/ecmwf/eccodes/bin/grib_to_netcdf -S param -o /cache/data2/adaptor.mars.internal-1584663762.9685905-23199-34-0e48a986-23ab-4f27-a5c4-2f3a006314b0.nc /cache/tmp/0e48a986-23ab-4f27-a5c4-2f3a006314b0-adaptor.mars.internal-1584663762.9691193-23199-7-tmp.grib'}


In [8]:
# extract the temperature from the data
temp = df['t']

In [9]:
# print the temperature variable
print(temp)

<xarray.DataArray 't' (time: 2, level: 10, latitude: 721, longitude: 1440)>
[20764800 values with dtype=float32]
Coordinates:
  * longitude  (longitude) float32 0.0 0.25 0.5 0.75 ... 359.25 359.5 359.75
  * latitude   (latitude) float32 90.0 89.75 89.5 89.25 ... -89.5 -89.75 -90.0
  * level      (level) int32 1 7 50 150 250 450 650 800 900 1000
  * time       (time) datetime64[ns] 2020-03-10 2020-03-10T06:00:00
Attributes:
    units:          K
    long_name:      Temperature
    standard_name:  air_temperature


In [10]:
# extract the temperature from the data
temp_ = df.t.values

In [11]:
# print to see temp_'s values
print(temp_)

[[[[256.88995 256.88995 256.88995 ... 256.88995 256.88995 256.88995]
   [256.34045 256.34045 256.34045 ... 256.34045 256.34045 256.34045]
   [255.949   255.949   255.949   ... 255.94711 255.94711 255.949  ]
   ...
   [258.68533 258.6872  258.6872  ... 258.68155 258.68344 258.68533]
   [259.0711  259.0711  259.0711  ... 259.06924 259.06924 259.0711 ]
   [259.4042  259.4042  259.4042  ... 259.4042  259.4042  259.4042 ]]

  [[230.19673 230.19673 230.19673 ... 230.19673 230.19673 230.19673]
   [230.03677 230.0349  230.033   ... 230.04054 230.03865 230.03677]
   [229.82411 229.82224 229.82036 ... 229.83165 229.82976 229.826  ]
   ...
   [230.7632  230.76508 230.76508 ... 230.75943 230.76132 230.76132]
   [230.88176 230.88176 230.88176 ... 230.87988 230.88176 230.88176]
   [230.98526 230.98526 230.98526 ... 230.98526 230.98526 230.98526]]

  [[193.39758 193.39758 193.39758 ... 193.39758 193.39758 193.39758]
   [193.60648 193.60648 193.60648 ... 193.60648 193.60648 193.60648]
   [193.80972 19

In [12]:
# print to the shape of the temperature array
print(temp_.shape)

(2, 10, 721, 1440)


In [13]:
# pick a temperature value at one grid point (1 time step, all vertical levels)
new_temp = temp_[0,:,90,500]

In [14]:
# print to see what it has
print(new_temp)

[262.94788 231.595   193.74573 202.00362 205.45695 220.11903 235.73337
 243.7786  248.20111 251.30254]


In [15]:
print(np.round(new_temp,2))

[262.95 231.6  193.75 202.   205.46 220.12 235.73 243.78 248.2  251.3 ]


In [16]:
print(new_temp.shape)

(10,)


the shape of temperature array is 10 which is corresponding to the number of vertical level

# Now we will export this temperature array to CSV file

In [17]:
# just index
ind            = np.arange(0, len(new_temp))
# make a Python dictionary
our_dictionary = {'ind' : ind, 'temp': np.round(new_temp,2)}
df_out         = pd.DataFrame(our_dictionary, columns=['ind', 'temp'])
# export to CSV
export_csv     = df_out.to_csv(path_out +'temperature_xarray.csv', index=None, header=True)

# Now we will open this CSV file again

In [18]:
# Open the CSV file
df_csv = pd.read_csv(path_out + 'temperature_xarray.csv')

In [19]:
print(df_csv)

   ind    temp
0    0  262.95
1    1  231.60
2    2  193.75
3    3  202.00
4    4  205.46
5    5  220.12
6    6  235.73
7    7  243.78
8    8  248.20
9    9  251.30


# All done !!!

- Please feel free to let me know if there is any analysis that you would like me to do or instruct
- Please subscribe my youtube too
- Thank you very much