# Structure of netCDF file

###  netCDF file

netCDF (network Common Data Form) is a file format for storing multidimensional array data. This is an example of a file with temperature and precipitation data.

![](http://xarray.pydata.org/en/stable/_images/dataset-diagram.png)

A netCDF file includes: 
* **data variables**: temperature, precipitation
* **coordinates**: time, latitude, longitude
* **attributes**: information about how the data is collected, institutions involved etc.

In [9]:
#Import the xarray library which is used to read and analyze netCDF files
import xarray as xr
#Other useful Python libraries
import os

In [13]:
#Change to your data directory
dataPath = '/Users/Brownscholar/Desktop/2020_BridgeUP_Internships/Data'
os.chdir(dataPath)
os.getcwd()


'/Users/brownscholar/Desktop/2020_BridgeUP_Internships/Data'

### Import netCDF file

`open_dataset(file name)` : imports netCDF file

In [15]:
# Load the sea surface temperature dataset
fileName = 'HadISST_sst.nc.gz'
data = xr.open_dataset(fileName)
data

In [17]:
# This is what the display will look like on Sublime so we'll change our display style to reflect that
xr.set_options(display_style="text")
data

### Extract sea surface temperature (SST)

This is a lot of information! But we are only interested in one variable (i.e sst) within the dataset so let's extract that. 

We need to give Python a path to this value much like you would with the path to your Data folder but instead of separating values with `/` you use `.`

In [16]:
# Extract the sst variable
data.sst

Your SST data is saved as a DataArray which is xarray's implementation of a labeled, multi-dimensional array.
It has several key properties:

| Attribute | Description                                                                                                                              |
| --------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
| `data`    | array's values.                                                                              |
| `dims`    | dimension names for each axis. For example:(`x`, `y`, `z`) (`lat`, `lon`, `time`).                                                       |
| `coords`  | values of the dimensions |
| `attrs`   | relevant attributes/metadata (for example: units, research institution)                                                                   |
| `name`    | name of the array                                                                                      |


In [6]:
# Extract dimensions
data.sst.dims

('time', 'latitude', 'longitude')

In [7]:
# Extract coordinates or values of the dimensions
data.sst.coords

Coordinates:
  * time       (time) datetime64[ns] 1870-01-16T11:59:59.505615234 ... 2020-0...
  * latitude   (latitude) float32 89.5 88.5 87.5 86.5 ... -87.5 -88.5 -89.5
  * longitude  (longitude) float32 -179.5 -178.5 -177.5 ... 177.5 178.5 179.5

In [8]:
#You can also extract a particular coordinate such as latitude
data.sst.latitude

In [11]:
# Extract attributes
data.sst.attrs

{'standard_name': 'sea_surface_temperature',
 'long_name': 'sst',
 'units': 'C',
 'cell_methods': 'time: lat: lon: mean'}

In [None]:
# Extract the units of the SST variable


## Practice

In your processing data script (`process_SST_data.py`), 
1. load in the netCDF file `HadISST_sst.nc` using xarray
1. What values will be useful in our analysis? Save these as separate variables so you can call on them easily. 

Use the space below to test out your code.

In [21]:
fileName = 'HadISST_sst.nc.gz'
data = xr.open_dataset(fileName)
data

In [22]:
data.sst

In [24]:
time = data.sst["time"]
print(time)

<xarray.DataArray 'time' (time: 1811)>
array(['1870-01-16T11:59:59.505615234', '1870-02-14T23:59:59.340820312',
       '1870-03-16T11:59:59.340820312', ..., '2020-09-16T12:00:00.000000000',
       '2020-10-16T12:00:00.000000000', '2020-11-16T12:00:00.000000000'],
      dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 1870-01-16T11:59:59.505615234 ... 2020-11-...
Attributes:
    long_name:      Time
    standard_name:  time


In [26]:
latitude = data.sst["latitude"]
print(latitude)

<xarray.DataArray 'latitude' (latitude: 180)>
array([ 89.5,  88.5,  87.5,  86.5,  85.5,  84.5,  83.5,  82.5,  81.5,  80.5,
        79.5,  78.5,  77.5,  76.5,  75.5,  74.5,  73.5,  72.5,  71.5,  70.5,
        69.5,  68.5,  67.5,  66.5,  65.5,  64.5,  63.5,  62.5,  61.5,  60.5,
        59.5,  58.5,  57.5,  56.5,  55.5,  54.5,  53.5,  52.5,  51.5,  50.5,
        49.5,  48.5,  47.5,  46.5,  45.5,  44.5,  43.5,  42.5,  41.5,  40.5,
        39.5,  38.5,  37.5,  36.5,  35.5,  34.5,  33.5,  32.5,  31.5,  30.5,
        29.5,  28.5,  27.5,  26.5,  25.5,  24.5,  23.5,  22.5,  21.5,  20.5,
        19.5,  18.5,  17.5,  16.5,  15.5,  14.5,  13.5,  12.5,  11.5,  10.5,
         9.5,   8.5,   7.5,   6.5,   5.5,   4.5,   3.5,   2.5,   1.5,   0.5,
        -0.5,  -1.5,  -2.5,  -3.5,  -4.5,  -5.5,  -6.5,  -7.5,  -8.5,  -9.5,
       -10.5, -11.5, -12.5, -13.5, -14.5, -15.5, -16.5, -17.5, -18.5, -19.5,
       -20.5, -21.5, -22.5, -23.5, -24.5, -25.5, -26.5, -27.5, -28.5, -29.5,
       -30.5, -31.5, -32.5, -3

In [27]:
longitude = data.sst["longitude"]
print(longitude)

<xarray.DataArray 'longitude' (longitude: 360)>
array([-179.5, -178.5, -177.5, ...,  177.5,  178.5,  179.5], dtype=float32)
Coordinates:
  * longitude  (longitude) float32 -179.5 -178.5 -177.5 ... 177.5 178.5 179.5
Attributes:
    units:          degrees_east
    long_name:      Longitude
    standard_name:  longitude


### Cheatsheet

`open_dataset(file name)`: imports netCDF file

`data_array.data_variable`: extracts data variable such as termperature or precipitation

`data_array.data_variable.dims`: names of the dimensions of the data variable such as time, latitude, ...

`data_array.data_variable.coords`: values of the dimensions of the data variable

`data_array.data_variable.time`: extracts a particular coordinate

`data.sst.attrs`: attributes of the data variable like it's name, units, ...