# Read and Write a netCDF File
## Overview of Xarray Data Structures
Imagine you have satellite remote sensing observations of the same region over a period of time. This might consist of multiple bands such as red, green, blue and near infrared (NIR), where each spatial pixel will have an observation at each time-step. Each individual band will therefore represent a 3-dimensional data set, consisting of two spatial dimensions and a time dimension. The data sets for the bands run in parallel along the time dimension. 

Xarray has two related data structures that are ideal for representing this type of remotely sensed data. The first is the *data array* which is used to represent the data set for an individual band. The second is a *dataset* which is built by stacking together each band's data array. Continuing the example from above, the data array for each band has only one variable - the value of the band; three dimensions - time, x, y; and three sets of coordinates, one for each dimension. 

### Further Reading
* [Data Structures](https://docs.xarray.dev/en/stable/user-guide/data-structures.html) from the Xarray documentation.

## Setup
It is conventional to give the xarray library the abbreviation `xr`.

In [1]:
import xarray as xr

## Read a netCDF File
The netCDF format is the best choice for storing Xarray data structures, although it is possible to build Xarray data arrays and datasets from collections of geoTIFF files. It is important to distinguish between data arrays and datasets when opening a netCDF file. If you are unsure which of these you have, use `xr.open_dataset()`.

It is conventional to call a generic dataset `ds` and a generic data array `da`. In practice, more descriptive names should be used. Using the suffix `_ds` or `_da` can be helpful as a reminder about the type of data structure that a variable contains.

In [2]:
ds = xr.open_dataset('../Data/netCDF/ds_BM_NP.nc')

## Inspect the Dataset
Type the name of the dataset to get a listing of its contents and properties and run the cell.

This dataset contains Sentinel 2 (A and B) data covering a region of the Blue Mountains National Park in NSW, Australia. It was extracted from the [Digital Earth Australia](https://www.dea.ga.gov.au/) data cube and has two spatial variables, x and y, plus a time variable. The spatial dimensions use the EPSG:3308 coordinate reference system (CRS) and time is measured in months. Click the icons at the end of the rows to view more information about the coordinates and the variables.

In [3]:
ds

### Byte size of the dataset
The `.nbytes` attribute of the dataset contains the size of the dataset in bytes. This can be converted to kilobytes, megabytes etc using division by the appropriate power of 1024.

In [4]:
print('Bytes: '+ str(ds.nbytes))
print('Kilobytes: '+ str(ds.nbytes/1024))
print('Megabytes: '+ str(ds.nbytes/1024**2))
print('Gigabytes: '+ str(ds.nbytes/1024**3))

Bytes: 19587252
Kilobytes: 19128.17578125
Megabytes: 18.679859161376953
Gigabytes: 0.01824204996228218


## Write a netCDF File
This is as simple as calling `.to_netcdf()` on your dataset. An error will occur if the directory does not exist.

In [5]:
ds.to_netcdf('../Output/ds_write_test.nc')