# Nov. 1 | Exploring netCDFs: The earth scientist's favorite file format
![netcdf imnage](http://desktop.arcgis.com/en/arcmap/10.3/manage-data/netcdf/GUID-D872A4C3-749E-4159-A6C0-FB6D3B47C5D8-web.gif)

What are netCDF files? The acronym stands for Network Common Data Form, and they're a way of formatting data that makes it easy for other scientists to share and read data on different computers, with different operating systems, with different software etc... without running into issues or struggling to understand someone else's work. 

netCDF files are in what we call an array-oriented dataset. Data is stored in arrays, which are like grids, and can be accessed by selecting the appropriate row and column. Here's an example of a 2D array:
![2-D array](https://www.dyclassroom.com/image/topic/c/2d-array/2d-array.jpg)

With netCDF files, our rows, columns, and other indices are called dimensions, and they can take values such as latitude, longitude and time. <img src="https://simulatingcomplexity.files.wordpress.com/2014/11/netcdf-file-structure.png" width="400">

Let's try to explore this file format with an actual file. Make sure you have the file sea_surface_temp.nc in your GitHub repository. This is a dataset of sea-surface temperatures, collected for the Intergovernmental Panel on Climate Change. Our first step is to import netCDF4.Dataset, one of the main tools we use for viewing netCDF files.

In [1]:
from netCDF4 import Dataset #import Dataset from the netCDF4 package
dataset = Dataset('sea_surface_temp.nc', 'r')  #This opens the dataset as a read-only file (so no changes can be made)
## The two netCDF extensions used are .nc and .cdf

FileNotFoundError: [Errno 2] No such file or directory: b'sea_surface_temp.nc'

In [None]:
print(dataset) #What output do you see when you run this command?

Note that we've now created an object, called dataset, that we can use to access different aspects of the file. We'll use the dot notation (i.e. dataset.blahblahblah) to access different parts of the data.

Let's find out more about this dataset. We'll look at the "metadata," which is basically data about the data. Scientists use this to explain how the data was acquired or made, how old it is, who to contact with questions etc.
First, we'll look at the dataset's "global attributes," which can be accessed by calling ncattrs (shorthand for <b>n</b>et<b>c</b>df **attr**ibutes). 

In [None]:
dataset.ncattrs()


To look at one of these, type in the name of the dataset variable, and add a period (.) and the name of the attribute you want to look at.

In [None]:
print(dataset.title)
print(dataset.contact)
print(dataset.experiment_id)

You can access the dimensions of the dataset by calling **dataset.dimensions**. Notice that the output is a dictionary. We can see the "keys," or dimension names, with **dataset.dimensions.keys()**

In [None]:
print(dataset.dimensions)
print(dataset.dimensions.keys())

If you want to see a specific dimension, you can do so by adding brackets and the dimension name in quotes. i.e. **dataset.dimensions['lat']**

In [None]:
print(dataset.dimensions['time'])

Now that you know the dimensions of this file, try to draw a sketch, like the images at the start of this Jupyter notebook, that show the possible dimensions and how they relate to each other. Don't worry about "bnds" for now.

We can also access the variables of our dataset by typing **dataset.variables**

In [None]:
print(dataset.variables, "\n \n")  #"\n" creates a new empty line so you can separate your output

print(dataset.variables.keys())

These variables have a lot more information, right? Let's look at just one variable: tos. Inspect it by typing **dataset.variables['tos']**

In [None]:
dataset.variables['tos']

How many different attributes can you identify? (standard_name, long_name, cell_methods, \_FillValue, missing_value, original_name, original_units, history, current shape). Look at the second line. It gives the name of the variable, and it also lists three names in parentheses after it. What do you think those names signify?

We can access any one of these attributes by calling it directly. Just add a period at the end of your call to a variable and add in the attribute name.

In [None]:
print(dataset.variables['tos'].units)
print(dataset.variables['tos'].history)

To see all of the attributes of this variable, we can write the following in Python:

In [None]:
for attr in dataset.variables['tos'].ncattrs(): #ncattrs is a shorthand way of saying the attributes of a netCDF file
    print(attr)
    print(getattr(dataset.variables['tos'], attr))  #getattr is a function that takes a variable and an attribute name and returns its value

You may be wondering: Where's the actual data?? So far, we've learning about what variables and dimensions are in this dataset, but we haven't actually seen any numbers or values. 

Let's look at the latitude and longitude values. To do so, you'll call on a variable (i.e. dataset.variables['tos'], as above), but you'll add [:] after it to tell the computer that you want to see the numpy array. 

In [None]:
print("Latitude: ", dataset.variables['lat'][:], "\n") #print the latitude values, and then add a line break to distinguish from longitude
print("Longitude: ", dataset.variables['lon'][:])

What about the sea-surface temperature values (i.e., the actual "data"?)

In [None]:
print(dataset.variables['tos'])
print(dataset.variables['tos'][:].shape)
dataset.variables['tos'][:,50,:]

## 👉netCDF file cheat sheet👈
[This tutorial](http://www.ceda.ac.uk/static/media/uploads/ncas-reading-2015/10_read_netcdf_python.pdf) was written in Python 2.7, so the print command is slightly different, but it's a helpful read to understand how these files work.

Addditionally: 
1. Import the tools to open a dataset: **from netCDF4 import Dataset**
2. Open a dataset: **dataset = Dataset('filename.nc')**
3. View the dataset's attributes: **dataset.ncattrs()**
4. Access a specific attribute: **dataset.attribute_name**
5. View the dataset's dimensions: **dataset.dimensions**
6. View a specific dimension: **dataset.dimensions[ 'name of dimension' ]**
7. View the dataset's variables: **dataset.variables**
8. View a specific variable: **dataset.variables[ 'name of variable' ]**
9. See a variable's values: **dataset.variables[ 'name of variable' ][ : ]**