# What is netCDF?

netCDF stands for Network Common Data Form. It is a set of software libraries and a data format to create, share and access arrat oriented data. It is mostly used for scientific data. It is used to store both - data and metadata - in the same file. The following code represents the usage of netCDF.

The following code reads a CSV file. The CSV file shows the results of a Survey Conducted to find the degree of correlation between the shoe size and heights of different individuals. The CSV file can be found here : https://github.com/shraddhasubhedar/netCDF-Tutorial/blob/master/Survey%20for%20Data%20Collection-3.csv 

This tutorial is made publicly available on GitHub. Link: https://github.com/shraddhasubhedar/netCDF-Tutorial  

In [1]:
#Importing packages
import netCDF4
import pandas as pd
import numpy as np
import scipy
import csv

from numpy import arange, dtype 
from netCDF4 import Dataset

In [2]:
#Read the data 
my_data = pd.read_csv("Survey for Data Collection-3.csv")

In [3]:
#Print some rows of data
my_data.head()

Unnamed: 0,Timestamp,Full Name,Shoe size (US size),Height (in inches)
0,2016/09/27 9:08:03 am GMT-4,Shraddha Subhedar,7.5,67.0
1,2016/09/27 9:18:01 am GMT-4,Vijay Gentyala,11.5,73.0
2,2016/09/27 9:22:45 am GMT-4,Aayush Dwivedi,10.5,71.5
3,2016/09/27 9:23:29 am GMT-4,Sidharth Prabhakaran,11.0,71.0
4,2016/09/27 9:25:24 am GMT-4,Shrey Shrivastava,11.0,74.0


In [4]:
#Creating netCDF4 file
my_data_netcdf = Dataset('dataset1.nc', 'w', format='NETCDF4')

In [5]:
#Printing the structure of file
print(my_data_netcdf)

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    dimensions(sizes): 
    variables(dimensions): 
    groups: 



In [6]:
#Create empty arrays
v1 = []
v2 = []
v3 = []
v4 = []

In [7]:
#Read csv file and append arrays
f = open('Survey for Data Collection-3.csv', 'r').readlines()

for line in f[1:]:
    fields = line.split(',')
    v4.append(fields[0])
    v1.append(fields[1])
    v2.append(float(fields[2]))
    v3.append(float(fields[3]))

In [8]:
#Metadata
my_data_netcdf.description = 'Results from survey collected via sending Google forms'
my_data_netcdf.location = 'The responders belonged to India and USA'
my_data_netcdf.source = 'Responses from https://goo.gl/forms/wjDJ57L2S9D1Rqe02'

In [9]:
#Creating Dimensions
my_data_netcdf.createDimension('Timestamp', 0)
my_data_netcdf.createDimension('Full Name', 0)
my_data_netcdf.createDimension('Shoe Size', len(v2))
my_data_netcdf.createDimension('Height',len(v3))

<class 'netCDF4._netCDF4.Dimension'>: name = 'Height', size = 164

In [10]:
#Creating variables
timestamp = my_data_netcdf.createVariable('Timestamp',str,('Timestamp',))
fullname = my_data_netcdf.createVariable('Full Name', str,('Full Name',))
shoesize = my_data_netcdf.createVariable('Shoe Size', 'f4', ('Shoe Size',))
height = my_data_netcdf.createVariable('Height', 'f4', ('Height',))

In [11]:
#Adding attributes
height.units = 'inches'
shoesize.units = 'US size'
timestamp.units = 'YYYY/MM/DD HH:MM:SS am/pm timezone(GMT)'

fullname.long_name = 'First and Last name of the person answering the survey'
timestamp.long_name = 'Date and Time when the data was recieved'
height.long_name = 'Height of the responder in inches'
shoesize.long_name = 'Shoe size of the responder according to US shoe size'

timestamp.calendar = 'gregorian'

In [12]:
#Writing data
for i in range(len(v1)):
    fullname[i] = v1[i]
for i in range(len(v4)):
    timestamp[i] = v4[i]
shoesize[:] = v2[:]
height[:] = v3[:]

In [13]:
#Reading the netCDF file
my_data_output = Dataset('dataset1.nc', 'r')
print(my_data_output) 

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    description: Results from survey collected via sending Google forms
    location: The responders belonged to India and USA
    source: Responses from https://goo.gl/forms/wjDJ57L2S9D1Rqe02
    dimensions(sizes): Timestamp(164), Full Name(164), Shoe Size(164), Height(164)
    variables(dimensions): <class 'str'> [4mTimestamp[0m(Timestamp), <class 'str'> [4mFull Name[0m(Full Name), float32 [4mShoe Size[0m(Shoe Size), float32 [4mHeight[0m(Height)
    groups: 



In [14]:
#Printing columnwise data
shoesize = my_data_output.variables['Shoe Size']
shoesize_chunk = shoesize[:]
print(shoesize_chunk)

height = my_data_output.variables['Height']
height_chunk = height[:]
print(height_chunk)

name = my_data_output.variables['Full Name']
name_chunk = name[:]
print(name_chunk)

time = my_data_output.variables['Timestamp']
timestamp_chunk = time[:]
print(timestamp_chunk)


[  7.5  11.5  10.5  11.   11.    5.    8.   11.   10.   12.   12.   10.5
   8.    5.    8.    8.    7.5   7.   13.    7.    7.   12.   12.   10.
   5.    7.    7.    8.5  10.   10.   11.5   8.   11.   11.    7.5  10.
   7.   11.    6.5   7.   11.    8.   10.    7.    6.   10.    7.    7.
   6.5   8.    6.    7.5   7.   10.5   7.    7.    6.    7.5   5.    7.
   6.5   7.5   8.    5.5   6.    8.5   9.    7.    8.    7.    9.    8.
   8.   10.    8.    7.   10.   11.5   9.    8.    8.    8.   10.   10.
   8.    9.    7.    7.5   8.    7.5   6.    5.    7.    7.    7.    9.
   6.    9.    5.    6.    5.    8.    9.5   7.   10.5   7.    8.    7.
   7.5   9.   12.    7.    7.   10.    8.    6.    7.    6.    7.    8.
  11.    7.    8.    8.    6.    8.   12.    8.    8.    8.    9.   13.
  10.    5.5  10.    5.    5.    7.    8.    9.    7.5   7.5  10.    8.
   8.5  10.   11.    5.    7.   10.5   7.    9.    7.    5.   15.    9.
  11.    7.   11.    7.    8.    7.    9.    9. ]
[ 67.        

In [15]:
#Printing data as a dataframe
df = pd.DataFrame(timestamp_chunk,columns = ['Timestamp'])
df['Full Name'] = name_chunk[:]
df['Shoe Size'] = shoesize_chunk[:]
df['Height'] = height_chunk[:]

df

Unnamed: 0,Timestamp,Full Name,Shoe Size,Height
0,2016/09/27 9:08:03 am GMT-4,Shraddha Subhedar,7.5,67.000000
1,2016/09/27 9:18:01 am GMT-4,Vijay Gentyala,11.5,73.000000
2,2016/09/27 9:22:45 am GMT-4,Aayush Dwivedi,10.5,71.500000
3,2016/09/27 9:23:29 am GMT-4,Sidharth Prabhakaran,11.0,71.000000
4,2016/09/27 9:25:24 am GMT-4,Shrey Shrivastava,11.0,74.000000
5,2016/09/27 9:31:46 am GMT-4,Shruti Bathia,5.0,60.000000
6,2016/09/27 9:33:38 am GMT-4,Pankaj subhedar,8.0,68.000000
7,2016/09/27 9:33:51 am GMT-4,Corey Byrne,11.0,71.000000
8,2016/09/27 9:33:51 am GMT-4,Mitul,10.0,71.000000
9,2016/09/27 9:34:25 am GMT-4,Jim Boulter,12.0,72.000000


In [16]:
#Closing the netCDF file
my_data_output.close()

By running a simple command: ncdump dataset1.nc we can view the entire contents of the netCDF file created. The terminal output can be found on this link: https://github.com/shraddhasubhedar/netCDF-Tutorial/blob/master/Output