# Python on the VDI: Part IV
## NetCDF Data


**The following will go through how to:** <br \>
- Using iPython Notebooks with NetCDF data within the VDI. 
   

<br>


### Launch the Jupyter Notebook application

Load the required python modules and launch jupyter:

    $ jupyter notebook
   
<div class="alert alert-info">
<b>NOTE: </b> For more information on setting up Jupyter on the VDI, see <b>Python on the VDI: III </b>. 
</div>

<br>


## Find some NetCDF data

In this example, we will use a file from the Geoscience Australia Geophysics National Coverages Collection:

    /g/data1/rr2/National_Coverages/magmap_v6_2015_VRTP/magmap_v6_2015_VRTP.nc
    

and we are going to compare direct vs. remote access. Timings (using the `%%time` magic function) will also be shown to help illustrate when it can be useful to conduct analysis on the filesystem.

#### Local path on /g/data

In [1]:
path = '/g/data1/rr2/National_Coverages/magmap_v6_2015_VRTP/magmap_v6_2015_VRTP.nc'

#### OPeNDAP Data URL

For more information on where to find OPeNDAP URL's, see:
<a href="https://nbviewer.jupyter.org/github/nci/Notebooks/blob/master/Using_Thredds/THREDDS_DataAccess.ipynb">THREDDS Data Server: Data Access</a>



In [2]:
url = 'http://dapds00.nci.org.au/thredds/dodsC/rr2/National_Coverages/magmap_v6_2015_VRTP/magmap_v6_2015_VRTP.nc'

## Open files

In [3]:
from netCDF4 import Dataset

In [4]:
%%time

f1 = Dataset(path)

CPU times: user 2 ms, sys: 4 ms, total: 6 ms
Wall time: 91 ms


In [5]:
%%time

f2 = Dataset(url)

CPU times: user 10 ms, sys: 10 ms, total: 20 ms
Wall time: 116 ms


## Extract a small subset

<br></br>
<div class="alert alert-info">
One big advantage of working directly on the filesystem is that data access is much faster. For modest subsets, the difference is quite small but as you work with larger data, remote access can become much slower or even exceed NCI's THREDDS Data Server memory limits. 
</div>

#### File variables

In [6]:
vars = f2.variables.keys()
for item in vars:
    print 'Variable: \t', item
    print 'Dimensions: \t', f2[item].dimensions
    print 'Shape:    \t', f2[item].shape, '\n'

Variable: 	lat
Dimensions: 	(u'lat',)
Shape:    	(41882,) 

Variable: 	lon
Dimensions: 	(u'lon',)
Shape:    	(50591,) 

Variable: 	crs
Dimensions: 	(u'maxStrlen64',)
Shape:    	(64,) 

Variable: 	mag_tmi_rtp_anomaly
Dimensions: 	(u'lat', u'lon')
Shape:    	(41882, 50591) 



#### Extract: Remotely

In [7]:
%%time

lat = f2.variables['lat'][:1000]
lon = f2.variables['lon'][:1000]

mag = f2.variables['mag_tmi_rtp_anomaly'][:1000,:1000]

CPU times: user 36 ms, sys: 24 ms, total: 60 ms
Wall time: 246 ms


#### Extract: Locally

In [8]:
%%time

lat = f1.variables['lat'][:1000]
lon = f1.variables['lon'][:1000]

mag = f1.variables['mag_tmi_rtp_anomaly'][:1000,:1000]

CPU times: user 10 ms, sys: 3 ms, total: 13 ms
Wall time: 25.4 ms


## Extract the full file

<br></br>
<div class="alert alert-info">
You will notice the remote example below results in an "Access failure" because it is too large a request. 
</div>



#### Extract: Remotely

In [9]:
%%time

lat = f2.variables['lat'][:]
lon = f2.variables['lon'][:]

mag = f2.variables['mag_tmi_rtp_anomaly'][:,:]

RuntimeError: NetCDF: Access failure

#### The above request exceeds remote access limit, would need to be requested in multiple chunks. 

#### Extract: Locally

In [10]:
%%time

lat = f1.variables['lat'][:]
lon = f1.variables['lon'][:]

mag = f1.variables['mag_tmi_rtp_anomaly'][:,:]

CPU times: user 39.8 s, sys: 5.01 s, total: 44.8 s
Wall time: 49.3 s
