# Example Notebook for Setting up and Installing NETCDF4 Data Format, HDF5 File Format and H5PY on Bluemix

<div class="alert alert-info" role="alert">
  <strong>Background</strong> What started out as a curiosity to help answer a Stack Overflow question about installing h5py on Bluemix turned into this huge notebook hack! :-) In the spirit of sharing to minimize pain, I thought I would document the process to have HDF5 File Format, NECDF4 Data Format and H5PY+NETCDF4 python support within a IBM Bluemix Jupyter notebook.  While these steps require compilation, time and in some cases browser reloading ... fortunately this should be a one time step.  After completion, you'll have the python modules in your global user space python package repo alongwith their pre-requesite native libs.
  
<strong>What does all of this mean??</strong> You'll be able to do python module imports such as <strong>import h5py</strong> OR <strong>import NETCDF4</strong> in any other notebook within this spark service instance.  You'll be able to read HDF5 files.  You'll be able to work wtih NETCDF4 file formats.  Cool!
</div>

In [1]:
# On Linux, Python installations default to the unicode representation of the OS environment (ucs2 vs ucs4)
# Let's check the size of the Unicode character a Python interpreter is using by checking the value of sys.maxunicode
import sys
if sys.maxunicode > 65535:
    print 'UCS4 build'
else:
    print 'UCS2 build'

UCS2 build


## Identify your current working directory
### We will need to declare a few paths

In [2]:
# Author:  Sanjay Joshi (@jStartter) ibm.biz/sanjay_joshi
# Courtesy of jStart - IBM Emerging Technology's client engagement team

import os
import subprocess as sub

p = sub.Popen(['pwd'],stdout=sub.PIPE,stderr=sub.PIPE)
prefix, errors = p.communicate()
prefix = os.sep.join(prefix.split(os.sep)[:-2])
shareDir = prefix + "/.local/share"
hdf5Dir = shareDir + "/notebook_hdf5"
netcdfDir = shareDir + "/notebook_netcdf"
print "prefix = " + prefix
print "shareDir = " + shareDir
print "hdf5Dir = " + hdf5Dir
print "netcdfDir = " + netcdfDir

prefix = /gpfs/global_fs01/sym_shared/YPProdSpark/user/s1a2-472d95bcebf7db-bf066087ecf5
shareDir = /gpfs/global_fs01/sym_shared/YPProdSpark/user/s1a2-472d95bcebf7db-bf066087ecf5/.local/share
hdf5Dir = /gpfs/global_fs01/sym_shared/YPProdSpark/user/s1a2-472d95bcebf7db-bf066087ecf5/.local/share/notebook_hdf5
netcdfDir = /gpfs/global_fs01/sym_shared/YPProdSpark/user/s1a2-472d95bcebf7db-bf066087ecf5/.local/share/notebook_netcdf


In [3]:
# Temp Testing cell to help reset setup
# Uncomment and Run if you want to test modifications/tweaks to this notebook and need to reset to an uninstalled state
#!rm -rf $shareDir
#!rm -rf $prefix/.local/lib/python2.7/site-packages/*
#!rm -rf *

In [4]:
# Let's test to see if we've already built HDF5
isHDF5Installed = os.path.isfile(hdf5Dir + "/hdf5/bin/h5cc")
if isHDF5Installed:
    print "Congratulations! HDF5 is already installed within your notebook user space"
else:
    print "HDF5 is NOT installed within this notebook's user space"
    
# Let's test to see if we've already built netcdf
isNETCDFInstalled = os.path.isfile(netcdfDir + "/netcdf/bin/nc-config")
if isNETCDFInstalled:
    print "Congratulations! NETCDF is already installed within your notebook user space"
    !$netcdfDir/netcdf/bin/nc-config --version
else:
    print "NETCDF is NOT installed within this notebook's user space"

HDF5 is NOT installed within this notebook's user space
NETCDF is NOT installed within this notebook's user space


In [5]:
if not isHDF5Installed:
    !mkdir $shareDir
    !mkdir $hdf5Dir
    !mkdir $hdf5Dir/hdf5
    print "HDF5 directories created to facilitate HDF5 build and install"

HDF5 directories created to facilitate HDF5 build and install


In [6]:
if not isNETCDFInstalled:
    !mkdir $netcdfDir
    !mkdir $netcdfDir/netcdf
    print "NETCDF directories created to facilitate netcdf build and install"

NETCDF directories created to facilitate netcdf build and install


## Fetch Snapshot of hdf5 tar gzip file

In [7]:
if not isHDF5Installed:
    !wget https://www.hdfgroup.org/ftp/HDF5/current/src/hdf5-1.8.17.tar.gz -O $hdf5Dir/hdf5-1.8.17.tar.gz

--2016-05-23 00:31:27--  https://www.hdfgroup.org/ftp/HDF5/current/src/hdf5-1.8.17.tar.gz
Resolving www.hdfgroup.org (www.hdfgroup.org)... 50.28.50.143
Connecting to www.hdfgroup.org (www.hdfgroup.org)|50.28.50.143|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 12304149 (12M) [application/x-gzip]
Saving to: '/gpfs/global_fs01/sym_shared/YPProdSpark/user/s1a2-472d95bcebf7db-bf066087ecf5/.local/share/notebook_hdf5/hdf5-1.8.17.tar.gz'


2016-05-23 00:31:32 (2.19 MB/s) - '/gpfs/global_fs01/sym_shared/YPProdSpark/user/s1a2-472d95bcebf7db-bf066087ecf5/.local/share/notebook_hdf5/hdf5-1.8.17.tar.gz' saved [12304149/12304149]



## Fetch Snapshot of netcdf tar gzip file

In [8]:
if not isNETCDFInstalled:
    !wget https://github.com/Unidata/netcdf-c/archive/v4.4.0.tar.gz -O $netcdfDir/v4.4.0.tar.gz

--2016-05-23 00:31:32--  https://github.com/Unidata/netcdf-c/archive/v4.4.0.tar.gz
Resolving github.com (github.com)... 192.30.252.129
Connecting to github.com (github.com)|192.30.252.129|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://codeload.github.com/Unidata/netcdf-c/tar.gz/v4.4.0 [following]
--2016-05-23 00:31:33--  https://codeload.github.com/Unidata/netcdf-c/tar.gz/v4.4.0
Resolving codeload.github.com (codeload.github.com)... 192.30.252.161
Connecting to codeload.github.com (codeload.github.com)|192.30.252.161|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-gzip]
Saving to: '/gpfs/global_fs01/sym_shared/YPProdSpark/user/s1a2-472d95bcebf7db-bf066087ecf5/.local/share/notebook_netcdf/v4.4.0.tar.gz'

    [              <=>                      ] 17,487,357  4.26MB/s   in 3.9s   

2016-05-23 00:31:37 (4.26 MB/s) - '/gpfs/global_fs01/sym_shared/YPProdSpark/user/s1a2-472d95bcebf7db-bf066087ecf5

## Untar (extract) file

In [9]:
if not isHDF5Installed:
    !tar -zxvf $hdf5Dir/hdf5-1.8.17.tar.gz -C $hdf5Dir >/dev/null
    hdf5SrcDir = hdf5Dir + "/hdf5-1.8.17"
    print hdf5SrcDir

/gpfs/global_fs01/sym_shared/YPProdSpark/user/s1a2-472d95bcebf7db-bf066087ecf5/.local/share/notebook_hdf5/hdf5-1.8.17


In [10]:
if not isHDF5Installed:
    !ls -al $hdf5SrcDir

total 2376
drwxr-xr-x 14 s1a2-472d95bcebf7db-bf066087ecf5 users    4096 May 23 00:31 .
drwxr-xr-x  4 s1a2-472d95bcebf7db-bf066087ecf5 users    4096 May 23 00:31 ..
-rw-r--r--  1 s1a2-472d95bcebf7db-bf066087ecf5 users     683 Apr 26 07:44 ACKNOWLEDGMENTS
-rw-r--r--  1 s1a2-472d95bcebf7db-bf066087ecf5 users    5701 Apr 26 07:44 CMakeFilters.cmake
-rw-r--r--  1 s1a2-472d95bcebf7db-bf066087ecf5 users   24952 Apr 26 07:44 CMakeInstallation.cmake
-rw-r--r--  1 s1a2-472d95bcebf7db-bf066087ecf5 users   40635 Apr 26 07:44 CMakeLists.txt
-rw-r--r--  1 s1a2-472d95bcebf7db-bf066087ecf5 users    4677 Apr 26 07:44 COPYING
-rw-r--r--  1 s1a2-472d95bcebf7db-bf066087ecf5 users    1508 Apr 26 07:44 CTestConfig.cmake
-rw-r--r--  1 s1a2-472d95bcebf7db-bf066087ecf5 users   68969 May  6 16:41 MANIFEST
-rw-r--r--  1 s1a2-472d95bcebf7db-bf066087ecf5 users    1318 Apr 26 07:44 Makefile
-rw-r--r--  1 s1a2-472d95bcebf7db-bf066087ecf5 users    7599 Apr 26 07:44 Makefile.am
-rw-r--r--  1 s1a2-472d95bce

In [11]:
if not isNETCDFInstalled:
    !tar -zxvf $netcdfDir/v4.4.0.tar.gz -C $netcdfDir >/dev/null
    netcdfSrcDir = netcdfDir + "/netcdf-c-4.4.0"
    print netcdfSrcDir

/gpfs/global_fs01/sym_shared/YPProdSpark/user/s1a2-472d95bcebf7db-bf066087ecf5/.local/share/notebook_netcdf/netcdf-c-4.4.0


In [12]:
if not isNETCDFInstalled:
    !ls -al $netcdfSrcDir

total 1584
drwxr-xr-x 22 s1a2-472d95bcebf7db-bf066087ecf5 users   4096 Jan 13 15:40 .
drwxr-xr-x  4 s1a2-472d95bcebf7db-bf066087ecf5 users   4096 May 23 00:31 ..
-rw-r--r--  1 s1a2-472d95bcebf7db-bf066087ecf5 users    292 Jan 13 15:40 .gitignore
-rw-r--r--  1 s1a2-472d95bcebf7db-bf066087ecf5 users   1572 Jan 13 15:40 .travis.yml
-rw-r--r--  1 s1a2-472d95bcebf7db-bf066087ecf5 users    944 Jan 13 15:40 .travis.yml.old
-rw-r--r--  1 s1a2-472d95bcebf7db-bf066087ecf5 users   3427 Jan 13 15:40 CMakeInstallation.cmake
-rw-r--r--  1 s1a2-472d95bcebf7db-bf066087ecf5 users  56801 Jan 13 15:40 CMakeLists.txt
-rw-r--r--  1 s1a2-472d95bcebf7db-bf066087ecf5 users   3186 Jan 13 15:40 COMPILE.cmake.txt
-rw-r--r--  1 s1a2-472d95bcebf7db-bf066087ecf5 users  11363 Jan 13 15:40 CONTRIBUTING.html
-rw-r--r--  1 s1a2-472d95bcebf7db-bf066087ecf5 users   2032 Jan 13 15:40 COPYRIGHT
-rw-r--r--  1 s1a2-472d95bcebf7db-bf066087ecf5 users    666 Jan 13 15:40 CTestConfig.cmake.in
-rw-r--r--  1 s1a2-472d9

# HDF5 support is useful for NETCDF4 and should be built first

## Run Configure for the HDF5 source directory

In [13]:
if not isHDF5Installed:
    # Setup Configuration Info
    !$hdf5SrcDir/configure --prefix=$hdf5Dir/hdf5

checking for a BSD-compatible install... /bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether make supports nested variables... (cached) yes
checking whether to enable maintainer-specific portions of Makefiles... no
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking shell variables initial values... done
checking if basename works... yes
checking if xargs works... yes
checking for cached host... none
checking for config x86_64-unknown-linux-gnu... no
checking for config x86_64-unknown-linux-gnu... no
checking for config unknown-linux-gnu... no
checking for config unknown-linux-gnu... no
checking for config x86_64-linux-gnu... no
checking for config x86_64-linux-gnu... no
checking for config x86_64-unknown... no
checki

In [14]:
if not isHDF5Installed:
    # Let's make the build and install process as quiet as possible.  Removing all of the H5_CFFLAGS compiler warning settings
    !rm config.status.new 2>/dev/null
    !cat config.status | sed -n '1h;1!H;${;g;s/"\-std.*O3"/"\-std=c99"/g;p;}' | sed 's/&amp;/\&/g' | sed 's/&lt;/\</g' | sed 's/&gt;/\>/g' > config.status.new
    !rm config.status
    !mv config.status.new config.status

## Let's build HDF5

In [15]:
# WARNING:  This cell will take a while ... ~10-12 mins with high CPU on the browser.  
# Your browser may even become unresponsive for a period of time.  Just be patient until the cell execution is complete.  Go grab something to drink.
# It is basically configuring and compiling the native hdf5 libs that are required by H5py 
# NOTE:  You may need to refresh your browser after this cell completes.  Carefully monitor the kernel indicator in the upper right as well.
#        The good news is that this is a one time operation. After the native libs are built, they will be available to all existing and new notebooks within this Spark Instance.
if not isHDF5Installed:
    !make -w -j2 && make -w install

make: Entering directory `/gpfs/global_fs01/sym_shared/YPProdSpark/user/s1a2-472d95bcebf7db-bf066087ecf5/notebook/work'
 /bin/sh ./config.status
config.status: creating src/libhdf5.settings
config.status: creating Makefile
config.status: creating src/Makefile
config.status: creating test/Makefile
config.status: creating test/testcheck_version.sh
config.status: creating test/testerror.sh
config.status: creating test/H5srcdir_str.h
config.status: creating test/testlibinfo.sh
config.status: creating test/testlinks_env.sh
config.status: creating test/test_plugin.sh
config.status: creating testpar/Makefile
config.status: creating tools/Makefile
config.status: creating tools/h5dump/Makefile
config.status: creating tools/h5dump/testh5dump.sh
config.status: creating tools/h5dump/testh5dumppbits.sh
config.status: creating tools/h5dump/testh5dumpxml.sh
config.status: creating tools/h5ls/testh5ls.sh
config.status: creating tools/h5import/Makefile
config.status: creating tools/h5import/h5importtes

<div class="alert alert-danger" role="alert">
  <strong>WARNING</strong> You may have to reload your browser to continue.  The Make and Make Install process can sometimes cause the notebook to lose it's kernel context.  Don't worry, all of the compiled files will persist across the browser reload.
</div>

## Housekeeping

In [16]:
# Re-evaluate installation status
isHDF5Installed = os.path.isfile(hdf5Dir + "/hdf5/bin/h5cc")

if isHDF5Installed:
    # Remove all of configure generated by-products
    !rm -rf * 2>/dev/null
    # Remove the extracted source folder
    !rm -rf $hdf5Dir/hdf5-1.8.17 2>/dev/null
    # Remove the tar gzip file
    !rm $hdf5Dir/hdf5-1.8.17.tar.gz 2>/dev/null
    print "HDF5 directories for facilitating HDF5 build and install have been removed"
else:
    print "HDF5 build and install seems to have failed.  Cannot local <HDF5_Install_Point>/bin/h5cc"

HDF5 directories for facilitating HDF5 build and install have been removed


# Now Let's take a look @ netCDF4 and build it with HDF5 support ;-)
[Sample Notebooks Exploring netcdf data](http://nbviewer.jupyter.org/url/www.hydro.washington.edu/~jhamman/hydro-logic/downloads/notebooks/2013-10-12-plot-netcdf-data.ipynb)
[Quickstart](http://unidata.github.io/netcdf4-python/#section1)

## Run Configure for the netCDF4 source directory

In [17]:
# Setup Configuration Info
# http://www.unidata.ucar.edu/mailing_lists/archives/netcdfgroup/2011/msg00340.html
# http://www.unidata.ucar.edu/software/netcdf/netcdf-4/newdocs/netcdf-install/Configure.html
os.environ['HDF5_DIR']= hdf5Dir + "/hdf5"
os.environ['CPPFLAGS']= "-I" + hdf5Dir + "/hdf5/include"
os.environ['LDFLAGS']= "-L" + hdf5Dir + "/hdf5/lib"
!$netcdfSrcDir/configure --prefix=$netcdfDir/netcdf --enable-netcdf-4

configure: netCDF 4.4.0
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking target system type... x86_64-unknown-linux-gnu
checking for a BSD-compatible install... /bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether to enable maintainer-specific portions of Makefiles... no
configure: checking user options
checking whether a win32 DLL is desired... no
checking whether a NCIO_MINBLOCKSIZE was specified... 256
checking if fsync support is enabled... no
checking if jna bug workaround is enabledd... no
checking whether extra valgrind tests should be run... no
checking whether we should build netCDF-4... yes
checking do we require hdf5 dynamic-loading support... yes
checking whether reading of HDF4 SD files is to be enabled

## Let's build netCDF4

In [18]:
!make -w -j2 && make -w install

make: Entering directory `/gpfs/global_fs01/sym_shared/YPProdSpark/user/s1a2-472d95bcebf7db-bf066087ecf5/notebook/work'
make  all-recursive
make[1]: Entering directory `/gpfs/global_fs01/sym_shared/YPProdSpark/user/s1a2-472d95bcebf7db-bf066087ecf5/notebook/work'
Making all in include
make[2]: Entering directory `/gpfs/global_fs01/sym_shared/YPProdSpark/user/s1a2-472d95bcebf7db-bf066087ecf5/notebook/work/include'
make[2]: Nothing to be done for `all'.
make[2]: Leaving directory `/gpfs/global_fs01/sym_shared/YPProdSpark/user/s1a2-472d95bcebf7db-bf066087ecf5/notebook/work/include'
Making all in h5_test
make[2]: Entering directory `/gpfs/global_fs01/sym_shared/YPProdSpark/user/s1a2-472d95bcebf7db-bf066087ecf5/notebook/work/h5_test'
make[2]: Nothing to be done for `all'.
make[2]: Leaving directory `/gpfs/global_fs01/sym_shared/YPProdSpark/user/s1a2-472d95bcebf7db-bf066087ecf5/notebook/work/h5_test'
Making all in libdispatch
make[2]: Entering directory `/gpfs/global_fs01/sym_shared/YPProdSpa

<div class="alert alert-danger" role="alert">
  <strong>WARNING</strong> You may have to reload your browser to continue.  The Make and Make Install process can sometimes cause the notebook to lose it's kernel context.  Don't worry, all of the compiled files will persist across the browser reload.
</div>

## Housekeeping

In [19]:
# Re-evaluate installation status
isNETCDFInstalled = os.path.isfile(netcdfDir + "/netcdf/bin/nc-config")

if isNETCDFInstalled:
    # Remove all of configure generated by-products
    !rm -rf * 2>/dev/null
    # Remove the extracted source folder
    !rm -rf $netcdfDir/netcdf-c-4.4.0 2>/dev/null
    # Remove the tar gzip file
    !rm $netcdfDir/v4.4.0.tar.gz 2>/dev/null
    print "NETCDF directories for facilitating NETCDF build and install have been removed"
else:
    print "NETCDF build and install seems to have failed.  Cannot local <NETCDF_Install_Point>/bin/nc-config"

NETCDF directories for facilitating NETCDF build and install have been removed


<div class="alert alert-info" role="alert">
  <strong>Prequisite Native Library Installs Complete</strong> Let's now do some pip installs
</div>

### Setting up custom paths via Environment Variables

In [20]:
if isHDF5Installed:
    os.environ['HDF5_DIR']= hdf5Dir + "/hdf5"
    os.environ['HDF5_DIR_INCDIR']= hdf5Dir + "/hdf5/inc"
    os.environ['HDF5_DIR_LIBDIR']= hdf5Dir + "/hdf5/lib"
    os.environ['PKG_CONFIG_PATH']= hdf5Dir + "/hdf5" 

if isNETCDFInstalled:
    os.environ['NETCDF4_DIR']= netcdfDir + "/netcdf"
    os.environ['NETCDF4_INCDIR']= netcdfDir + "/netcdf/inc"
    os.environ['NETCDF4_LIBDIR']= netcdfDir + "/netcdf/lib" 
    os.environ['PKG_CONFIG_PATH']= netcdfDir + "/netcdf"
    
if isHDF5Installed and isNETCDFInstalled:
    os.environ['PKG_CONFIG_PATH']= netcdfDir + "/netcdf" + ":" + hdf5Dir + "/hdf5"   # Needed for NETCDF4 PIP Install
    os.environ['USE_NCCONFIG']= "1"  # Needed for NETCDF4 PIP Install
    os.environ['USE_SETUPCFG']= "0"  # Needed for NETCDF4 PIP Install

### Installing pkgconfig Python module :: interface with the pkg-config command line tool 

In [21]:
!pip install pkgconfig --user

Collecting pkgconfig
  Using cached pkgconfig-1.1.0.tar.gz
Installing collected packages: pkgconfig
  Running setup.py install for pkgconfig ... [?25l- done
[?25hSuccessfully installed pkgconfig-1.1.0


### Installing h5py Python module :: The h5py package provides both a high- and low-level interface to the HDF5 library from Python. The low-level interface is intended to be a complete wrapping of the HDF5 API, while the high-level component supports access to HDF5 files, datasets and groups using established Python and NumPy concepts.

In [22]:
if isHDF5Installed:
    !pip install h5py --user

Collecting h5py
  Using cached h5py-2.6.0.tar.gz
Installing collected packages: h5py
  Running setup.py install for h5py ... [?25l- \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ done
[?25hSuccessfully installed h5py-2.6.0


### Installing NETCDF4 Python module :: netCDF version 4 has many features not found in earlier versions of the library and is implemented on top of HDF5. This module can read and write files in both the new netCDF 4 and the old netCDF 3 format, and can create files that are readable by HDF5 clients. The API modelled after Scientific.IO.NetCDF, and should be familiar to users of that module.

<div class="alert alert-info" role="alert">
  The NETCDF4 module pip installation requires context on the install locations of HDF5 and NETCDF.  
  Given that it uses the pkg-config process, we must create 2 *.pc files that outline their installation location.
  <ol><li>hdf5.pc</li>
  <li>netcfd.pc</li></ol>
</div>

### Create HDF5.pc file

In [23]:
!echo "# An example pkg-config file for hdf5.  Fix the paths if necessary and put" > hdf5.pc
!echo "# this in a place where pkg-config will find it.  You can find where it's" >> hdf5.pc
!echo "# searching now with the following command:" >> hdf5.pc
!echo "#" >> hdf5.pc
!echo "#   $ pkg-config --variable pc_path pkg-config" >> hdf5.pc
!echo "#" >> hdf5.pc
!echo "# According to the man page, you can override this if necessary by" >> hdf5.pc
!echo "# setting PKG_CONFIG_PATH" >> hdf5.pc
!echo "#" >> hdf5.pc
!echo "" >> hdf5.pc
!echo "prefix=${HDF5_DIR}" >> hdf5.pc
!echo "exec_prefix=${HDF5_DIR}" >> hdf5.pc
!echo "includedir=${HDF5_DIR}/include" >> hdf5.pc
!echo "libdir=${HDF5_DIR}/lib" >> hdf5.pc
!echo "" >> hdf5.pc
!echo "Name: hdf5" >> hdf5.pc
!echo "Description: HDF5" >> hdf5.pc
!echo "Version: 1.8.17" >> hdf5.pc
!echo "Requires.private: zlib" >> hdf5.pc
!echo "Cflags: -I${HDF5_DIR}/include" >> hdf5.pc
!echo "Libs: -L${HDF5_DIR}/lib -lhdf5" >> hdf5.pc

### Move the HDF5.pc file into the HDF5 install location

In [24]:
!mv hdf5.pc $hdf5Dir/hdf5

### Create NETCDF.pc file

In [25]:
!echo "# An example pkg-config file for netcdf.  Fix the paths if necessary and put" > netcdf.pc
!echo "# this in a place where pkg-config will find it.  You can find where it's" >> netcdf.pc
!echo "# searching now with the following command:" >> netcdf.pc
!echo "#" >> netcdf.pc
!echo "#   $ pkg-config --variable pc_path pkg-config" >> netcdf.pc
!echo "#" >> netcdf.pc
!echo "# According to the man page, you can override this if necessary by" >> netcdf.pc
!echo "# setting PKG_CONFIG_PATH" >> netcdf.pc
!echo "#" >> netcdf.pc
!echo "" >> netcdf.pc
!echo "prefix=${NETCDF4_DIR}" >> netcdf.pc
!echo "exec_prefix=${NETCDF4_DIR}" >> netcdf.pc
!echo "includedir=${NETCDF4_DIR}/include" >> netcdf.pc
!echo "libdir=${NETCDF4_DIR}/lib" >> netcdf.pc
!echo "ccompiler=gcc" >> netcdf.pc

!echo "" >> netcdf.pc
!echo "Name: netcdf" >> netcdf.pc
!echo "Description: NetCDF Client Library" >> netcdf.pc
!echo "URL: http://www.unidata.ucar.edu/netcdf" >> netcdf.pc
!echo "Version: 4.4.0" >> netcdf.pc
!echo "Requires.private: zlib" >> hdf5.pc
!echo "Cflags: -I${NETCDF4_DIR}/include" >> netcdf.pc
!echo "Libs: -L${NETCDF4_DIR}/lib -lnetcdf" >> netcdf.pc

### Move the NETCDF.pc file into the NETCDF install location

In [26]:
!mv netcdf.pc $netcdfDir/netcdf

In [27]:
if isHDF5Installed and isNETCDFInstalled:
    !pip install netCDF4 --user

Collecting netCDF4
  Using cached netCDF4-1.2.4.tar.gz
Installing collected packages: netCDF4
  Running setup.py install for netCDF4 ... [?25l- \ | / - \ | / - \ | done
[?25hSuccessfully installed netCDF4-1.2.4


<div class="alert alert-success" role="alert">
  <strong>Congratulations</strong> You went through the wormhole and made it to the other side.  
  Let's review our accomplishments:
  <ol><li>HDF5 Native Lib Installed .............................   Check</li>
      <li>NETCDF Native Lib Installed ...........................   Check</li>
      <li>Python Module Cython Installed ........................   Check</li>
      <li>Python Module H5PY Installed ..........................   Check</li>
      <li>Python Module Numpy Installed .........................   Check</li>
      <li>Python Module NETCDF4 with HDF5 support Installed .....   Check</li>
      <li>Python Module pkg-config Installed ....................   Check</li></ol>
  
  Whew!
</div>

# Let's work through the h5py Quick Start Guide
[Quick Start Guide](http://docs.h5py.org/en/latest/quick.html)

In [28]:
import h5py
import numpy as np
f = h5py.File("mytestfile1.hdf5", "w")

In [29]:
dset = f.create_dataset("mydataset", (100,), dtype='i')

In [30]:
dset.shape

(100,)

In [31]:
dset.dtype

dtype('int32')

In [32]:
dset[...] = np.arange(100)

In [33]:
dset[0]

0

In [34]:
dset[10]

10

In [35]:
dset[0:100:10]

array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90], dtype=int32)

In [36]:
dset.name

u'/mydataset'

In [37]:
f.name

u'/'

In [38]:
grp = f.create_group("subgroup")

In [39]:
dset2 = grp.create_dataset("another_dataset", (50,), dtype='f')

In [40]:
dset2.name

u'/subgroup/another_dataset'

In [41]:
dset3 = f.create_dataset('subgroup2/dataset_three', (10,), dtype='i')

In [42]:
dset3.name

u'/subgroup2/dataset_three'

In [43]:
for name in f:
    print name

mydataset
subgroup
subgroup2


In [44]:
dset.attrs['temperature'] = 99.5
dset.attrs['temperature']

99.5

In [45]:
'temperature' in dset.attrs

True

In [46]:
f.close()

# Let's work through a few netCDF4 commands

In [47]:
!$netcdfDir/netcdf/bin/nc-config --version  # --cflags

netCDF 4.4.0


In [48]:
# standard imports
import netCDF4 as nc
import numpy as np
from netCDF4 import Dataset
np.set_printoptions(precision=3, linewidth=100, edgeitems=2)  # make numpy less verbose
nc.getlibversion()

u'4.4.0 of May 23 2016 00:44:04 $'

In [49]:
rootgrp = Dataset("test.nc", "w", format="NETCDF4")
print rootgrp.data_model
rootgrp.close()

NETCDF4


In [50]:
rootgrp = Dataset("test.nc", "a")
fcstgrp = rootgrp.createGroup("forecasts")
analgrp = rootgrp.createGroup("analyses")
print rootgrp.groups

OrderedDict([('forecasts', <type 'netCDF4._netCDF4.Group'>
group /forecasts:
    dimensions(sizes): 
    variables(dimensions): 
    groups: 
), ('analyses', <type 'netCDF4._netCDF4.Group'>
group /analyses:
    dimensions(sizes): 
    variables(dimensions): 
    groups: 
)])


In [51]:
fcstgrp1 = rootgrp.createGroup("/forecasts/model1")
fcstgrp2 = rootgrp.createGroup("/forecasts/model2")

In [52]:
def walktree(top):
    values = top.groups.values()
    yield values
    for value in top.groups.values():
        for children in walktree(value):
            yield children

print rootgrp
for children in walktree(rootgrp):
    for child in children:
        print child

<type 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    dimensions(sizes): 
    variables(dimensions): 
    groups: forecasts, analyses

<type 'netCDF4._netCDF4.Group'>
group /forecasts:
    dimensions(sizes): 
    variables(dimensions): 
    groups: model1, model2

<type 'netCDF4._netCDF4.Group'>
group /analyses:
    dimensions(sizes): 
    variables(dimensions): 
    groups: 

<type 'netCDF4._netCDF4.Group'>
group /forecasts/model1:
    dimensions(sizes): 
    variables(dimensions): 
    groups: 

<type 'netCDF4._netCDF4.Group'>
group /forecasts/model2:
    dimensions(sizes): 
    variables(dimensions): 
    groups: 



In [53]:
level = rootgrp.createDimension("level", None)
time = rootgrp.createDimension("time", None)
lat = rootgrp.createDimension("lat", 73)
lon = rootgrp.createDimension("lon", 144)
print rootgrp.dimensions

OrderedDict([('level', <type 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'level', size = 0
), ('time', <type 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 0
), ('lat', <type 'netCDF4._netCDF4.Dimension'>: name = 'lat', size = 73
), ('lon', <type 'netCDF4._netCDF4.Dimension'>: name = 'lon', size = 144
)])


In [54]:
print len(lon)
print lon.isunlimited()
print time.isunlimited()

144
False
True


In [55]:
for dimobj in rootgrp.dimensions.values():
    print dimobj

<type 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'level', size = 0

<type 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 0

<type 'netCDF4._netCDF4.Dimension'>: name = 'lat', size = 73

<type 'netCDF4._netCDF4.Dimension'>: name = 'lon', size = 144



In [56]:
times = rootgrp.createVariable("time","f8",("time",))
levels = rootgrp.createVariable("level","i4",("level",))
latitudes = rootgrp.createVariable("latitude","f4",("lat",))
longitudes = rootgrp.createVariable("longitude","f4",("lon"))
temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",))
print temp

<type 'netCDF4._netCDF4.Variable'>
float32 temp(time, level, lat, lon)
unlimited dimensions: time, level
current shape = (0, 0, 73, 144)
filling on, default _FillValue of 9.96920996839e+36 used



In [57]:
ftemp = rootgrp.createVariable("/forecasts/model1/temp","f4",("time","level","lat","lon",))

In [58]:
print rootgrp["/forecasts/model1"]
print rootgrp["/forecasts/model1/temp"]

<type 'netCDF4._netCDF4.Group'>
group /forecasts/model1:
    dimensions(sizes): 
    variables(dimensions): float32 [4mtemp[0m(time,level,lat,lon)
    groups: 

<type 'netCDF4._netCDF4.Variable'>
float32 temp(time, level, lat, lon)
path = /forecasts/model1
unlimited dimensions: time, level
current shape = (0, 0, 73, 144)
filling on, default _FillValue of 9.96920996839e+36 used



In [59]:
print rootgrp.variables

OrderedDict([('time', <type 'netCDF4._netCDF4.Variable'>
float64 time(time)
unlimited dimensions: time
current shape = (0,)
filling on, default _FillValue of 9.96920996839e+36 used
), ('level', <type 'netCDF4._netCDF4.Variable'>
int32 level(level)
unlimited dimensions: level
current shape = (0,)
filling on, default _FillValue of -2147483647 used
), ('latitude', <type 'netCDF4._netCDF4.Variable'>
float32 latitude(lat)
unlimited dimensions: 
current shape = (73,)
filling on, default _FillValue of 9.96920996839e+36 used
), ('longitude', <type 'netCDF4._netCDF4.Variable'>
float32 longitude(lon)
unlimited dimensions: 
current shape = (144,)
filling on, default _FillValue of 9.96920996839e+36 used
), ('temp', <type 'netCDF4._netCDF4.Variable'>
float32 temp(time, level, lat, lon)
unlimited dimensions: time, level
current shape = (0, 0, 73, 144)
filling on, default _FillValue of 9.96920996839e+36 used
)])


In [60]:
import time
rootgrp.description = "bogus example script"
rootgrp.history = "Created " + time.ctime(time.time())
rootgrp.source = "netCDF4 python module tutorial from unidata.github.io"
latitudes.units = "degrees north"
longitudes.units = "degrees east"
levels.units = "hPa"
temp.units = "K"
times.units = "hours since 0001-01-01 00:00:00.0"
times.calendar = "gregorian"

In [61]:
for name in rootgrp.ncattrs():
    print "Global attr", name, "=", getattr(rootgrp,name)

Global attr description = bogus example script
Global attr history = Created Mon May 23 00:58:27 2016
Global attr source = netCDF4 python module tutorial from unidata.github.io


In [62]:
print rootgrp.__dict__

OrderedDict([(u'description', u'bogus example script'), (u'history', u'Created Mon May 23 00:58:27 2016'), (u'source', u'netCDF4 python module tutorial from unidata.github.io')])


In [63]:
lats = np.arange(-90,91,2.5)
lons = np.arange(-180,180,2.5)
latitudes[:] = lats
longitudes[:] = lons
print "latitudes =\n",latitudes[:]

latitudes =
[-90.  -87.5 -85.  -82.5 -80.  -77.5 -75.  -72.5 -70.  -67.5 -65.  -62.5 -60.  -57.5 -55.  -52.5
 -50.  -47.5 -45.  -42.5 -40.  -37.5 -35.  -32.5 -30.  -27.5 -25.  -22.5 -20.  -17.5 -15.  -12.5
 -10.   -7.5  -5.   -2.5   0.    2.5   5.    7.5  10.   12.5  15.   17.5  20.   22.5  25.   27.5
  30.   32.5  35.   37.5  40.   42.5  45.   47.5  50.   52.5  55.   57.5  60.   62.5  65.   67.5
  70.   72.5  75.   77.5  80.   82.5  85.   87.5  90. ]


In [64]:
#append along two unlimited dimensions by assign to slice
nlats = len(rootgrp.dimensions["lat"])
nlons = len(rootgrp.dimensions["lon"])
print "temp shape before adding data = ", temp.shape

temp shape before adding data =  (0, 0, 73, 144)


In [65]:
from numpy.random import uniform
temp[0:5,0:10,:,:] = uniform(size=(5,10,nlats,nlons))
print "temp shape after adding data = ",temp.shape

temp shape after adding data =  (5, 10, 73, 144)


In [66]:
# levels have grown, but no values yet assigned
print "levels shape after adding pressure data = ", levels.shape

levels shape after adding pressure data =  (10,)


In [67]:
# now, assign data to levels dimension variable
levels[:] = [1000.,850.,700.,500.,300.,250.,200.,150.,10.,50.]
temp[0, 0, [0,1,2,3], [0,1,2,3]]
tempdat = temp[::2, [1,3,6], lats>0, lons>0]
print "shape of fancy temp slice = ", tempdat.shape

shape of fancy temp slice =  (3, 3, 36, 71)


In [68]:
# fill in times
from datetime import datetime, timedelta
from netCDF4 import num2date, date2num
dates = [datetime(2001,3,1)+n*timedelta(hours=12) for n in range(temp.shape[0])]
times[:] = date2num(dates, units=times.units,calendar=times.calendar)
print "time values (in units %s): " % times.units+"\n",times[:]

time values (in units hours since 0001-01-01 00:00:00.0): 
[ 17533104.  17533116.  17533128.  17533140.  17533152.]


In [69]:
dates = num2date(times[:],units=times.units,calendar=times.calendar)
print "dates corresponding to time values:\n",dates

dates corresponding to time values:
[datetime.datetime(2001, 3, 1, 0, 0) datetime.datetime(2001, 3, 1, 12, 0)
 datetime.datetime(2001, 3, 2, 0, 0) datetime.datetime(2001, 3, 2, 12, 0)
 datetime.datetime(2001, 3, 3, 0, 0)]


In [70]:
for nf in range(10):
    myFile =  Dataset("mftest%s.nc" % nf,"w",format="NETCDF4_CLASSIC")
    myFile.createDimension("q",None)
    x = myFile.createVariable("q","i",("q",))
    x[0:10] = np.arange(nf*10,10*(nf+1))
    while myFile.isopen():
        myFile.close()

In [71]:
from netCDF4 import MFDataset
# MFNetCDF4 only works with NETCDF3_* and NETCDF4_CLASSIC formatted files, not NETCDF4
aggFile = MFDataset("mftest*nc")
print aggFile.variables["q"][:]
aggFile.close()

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98
 99]


In [72]:
temp3 = rootgrp.createVariable("temp3","f4",("time","level","lat","lon",), zlib=True, least_significant_digit=3)
print temp3

<type 'netCDF4._netCDF4.Variable'>
float32 temp3(time, level, lat, lon)
    least_significant_digit: 3
unlimited dimensions: time, level
current shape = (5, 10, 73, 144)
filling on, default _FillValue of 9.96920996839e+36 used



In [73]:
compoundFile = Dataset("complex.nc", "w")
size = 3 # length of 1-d complex array
# create sample complex data
datac = np.exp(1j*(1.+np.linspace(0, np.pi, size)))
# create complex128 compound data type.
complex128 = np.dtype([("real", np.float64),("imag",np.float64)])
complex128_t = compoundFile.createCompoundType(complex128,"complex128")
# create a variable with this data type, write some data to it.
compoundFile.createDimension("x_dim",None)
v = compoundFile.createVariable("cmplx_var", complex128_t, "x_dim")
data = np.empty(size,complex128) # numpy structured array
data["real"] = datac.real
data["imag"] = data.imag
v[:] = data # write numpy structured array to netcdf compound var
# close and reopen the file, check the contents
compoundFile.close()
compoundFile = Dataset("complex.nc")
v = compoundFile.variables["cmplx_var"]
datain = v[:] # read in all the data into a numpy structured array
# create an emtpy numpy complex array
datac2 = np.empty(datain.shape,np.complex128)
# .. fill it with contexts of structured array
datac2.real = datain["real"]
datac2.imag = datain["imag"]
print datac.dtype,datac
print datac2.dtype,datac2 # data from file

complex128 [ 0.540+0.841j -0.841+0.54j  -0.540-0.841j]
complex128 [ 0.540+0.j -0.841+0.j -0.540+0.j]


In [74]:
print compoundFile

<type 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    dimensions(sizes): x_dim(3)
    variables(dimensions): {'names':[u'real',u'imag'], 'formats':['<f8','<f8'], 'offsets':[0,8], 'itemsize':16, 'aligned':True} [4mcmplx_var[0m(x_dim)
    groups: 



In [75]:
print compoundFile.variables["cmplx_var"]

<type 'netCDF4._netCDF4.Variable'>
compound cmplx_var(x_dim)
compound data type: {'names':[u'real',u'imag'], 'formats':['<f8','<f8'], 'offsets':[0,8], 'itemsize':16, 'aligned':True}
unlimited dimensions: x_dim
current shape = (3,)



In [76]:
print compoundFile.cmptypes

OrderedDict([(u'complex128', <type 'netCDF4._netCDF4.CompoundType'>: name = 'complex128', numpy dtype = {'names':[u'real',u'imag'], 'formats':['<f8','<f8'], 'offsets':[0,8], 'itemsize':16, 'aligned':True}
)])


In [77]:
print compoundFile.cmptypes["complex128"]
compoundFile.close()

<type 'netCDF4._netCDF4.CompoundType'>: name = 'complex128', numpy dtype = {'names':[u'real',u'imag'], 'formats':['<f8','<f8'], 'offsets':[0,8], 'itemsize':16, 'aligned':True}



In [78]:
varFile = Dataset("ts_vlen.nc","w")
vlen_t = varFile.createVLType(np.int32, "phony_vlen")

In [79]:
x = varFile.createDimension("x",3)
y = varFile.createDimension("y",4)
vlvar = varFile.createVariable("phony_vlen_var", vlen_t,("y","x"))

In [80]:
import random
data = np.empty(len(y)*len(x),object)
for n in range(len(y)*len(x)):
    data[n] = np.arange(random.randint(1,10),dtype="int32")+1

data = np.reshape(data,(len(y),len(x)))
vlvar[:] = data
print "vlen variable =\n",vlvar[:]

vlen variable =
[[array([1, 2], dtype=int32) array([1], dtype=int32) array([1, 2, 3, 4, 5, 6, 7, 8], dtype=int32)]
 [array([1, 2, 3, 4, 5, 6], dtype=int32) array([1, 2, 3], dtype=int32) array([1, 2, 3], dtype=int32)]
 [array([1, 2, 3, 4], dtype=int32) array([1, 2, 3, 4, 5, 6, 7, 8], dtype=int32)
  array([1, 2, 3, 4, 5, 6, 7], dtype=int32)]
 [array([1, 2, 3, 4, 5], dtype=int32) array([1, 2], dtype=int32)
  array([1, 2, 3, 4, 5, 6], dtype=int32)]]


In [81]:
print varFile.variables["phony_vlen_var"]

<type 'netCDF4._netCDF4.Variable'>
vlen phony_vlen_var(y, x)
vlen data type: int32
unlimited dimensions: 
current shape = (4, 3)

