<table style="width: 100%; border-collapse: collapse;" border="0">
<tr>
<td><b>Created:</b> Monday 30 January 2017</td>
<td style="text-align: right;"><a href="https://www.github.com/rhyswhitley/fire_limitation">github.com/rhyswhitley/fire_limitation</td>
</tr>
</table>

<div>
<center>
<font face="Times">
<br>
<h1>Quantifying the uncertainity of a global fire limitation model using Bayesian inference</h1>
<h2>Part 1: Staging data for analysis</h2>
<br>
<br>
<sup>1,* </sup>Douglas Kelley, 
<sup>2 </sup>Ioannis Bistinas, 
<sup>3, 4 </sup>Chantelle Burton, 
<sup>1 </sup>Tobias Marthews, 
<sup>5 </sup>Rhys Whitley
<br>
<br>
<br>
<sup>1 </sup>Centre for Ecology and Hydrology, Maclean Building, Crowmarsh Gifford, Wallingford, Oxfordshire, United Kingdom
<br>
<sup>2 </sup>Vrije Universiteit Amsterdam, Faculty of Earth and Life Sciences, Amsterdam, Netherlands
<br>
<sup>3 </sup>Met Office United Kingdom, Exeter, United Kingdom
<br>
<sup>4 </sup>Geography, University of Exeter, Exeter, United Kingdom
<br>
<sup>5 </sup>Natural Perils Pricing, Commercial & Consumer Portfolio & Pricing, Suncorp Group, Sydney, Australia
<br>
<br>
<h3>Summary</h3>
<hr>
<p> 
This notebook aims to process the separate netCDF4 files for the model drivers (X<sub>i=1, 2, ... M</sub>) and model target (Y) into a unified tabular data frame, exported as a compressed comma separated value (CSV) file. This file is subsequently used in the Bayesian inference study that forms the second notebook in this experiment. The advantage of the pre-processing the data separately to the analysis allows for it be quickly staged on demand. Of course other file formats may be more advantageous for greater compression (e.g. SQLite3 database file).
</p>
<br>
<br>
<i>Python code and calculations below</i>
<hr>
</font>
</center>
</div>

## Load libraries

In [157]:
# data munging and analytical libraries 
import re
import os
import numpy as np
import pandas as pd
from netCDF4 import Dataset 

# graphical libraries
import matplotlib.pyplot as plt
%matplotlib inline

## Import and clean data

Set the directory path and look for all netcdf files that correspond to the model drivers and target.

In [129]:
driver_paths = [os.path.join(dp, f) for (dp, _, fn) in os.walk("../data/raw/") for f in fn if f.endswith('.nc')]
driver_names = [re.search('^[a-zA-Z_]*', os.path.basename(fp)).group(0) for fp in driver_paths]

file_table = pd.DataFrame({'filepath': driver_paths, 'file_name': driver_names})
file_table

Unnamed: 0,file_name,filepath
0,alpha,../data/raw/alpha2000-2014.nc
1,cropland,../data/raw/cropland2000-2014.nc
2,fire,../data/raw/fire2000-2014.nc
3,lightning_ignitions,../data/raw/lightning_ignitions2000-2014.nc
4,NPP,../data/raw/NPP2000-2014.nc
5,pasture,../data/raw/pasture2000-2014.nc
6,population_density,../data/raw/population_density2000-2014.nc
7,urban_area,../data/raw/urban_area2000-2014.nc


In [168]:
def nc_extract(fpath):
    print("Processing: {0}".format(fpath))
    with Dataset(fpath, 'r') as nc_file:
        gdata = nc_file.variables['variable'][:][:, :, :]
        gflat = gdata.ravel()
        if type(gdata) == np.ma.core.MaskedArray:
            return gflat[~gflat.mask].data[:]
        else:
            return gflat[:]

In [169]:
my_dict = {row.file_name: nc_extract(row.filepath).data for (_, row) in file_table.iterrows()}

Processing: ../data/raw/alpha2000-2014.nc
Processing: ../data/raw/cropland2000-2014.nc
Processing: ../data/raw/fire2000-2014.nc
Processing: ../data/raw/lightning_ignitions2000-2014.nc
Processing: ../data/raw/NPP2000-2014.nc
Processing: ../data/raw/pasture2000-2014.nc
Processing: ../data/raw/population_density2000-2014.nc
Processing: ../data/raw/urban_area2000-2014.nc


In [170]:
my_dict

{'NPP': <memory at 0x118ebf1c8>,
 'alpha': <memory at 0x111ce5348>,
 'cropland': <memory at 0x118ebfb88>,
 'fire': <memory at 0x118ebf888>,
 'lightning_ignitions': <memory at 0x118ebf588>,
 'pasture': <memory at 0x118ebff48>,
 'population_density': <memory at 0x118ebfd08>,
 'urban_area': <memory at 0x118eef048>}