# Coding exercise

In this exercise, you'll be expected to fit a simple first-order linear model (e.g. $y=m\cdot x + c$) that relates the monthly value of El Niño 3.4 index to MODIS-derived fire counts (e.g. number of thermal anomalies detected by the MODIS mounted on the TERRA platform). Both datasets are provided for you.

## The ENSO (Niña 3.4) data

The ENSO data is provided as a [CSV file](data/nina34.csv). The file format is easily readable, but in summary...
1. The first line contains the start and end year of the dataset
2. Each line contains the monthly value of the index (from January to December) for every year in the record.
3. Missing data are noted by the value -99.99
4. The last three lines provide you with some additional information of where the data was acquired from

## The fire counts data

Since it'd be cumbersome to fit a model to every month, for each year, we have calculated the "peak fire month" (e.g. the month which on average tends to have the biggest number of fires) and the number of fire counts on that month per year. All the fire data has been aggregated to a 5 degree grid.

The peak fire month is available as a single band georeferenced GeoTIFF file with name [`data/peak_fire_month`](data/peak_fire_month.tif). 

The fire counts data is provided as a georeferenced GeoTIFF file with name [`data/FireCounts_2001_2016.tif`](data/FireCounts_2001_2016.tif). Each band represents the fire counts on the most fire active month for every year. 


## Fitting the model

We assume that the relationship between fire counts and the ENSO index at the peak fire month can be approximated by a linear function, e.g. 

$$
N_{fires} = m\cdot ENSO_{\textrm{@peak month}} + c
$$

You ought to be able to solve this linear problem as a matrix, and solve it as a matrix (e.g. without using any higher order package.


## Deliverables

The aim of this exercise is 

1. to fit a linear model model to the observations from 2001 to 2016,
2. to provide an estimate of the slope and intercept (possibly with associated uncertainties  in the parameters), 
3. to check the quality of the fit, as well as illustrating the model fitting with adequate plots (e.g. scatter plot, line of best fit, 1:1 line, etc.). 

The fit will be performed on the following grid cells:

* `row, col =>` 21, 28
* `row, col =>` 19, 64
* `row, col =>` 18, 20

A function named `extract_fire_counts_peak_fire` is provided to extract the time series and the peak fire month for a given location. The ENSO data are given as a CSV file, and you'll have to extract the relevant data from it.


## Installed software

The software installed in this instance are

* numpy
* scipy
* matplotlib
* gdal
* pandas

Additionally, all other standard Python 3.6+ modules are available.



In [1]:
import numpy as np
from osgeo import gdal
import matplotlib.pyplot as plt
%matplotlib inline
gdal.UseExceptions()


def extract_fire_counts_peak_fire(row,
                                  col,
                                  fire_cnts_f="data/FireCounts_2001_2016.tif",
                                  peak_month_f="data/peak_fire_month.tif"):
    """A function that will retrieve a time series of fire counts for a given
    row/column, together with the peak fire month (1 would be January, 12
    would be december).

    Parameters
    ----------
    row: int
         The desired row. Internally checked to be consistent with data
    col: int
        The desired column. Internally checked to be consistent with data

    Returns
    -------
    A tuple with the first element being a 16 element array with the number
    of fire counts for the years 2001 to 2016, and a second element being an
    integer with the peak fire month. If there's no  data (e.g. oceans), it'll
    return 255.
    """
    assert ((row >= 0) and (row <= 36)), "row needs to be between 0 and 36"
    assert ((col >= 0) and (col <= 72)), "col needs to be between 0 and 72"
    g = gdal.Open(fire_cnts_f)
    f_counts = g.ReadAsArray()
    f_counts = f_counts[:, row, col]
    g = gdal.Open(peak_month_f)
    pk_fire_month = g.ReadAsArray()[row, col]
    return f_counts, pk_fire_month