In [1]:
import pathlib
import platform

import numpy as np
import pandas as pd

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

import pymc3 as pm

import arviz as ar
import matplotlib.pyplot as pl
from matplotlib import rcParams
import seaborn as sb

### <u>Package Versions<u>

In [2]:
def pkg_ver(pkgs):
    print('Python & Package Versions')
    print('----------------')
    print(f'PYTHON: {platform.python_version()}')
    for pki in pkgs:
        print(f'{pki.__name__}: {pki.__version__}')
pkg_ver([np, pd, pm, ar, sb])

Python & Package Versions
----------------
PYTHON: 3.7.3
numpy: 1.17.2
pandas: 0.25.1
pymc3: 3.7
arviz: 0.5.1
seaborn: 0.9.0


In [3]:
ar.style.use('arviz-darkgrid')

### <u>Overview</u>

In this and [a subsequent notebook](), I implement bayesian regression models to predict chlorophyll from satellite and ancillary data. I use a Bayesian framework for all models. For each model, implementation follows the sequence below.

* The model is cast in a Bayesian framework using a probabilistic programming language (PPL);
* A set of prior predictive simulations is conducted to ascertain that model priors are reasonable;
* The model is fit using the No U-Turn Sampler (NUTS) variant of Hamiltonian Monte Carlo, and the data subset from NOMAD 2008;
* Model predictive skill and  uncertainty are quantified via posterior distribution evaluation and posterior predictive simulation.

In a [third notebook](), the skill of the models are compared using Information Criteria (IC) based methods. These include Watanabe Akaike Information Criterion (WAIC) and/or Pareto Smoothed Importance Sampling Leave-One-Out Cross Validation (LOO).

### <u>The Present Notebook's Linear Models</u>
I include 1. models here:
1. A simple maximum-blue band ratio (*MBR*) regression model.
2. An OC4-type \\(4^{th}\\)degree polynomial regression.
3. An OCI-type mixture model where one of either OC4-type or Color Index Model is applied to the data.
$$$$

### <u>Loading the Data</u>
The data was stored previously in a [pandas dataframe](https://pandas.pydata.org). 

In [5]:
df = pd.read_pickle('./PickleJar/df_main_2_w_CI.pkl')

In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4459 entries, 0 to 4458
Data columns (total 26 columns):
id                4459 non-null int64
etopo2            4459 non-null float64
log_etopo2        4459 non-null float64
lat               4459 non-null float64
rrs411            4293 non-null float64
log_rrs411        4293 non-null float64
rrs443            4456 non-null float64
log_rrs443        4456 non-null float64
rrs489            4422 non-null float64
log_rrs489        4422 non-null float64
rrs510            3435 non-null float64
log_rrs510        3435 non-null float64
rrs555            3255 non-null float64
log_rrs555        3255 non-null float64
rrs670            1598 non-null float64
log_rrs670        1598 non-null float64
CI                1163 non-null float64
CI_OK             4459 non-null int64
MaxBlue           4459 non-null float64
MaxBlueBand       4459 non-null object
MaxBlueBandIdx    4459 non-null int8
mxBlue2Gr         3255 non-null float64
log_mxBlue2Gr     325

### <u>Simple Band Ratio</u>
A. Pooled Model

B. Partially Pooled Model

---
End of this Notebook