## Open Source Tools For Rapid Statistical Model Development

### Overview
* Catching up with Probabilistic Programming
* Plan models -- causalgraphicalmodels
* Loading Data -- PANDAS
* Preparing Data -- PANDAS, Seaborn, Scikit-learn
* Rapid Model Development -- PyMC3
* Beyond the Basics

### Notebook Content:
1. [Plan modeling](#ModelPlan)
2. [Loading NASA's SEABASS Data](#DataLoad)
3. [Prepare Data for Modeling](#DataPrep)
4. [Bayesian Modelling](#PyMC3)
   1. [Model coding](#writemodel)
   2. [Prior evaluation & Model modification](#priors)
   3. [Model fitting & diagnostics](#fit)
   4. [Model Evaluation](#eval)
       1. [PPC]
       2. [WAIC]
       3. [TEST set evaluation]
5. [Not Covered](#Aftermath)

In [10]:
import pandas as pd
import seaborn as sb
import pymc3 as pm
import cmocean as cm

In [11]:
print('pandas version %s' % pd.__version__)
print('seaborn version %s' %sb.__version__)
print('pymc3 version %s' % pm.__version__)
print('cmocean version %s' % cm.__version__)

pandas version 0.23.4
seaborn version 0.8.1
pymc3 version 3.4.1
cmocean version 1.2


In [8]:
% matplotlib inline

### Loading and preparing data -- PANDAS
* the nomad dataset
* reading in 
* get column names
* extract desired variables

### Data Exploration -- PANDAS, Seaborn and Scikit-Learn
* predictor isolated distributions
* plotting predictors/predicted w/ respect to each other
* predictor correlation, multicollinearity and pca

### Modeling -- Probabilistic Programming with PyMC3
* simple bayesian regression to predict chlorophyll from Rrs
* rapid but transparent model development
* evaluation of priors
* fitting and evaluation of posterior distribution
* model comparison/selection