### Optimize
This is the heart of the model.  This will notebook optimizes the three coefficients *b1, b2, b3* by minimizing the root mean squared error.  The results of this model will feed into a monte carlo simulation of another model.  Therefore, I will create around 1000 models from the available data in a bootstrap process with each year providing 1000/n years models.  The distribution of coefficients will be used to construct the monte carlo model.  

I will use the `minimize` function from [scipy's optimize] (https://docs.scipy.org/doc/scipy/reference/optimize.html) package.  It requires a first guess for the model coefficients.  I will optimize on all the data first as I think this will be a good first guess for each model.  Within year model bootstrap models will use an average of the previous model runs as a first guess.
 

#### Parameters
 - data
 - x0
 - sample
 - maxiter
 - min_fb_tdg

The x0 parameter is the initial guess to start the optimization algorithm with.  The sample parameter is a boolean of whether to sample the data with replacement used for the bootstrap process.  The maxiter is the maximum number of iterations for the optimization algorithm before it breaks.


#### Notebook
[optimize.ipynb](optimize.ipynb)
 

In [1]:
import papermill as pm
import pandas as pd
import numpy as np

In [2]:
import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

In [3]:
#params
data_dir = r'D:\gitClones\nteract_models\optimize\projects\the_dalles'
maxiter = 100
x0 = [1,.5,2]
sample = False
min_fb_tdg = 120
file_name_extension = ''

In [4]:
nbs = pm.read_notebooks(data_dir+'/train_test')
df = nbs.dataframe
data = df[df['name']=='data']
grouped = data.groupby('filename')
i = 0
for g,v in grouped: 
    name = g.split('.')[0]
    train_test_data = v['value'].values[0]
    file_name = data_dir + '/results/{}_optimized{}.ipynb'.format(name,file_name_extension)
    
    pm.execute_notebook(
               'optimize.ipynb',
               file_name,
               parameters = dict(data=train_test_data, sample=sample, maxiter=maxiter, x0=x0,min_fb_tdg=min_fb_tdg)
            )
    #get weighted average of x0 to start with for next batch
    nb = pm.read_notebook(file_name)
    nb_df = nb.dataframe
    x = nb_df[nb_df['name']=='x']['value'].values[0]
    x0 = list((np.array(x0) * i + np.array(x))/(i+1))
    i+=1
    
    


Input Notebook:  optimize.ipynb
Output Notebook: D:\gitClones\nteract_models\optimize\projects\the_dalles/results/the_dalles_2014-12-31_00_00_00_optimized.ipynb
100%|██████████| 10/10 [00:24<00:00,  1.65s/it]
Input Notebook:  optimize.ipynb
Output Notebook: D:\gitClones\nteract_models\optimize\projects\the_dalles/results/the_dalles_2015-12-31_00_00_00_optimized.ipynb
100%|██████████| 10/10 [00:19<00:00,  1.00it/s]
Input Notebook:  optimize.ipynb
Output Notebook: D:\gitClones\nteract_models\optimize\projects\the_dalles/results/the_dalles_2016-12-31_00_00_00_optimized.ipynb
100%|██████████| 10/10 [00:12<00:00,  1.54it/s]
Input Notebook:  optimize.ipynb
Output Notebook: D:\gitClones\nteract_models\optimize\projects\the_dalles/results/the_dalles_2017-12-31_00_00_00_optimized.ipynb
100%|██████████| 10/10 [00:13<00:00,  1.15it/s]
Input Notebook:  optimize.ipynb
Output Notebook: D:\gitClones\nteract_models\optimize\projects\the_dalles/results/the_dalles_2018-12-31_00_00_00_optimized.ipynb
100