This notebook purpose is to generate pickle models

In [1]:
from pandas import read_excel

In [2]:
df = read_excel("../data/Sample - Superstore.xls")

In [3]:
def prepare_y(df, category):
    data = df.loc[df['Category'] == category]
    cols = ['Row ID', 'Order ID', 'Ship Date', 'Ship Mode', 'Customer ID', 'Customer Name', 'Segment', 'Country', 'City', 
            'State', 'Postal Code', 'Region', 'Product ID', 'Category', 'Sub-Category', 'Product Name', 'Quantity', 'Discount', 'Profit']
    data.drop(cols, axis=1, inplace=True)
    data = data.sort_values('Order Date')
    data = data.groupby('Order Date')['Sales'].sum().reset_index()
    data = data.set_index('Order Date')
    y = data['Sales'].resample('MS').mean()
    return y

In [4]:
def gridSearch(y):
    '''
    This function searches for hyperparametes using the SARIMAX function from statsmodels, it evens searches for trend and seasonability type.
    '''
    import grid_search
    scores = grid_search.sarima_grid_search(y, 12)

    from ast import literal_eval
    order = literal_eval(scores[0][0])[0]
    seasonal_order = literal_eval(scores[0][0])[1]
    trend = literal_eval(scores[0][0])[2]
    
    return order, seasonal_order, trend

In [5]:
def fitModel(y, order, seasonal_order, trend):
    import statsmodels.api as sm
    mod = sm.tsa.statespace.SARIMAX(y,
                                    order=order,
                                    seasonal_order=seasonal_order,
                                    trend=trend,
                                    enforce_stationarity=False,
                                    enforce_invertibility=False)
    results = mod.fit()
    return results

In [6]:
def savePickle(results, category):
    import pickle
    pickle.dump(results, open(category+".pkl","wb"))
    print('Pickle file saved as '+category+".pkl")

In [7]:
import grid_search
for x in df.Category.unique():
    category = x
    y = prepare_y(df, category)
    order, seasonal_order, trend = gridSearch(y)
    results = fitModel(y, order, seasonal_order, trend)
    grid_search.test_prediction(results, y)
    savePickle(results, category)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


 > Model[[(0, 0, 0), (0, 0, 0, 12), 'n']] 845.331
 > Model[[(0, 0, 0), (0, 1, 0, 12), 'n']] 145.601
 > Model[[(0, 0, 0), (1, 1, 0, 12), 'n']] 191.856
 > Model[[(0, 0, 0), (1, 0, 0, 12), 'n']] 146.166
 > Model[[(0, 0, 0), (2, 0, 0, 12), 'n']] 203.185
 > Model[[(0, 0, 0), (0, 0, 0, 12), 'c']] 307.084
 > Model[[(0, 0, 0), (0, 1, 0, 12), 'ct']] 156.386
 > Model[[(0, 0, 0), (0, 1, 0, 12), 'c']] 149.634
 > Model[[(0, 0, 0), (1, 0, 0, 12), 't']] 186.491
 > Model[[(0, 0, 0), (1, 0, 0, 12), 'c']] 166.087
 > Model[[(0, 0, 1), (0, 0, 0, 12), 'n']] 533.418
 > Model[[(0, 0, 1), (0, 1, 0, 12), 'n']] 148.411
 > Model[[(0, 0, 0), (1, 1, 0, 12), 't']] 212.949
 > Model[[(0, 0, 0), (1, 1, 0, 12), 'c']] 207.907
 > Model[[(0, 0, 0), (1, 0, 0, 12), 'ct']] 176.507
 > Model[[(0, 0, 1), (1, 0, 0, 12), 'n']] 153.155
 > Model[[(0, 0, 1), (1, 1, 0, 12), 'n']] 203.120
 > Model[[(0, 0, 0), (2, 0, 0, 12), 't']] 218.754
 > Model[[(0, 0, 0), (1, 1, 0, 12), 'ct']] 233.905
 > Model[[(0, 0, 0), (0, 0, 0, 12), 'ct']] 327.


To register the converters:
	>>> from pandas.plotting import register_matplotlib_converters
	>>> register_matplotlib_converters()


<Figure size 1600x800 with 4 Axes>

<Figure size 1400x700 with 1 Axes>

The Mean Squared Error of our forecasts is 15689.57
The Root Mean Squared Error of our forecasts is 125.26


  freq=base_index.freq)


<Figure size 1400x700 with 1 Axes>

Pickle file saved as Furniture.pkl


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


 > Model[[(0, 0, 0), (0, 0, 0, 12), 'n']] 832.197
 > Model[[(0, 0, 0), (0, 1, 0, 12), 'n']] 357.019
 > Model[[(0, 0, 0), (1, 1, 0, 12), 'n']] 358.955
 > Model[[(0, 0, 0), (1, 0, 0, 12), 'n']] 362.823
 > Model[[(0, 0, 0), (2, 0, 0, 12), 'n']] 323.087
 > Model[[(0, 0, 0), (0, 0, 0, 12), 'ct']] 286.794
 > Model[[(0, 0, 0), (0, 0, 0, 12), 'c']] 356.559
 > Model[[(0, 0, 1), (0, 1, 0, 12), 'n']] 365.839
 > Model[[(0, 0, 0), (0, 1, 0, 12), 'c']] 334.596
 > Model[[(0, 0, 0), (0, 1, 0, 12), 'ct']] 341.378
 > Model[[(0, 0, 1), (1, 0, 0, 12), 'n']] 377.112
 > Model[[(0, 0, 0), (1, 0, 0, 12), 'c']] 316.209
 > Model[[(0, 0, 0), (2, 0, 0, 12), 'c']] 307.060
 > Model[[(0, 0, 1), (1, 1, 0, 12), 'n']] 358.311
 > Model[[(0, 0, 0), (1, 1, 0, 12), 'c']] 297.373
 > Model[[(0, 0, 0), (0, 0, 0, 12), 't']] 312.439
 > Model[[(0, 0, 0), (1, 0, 0, 12), 'ct']] 277.812
 > Model[[(0, 0, 0), (0, 1, 0, 12), 't']] 329.357
 > Model[[(0, 0, 0), (1, 0, 0, 12), 't']] 273.101
 > Model[[(0, 0, 1), (2, 0, 0, 12), 'n']] 380.3



<Figure size 1600x800 with 4 Axes>

<Figure size 1400x700 with 1 Axes>

  freq=base_index.freq)


The Mean Squared Error of our forecasts is 39660.31
The Root Mean Squared Error of our forecasts is 199.15


<Figure size 1400x700 with 1 Axes>

Pickle file saved as Office Supplies.pkl


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


 > Model[[(0, 0, 0), (0, 0, 0, 12), 'n']] 1146.988
 > Model[[(0, 0, 0), (0, 1, 0, 12), 'n']] 548.902
 > Model[[(0, 0, 0), (1, 1, 0, 12), 'n']] 443.929
 > Model[[(0, 0, 0), (1, 0, 0, 12), 'n']] 546.335
 > Model[[(0, 0, 0), (0, 0, 0, 12), 't']] 582.327
 > Model[[(0, 0, 0), (2, 0, 0, 12), 'n']] 455.707
 > Model[[(0, 0, 0), (0, 0, 0, 12), 'c']] 413.626
 > Model[[(0, 0, 0), (0, 1, 0, 12), 't']] 577.977
 > Model[[(0, 0, 0), (0, 1, 0, 12), 'c']] 559.901
 > Model[[(0, 0, 1), (0, 1, 0, 12), 'c']] 533.049
 > Model[[(0, 0, 0), (1, 0, 0, 12), 'c']] 418.208
 > Model[[(0, 0, 0), (1, 0, 0, 12), 't']] 508.871
 > Model[[(0, 0, 0), (1, 1, 0, 12), 't']] 459.305
 > Model[[(0, 0, 0), (1, 1, 0, 12), 'c']] 447.795
 > Model[[(0, 0, 1), (1, 0, 0, 12), 'c']] 431.933
 > Model[[(0, 0, 0), (2, 0, 0, 12), 'ct']] 510.447
 > Model[[(0, 0, 1), (0, 0, 0, 12), 'n']] 790.471
 > Model[[(0, 0, 1), (0, 1, 0, 12), 'n']] 523.727
 > Model[[(0, 0, 0), (2, 0, 0, 12), 't']] 532.483
 > Model[[(0, 0, 1), (1, 0, 0, 12), 'n']] 613.86

  return matrix[[slice(None)]*(matrix.ndim-1) + [0]]


<Figure size 1600x800 with 4 Axes>

<Figure size 1400x700 with 1 Axes>

The Mean Squared Error of our forecasts is 146696.2
The Root Mean Squared Error of our forecasts is 383.01


  freq=base_index.freq)


<Figure size 1400x700 with 1 Axes>

Pickle file saved as Technology.pkl
