# Library Usage in Seattle, 2005-2020

## Modeling

The data that is the basis for this project is the [Checkouts by Title (Physical Items)](https://data.seattle.gov/Community/Checkouts-By-Title-Physical-Items-/5src-czff) dataset from [Seattle Open Data](https://data.seattle.gov/). It was downloaded on December 15, 2020. Using API calls in [this notebook](0x_api_calls.ipynb), I obtained the rest of the data for the year 2020.

In this notebook, I will be working only with the item checkout counts data created in the [previous (EDA) notebook](02_eda.ipynb). There are 2 versions of this data, one that is missing values for various dates, and one in which those missing values have been imputed (again this process is detailed in the [previous notebook](02_eda.ipynb)). I will mostly be working with the imputed data, but may try using Facebook Prophet in conjunction with the unimputed dataset, since it can handle missing values.

I will look at seasonality, trends, and stationarity in order to create time series models and forecasts for future library use, as well as measure the impact that the COVID-19 pandemic has had on Seattle's libraries.


## Table of contents

1. [Import required packages](#Import-required-packages)
2. [Load data](#Load-data)
3. 

### Import required packages

[[go back to the top](#Library-Usage-in-Seattle,-2005-2020)]

In [None]:
# standard dataframe packages
import pandas as pd; pd.set_option('display.max_columns', 50)
import numpy as np

# graphing packages
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns; sns.set_style('ticks')

# time-related packages
import datetime
import holidays

# statistical packages
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.graphics.tsaplots import plot_pacf, plot_acf
from sklearn.metrics import mean_squared_error

# modeling packages
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.statespace.sarimax import SARIMAX

# saving packages
import gzip
import pickle

# custom functions
from functions.data_cleaning import *

# reload functions/libraries when edited
%load_ext autoreload
%autoreload 2

# ignore warnings
import warnings
warnings.filterwarnings('ignore')

# # may want to use the following:
# from keras.models import Sequential
# from keras.layers import Dense
# from keras.layers import LSTM
# from keras.layers import TimeDistributed
# from keras.layers import Flatten
# from keras.layers import Dropout
# from keras.layers.convolutional import Conv1D
# from keras.layers.convolutional import MaxPooling1D

### Load data

[[go back to the top](#Library-Usage-in-Seattle,-2005-2020)]

In [1]:
# load imputed data
df_imputed = pd.read_pickle('data/seattle_lib_counts_imputed.pkl', compression='gzip')

# load unimputed data
df_unimputed = pd.read_pickle('data/seattle_lib_counts_unimputed.pkl', compression='gzip')

In [2]:
# take a look
df_imputed.head()

Unnamed: 0,total_checkouts,missing_title,missing_subjects,format_group_Equipment,format_group_Media,format_group_Other,format_group_Print,format_subgroup_Art,format_subgroup_Audio Disc,format_subgroup_Audio Tape,...,category_group_Language,category_group_Nonfiction,category_group_Other,category_group_Reference,age_group_Adult,age_group_Juvenile,age_group_Teen,day,weekend,holiday
2005-04-13,16471.0,212.0,664.0,1.0,6397.0,32.0,10041.0,0.0,1874.0,63.0,...,370.0,6719.0,1143.0,18.0,11257.0,4613.0,601.0,Wednesday,0,0
2005-04-14,10358.0,123.0,541.0,1.0,4015.0,75.0,6267.0,0.0,1245.0,31.0,...,272.0,4104.0,621.0,12.0,6726.0,3381.0,251.0,Thursday,0,0
2005-04-15,12896.0,179.0,508.0,0.0,5351.0,51.0,7494.0,0.0,1462.0,54.0,...,302.0,5166.0,1014.0,7.0,8795.0,3747.0,354.0,Friday,0,0
2005-04-16,1358.0,7.0,56.0,0.0,552.0,0.0,806.0,0.0,175.0,8.0,...,29.0,666.0,95.0,1.0,950.0,367.0,41.0,Saturday,1,0
2005-04-17,4555.0,80.0,232.0,0.0,1555.0,8.0,2992.0,0.0,499.0,10.0,...,177.0,2145.0,203.0,5.0,3035.0,1349.0,171.0,Sunday,1,0
