### Store Sales Forescasting

For the Kaggle competition [Store Sales](https://www.kaggle.com/competitions/store-sales-time-series-forecasting/overview)

Forecast store sales based on data from Corporación Favorita, a large Ecuadorian-based grocery retailer.

<b>train.csv</b>

- The training data, comprising time series of features <b>store_nbr</b>, <b>family</b>, and <b>onpromotion</b> as well as the target <b>sales</b>.
- <b>store_nbr</b> identifies the store at which the products are sold.
- <b>family</b> identifies the type of product sold.
- <b>sales</b> gives the total sales for a product family at a particular store at a given date. Fractional values are possible since products can be sold in fractional units (1.5 kg of cheese, for instance, as opposed to 1 bag of chips).
- <b>onpromotion</b> gives the total number of items in a product family that were being promoted at a store at a given date.

<b>test.csv</b>

- The test data, having the same features as the training data. You will predict the target <b>sales</b> for the dates in this file.
- The dates in the test data are for the 15 days after the last date in the training data.

<b>sample_submission.csv</b>

- A sample submission file in the correct format.

<b>stores.csv</b>

- Store metadata, including <b>city</b>, <b>state</b>, <b>type</b>, and <b>cluster</b>.
- <b>cluster</b> is a grouping of similar stores.

<b>oil.csv</b>

- Daily oil price. Includes values during both the train and test data timeframes. (Ecuador is an oil-dependent country and it's economical health is highly vulnerable to shocks in oil prices.)

<b>holidays_events.csv</b>

- Holidays and Events, with metadata
NOTE: Pay special attention to the <b>transferred</b> column. A holiday that is transferred officially falls on that calendar day, but was moved to another date by the government. A transferred day is more like a normal day than a holiday. To find the day that it was actually celebrated, look for the corresponding row where type is Transfer. For example, the holiday Independencia de Guayaquil was transferred from 2012-10-09 to 2012-10-12, which means it was celebrated on 2012-10-12. Days that are type Bridge are extra days that are added to a holiday (e.g., to extend the break across a long weekend). These are frequently made up by the type Work Day which is a day not normally scheduled for work (e.g., Saturday) that is meant to payback the Bridge.
- Additional holidays are days added a regular calendar holiday, for example, as typically happens around Christmas (making Christmas Eve a holiday).

<b>Additional Notes</b>

- Wages in the public sector are paid every two weeks on the 15 th and on the last day of the month. Supermarket sales could be affected by this.
- A magnitude 7.8 earthquake struck Ecuador on April 16, 2016. People rallied in relief efforts donating water and other first need products which greatly affected supermarket sales for several weeks after the earthquake.

In [2]:
#The Following cell of code is used everytime FASTAI library is used.
#They tell the notebook to reload any changes made to any libraries used.
#They also ensure that any graphs are plotted are shown in this notebook
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [3]:
from fastai.tabular.all import *
from fastbook import *

from sklearn.ensemble import RandomForestRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import mean_squared_log_error

import seaborn as sns

from dtreeviz.trees import *
import dtreeviz

In [4]:
#| export
iskaggle = os.environ.get('KAGGLE_KERNEL_RUN_TYPE', '')
creds = ''


In [5]:
#| export
cred_path = Path('~/.kaggle/kaggle.json').expanduser()
if not cred_path.exists():
    cred_path.parent.mkdir(exist_ok=True)
    cred_path.write_text(creds)
    cred_path.chmod(0o600)


In [6]:
#| export
path = Path('store-sales-time-series-forecasting')

In [7]:
#| export
if not iskaggle and not path.exists():
    import zipfile, kaggle
    kaggle.api.competition_download_cli(str(path))    
    zipfile.ZipFile(f'{path}.zip').extractall(path)


In [8]:
#| export
if iskaggle:
    path = Path('../input/store-sales-time-series-forecasting')
    ! pip install -q dataset

In [9]:
[x for x in path.ls()]

[Path('store-sales-time-series-forecasting/sample_submission.csv'),
 Path('store-sales-time-series-forecasting/holidays_events.csv'),
 Path('store-sales-time-series-forecasting/oil.csv'),
 Path('store-sales-time-series-forecasting/test.csv'),
 Path('store-sales-time-series-forecasting/train.csv'),
 Path('store-sales-time-series-forecasting/transactions.csv'),
 Path('store-sales-time-series-forecasting/stores.csv')]

In [10]:
train_df = pd.read_csv(path/'train.csv', low_memory=False)
test_df = pd.read_csv(path/'test.csv', low_memory=False)
sample_df = pd.read_csv(path/'sample_submission.csv', low_memory=False)
stores_df = pd.read_csv(path/'stores.csv', low_memory=False)
oil_df = pd.read_csv(path/'oil.csv', low_memory=False)
hol_events_df = pd.read_csv(path/'holidays_events.csv', low_memory=False)
transactions_df = pd.read_csv(path/'transactions.csv', low_memory=False)

In [49]:
train_df.sample(n=5)

Unnamed: 0,id,date,store_nbr,family,sales,onpromotion
2475809,2475809,2016-10-24,26,HOME APPLIANCES,0.0,0
1365191,1365191,2015-02-08,14,HARDWARE,0.0,0
2179430,2179430,2016-05-11,10,FROZEN FOODS,26.0,0
303448,303448,2013-06-20,23,GROCERY II,26.0,0
1422914,1422914,2015-03-12,33,LAWN AND GARDEN,0.0,0


In [50]:
stores_df.sample(n=5)

Unnamed: 0,store_nbr,city,state,type,cluster
49,50,Ambato,Tungurahua,A,14
4,5,Santo Domingo,Santo Domingo de los Tsachilas,D,4
8,9,Quito,Pichincha,B,6
42,43,Esmeraldas,Esmeraldas,E,10
10,11,Cayambe,Pichincha,B,6


In [51]:
oil_df.head()

Unnamed: 0,date,dcoilwtico
0,2013-01-01,
1,2013-01-02,93.14
2,2013-01-03,92.97
3,2013-01-04,93.12
4,2013-01-07,93.2


In [52]:
hol_events_df.head()

Unnamed: 0,date,type,locale,locale_name,description,transferred
0,2012-03-02,Holiday,Local,Manta,Fundacion de Manta,False
1,2012-04-01,Holiday,Regional,Cotopaxi,Provincializacion de Cotopaxi,False
2,2012-04-12,Holiday,Local,Cuenca,Fundacion de Cuenca,False
3,2012-04-14,Holiday,Local,Libertad,Cantonizacion de Libertad,False
4,2012-04-21,Holiday,Local,Riobamba,Cantonizacion de Riobamba,False


In [54]:
transactions_df.tail()

Unnamed: 0,date,store_nbr,transactions
83483,2017-08-15,50,2804
83484,2017-08-15,51,1573
83485,2017-08-15,52,2255
83486,2017-08-15,53,932
83487,2017-08-15,54,802


In [11]:
test_df.head()

Unnamed: 0,id,date,store_nbr,family,onpromotion
0,3000888,2017-08-16,1,AUTOMOTIVE,0
1,3000889,2017-08-16,1,BABY CARE,0
2,3000890,2017-08-16,1,BEAUTY,2
3,3000891,2017-08-16,1,BEVERAGES,20
4,3000892,2017-08-16,1,BOOKS,0
