# **<font color = '#3498eb'>OBJECTIVE</font>**

Predict sales for the thousands of product families sold at Favorita stores located in Ecuador.

# **<font color = '#3498eb'>Introduction</font>**

The objective is to predict the sales of each one of the product families

# **<font color = '#3498eb'>Libraries</font>**

In [1]:
# BASE ------------------------------------------------------
import numpy as np
import pandas as pd
import os
import gc
import warnings

# PACF - ACF ------------------------------------------------------
import statsmodels.api as sm

# DATA VISUALIZATION ------------------------------------------------------
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

# CONFIGURATIONS ------------------------------------------------------
pd.set_option('display.max_columns', None)
pd.options.display.float_format = '{:.2f}'.format
warnings.filterwarnings('ignore')

# **<font color = '#3498eb'>Importing data</font>**

In [15]:
df_train_favorita = pd.read_csv("F:/GDrive_DATA/PROYECTOS/Kaggle_Store_Sales cvs/train.csv", index_col="id")
df_test_favorita = pd.read_csv("F:/GDrive_DATA/PROYECTOS/Kaggle_Store_Sales cvs/test.csv", index_col="id")
df_stores_favorita = pd.read_csv("F:/GDrive_DATA/PROYECTOS/Kaggle_Store_Sales cvs/stores.csv", index_col="store_nbr")
df_transactions_favorita = pd.read_csv("F:/GDrive_DATA/PROYECTOS/Kaggle_Store_Sales cvs/transactions.csv")

# **<font color = '#3498eb'>Understanding data</font>**

In [29]:
df_train_favorita.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3000888 entries, 0 to 3000887
Data columns (total 5 columns):
 #   Column       Dtype  
---  ------       -----  
 0   date         object 
 1   store_nbr    int64  
 2   family       object 
 3   sales        float64
 4   onpromotion  int64  
dtypes: float64(1), int64(2), object(2)
memory usage: 137.4+ MB


In [11]:
df_train_favorita.head()

Unnamed: 0_level_0,date,store_nbr,family,sales,onpromotion
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,2013-01-01,1,AUTOMOTIVE,0.0,0
1,2013-01-01,1,BABY CARE,0.0,0
2,2013-01-01,1,BEAUTY,0.0,0
3,2013-01-01,1,BEVERAGES,0.0,0
4,2013-01-01,1,BOOKS,0.0,0


**store_nbr**: identifies the store at which the products are sold

**family**: identifies the type of product sold

**sales**: gives the total sales for a product family at a particular store at a given date. Fractional values are possible since products can be sold in fractional units (1.5 kg of cheese, for instance, as opposed to 1 bag of chips)

**onpromotion**: gives the total number of items in a product family that were being promoted at a store at a given date

In [22]:
family_list = df_train_favorita['family'].unique()

for each_family in family_list:
    print(each_family)

AUTOMOTIVE
BABY CARE
BEAUTY
BEVERAGES
BOOKS
BREAD/BAKERY
CELEBRATION
CLEANING
DAIRY
DELI
EGGS
FROZEN FOODS
GROCERY I
GROCERY II
HARDWARE
HOME AND KITCHEN I
HOME AND KITCHEN II
HOME APPLIANCES
HOME CARE
LADIESWEAR
LAWN AND GARDEN
LINGERIE
LIQUOR,WINE,BEER
MAGAZINES
MEATS
PERSONAL CARE
PET SUPPLIES
PLAYERS AND ELECTRONICS
POULTRY
PREPARED FOODS
PRODUCE
SCHOOL AND OFFICE SUPPLIES
SEAFOOD


In [16]:
# The dates in the test data are for the 15 days after the last date in the training data.

df_test_favorita.head()

Unnamed: 0_level_0,date,store_nbr,family,onpromotion
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
3000888,2017-08-16,1,AUTOMOTIVE,0
3000889,2017-08-16,1,BABY CARE,0
3000890,2017-08-16,1,BEAUTY,2
3000891,2017-08-16,1,BEVERAGES,20
3000892,2017-08-16,1,BOOKS,0


In [12]:
df_stores_favorita.head()

Unnamed: 0_level_0,city,state,type,cluster
store_nbr,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,Quito,Pichincha,D,13
2,Quito,Pichincha,D,13
3,Quito,Pichincha,D,8
4,Quito,Pichincha,D,9
5,Santo Domingo,Santo Domingo de los Tsachilas,D,4


**store_nbr**: identifies the store at which the products are sold

**city**: where the store is

**state**: where the store is

**type**: 

**cluster**: 

In [8]:
df_transactions_favorita.head()

Unnamed: 0,date,store_nbr,transactions
0,2013-01-01,25,770
1,2013-01-02,1,2111
2,2013-01-02,2,2358
3,2013-01-02,3,3487
4,2013-01-02,4,1922
