## Context
You work in the data analysis team of a very important company. On Monday, the company shares some good news with you: you just got hired by a major retail company! So, let's get prepared for a huge amount of work!

Then you get to work with your team and define the following tasks to perform:   
1. You need to start your analysis using data from the past.  
2. You need to define a process that takes your daily data as an input and integrates it.  

You are in charge of the second part, so you are provided with a sample file that you will have to read daily. To complete you task, you need the following aggregates:
* One aggregate per store that adds up the rest of the values.
* One aggregate per item that adds up the rest of the values.

You can import the `raw_sales` table from the database `retail_sales` fon of Ironhack's databases. 

## Your task
Therefore, your process will consist of the following steps:
1. Read the sample file that a daily process will save in your folder. 
2. Clean up the data.
3. Create the aggregates.
4. Write three tables in your local database: 
    - A table for the cleaned data.
    - A table for the aggregate per store.
    - A table for the aggregate per item.

## Instructions
* Clean the data and create the aggregates as you consider.
* Create the tables in your local database.
* Populate them with your process.

In [13]:
import numpy as np
import pandas as pd

In [14]:
data = pd.read_csv('../Datasets_as_CSV/retail_sales-raw_sales.csv', sep=";",index_col = 'date',parse_dates=True)
data

Unnamed: 0_level_0,shop_id,item_id,item_price,item_cnt_day
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2015-01-04,29,1469,1199.0,1.0
2015-01-04,28,21364,479.0,1.0
2015-01-04,28,21365,999.0,2.0
2015-01-04,28,22104,249.0,2.0
2015-01-04,28,22091,179.0,1.0
...,...,...,...,...
2015-01-04,15,4240,1299.0,1.0
2015-01-04,14,21922,99.0,1.0
2015-01-04,15,1969,3999.0,1.0
2015-01-04,14,22091,179.0,1.0


In [15]:
type(data.index)

pandas.core.indexes.datetimes.DatetimeIndex

In [16]:
data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 4545 entries, 2015-01-04 to 2015-01-04
Data columns (total 4 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   shop_id       4545 non-null   int64  
 1   item_id       4545 non-null   int64  
 2   item_price    4545 non-null   float64
 3   item_cnt_day  4545 non-null   float64
dtypes: float64(2), int64(2)
memory usage: 177.5 KB


In [17]:
data.describe()

Unnamed: 0,shop_id,item_id,item_price,item_cnt_day
count,4545.0,4545.0,4545.0,4545.0
mean,34.021122,11140.459406,1031.686121,1.10363
std,16.565517,6558.649572,2073.91999,0.536967
min,2.0,30.0,3.0,-1.0
25%,22.0,4977.0,249.0,1.0
50%,31.0,11247.0,479.0,1.0
75%,50.0,16671.0,1192.0,1.0
max,59.0,22162.0,27990.0,10.0


In [18]:
df = data.copy()

In [19]:
df['total_amount'] = df['item_price']*df['item_cnt_day']
df

Unnamed: 0_level_0,shop_id,item_id,item_price,item_cnt_day,total_amount
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2015-01-04,29,1469,1199.0,1.0,1199.0
2015-01-04,28,21364,479.0,1.0,479.0
2015-01-04,28,21365,999.0,2.0,1998.0
2015-01-04,28,22104,249.0,2.0,498.0
2015-01-04,28,22091,179.0,1.0,179.0
...,...,...,...,...,...
2015-01-04,15,4240,1299.0,1.0,1299.0
2015-01-04,14,21922,99.0,1.0,99.0
2015-01-04,15,1969,3999.0,1.0,3999.0
2015-01-04,14,22091,179.0,1.0,179.0


In [40]:
item = pd.DataFrame(df.groupby(['item_id','item_price'])[['item_cnt_day','total_amount']].sum())
item

Unnamed: 0_level_0,Unnamed: 1_level_0,item_cnt_day,total_amount
item_id,item_price,Unnamed: 2_level_1,Unnamed: 3_level_1
30,169.0,3.0,507.0
31,363.0,3.0,1089.0
32,149.0,3.0,447.0
42,299.0,3.0,897.0
59,249.0,3.0,747.0
...,...,...,...
22091,179.0,6.0,1074.0
22092,179.0,3.0,537.0
22104,249.0,6.0,1494.0
22140,217.5,3.0,652.5


In [29]:
pd.DataFrame(df.groupby(['shop_id','item_id','item_cnt_day'])['total_amount'].sum())

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,total_amount
shop_id,item_id,item_cnt_day,Unnamed: 3_level_1
2,1970,1.0,26997.00
2,1971,1.0,13497.00
2,2871,1.0,2997.00
2,2881,1.0,2997.00
2,3028,1.0,7797.00
...,...,...,...
59,20608,1.0,5997.00
59,20949,2.0,30.00
59,21362,1.0,3297.00
59,21364,1.0,1437.00


In [31]:
pd.DataFrame(df.groupby(['shop_id'])['total_amount'].sum())

Unnamed: 0_level_0,total_amount
shop_id,Unnamed: 1_level_1
2,103746.0
3,67443.0
4,29361.0
5,33138.0
6,138678.0
7,52371.0
10,22716.0
12,295173.0
14,57450.0
15,125139.0
