## Context
You work in the data analysis team of a very important company. On Monday, the company shares some good news with you: you just got hired by a major retail company! So, let's get prepared for a huge amount of work!

Then you get to work with your team and define the following tasks to perform:   
1. You need to start your analysis using data from the past.  
2. You need to define a process that takes your daily data as an input and integrates it.  

You are in charge of the second part, so you are provided with a sample file that you will have to read daily. To complete you task, you need the following aggregates:
* One aggregate per store that adds up the rest of the values.
* One aggregate per item that adds up the rest of the values.

You can import the `raw_sales` table from the database `retail_sales` fon of Ironhack's databases. 

## Your task
Therefore, your process will consist of the following steps:
1. Read the sample file that a daily process will save in your folder. 
2. Clean up the data.
3. Create the aggregates.
4. Write three tables in your local database: 
    - A table for the cleaned data.
    - A table for the aggregate per store.
    - A table for the aggregate per item.

## Instructions
* Clean the data and create the aggregates as you consider.
* Create the tables in your local database.
* Populate them with your process.

In [1]:
import pandas as pd
import numpy as np

In [4]:
#we need to specify the separator in this case a semicolon

sales=pd.read_csv("../Datasets/retail_sales-raw_sales.csv", sep=';')

In [12]:
sales.head(5)

Unnamed: 0,date,shop_id,item_id,item_price,item_cnt_day
0,2015-01-04 00:00:00,29,1469,1199.0,1.0
1,2015-01-04 00:00:00,28,21364,479.0,1.0
2,2015-01-04 00:00:00,28,21365,999.0,2.0
3,2015-01-04 00:00:00,28,22104,249.0,2.0
4,2015-01-04 00:00:00,28,22091,179.0,1.0


In [28]:
sales[sales.duplicated()]

Unnamed: 0,date,shop_id,item_id,item_price,item_cnt_day
1515,2015-01-04 00:00:00,29,1469,1199.0,1.0
1516,2015-01-04 00:00:00,28,21364,479.0,1.0
1517,2015-01-04 00:00:00,28,21365,999.0,2.0
1518,2015-01-04 00:00:00,28,22104,249.0,2.0
1519,2015-01-04 00:00:00,28,22091,179.0,1.0
...,...,...,...,...,...
4540,2015-01-04 00:00:00,15,4240,1299.0,1.0
4541,2015-01-04 00:00:00,14,21922,99.0,1.0
4542,2015-01-04 00:00:00,15,1969,3999.0,1.0
4543,2015-01-04 00:00:00,14,22091,179.0,1.0


In [16]:
sales.item_id.value_counts()

21364    48
1969     48
20949    45
17717    42
11927    36
         ..
21377     3
2946      3
10615     3
11142     3
12488     3
Name: item_id, Length: 985, dtype: int64

In [21]:
sales.groupby("item_id").aggregate({"item_cnt_day":"sum"})

Unnamed: 0_level_0,item_cnt_day
item_id,Unnamed: 1_level_1
30,3.0
31,3.0
32,3.0
42,3.0
59,3.0
...,...
22091,6.0
22092,3.0
22104,6.0
22140,3.0


In [23]:
sales.groupby(["shop_id","item_id"]).aggregate({"item_cnt_day":"sum"})

Unnamed: 0_level_0,Unnamed: 1_level_0,item_cnt_day
shop_id,item_id,Unnamed: 2_level_1
2,1970,3.0
2,1971,3.0
2,2871,3.0
2,2881,3.0
2,3028,3.0
...,...,...
59,20608,3.0
59,20949,6.0
59,21362,3.0
59,21364,3.0
