## Context
You work in the data analysis team of a very important company. On Monday, the company shares some good news with you: you just got hired by a major retail company! So, let's get prepared for a huge amount of work!

Then you get to work with your team and define the following tasks to perform:   
1. You need to start your analysis using data from the past.  
2. You need to define a process that takes your daily data as an input and integrates it.  

You are in charge of the second part, so you are provided with a sample file that you will have to read daily. To complete you task, you need the following aggregates:
* One aggregate per store that adds up the rest of the values.
* One aggregate per item that adds up the rest of the values.

You can import the `raw_sales` table from the database `retail_sales` fon of Ironhack's databases. 

## Your task
Therefore, your process will consist of the following steps:
1. Read the sample file that a daily process will save in your folder. 
2. Clean up the data.
3. Create the aggregates.
4. Write three tables in your local database: 
    - A table for the cleaned data.
    - A table for the aggregate per store.
    - A table for the aggregate per item.

## Instructions
* Clean the data and create the aggregates as you consider.
* Create the tables in your local database.
* Populate them with your process.

In [1]:
# your code here
import pandas as pd
import numpy as np

In [6]:
retail_raw_sales = pd.read_csv("../data/retail_sales-raw_sales.csv", sep=";")

In [20]:
#convert the item_cnt_day to a int64 to avoid decimal counts
retail_raw_sales["item_cnt_day"] = retail_raw_sales["item_cnt_day"].astype(int)

In [22]:
retail_raw_sales.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4545 entries, 0 to 4544
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   date          4545 non-null   object 
 1   shop_id       4545 non-null   int64  
 2   item_id       4545 non-null   int64  
 3   item_price    4545 non-null   float64
 4   item_cnt_day  4545 non-null   int64  
dtypes: float64(1), int64(3), object(1)
memory usage: 177.7+ KB


In [25]:
raw_sales_agg_by_shop = retail_raw_sales.groupby("shop_id").sum()

In [27]:
raw_sales_agg_by_shop.head(10)

Unnamed: 0_level_0,item_id,item_price,item_cnt_day
shop_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2,966879,99070.5,81
3,335745,67443.0,33
4,498624,29361.0,39
5,620868,33138.0,45
6,1266894,116352.0,150
7,669045,52371.0,63
10,310137,22707.0,30
12,1647339,212196.4,216
14,421977,33456.0,51
15,1210026,125139.0,93


In [28]:
raw_sales_agg_by_item = retail_raw_sales.groupby("item_id").sum()

In [30]:
raw_sales_agg_by_item.head(10)

Unnamed: 0_level_0,shop_id,item_price,item_cnt_day
item_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
30,84,507.0,3
31,18,1089.0,3
32,93,447.0,3
42,162,897.0,3
59,171,747.0,3
74,75,1497.0,3
109,162,747.0,3
259,162,747.0,3
464,36,897.0,3
482,222,19800.0,12
