## Context
You work in the data analysis team of a very important company. On Monday, the company shares some good news with you: you just got hired by a major retail company! So, let's get prepared for a huge amount of work!

Then you get to work with your team and define the following tasks to perform:   
1. You need to start your analysis using data from the past.  
2. You need to define a process that takes your daily data as an input and integrates it.  

You are in charge of the second part, so you are provided with a sample file that you will have to read daily. To complete you task, you need the following aggregates:
* One aggregate per store that adds up the rest of the values.
* One aggregate per item that adds up the rest of the values.

You can import the `raw_sales` table from the database `retail_sales` fon of Ironhack's databases. 

## Your task
Therefore, your process will consist of the following steps:
1. Read the sample file that a daily process will save in your folder. 
2. Clean up the data.
3. Create the aggregates.
4. Write three tables in your local database: 
    - A table for the cleaned data.
    - A table for the aggregate per store.
    - A table for the aggregate per item.

## Instructions
* Clean the data and create the aggregates as you consider.
* Create the tables in your local database.
* Populate them with your process.

In [1]:
# your code here
import pandas as pd
import numpy as np
import os

In [5]:
os.chdir("C:\\Users\\GiantsV3\\Documents\\Ironhack\\Week2\\Day3\\lab-df-calculation-and-transformation")

In [13]:
raw_sales = pd.read_csv("data/retail_sales-raw_sales.csv", sep=";")
raw_sales

Unnamed: 0,date,shop_id,item_id,item_price,item_cnt_day
0,2015-01-04 00:00:00,29,1469,1199.0,1.0
1,2015-01-04 00:00:00,28,21364,479.0,1.0
2,2015-01-04 00:00:00,28,21365,999.0,2.0
3,2015-01-04 00:00:00,28,22104,249.0,2.0
4,2015-01-04 00:00:00,28,22091,179.0,1.0
...,...,...,...,...,...
4540,2015-01-04 00:00:00,15,4240,1299.0,1.0
4541,2015-01-04 00:00:00,14,21922,99.0,1.0
4542,2015-01-04 00:00:00,15,1969,3999.0,1.0
4543,2015-01-04 00:00:00,14,22091,179.0,1.0


In [17]:
raw_sales["revenue"] = raw_sales["item_price"] * raw_sales["item_cnt_day"]
raw_sales

Unnamed: 0,date,shop_id,item_id,item_price,item_cnt_day,revenue
0,2015-01-04 00:00:00,29,1469,1199.0,1.0,1199.0
1,2015-01-04 00:00:00,28,21364,479.0,1.0,479.0
2,2015-01-04 00:00:00,28,21365,999.0,2.0,1998.0
3,2015-01-04 00:00:00,28,22104,249.0,2.0,498.0
4,2015-01-04 00:00:00,28,22091,179.0,1.0,179.0
...,...,...,...,...,...,...
4540,2015-01-04 00:00:00,15,4240,1299.0,1.0,1299.0
4541,2015-01-04 00:00:00,14,21922,99.0,1.0,99.0
4542,2015-01-04 00:00:00,15,1969,3999.0,1.0,3999.0
4543,2015-01-04 00:00:00,14,22091,179.0,1.0,179.0


In [33]:
revenue_store = raw_sales.groupby(by="shop_id")["revenue"].sum().sort_values(ascending=False)
revenue_store

shop_id
42    330111.0
31    304692.0
12    295173.0
25    288432.0
21    228999.0
57    226269.0
37    220500.0
28    202512.0
27    172959.0
55    170847.6
22    150717.0
58    142863.0
50    142053.0
6     138678.0
44    137445.0
54    125343.0
15    125139.0
16    121923.0
26    120462.0
59    113109.0
2     103746.0
46     93903.0
29     85737.0
45     82350.0
47     80142.0
38     73482.0
3      67443.0
35     65769.0
52     63531.0
14     57450.0
24     56955.0
56     54906.0
7      52371.0
19     51420.0
53     50505.0
41     36840.0
18     35787.0
49     35784.0
39     34686.0
5      33138.0
48     32745.0
4      29361.0
10     22716.0
34     12117.0
51     10665.0
Name: revenue, dtype: float64

In [36]:
revenue_item = raw_sales.groupby(by="item_id")["revenue"].sum().sort_values(ascending=False)
revenue_item

item_id
1969     262134.0
6675     242910.0
1971     121473.0
1970     107988.0
13494     89940.0
           ...   
8095      -1497.0
1523      -2397.0
2690      -4794.0
2575      -6297.0
7877     -11970.0
Name: revenue, Length: 985, dtype: float64