## Context
You work in the data analysis team of a very important company. On Monday, the company shares some good news with you: you just got hired by a major retail company! So, let's get prepared for a huge amount of work!

Then you get to work with your team and define the following tasks to perform:   
1. You need to start your analysis using data from the past.  
2. You need to define a process that takes your daily data as an input and integrates it.  

You are in charge of the second part, so you are provided with a sample file that you will have to read daily. To complete you task, you need the following aggregates:
* One aggregate per store that adds up the rest of the values.
* One aggregate per item that adds up the rest of the values.

You can import the `raw_sales` table from the database `retail_sales` fon of Ironhack's databases. 

## Your task
Therefore, your process will consist of the following steps:
1. Read the sample file that a daily process will save in your folder. 
2. Clean up the data.
3. Create the aggregates.
4. Write three tables in your local database: 
    - A table for the cleaned data.
    - A table for the aggregate per store.
    - A table for the aggregate per item.

## Instructions
* Clean the data and create the aggregates as you consider.
* Create the tables in your local database.
* Populate them with your process.

In [1]:
import pandas as pd
import numpy as np
raw_sales = pd.read_csv("../../Lab Datasets as CSV/retail_sales-raw_sales.csv", sep =";")

In [2]:
#Data of the day : 2015-01-04
raw_sales.info()
raw_sales.head()
raw_sales.describe()
raw_sales.head()
raw_sales.drop("date",axis=1,inplace=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4545 entries, 0 to 4544
Data columns (total 5 columns):
date            4545 non-null object
shop_id         4545 non-null int64
item_id         4545 non-null int64
item_price      4545 non-null float64
item_cnt_day    4545 non-null float64
dtypes: float64(2), int64(2), object(1)
memory usage: 177.7+ KB


In [4]:
#Table that gives you the WORST-10 shops by revenue on 2015-01-04
raw_sales["item_sold"] = raw_sales["item_price"] * raw_sales["item_cnt_day"]
raw_sales.groupby("shop_id")[["item_sold"]].sum().sort_values(by="item_sold",ascending=True).head(10)

Unnamed: 0_level_0,item_sold
shop_id,Unnamed: 1_level_1
51,10665.0
34,12117.0
10,22716.0
4,29361.0
48,32745.0
5,33138.0
39,34686.0
49,35784.0
18,35787.0
41,36840.0


In [25]:
#Table that gives you the TOP-10 shops by revenue and their total items sold on 2015-01-04
raw_sales["item_sold"] = raw_sales["item_price"] * raw_sales["item_cnt_day"] #"item_sold" gives you the total revenure per item of the Day
raw_sales.groupby("shop_id").aggregate({"item_sold":"sum","item_cnt_day":"count"}).sort_values(by="item_sold",ascending=False).head(10)

Unnamed: 0_level_0,item_sold,item_cnt_day
shop_id,Unnamed: 1_level_1,Unnamed: 2_level_1
42,330111.0,240
31,304692.0,345
12,295173.0,144
25,288432.0,294
21,228999.0,174
57,226269.0,309
37,220500.0,60
28,202512.0,201
27,172959.0,162
55,170847.6,105


In [27]:
#Table that gives you the TOP-10 shops by total items sold and their revenue on 2015-01-04
raw_sales.groupby("shop_id").aggregate({"item_cnt_day":"count","item_sold":"sum"}).sort_values(by="item_cnt_day",ascending=False).head(10)

Unnamed: 0_level_0,item_cnt_day,item_sold
shop_id,Unnamed: 1_level_1,Unnamed: 2_level_1
31,345,304692.0
57,309,226269.0
25,294,288432.0
42,240,330111.0
28,201,202512.0
54,186,125343.0
21,174,228999.0
27,162,172959.0
58,153,142863.0
12,144,295173.0


In [30]:
#Top-10 item solds throught the day, in all shops combined
raw_sales.groupby("item_id").aggregate({"item_price":"first","item_cnt_day":"sum","item_sold":"sum"}).sort_values(by="item_cnt_day",ascending=False).head(10)

Unnamed: 0_level_0,item_price,item_cnt_day,item_sold
item_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
20949,5.0,93.0,453.0
1969,3999.0,66.0,262134.0
21364,479.0,66.0,31470.0
17717,1000.0,60.0,88257.0
11927,599.0,51.0,29469.0
13802,499.0,39.0,19461.0
21365,999.0,36.0,35664.0
5822,1149.0,36.0,41364.0
10476,479.0,33.0,15807.0
20225,479.0,33.0,15807.0


In [18]:
#Top-10 Shops where a signle item is sold most times on 2015-01-04
raw_sales.groupby(["item_cnt_day","shop_id"])[["item_id"]].first().sort_values(by="item_cnt_day",ascending=False).head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,item_id
item_cnt_day,shop_id,Unnamed: 2_level_1
10.0,12,11370
6.0,55,1875
6.0,28,20949
5.0,58,21364
5.0,22,13387
5.0,12,13753
4.0,55,13097
4.0,50,20949
4.0,45,6738
4.0,31,5822


In [33]:
raw_sales.head()

Unnamed: 0,shop_id,item_id,item_price,item_cnt_day,item_sold
0,29,1469,1199.0,1.0,1199.0
1,28,21364,479.0,1.0,479.0
2,28,21365,999.0,2.0,1998.0
3,28,22104,249.0,2.0,498.0
4,28,22091,179.0,1.0,179.0


In [23]:
raw_sales.describe()

Unnamed: 0,shop_id,item_id,item_price,item_cnt_day,item_sold
count,4545.0,4545.0,4545.0,4545.0,4545.0
mean,34.021122,11140.459406,1031.686121,1.10363,1118.102442
std,16.565517,6558.649572,2073.91999,0.536967,2238.801761
min,2.0,30.0,3.0,-1.0,-3990.0
25%,22.0,4977.0,249.0,1.0,249.0
50%,31.0,11247.0,479.0,1.0,479.0
75%,50.0,16671.0,1192.0,1.0,1199.0
max,59.0,22162.0,27990.0,10.0,27990.0
