# Mandatory Challenge
## Context
You work in the data analysis team of a very important company. On Monday, the company shares some good news with you: you just got hired by a major retail company! So, let's get prepared for a huge amount of work!

Then you get to work with your team and define the following tasks to perform:   
1. You need to start your analysis using data from the past.  
2. You need to define a process that takes your daily data as an input and integrates it.  

You are in charge of the second part, so you are provided with a sample file that you will have to read daily. To complete you task, you need the following aggregates:
* One aggregate per store that adds up the rest of the values.
* One aggregate per item that adds up the rest of the values.

You can import the dataset `warehouse_and_retail_sales` from Ironhack's database. 

## Your task
Therefore, your process will consist of the following steps:
1. Read the sample file that a daily process will save in your folder. 
2. Clean up the data.
3. Create the aggregates.
4. Write three tables in your local database: 
    - A table for the cleaned data.
    - A table for the aggregate per supplier.
    - A table for the aggregate per item.

## Instructions
* Read the csv you can find in Ironhack's database.
* Clean the data and create the aggregates as you consider.
* Create the tables in your local database.
* Populate them with your process.

In [3]:
import pandas as pd
import numpy as np
wrs = pd.read_csv('Warehouse_and_Retail_Sales.csv')
wrs.head()

Unnamed: 0,YEAR,MONTH,SUPPLIER,ITEM CODE,ITEM DESCRIPTION,ITEM TYPE,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
0,2017,4,ROYAL WINE CORP,100200,GAMLA CAB - 750ML,WINE,0.0,1.0,0.0
1,2017,4,SANTA MARGHERITA USA INC,100749,SANTA MARGHERITA P/GRIG ALTO - 375ML,WINE,0.0,1.0,0.0
2,2017,4,JIM BEAM BRANDS CO,10103,KNOB CREEK BOURBON 9YR - 100P - 375ML,LIQUOR,0.0,8.0,0.0
3,2017,4,HEAVEN HILL DISTILLERIES INC,10120,J W DANT BOURBON 100P - 1.75L,LIQUOR,0.0,2.0,0.0
4,2017,4,ROYAL WINE CORP,101664,RAMON CORDOVA RIOJA - 750ML,WINE,0.0,4.0,0.0


In [4]:
null_cols = wrs.isnull().sum()
null_cols[null_cols > 0]

SUPPLIER     24
ITEM TYPE     1
dtype: int64

In [5]:
wrs.dtypes

YEAR                  int64
MONTH                 int64
SUPPLIER             object
ITEM CODE            object
ITEM DESCRIPTION     object
ITEM TYPE            object
RETAIL SALES        float64
RETAIL TRANSFERS    float64
WAREHOUSE SALES     float64
dtype: object

In [6]:
low_variance = []
 
for col in wrs._get_numeric_data():
    minimum = min(wrs[col])
    ninety_perc = np.percentile(wrs[col], 90)
    if ninety_perc == minimum:
        low_variance.append(col)
 
print(low_variance)

[]


In [7]:
stats = wrs.describe().transpose()
stats['IQR'] = stats['75%'] - stats['25%']
stats

Unnamed: 0,count,mean,std,min,25%,50%,75%,max,IQR
YEAR,128355.0,2017.20603,0.404454,2017.0,2017.0,2017.0,2017.0,2018.0,0.0
MONTH,128355.0,7.079303,3.645826,1.0,5.0,8.0,10.0,12.0,5.0
RETAIL SALES,128355.0,6.563037,28.924944,-6.49,0.0,0.33,3.25,1616.6,3.25
RETAIL TRANSFERS,128355.0,7.188161,30.640156,-27.66,0.0,0.0,4.0,1587.99,4.0
WAREHOUSE SALES,128355.0,22.624213,239.693277,-4996.0,0.0,1.0,4.0,16271.75,4.0


In [8]:
outliers = pd.DataFrame(columns = wrs.columns)

for col in stats.index:
    iqr = stats.at[col,'IQR']
    cutoff = iqr * 3
    upper = stats.at[col,'75%'] + cutoff
    lower = stats.at[col,'25%'] - cutoff
    results = wrs[(wrs[col] < lower) | (wrs[col] > upper)].copy()
    results['Outlier'] = col
    outliers = outliers.append(results)

outliers

Unnamed: 0,YEAR,MONTH,SUPPLIER,ITEM CODE,ITEM DESCRIPTION,ITEM TYPE,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES,Outlier
4068,2018,2,LEGENDS LTD,99090,BITBURGER 1/2K,KEGS,0.00,0.0,2.0,YEAR
101911,2018,1,REPUBLIC NATIONAL DISTRIBUTING CO,100009,BOOTLEG RED - 750ML,WINE,0.00,0.0,1.0,YEAR
101912,2018,1,INTERBALT PRODUCTS CORP,100012,PAPI P/GRIG - 750ML,WINE,0.00,0.0,1.0,YEAR
101913,2018,1,ROYAL WINE CORP,100080,KEDEM CREAM RED CONCORD - 750ML,WINE,0.00,0.0,1.0,YEAR
101914,2018,1,RELIABLE CHURCHILL LLLP,1001,SAM SMITH ORGANIC PEAR CIDER - 18.7OZ,BEER,0.00,0.0,1.0,YEAR
...,...,...,...,...,...,...,...,...,...,...
128350,2018,2,ANHEUSER BUSCH INC,9997,HOEGAARDEN 4/6NR - 12OZ,BEER,66.46,59.0,212.0,WAREHOUSE SALES
128351,2018,2,COASTAL BREWING COMPANY LLC,99970,DOMINION OAK BARREL STOUT 4/6 NR - 12OZ,BEER,9.08,7.0,35.0,WAREHOUSE SALES
128352,2018,2,BOSTON BEER CORPORATION,99988,SAM ADAMS COLD SNAP 1/6 KG,KEGS,0.00,0.0,32.0,WAREHOUSE SALES
128353,2018,2,,BC,BEER CREDIT,REF,0.00,0.0,-35.0,WAREHOUSE SALES
