# Mandatory Challenge
## Context
You work in the data analysis team of a very important company. On Monday, the company shares some good news with you: you just got hired by a major retail company! So, let's get prepared for a huge amount of work!

Then you get to work with your team and define the following tasks to perform:   
1. You need to start your analysis using data from the past.  
2. You need to define a process that takes your daily data as an input and integrates it.  

You are in charge of the second part, so you are provided with a sample file that you will have to read daily. To complete you task, you need the following aggregates:
* One aggregate per store that adds up the rest of the values.
* One aggregate per item that adds up the rest of the values.

You can import the dataset `warehouse_and_retail_sales` from Ironhack's database. 

## Your task
Therefore, your process will consist of the following steps:
1. Read the sample file that a daily process will save in your folder. 
2. Clean up the data.
3. Create the aggregates.
4. Write three tables in your local database: 
    - A table for the cleaned data.
    - A table for the aggregate per supplier.
    - A table for the aggregate per item.

## Instructions
* Read the csv you can find in Ironhack's database.
* Clean the data and create the aggregates as you consider.
* Create the tables in your local database.
* Populate them with your process.

In [147]:
# your code here
#importing libraries and reading csv file
import pandas as pd
import numpy as np

store_df = pd.read_csv('/Users/erinberardi/Downloads/Warehouse_and_Retail_Sales.csv')
store_df



Unnamed: 0,YEAR,MONTH,SUPPLIER,ITEM CODE,ITEM DESCRIPTION,ITEM TYPE,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
0,2017,4,ROYAL WINE CORP,100200,GAMLA CAB - 750ML,WINE,0.00,1.0,0.0
1,2017,4,SANTA MARGHERITA USA INC,100749,SANTA MARGHERITA P/GRIG ALTO - 375ML,WINE,0.00,1.0,0.0
2,2017,4,JIM BEAM BRANDS CO,10103,KNOB CREEK BOURBON 9YR - 100P - 375ML,LIQUOR,0.00,8.0,0.0
3,2017,4,HEAVEN HILL DISTILLERIES INC,10120,J W DANT BOURBON 100P - 1.75L,LIQUOR,0.00,2.0,0.0
4,2017,4,ROYAL WINE CORP,101664,RAMON CORDOVA RIOJA - 750ML,WINE,0.00,4.0,0.0
...,...,...,...,...,...,...,...,...,...
128350,2018,2,ANHEUSER BUSCH INC,9997,HOEGAARDEN 4/6NR - 12OZ,BEER,66.46,59.0,212.0
128351,2018,2,COASTAL BREWING COMPANY LLC,99970,DOMINION OAK BARREL STOUT 4/6 NR - 12OZ,BEER,9.08,7.0,35.0
128352,2018,2,BOSTON BEER CORPORATION,99988,SAM ADAMS COLD SNAP 1/6 KG,KEGS,0.00,0.0,32.0
128353,2018,2,,BC,BEER CREDIT,REF,0.00,0.0,-35.0


In [148]:
#finding null values
store_df.isnull()
null_cols = store_df.isnull().sum()
null_cols[null_cols >0]

#null_cols

#saving returned items to separate DF 
returns = store_df[(store_df['SUPPLIER'].isnull()==True)].copy()

#changing null_SUPPLIER to Returned item as credit
returns['SUPPLIER']= returns['SUPPLIER'].fillna('Returned Item')

#returns

#deleting rows of null SUPPLIER.

store_df = store_df[store_df['SUPPLIER'].notna()]
store_df



Unnamed: 0,YEAR,MONTH,SUPPLIER,ITEM CODE,ITEM DESCRIPTION,ITEM TYPE,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
0,2017,4,ROYAL WINE CORP,100200,GAMLA CAB - 750ML,WINE,0.00,1.0,0.0
1,2017,4,SANTA MARGHERITA USA INC,100749,SANTA MARGHERITA P/GRIG ALTO - 375ML,WINE,0.00,1.0,0.0
2,2017,4,JIM BEAM BRANDS CO,10103,KNOB CREEK BOURBON 9YR - 100P - 375ML,LIQUOR,0.00,8.0,0.0
3,2017,4,HEAVEN HILL DISTILLERIES INC,10120,J W DANT BOURBON 100P - 1.75L,LIQUOR,0.00,2.0,0.0
4,2017,4,ROYAL WINE CORP,101664,RAMON CORDOVA RIOJA - 750ML,WINE,0.00,4.0,0.0
...,...,...,...,...,...,...,...,...,...
128348,2018,2,LEGENDS LTD,99753,DUTCHESS DE BOURGOGNE NR - 750ML,BEER,0.00,0.0,2.0
128349,2018,2,COASTAL BREWING COMPANY LLC,99813,DOMINION OAK BARREL STOUT 1/2K,KEGS,0.00,0.0,2.0
128350,2018,2,ANHEUSER BUSCH INC,9997,HOEGAARDEN 4/6NR - 12OZ,BEER,66.46,59.0,212.0
128351,2018,2,COASTAL BREWING COMPANY LLC,99970,DOMINION OAK BARREL STOUT 4/6 NR - 12OZ,BEER,9.08,7.0,35.0


In [196]:
#change data types for year and month so they are not used in stats

store_df['YEAR']=store_df['YEAR'].astype('object')
store_df['MONTH']=store_df['MONTH'].astype('object')

#store_df.describe()
stats=store_df.describe().transpose()
stats['IQR'] = stats['75%'] - stats['25%']
stats


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,count,mean,std,min,25%,50%,75%,max,IQR
RETAIL SALES,128331.0,6.564265,28.92751,-6.49,0.0,0.33,3.25,1616.6,3.25
RETAIL TRANSFERS,128331.0,7.189505,30.642863,-27.66,0.0,0.0,4.0,1587.99,4.0
WAREHOUSE SALES,128331.0,22.68151,239.573627,-4996.0,0.0,1.0,4.0,16271.75,4.0


In [189]:
outliers = pd.DataFrame(columns=store_df.columns)

for col in stats.index:
    iqr = stats.at[col,'IQR']
    cutoff = iqr * 2
    lower = stats.at[col,'25%'] - cutoff
    upper = stats.at[col,'75%'] + cutoff
    results = store_df[(store_df[col] < lower) | 
                   (store_df[col] > upper)].copy()
    results['Outlier'] = col
    outliers = outliers.append(results)
outliers

outliers.shape

(75480, 10)

In [203]:
#creating Supplier aggregate.

supplier_agg = store_df.groupby(['SUPPLIER'])['RETAIL SALES','RETAIL TRANSFERS','WAREHOUSE SALES'].agg('sum')
#supplier_agg

#creating Items aggregate
#if mean = 0 delete from aggregate.
item_retail_nonzero = store_df.loc[store_df['RETAIL SALES'] != 0].groupby('ITEM TYPE').mean()
item_warehouse_nonzero = store_df.loc[store_df['WAREHOUSE SALES'] != 0].groupby('ITEM TYPE').mean()
item_transfer_nonzero = store_df.loc[store_df['RETAIL TRANSFERS'] != 0].groupby('ITEM TYPE').mean()

items_agg = pd.concat([item_retail_nonzero,item_warehouse_nonzero,item_transfer_nonzero], axis=1)

#,'RETAIL TRANSFERS','WAREHOUSE SALES'
# want to delete if mean is 0 .
#items_agg.head

print('I was trying to get three df of RETAIL SALES, WAREHOUSE SALES, and RETAIL TRANSFERS without zeros')
print('but i am at a loss AGAIN.  I am also not sure how to best google what I want to do')

I was trying to get three df of RETAIL SALES, WAREHOUSE SALES, and RETAIL TRANSFERS without zeros
but i am at a loss AGAIN.  I am also not sure how to best google what I want to do


  This is separate from the ipykernel package so we can avoid doing imports until


In [200]:
#print (store_df)
print('This does not look like what I was aiming at. I am at a loss.')
print ('\n')
print (supplier_agg)
print (items_agg)



This does not look like what I was aiming at. I am at a loss.


                            RETAIL SALES  RETAIL TRANSFERS  WAREHOUSE SALES
SUPPLIER                                                                   
8 VINI INC                          2.78              2.00             1.00
A HARDY USA LTD                     0.40              0.00             0.00
A I G WINE & SPIRITS               12.52              5.92           134.00
A VINTNERS SELECTIONS            8640.57           8361.10         29776.67
A&E INC                            11.52              2.00             0.00
...                                  ...               ...              ...
WINEBOW INC                         1.24             -1.58             0.00
YOUNG WON TRADING INC            1058.65           1047.40          2528.90
YUENGLING BREWERY                9628.35          10851.17         53805.32
Z WINE GALLERY IMPORTS LLC          8.83             11.25            16.00
ZURENA LLC              