### Importing required libraries ###

In [248]:
%matplotlib notebook

In [249]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [250]:
df = pd.read_csv ('data.csv')

In [251]:
df.head()

Unnamed: 0,order_id,shop_id,user_id,order_amount,total_items,payment_method,created_at
0,1,53,746,224,2,cash,3/13/2017 12:36
1,2,92,925,90,1,cash,3/3/2017 17:38
2,3,44,861,144,1,cash,3/14/2017 4:23
3,4,18,935,156,1,credit_card,3/26/2017 12:43
4,5,18,883,156,1,credit_card,3/1/2017 4:35


The right way to calculate the average price is the division of total order_amount by sum of total_items.

In [252]:
print ("The average price per sneaker is" ,"%.2f" % (df['order_amount'].sum()/df['total_items'].sum()),"$")

The average price per sneaker is 357.92 $


In order to get more information, we should take a look at each store individually.

In [253]:
df.groupby('shop_id', as_index=False).sum().sort_values('order_amount',ascending=False)

Unnamed: 0,shop_id,order_id,user_id,order_amount,total_items
41,42,124538,38688,11990176,34063
77,78,122499,39916,2263800,88
88,89,172859,50618,23128,118
80,81,158452,49317,22656,128
5,6,143483,49818,22627,121
...,...,...,...,...,...
1,2,126448,47370,9588,102
99,100,94261,34093,8547,77
55,56,91134,31211,8073,69
31,32,96561,35986,7979,79


In [254]:
df.groupby('shop_id', as_index=False).sum().sort_values('total_items',ascending=False)

Unnamed: 0,shop_id,order_id,user_id,order_amount,total_items
41,42,124538,38688,11990176,34063
12,13,151982,54549,21760,136
83,84,137231,50177,20196,132
70,71,172484,56442,21320,130
52,53,162752,58381,14560,130
...,...,...,...,...,...
37,38,92390,29323,13680,72
15,16,109591,34778,11076,71
43,44,116528,32603,10224,71
55,56,91134,31211,8073,69


Based on the total_itmes, it looks like store **# 42** is a wholesale store.

If we take a look at the average price per store:

In [255]:
df_per_stor = df.groupby('shop_id', as_index=False).sum()

In [256]:
df_per_stor['average_price'] = df_per_stor['order_amount']/df_per_stor['total_items']

In [257]:
df_per_stor.sort_values ('average_price', ascending=False)

Unnamed: 0,shop_id,order_id,user_id,order_amount,total_items,average_price
77,78,122499,39916,2263800,88,25725.0
41,42,124538,38688,11990176,34063,352.0
11,12,135437,44755,18693,93,201.0
88,89,172859,50618,23128,118,196.0
98,99,128844,45693,18330,94,195.0
...,...,...,...,...,...,...
52,53,162752,58381,14560,130,112.0
99,100,94261,34093,8547,77,111.0
31,32,96561,35986,7979,79,101.0
1,2,126448,47370,9588,102,94.0


Based on the average price per store, it seems sotre **# 78** with the average price of **25725 $** per item is a luxury store.

If we drop these two store (the luxury store and the wholesale store), then we can have a better clue about the average price at usual stores, which is:

In [258]:
round(df_per_stor[(df_per_stor['shop_id'] !=42) & (df_per_stor['shop_id'] !=78)]['average_price'].mean(),2)

150.22

## Exploratory data analysis

We can use "created_at" column to find out how many items were sold per day.

First, need to convert the column into datetime format. Then we can extract the day of the month and the day of the week from it.

In [259]:
df['created_at']=pd.to_datetime(df['created_at'])

In [260]:
df.head()

Unnamed: 0,order_id,shop_id,user_id,order_amount,total_items,payment_method,created_at
0,1,53,746,224,2,cash,2017-03-13 12:36:00
1,2,92,925,90,1,cash,2017-03-03 17:38:00
2,3,44,861,144,1,cash,2017-03-14 04:23:00
3,4,18,935,156,1,credit_card,2017-03-26 12:43:00
4,5,18,883,156,1,credit_card,2017-03-01 04:35:00


In [262]:
df['DayofMonth'] = df['created_at'].dt.day

In [263]:
df['DayofWeek'] = df['created_at'].dt.day_name()

In [265]:
df.head()

Unnamed: 0,order_id,shop_id,user_id,order_amount,total_items,payment_method,created_at,DayofMonth,DayofWeek
0,1,53,746,224,2,cash,2017-03-13 12:36:00,13,Monday
1,2,92,925,90,1,cash,2017-03-03 17:38:00,3,Friday
2,3,44,861,144,1,cash,2017-03-14 04:23:00,14,Tuesday
3,4,18,935,156,1,credit_card,2017-03-26 12:43:00,26,Sunday
4,5,18,883,156,1,credit_card,2017-03-01 04:35:00,1,Wednesday


By removing the wholesale store, the order_amount per day can be illustrated. 

In [298]:
sns.barplot (x='DayofMonth', y='total_items',
             data=df.drop(df[df['shop_id']==42].index).groupby('DayofMonth',as_index=False).sum())
plt.title("total items per day of the month")

<IPython.core.display.Javascript object>

Text(0.5, 1.0, 'total items per day of the month')

In [319]:
sns.barplot (x='DayofWeek', y='total_items',
             data=df.drop(df[df['shop_id']==42].index).groupby('DayofWeek',sort=False,as_index=False).sum())
plt.title("total items per day of the week")

<IPython.core.display.Javascript object>

Text(0.5, 1.0, 'total items per day of the week')