# StockX

* **Data:** `StockX-Data-Contest-2019-3.xlsx`
* **Description:** Adidas and Nike shoes are bought at a retail price and then sold on platforms like stock x for a higher resale price? This data is from [StockX](https://stockx.com/)
* **Source:** https://stockx.com/news/the-2019-data-contest/
* **Columns of interest:**
    * `Order Date` is the resale order was completed
    * `Brand` is the name of the company producing the shoe
    * `Sneaker Name` is the name of the shoe itself
    * `Sale Price` is how much the shoe went for at resale
    * `Retail Price` is how much the shoe originally cost
    * `Release Date` is when the shoe was originally released
    * `Shoe Size` is the size of the shoe being sold
    * `Buyer Region` is where the shoe's buyer is located

This dataset is topical due to the passing of [Virgil Abloh, founder of Off-White](https://www.newyorker.com/culture/postscript/the-remarkable-life-of-virgil-abloh).

In [5]:
import pandas as pd
pd.set_option("display.max_columns", None)
pd.set_option("display.float_format", '{:,}'.format)

df = pd.read_excel("StockX-Data-Contest-2019-3.xlsx",  sheet_name='Raw Data')
df.head()

Unnamed: 0,Order Date,Brand,Sneaker Name,Sale Price,Retail Price,Release Date,Shoe Size,Buyer Region
0,2017-09-01,Yeezy,Adidas-Yeezy-Boost-350-Low-V2-Beluga,1097.0,220,2016-09-24,11.0,California
1,2017-09-01,Yeezy,Adidas-Yeezy-Boost-350-V2-Core-Black-Copper,685.0,220,2016-11-23,11.0,California
2,2017-09-01,Yeezy,Adidas-Yeezy-Boost-350-V2-Core-Black-Green,690.0,220,2016-11-23,11.0,California
3,2017-09-01,Yeezy,Adidas-Yeezy-Boost-350-V2-Core-Black-Red,1075.0,220,2016-11-23,11.5,Kentucky
4,2017-09-01,Yeezy,Adidas-Yeezy-Boost-350-V2-Core-Black-Red-2017,828.0,220,2017-02-11,11.0,Rhode Island


## What brand had more sales?

 Yeezy beat Off- White in sales. Yeezy is partnered with Adidas Off White is partnered with Nike.

In [6]:
df.Brand.value_counts()

 Yeezy       72162
Off-White    27794
Name: Brand, dtype: int64

## What's the most common shoe size sold?

Size 10 is the most common shoe size sold, capturing 11% of shoe sales.


In [10]:

df['Shoe Size'].value_counts(normalize=True)* 100

10.0      11.09788306855016
9.0        9.71027251990876
11.0      9.255072231781984
10.5      8.787866661330986
9.5       8.688823082156148
12.0      7.300212093321061
8.0       5.365360758733843
8.5       5.302333026531674
13.0       4.60402577133939
11.5      4.503981751970867
6.0       4.014766497258794
7.0       3.868702228980751
5.0       3.578574572812037
7.5        2.66517267597743
5.5       2.622153747648966
4.0       2.241986474048581
6.5      2.2199767897875065
14.0     1.7917883868902318
4.5      1.3045740125655287
12.5     0.6282764416343192
13.5    0.14706470847172756
15.0     0.1300572251790788
14.5    0.08403697626955861
16.0     0.0790347753011325
17.0   0.004001760774740886
3.5    0.004001760774740886
Name: Shoe Size, dtype: float64

## What was the median difference between the resale price and the retail price?
The median difference was $154

In [11]:
df['profit'] =df['Sale Price'] - df['Retail Price']


Unnamed: 0,Order Date,Brand,Sneaker Name,Sale Price,Retail Price,Release Date,Shoe Size,Buyer Region,profit
0,2017-09-01,Yeezy,Adidas-Yeezy-Boost-350-Low-V2-Beluga,1097.0,220,2016-09-24,11.0,California,877.0
1,2017-09-01,Yeezy,Adidas-Yeezy-Boost-350-V2-Core-Black-Copper,685.0,220,2016-11-23,11.0,California,465.0
2,2017-09-01,Yeezy,Adidas-Yeezy-Boost-350-V2-Core-Black-Green,690.0,220,2016-11-23,11.0,California,470.0
3,2017-09-01,Yeezy,Adidas-Yeezy-Boost-350-V2-Core-Black-Red,1075.0,220,2016-11-23,11.5,Kentucky,855.0
4,2017-09-01,Yeezy,Adidas-Yeezy-Boost-350-V2-Core-Black-Red-2017,828.0,220,2017-02-11,11.0,Rhode Island,608.0


In [12]:
df.profit.median()

154.0

## What were the total sales (in dollars) to Ilinois, New York , and California?
These three state produced $ 18,194,048 in sales combined.

In [19]:
regions=['Illinois', 'New York', 'California']
df[df['Buyer Region'].isin(regions)]['Sale Price'].sum()

18194048.5132

## What were the total sales (in dollars) of shoes sized 10, 11 and 12?
$ 13,030,998 in sales for shoe sizes 10-12

In [24]:

df[df['Shoe Size'].isin([10, 11, 12])]['Sale Price'].sum()

13030998.7429

## What sneakers sold, on average, for the highest sale price?
The Jordan 1 Off White sold for $1,826 giving it the highest sale price

In [27]:
df.groupby('Sneaker Name')['Sale Price'].mean().sort_values(ascending=False).head(3)

Sneaker Name
Air-Jordan-1-Retro-High-Off-White-White     1,826.0688936102238
Air-Jordan-1-Retro-High-Off-White-Chicago               1,769.8
Adidas-Yeezy-Boost-350-Low-Turtledove       1,531.6617647058824
Name: Sale Price, dtype: float64

## How many shoes in the dataset were produced by Nike?

Nike produce 27,794 shoes that is including Jordans.

In [36]:
df[df['Sneaker Name'].str.contains("Nike") | df['Sneaker Name'].str.contains("Jordan")]

Unnamed: 0,Order Date,Brand,Sneaker Name,Sale Price,Retail Price,Release Date,Shoe Size,Buyer Region,profit
128,2017-09-07,Off-White,Nike-Air-Max-90-Off-White,1600.0,160,2017-09-09,8.0,California,1440.0
129,2017-09-07,Off-White,Nike-Air-Max-90-Off-White,1090.0,160,2017-09-09,11.5,New York,930.0
130,2017-09-07,Off-White,Nike-Air-Presto-Off-White,1344.0,160,2017-09-09,10.0,New York,1184.0
131,2017-09-07,Off-White,Nike-Air-Presto-Off-White,1325.0,160,2017-09-09,10.0,Massachusetts,1165.0
132,2017-09-07,Off-White,Nike-Air-VaporMax-Off-White,1800.0,250,2017-09-09,12.0,Kentucky,1550.0
...,...,...,...,...,...,...,...,...,...
99869,2019-02-13,Off-White,Nike-Zoom-Fly-Off-White-Pink,265.0,170,2018-11-28,11.0,New York,95.0
99870,2019-02-13,Off-White,Nike-Zoom-Fly-Off-White-Pink,331.0,170,2018-11-28,4.0,California,161.0
99871,2019-02-13,Off-White,Nike-Zoom-Fly-Off-White-Pink,405.0,170,2018-11-28,6.0,New York,235.0
99872,2019-02-13,Off-White,Nike-Zoom-Fly-Off-White-Pink,263.0,170,2018-11-28,10.0,Maryland,93.0


In [46]:
df[df['Sneaker Name'].str.contains("Nike") | df['Sneaker Name'].str.contains("Jordan")].shape
## 27,794 shoes were produced by Nike

(27794, 9)

## How many shoes in the data set were produced by Adidas?
52,572 shoes were produced by Adidas in this Data set

In [43]:
df[df['Sneaker Name'].str.contains("Adidas") | df['Sneaker Name'].str.contains("Adidas")]

Unnamed: 0,Order Date,Brand,Sneaker Name,Sale Price,Retail Price,Release Date,Shoe Size,Buyer Region,profit
0,2017-09-01,Yeezy,Adidas-Yeezy-Boost-350-Low-V2-Beluga,1097.0,220,2016-09-24,11.0,California,877.0
1,2017-09-01,Yeezy,Adidas-Yeezy-Boost-350-V2-Core-Black-Copper,685.0,220,2016-11-23,11.0,California,465.0
2,2017-09-01,Yeezy,Adidas-Yeezy-Boost-350-V2-Core-Black-Green,690.0,220,2016-11-23,11.0,California,470.0
3,2017-09-01,Yeezy,Adidas-Yeezy-Boost-350-V2-Core-Black-Red,1075.0,220,2016-11-23,11.5,Kentucky,855.0
4,2017-09-01,Yeezy,Adidas-Yeezy-Boost-350-V2-Core-Black-Red-2017,828.0,220,2017-02-11,11.0,Rhode Island,608.0
...,...,...,...,...,...,...,...,...,...
99732,2019-02-13,Yeezy,Adidas-Yeezy-Boost-350-V2-Zebra,344.0,220,2017-02-25,10.0,Delaware,124.0
99733,2019-02-13,Yeezy,Adidas-Yeezy-Boost-350-V2-Zebra,341.0,220,2017-02-25,12.0,Oregon,121.0
99734,2019-02-13,Yeezy,Adidas-Yeezy-Boost-350-V2-Zebra,345.0,220,2017-02-25,9.0,California,125.0
99735,2019-02-13,Yeezy,Adidas-Yeezy-Boost-350-V2-Zebra,321.0,220,2017-02-25,8.0,California,101.0


In [45]:
df[df['Sneaker Name'].str.contains("Adidas") | df['Sneaker Name'].str.contains("Adidas")].shape
## 52,572 shoes were produced by Adidas in this data set

(52572, 9)

## What are the top 3 months for buying shoes? (This is order date, not release date)

People like to buy shoes for Christmas, or with money they received during Christmas. 

In [50]:
df['Order Date'].dt.month_name().value_counts()
## The top 3 months for purchaing shoes are December, November, and January

December     22292
November     15489
January      14511
February      7774
July          7434
October       7307
August        6090
June          5431
September     4671
May           3456
April         2756
March         2745
Name: Order Date, dtype: int64

## What month had the most total money spent on the shoes in this dataset?

The most money was spent on shoes in December 2018.

In [None]:
#zooming out 
#focusing on data from just 2018

In [55]:
# The most money was spent on sneaker oin December 2018.
df.resample('M', on='Order Date')['Sale Price'].sum().sort_values(ascending=False).head(5)#.plot()

Order Date
2018-12-31   5,068,067.6894
2019-01-31   4,029,846.2624
2018-11-30   3,785,401.2927
2017-12-31      3,211,053.0
2018-08-31      3,162,458.0
Name: Sale Price, dtype: float64