## Import Data

#### Data Documentation:
<br>**Description**: Synthetic dataset from Gap Inc., representing a random sample of individual purchases from Q1 FY2020. Each row is a unique item purchased in an order
<br><br>

| **Feature** | **Description**    | **Sample Value(s)**  |
| ------- | -----------    | ------------- |
| OrderID | Unique identifier per transaction (7-digit) | DRW7C20   |
| CustomerID | Unique identifier per customer (5-digit) | KP441   |
| ProductID  | Unique identifier per item (8-digit) | 13-817-239 |
| StoreID | Unique identifier per store (4-digit) | #4176 |
| | | | 
| OrderType | How purchase was completed  | InStore, HomeDelivery, Online |
| Timestamp | Timestamp of transaction (YYYY-MM-DD) | 2020-01-18 10:13:56	 |
| | | | 
| Brand | Which reporting segment of Gap Inc. bought from | Banana Republic |
| ItemSize | Size of item | XS, S, M, L, X, XL |
| ProductName | Name of item associated with item identifier | Pink Polo by Kanye |
| Collection | Which part of store | Denim Shop |
| Price | Listed price of item | $29.95 |
| ClearanceType | Type of clearance | Retail, Clearance, Final Sale |
| DiscountType | If Gap Card rewards was used | Reward points, Promotion, GapCash, Other |
| | | | 
| StoreName | Store name (i.e. Mall), or facility where online order was shipped from | Fair Oaks Mall |
| Location | State of store location | VA |

<br>

**Quick note on IDs**: 

<br>IDs are a really important part of many, if not most datasets. Each unique *thing*, whether that's a product, or store, or customer, gets assigned **it's own unique identifier**. 

This is important in case two stores have the same name (i.e. Gap and Banana Republic at the Fair Oaks Mall). When we group by Store Name, for example, we want to make sure we're not accidentally clumping up both both stores, and instead keep the two seperate.

In [17]:
import pandas as pd
import numpy as np

In [18]:
df = pd.read_csv('gap.csv', sep='|')

In [19]:
print(df.shape)
df.sample(2)

(4031, 14)


Unnamed: 0,OrderID,CustomerID,ProductID,StoreID,OrderType,Timestamp,Brand,ItemSize,ProductName,Collection,Price,ClearanceType,StoreName,Location
3142,STZ2QD8,QD165,10-966-490,#1047,InStore,2020-03-11 13:18:31,Banana Republic,S,Tan Slacks for Serious Press Conference,Men's Bottoms,138.99,Clearance,Fashion Centre at Pentagon City,VA
2858,0ZT3Z0T,EV890,93-552-710,#4291,HomeDelivery,2020-01-11 16:59:41,Banana Republic,L,Acid-Washed Low-Rise Jeans with LSD-tab-sized ...,Women's Bottoms,109.99,Clearance,Potomac Mills,VA


## Product Trends 

Let's take a look at some product trends. A couple of questions that management wants us to answer:

1. Which products are selling well? Clothing Sizes? Collections? 
2. Take a look at the top 10 best selling items, or keep a list of them. Are these more likely to be on clearance?
3. Does one brand offer more products than the other?

<br>Functions / tools you'll probably wanna Google or refer to the week-2 notebook for:
- `value_counts()`, `unique()`, `nunique()`
- `groupby(by=)`, `agg(func=)`, 
- Subsetting dataframes by conditions
- Selecting columns as series

## Segments Trends

Take a look at some of the segment trends. A couple of questions that management wants us to answer:

1. Which business segments (aka brands) saw the most sales (in $)? What about by number of orders?
2. Are there any stores competing in the same mall? Which stores are they?
3. Are products from one brand usually more expensive than the other?

<br>Functions / tools you'll probably wanna Google or refer to the week-2 notebook for:
- `unique()`
- `groupby(by=)`, `agg(func=)`
- Subsetting dataframes by conditions
- Selecting columns as series

## Customer Trends

Let's look at the customers. A couple of questions that management wants us to answer:

1. What sizes do customers usually buy? Make sure to not double count answers.
2. On average, how much does each customer spend per transaction? Does this differ when broken down by brand?

<br>Functions / tools you'll probably wanna Google or refer to the week-2 notebook for:
- `value_counts()`, `unique()`
- `groupby(by=)`, `agg(func=)`
- Subsetting dataframes by conditions
- Selecting columns as series

## Store Trends

Almost done! Think at the store level:

1. Which stores have the most sales (in $)? By number of orders?
2. Are there some stores that see more online orders? 

<br>Functions / tools you'll probably wanna Google or refer to the week-2 notebook for:
- `value_counts()`, `unique()`
- `groupby(by=)`, `agg(func=)`
- Subsetting dataframes by conditions
- Selecting columns as series