# Challenge 3

In this challenge we will work on the `Orders` data set. In your work you will apply the thinking process and workflow we showed you in Challenge 2.

You are serving as a Business Intelligence Analyst at the headquarter of an international fashion goods chain store. Your boss today asked you to do two things for her:

**First, identify two groups of customers from the data set.** The first group is **VIP Customers** whose **aggregated expenses** at your global chain stores are **above the 95th percentile** (aka. 0.95 quantile). The second group is **Preferred Customers** whose **aggregated expenses** are **between the 75th and 95th percentile**.

**Second, identify which country has the most of your VIP customers, and which country has the most of your VIP+Preferred Customers combined.**

## Q1: How to identify VIP & Preferred Customers?

We start by importing all the required libraries:

In [2]:
# import required libraries
import numpy as np
import pandas as pd

Next, extract and import `Orders` dataset into a dataframe variable called `orders`. Print the head of `orders` to overview the data:

In [3]:
# your code here
orders = pd.read_csv('orders.zip')
orders.head()

Unnamed: 0.1,Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent
0,0,536365,85123A,2010,12,3,8,white hanging heart t-light holder,6,2010-12-01 08:26:00,2.55,17850,United Kingdom,15.3
1,1,536365,71053,2010,12,3,8,white metal lantern,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34
2,2,536365,84406B,2010,12,3,8,cream cupid hearts coat hanger,8,2010-12-01 08:26:00,2.75,17850,United Kingdom,22.0
3,3,536365,84029G,2010,12,3,8,knitted union flag hot water bottle,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34
4,4,536365,84029E,2010,12,3,8,red woolly hottie white heart.,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34


---

"Identify VIP and Preferred Customers" is the non-technical goal of your boss. You need to translate that goal into technical languages that data analysts use:

## How to label customers whose aggregated `amount_spent` is in a given quantile range?


We break down the main problem into several sub problems:

#### Sub Problem 1: How to aggregate the  `amount_spent` for unique customers?

#### Sub Problem 2: How to select customers whose aggregated `amount_spent` is in a given quantile range?

#### Sub Problem 3: How to label selected customers as "VIP" or "Preferred"?

*Note: If you want to break down the main problem in a different way, please feel free to revise the sub problems above.*

Now in the workspace below, tackle each of the sub problems using the iterative problem solving workflow. Insert cells as necessary to write your codes and explain your steps.

In [21]:
total_spent = orders.groupby(['CustomerID'])['amount_spent'].sum()
total_spent

CustomerID
12346    77183.60
12347     4310.00
12348     1797.24
12349     1757.55
12350      334.40
           ...   
18280      180.60
18281       80.82
18282      178.05
18283     2094.88
18287     1837.28
Name: amount_spent, Length: 4339, dtype: float64

In [22]:
orders['Total spent'] = orders.groupby('CustomerID')['amount_spent'].transform('sum')
orders

Unnamed: 0.1,Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent,Total spent
0,0,536365,85123A,2010,12,3,8,white hanging heart t-light holder,6,2010-12-01 08:26:00,2.55,17850,United Kingdom,15.30,5391.21
1,1,536365,71053,2010,12,3,8,white metal lantern,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34,5391.21
2,2,536365,84406B,2010,12,3,8,cream cupid hearts coat hanger,8,2010-12-01 08:26:00,2.75,17850,United Kingdom,22.00,5391.21
3,3,536365,84029G,2010,12,3,8,knitted union flag hot water bottle,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34,5391.21
4,4,536365,84029E,2010,12,3,8,red woolly hottie white heart.,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34,5391.21
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
397919,541904,581587,22613,2011,12,5,12,pack of 20 spaceboy napkins,12,2011-12-09 12:50:00,0.85,12680,France,10.20,862.81
397920,541905,581587,22899,2011,12,5,12,children's apron dolly girl,6,2011-12-09 12:50:00,2.10,12680,France,12.60,862.81
397921,541906,581587,23254,2011,12,5,12,childrens cutlery dolly girl,4,2011-12-09 12:50:00,4.15,12680,France,16.60,862.81
397922,541907,581587,23255,2011,12,5,12,childrens cutlery circus parade,4,2011-12-09 12:50:00,4.15,12680,France,16.60,862.81


In [31]:
# VIP Customers -> Percentile 95
vip_threshold = np.percentile(total_spent, 95)
vip_threshold

5840.181999999983

In [41]:
vip_costumers = orders.loc[(orders['Total spent'] >= vip_threshold)]
vip_costumers

Unnamed: 0.1,Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent,Total spent
26,26,536370,22728,2010,12,3,8,alarm clock bakelike pink,24,2010-12-01 08:45:00,3.75,12583,France,90.0,7281.38
27,27,536370,22727,2010,12,3,8,alarm clock bakelike red,24,2010-12-01 08:45:00,3.75,12583,France,90.0,7281.38
28,28,536370,22726,2010,12,3,8,alarm clock bakelike green,12,2010-12-01 08:45:00,3.75,12583,France,45.0,7281.38
29,29,536370,21724,2010,12,3,8,panda and bunnies sticker sheet,12,2010-12-01 08:45:00,0.85,12583,France,10.2,7281.38
30,30,536370,21883,2010,12,3,8,stars gift tape,24,2010-12-01 08:45:00,0.65,12583,France,15.6,7281.38
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
397883,541868,581584,85038,2011,12,5,12,6 chocolate love heart t-lights,48,2011-12-09 12:25:00,1.85,13777,United Kingdom,88.8,25977.16
397905,541890,581586,22061,2011,12,5,12,large cake stand hanging strawbery,8,2011-12-09 12:49:00,2.95,13113,United Kingdom,23.6,12245.96
397906,541891,581586,23275,2011,12,5,12,set of 3 hanging owls ollie beak,24,2011-12-09 12:49:00,1.25,13113,United Kingdom,30.0,12245.96
397907,541892,581586,21217,2011,12,5,12,red retrospot round cake tins,24,2011-12-09 12:49:00,8.95,13113,United Kingdom,214.8,12245.96


In [42]:
vip_costumers.CustomerID.unique()

array([12583, 15311, 16029, 12431, 17511, 13408, 13767, 15513, 13694,
       14849, 16210, 12748, 12433, 14911, 17841, 13093, 12921, 13777,
       18229, 14606, 13576, 13090, 15694, 17017, 15601, 13418, 14060,
       17381, 17581, 15061, 15640, 14031, 12971, 13798, 17396, 14156,
       14680, 12557, 16013, 17949, 12682, 15769, 13081, 17243, 15465,
       13089, 16033, 18055, 18109, 16839, 16814, 12567, 16353, 14527,
       15023, 12472, 16422, 15502, 17677, 17428, 15039, 15078, 14667,
       15194, 17450, 12681, 17735, 15838, 14733, 13488, 17675, 18102,
       13078, 12709, 16779, 14796, 13199, 17706, 16525, 16558, 15498,
       14051, 16713, 13113, 12766, 15005, 14866, 17340, 18092, 15358,
       13319, 12621, 12683, 13854, 17857, 15856, 13102, 13969, 12471,
       12731, 16656, 14952, 12989, 17865, 16873, 14062, 16923, 12753,
       13668, 15044, 14505, 12540, 13225, 13209, 17338, 12476, 15159,
       13324, 14961, 14057, 14298, 17404, 14415, 13097, 13458, 15290,
       15615, 15482,

In [35]:
# VIP Customers -> Percentile 80
preferred_threshold = np.percentile(total_spent, 80)
preferred_threshold

2057.9139999999998

In [39]:
preferred_costumers = orders.loc[(orders['Total spent'] > preferred_threshold) & (orders['Total spent'] < vip_threshold)]
preferred_costumers

Unnamed: 0.1,Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent,Total spent
0,0,536365,85123A,2010,12,3,8,white hanging heart t-light holder,6,2010-12-01 08:26:00,2.55,17850,United Kingdom,15.30,5391.21
1,1,536365,71053,2010,12,3,8,white metal lantern,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34,5391.21
2,2,536365,84406B,2010,12,3,8,cream cupid hearts coat hanger,8,2010-12-01 08:26:00,2.75,17850,United Kingdom,22.00,5391.21
3,3,536365,84029G,2010,12,3,8,knitted union flag hot water bottle,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34,5391.21
4,4,536365,84029E,2010,12,3,8,red woolly hottie white heart.,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34,5391.21
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
397900,541885,581585,21684,2011,12,5,12,small medina stamped metal bowl,12,2011-12-09 12:31:00,0.85,15804,United Kingdom,10.20,4206.39
397901,541886,581585,22398,2011,12,5,12,magnets pack of 4 swallows,12,2011-12-09 12:31:00,0.39,15804,United Kingdom,4.68,4206.39
397902,541887,581585,23328,2011,12,5,12,set 6 school milk bottles in crate,4,2011-12-09 12:31:00,3.75,15804,United Kingdom,15.00,4206.39
397903,541888,581585,23145,2011,12,5,12,zinc t-light holder star large,12,2011-12-09 12:31:00,0.95,15804,United Kingdom,11.40,4206.39


In [40]:
preferred_costumers.CustomerID.unique()

array([17850, 13047, 15291, 14688, 17809, 17924, 13448, 16218, 14307,
       17920, 13758, 17377, 12662, 15485, 18144, 16456, 17346, 13468,
       16928, 14696, 17690, 17069, 15235, 15752, 13941, 14135, 14388,
       18041, 15955, 14390, 15544, 15738, 14180, 14466, 16186, 17685,
       17567, 17838, 17659, 15299, 17757, 14395, 15093, 13520, 12841,
       16905, 13013, 16477, 12600, 12779, 17954, 17819, 12712, 15373,
       17238, 12395, 13069, 16241, 14800, 15708, 16168, 16931, 13269,
       14810, 18118, 13831, 17059, 16327, 17211, 15570, 15808, 17858,
       16393, 17863, 17402, 12647, 15867, 15555, 16143, 12720, 12747,
       17965, 13174, 16161, 18219, 12708, 14189, 15301, 14825, 17596,
       14085, 16919, 16722, 16710, 15984, 17682, 16550, 17068, 15356,
       17191, 14409, 12913, 17091, 14907, 13756, 17491, 14282, 13769,
       16904, 13880, 12347, 14739, 16293, 17419, 17591, 12839, 13267,
       13050, 15628, 13599, 18077, 17406, 15750, 13983, 13842, 14032,
       16081, 17975,

Now we'll leave it to you to solve Q2 & Q3, which you can leverage from your solution for Q1:

## Q2: How to identify which country has the most VIP Customers?

In [44]:
vip_country = vip_costumers.groupby(['Country'])['CustomerID'].nunique().sort_values(ascending = False)
vip_country.head(1)

Country
United Kingdom    177
Name: CustomerID, dtype: int64

## Q3: How to identify which country has the most VIP+Preferred Customers combined?

In [50]:
tables = [preferred_costumers, vip_costumers]
result = pd.concat(tables)
result

Unnamed: 0.1,Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent,Total spent
0,0,536365,85123A,2010,12,3,8,white hanging heart t-light holder,6,2010-12-01 08:26:00,2.55,17850,United Kingdom,15.30,5391.21
1,1,536365,71053,2010,12,3,8,white metal lantern,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34,5391.21
2,2,536365,84406B,2010,12,3,8,cream cupid hearts coat hanger,8,2010-12-01 08:26:00,2.75,17850,United Kingdom,22.00,5391.21
3,3,536365,84029G,2010,12,3,8,knitted union flag hot water bottle,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34,5391.21
4,4,536365,84029E,2010,12,3,8,red woolly hottie white heart.,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34,5391.21
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
397883,541868,581584,85038,2011,12,5,12,6 chocolate love heart t-lights,48,2011-12-09 12:25:00,1.85,13777,United Kingdom,88.80,25977.16
397905,541890,581586,22061,2011,12,5,12,large cake stand hanging strawbery,8,2011-12-09 12:49:00,2.95,13113,United Kingdom,23.60,12245.96
397906,541891,581586,23275,2011,12,5,12,set of 3 hanging owls ollie beak,24,2011-12-09 12:49:00,1.25,13113,United Kingdom,30.00,12245.96
397907,541892,581586,21217,2011,12,5,12,red retrospot round cake tins,24,2011-12-09 12:49:00,8.95,13113,United Kingdom,214.80,12245.96


In [51]:
super_country = result.groupby(['Country'])['CustomerID'].nunique().sort_values(ascending = False)
super_country.head(1)

Country
United Kingdom    737
Name: CustomerID, dtype: int64