# Challenge 3

In this challenge we will work on the `Orders.csv` data set in the previous [Subsetting and Descriptive Stats lab](../../lab-subsetting-and-descriptive-stats/your-code/main.ipynb). In your work you will apply the thinking process and workflow we showed you in Challenge 2.

You are serving as a Business Intelligence Analyst at the headquarter of an international fashion goods chain store. Your boss today asked you to do two things for her:

**First, identify two groups of customers from the data set.** The first group is **VIP Customers** whose **aggregated expenses** at your global chain stores are **above the 95th percentile** (aka. 0.95 quantile). The second group is **Preferred Customers** whose **aggregated expenses** are **between the 75th and 95th percentile**.

**Second, identify which country has the most of your VIP customers, and which country has the most of your VIP+Preferred Customers combined.**

# Q1: How to identify VIP & Preferred Customers?

We start by importing all the required libraries:

In [1]:
# import required libraries
import numpy as np
import pandas as pd

Next, import `Orders.csv` from the "subsetting" lab folder into a dataframe variable called `orders`. Print the head of `orders` to overview the data:

In [2]:
# enter your code here
orders = pd.read_csv("../../lab-subsetting-and-descriptive-stats/your-code/Orders.csv")
orders.head()

Unnamed: 0.1,Unnamed: 0,InvoiceNo,StockCode,year,month,day,hour,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,amount_spent
0,0,536365,85123A,2010,12,3,8,white hanging heart t-light holder,6,2010-12-01 08:26:00,2.55,17850,United Kingdom,15.3
1,1,536365,71053,2010,12,3,8,white metal lantern,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34
2,2,536365,84406B,2010,12,3,8,cream cupid hearts coat hanger,8,2010-12-01 08:26:00,2.75,17850,United Kingdom,22.0
3,3,536365,84029G,2010,12,3,8,knitted union flag hot water bottle,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34
4,4,536365,84029E,2010,12,3,8,red woolly hottie white heart.,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,20.34


---

"Identify VIP and Preferred Customers" is the non-technical goal of your boss. You need to translate that goal into technical languages that data analysts use:

## How to label customers whose aggregated `amount_spent` is in a given quantile range?


We break down the main problem into several sub problems:

#### Sub Problem 1: How to aggregate the  `amount_spent` for unique customers?

#### Sub Problem 2: How to select customers whose aggregated `amount_spent` is in a given quantile range?

#### Sub Problem 3: How to label selected customers as "VIP" or "Preferred"?

*Note: If you want to break down the main problem in a different way, please feel free to revise the sub problems above.*

Now in the workspace below, tackle each of the sub problems using the iterative problem solving workflow. Insert cells as necessary to write your codes and explain your steps.

In [3]:
# your code here
#1.-
customer_spent = orders.groupby('CustomerID').agg({'amount_spent':'sum'})

#2.-
customer_spent.amount_spent[customer_spent.amount_spent > customer_spent.amount_spent.quantile(.90)]

#3.-
customer_spent.amount_spent[customer_spent.amount_spent > customer_spent.amount_spent.quantile(.90)] #VIP

customer_spent.amount_spent[(customer_spent.amount_spent > customer_spent.amount_spent.quantile(.75)) &
                            (customer_spent.amount_spent < customer_spent.amount_spent.quantile(.90))
                           ] # Prefered

#Clasificas a los clientes segun en qué cuantil se encuentran: Si se encuentran después del cuantil .9, quiere decir que la
#suma de lo que han gastado es mucha.
#Para los clientes que estan entre los cuantiles .75 y 9, se les considera Prefered, ya que, no han gastado tanto como los VIP,
#pero aun asi estan en un rango alto.

CustomerID
12348    1797.24
12349    1757.55
12352    2506.04
12356    2811.43
12360    2662.06
12370    3545.69
12371    1887.96
12380    2724.81
12381    1845.31
12383    1850.56
12388    2780.66
12395    3018.63
12397    2409.90
12405    1710.39
12406    3415.30
12407    1708.12
12408    2888.55
12423    1859.31
12424    1760.96
12438    2906.85
12454    3528.34
12455    2466.86
12456    3181.04
12457    2363.23
12480    3281.63
12483    2484.98
12501    2169.39
12517    2502.84
12518    2056.89
12520    2634.26
          ...   
18061    2119.41
18065    2392.83
18069    2036.67
18075    2611.75
18077    2633.01
18093    2106.52
18094    3017.30
18097    2697.80
18122    1826.21
18144    2888.75
18145    2861.55
18173    2106.84
18179    1793.17
18180    1843.75
18188    2001.04
18204    1993.70
18210    2621.38
18219    2069.77
18230    2810.20
18231    2083.27
18235    1796.48
18241    2073.09
18242    2232.49
18245    2567.06
18257    2337.63
18259    2338.60
18260    2643.20
182

Now we'll leave it to you to solve Q2 & Q3, which you can leverage from your solution for Q1:

# Q2: How to identify which country has the most VIP Customers?

# Q3: How to identify which country has the most VIP+Preferred Customers combined?

Provide your solution for Q2 below:

In [17]:
country_customers = orders.groupby(['Country','CustomerID'],as_index=False).agg({'amount_spent':'sum'})
country_customers

vips = country_customers[['Country','CustomerID']][country_customers.amount_spent > country_customers.amount_spent.quantile(.90)] #VIPS
vips_prefered = country_customers[['Country','CustomerID']][country_customers.amount_spent > country_customers.amount_spent.quantile(.75)] #VIPS + Prefered

vips.Country.value_counts() #Q2
vips_prefered.Country.value_counts() #Q3

United Kingdom     934
Germany             39
France              29
Belgium             11
Switzerland          9
Norway               7
Portugal             7
Spain                7
Italy                5
Finland              5
Japan                4
Australia            4
Channel Islands      4
Cyprus               3
EIRE                 3
Denmark              3
Sweden               2
Israel               2
Canada               1
Lebanon              1
Poland               1
Greece               1
Netherlands          1
Austria              1
Iceland              1
Malta                1
Singapore            1
Name: Country, dtype: int64