# Business Problem with Customer Segmentation


An e-commerce company wants to segment its customers and determine marketing strategies according to these segments.

For this purpose, we will define the behavior of customers and we will form groups according to clustering.

In other words, we will take those who exhibit common behaviors into the same groups and we will try to develop sales and marketing techniques specific to these groups.



### Data Set Story:

https://archive.ics.uci.edu/ml/datasets/Online+Retail+II

This Online Retail II data set contains all the transactions occurring for a UK-based and registered, non-store online retail between 01/12/2009 and 09/12/2011.

The company mainly sells unique all-occasion gift-ware. 

Many customers of the company are wholesalers.




### Attribute Information:

- InvoiceNo: Invoice number. Nominal. A 6-digit integral number uniquely assigned to each transaction. If this code starts with the letter 'c', it indicates a cancellation.
- StockCode: Product (item) code. Nominal. A 5-digit integral number uniquely assigned to each distinct product.
- Description: Product (item) name. Nominal.
- Quantity: The quantities of each product (item) per transaction. Numeric.
- InvoiceDate: Invice date and time. Numeric. The day and time when a transaction was generated.
- UnitPrice: Unit price. Numeric. Product price per unit in sterling (Â£).
- CustomerID: Customer number. Nominal. A 5-digit integral number uniquely assigned to each customer.
- Country: Country name. Nominal. The name of the country where a customer resides.



# Questions from data set


All questions about 2009-2010 years

1. What is the number of unique products?
2. Which product do you have?
3. Which product is the most ordered?
4. How do we rank this output?
5. How many invoices have been issued?
6. How much money has been earned per invoice?
7. Which are the most expensive products?
8. How many orders came from which country?
9. which country gained how much?
10. which product is the most returned?
11. What should we do for customer segmentation with RFM?
12. Scoring for RFM.
13. Finally, create an excel file named New Customer.

# Data Understanding 

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns

# to display all columns and rows:
pd.set_option('display.max_columns', None); pd.set_option('display.max_rows', None);


The number of numbers that will be shown after the comma. For variables such as 'price', the option below is replaced with 0 instead.

In [2]:
pd.set_option('display.float_format', lambda x: '%.0f' % x)
import matplotlib.pyplot as plt

In [3]:
df_2009_2010 = pd.read_excel("../input/online-retail-ii-data-set-from-ml-repository/online_retail_II.xlsx", sheet_name = "Year 2009-2010")

In [4]:
df = df_2009_2010.copy()

Try to understand the data by using the functions that can be used as a first look at the data in the pandas section.

## 1. What is the number of unique products?

In [5]:
df["Description"].nunique()

4681

## 2. Which product do you have?

In [6]:
df["Description"].value_counts().head()

WHITE HANGING HEART T-LIGHT HOLDER    3549
REGENCY CAKESTAND 3 TIER              2212
STRAWBERRY CERAMIC TRINKET BOX        1843
PACK OF 72 RETRO SPOT CAKE CASES      1466
ASSORTED COLOUR BIRD ORNAMENT         1457
Name: Description, dtype: int64

## 3. Which product is the most ordered?

In [7]:
df.groupby("Description").agg({"Quantity":"sum"}).head()

Unnamed: 0_level_0,Quantity
Description,Unnamed: 1_level_1
21494,-720
22467,-2
22719,2
DOORMAT UNION JACK GUNS AND ROSES,179
3 STRIPEY MICE FELTCRAFT,690


## 4. How do we rank this output?

In [8]:
df.groupby("Description").agg({"Quantity":"sum"}).sort_values("Quantity", ascending = False).head()

Unnamed: 0_level_0,Quantity
Description,Unnamed: 1_level_1
WHITE HANGING HEART T-LIGHT HOLDER,57733
WORLD WAR 2 GLIDERS ASSTD DESIGNS,54698
BROCADE RING PURSE,47647
PACK OF 72 RETRO SPOT CAKE CASES,46106
ASSORTED COLOUR BIRD ORNAMENT,44925


## 5. How many invoices have been issued?

In [9]:
df["Invoice"].nunique()

28816

## 6. How much money has been earned per invoice?

In [10]:
# it is necessary to create a new variable by multiplying two variables

df["TotalPrice"] = df["Quantity"]*df["Price"]

In [11]:
df.head()

Unnamed: 0,Invoice,StockCode,Description,Quantity,InvoiceDate,Price,Customer ID,Country,TotalPrice
0,489434,85048,15CM CHRISTMAS GLASS BALL 20 LIGHTS,12,2009-12-01 07:45:00,7,13085,United Kingdom,83
1,489434,79323P,PINK CHERRY LIGHTS,12,2009-12-01 07:45:00,7,13085,United Kingdom,81
2,489434,79323W,WHITE CHERRY LIGHTS,12,2009-12-01 07:45:00,7,13085,United Kingdom,81
3,489434,22041,"RECORD FRAME 7"" SINGLE SIZE",48,2009-12-01 07:45:00,2,13085,United Kingdom,101
4,489434,21232,STRAWBERRY CERAMIC TRINKET BOX,24,2009-12-01 07:45:00,1,13085,United Kingdom,30


In [12]:
df.groupby("Invoice").agg({"TotalPrice":"sum"}).head()

Unnamed: 0_level_0,TotalPrice
Invoice,Unnamed: 1_level_1
489434,505
489435,146
489436,630
489437,311
489438,2286


## 7. Which are the most expensive products?

In [13]:
df.sort_values("Price", ascending = False).head()

Unnamed: 0,Invoice,StockCode,Description,Quantity,InvoiceDate,Price,Customer ID,Country,TotalPrice
241824,C512770,M,Manual,-1,2010-06-17 16:52:00,25111,17399.0,United Kingdom,-25111
241827,512771,M,Manual,1,2010-06-17 16:53:00,25111,,United Kingdom,25111
320581,C520667,BANK CHARGES,Bank Charges,-1,2010-08-27 13:42:00,18911,,United Kingdom,-18911
517953,C537630,AMAZONFEE,AMAZON FEE,-1,2010-12-07 15:04:00,13541,,United Kingdom,-13541
519294,C537651,AMAZONFEE,AMAZON FEE,-1,2010-12-07 15:49:00,13541,,United Kingdom,-13541


## 8. How many orders came from which country?

In [14]:
df["Country"].value_counts()

United Kingdom          485852
EIRE                      9670
Germany                   8129
France                    5772
Netherlands               2769
Spain                     1278
Switzerland               1187
Portugal                  1101
Belgium                   1054
Channel Islands            906
Sweden                     902
Italy                      731
Australia                  654
Cyprus                     554
Austria                    537
Greece                     517
United Arab Emirates       432
Denmark                    428
Norway                     369
Finland                    354
Unspecified                310
USA                        244
Japan                      224
Poland                     194
Malta                      172
Lithuania                  154
Singapore                  117
RSA                        111
Bahrain                    107
Canada                      77
Hong Kong                   76
Thailand                    76
Israel  

## 9. Which country gained how much?

In [15]:
df.groupby("Country").agg({"TotalPrice":"sum"}).sort_values("TotalPrice", ascending = False).head()

Unnamed: 0_level_0,TotalPrice
Country,Unnamed: 1_level_1
United Kingdom,8194778
EIRE,352243
Netherlands,263863
Germany,196290
France,130770


## 10. Which product is the most returned?

In [16]:
df[df['Invoice'].str.startswith("C", na=False)].sort_values("Quantity", ascending = True).head()

Unnamed: 0,Invoice,StockCode,Description,Quantity,InvoiceDate,Price,Customer ID,Country,TotalPrice
507225,C536757,84347,ROTATING SILVER ANGELS T-LIGHT HLDR,-9360,2010-12-02 14:23:00,0,15838,United Kingdom,-281
359669,C524235,21088,SET/6 FRUIT SALAD PAPER CUPS,-7128,2010-09-28 11:02:00,0,14277,France,-570
359670,C524235,21096,SET/6 FRUIT SALAD PAPER PLATES,-7008,2010-09-28 11:02:00,0,14277,France,-911
359630,C524235,16047,POP ART PEN CASE & PENS,-5184,2010-09-28 11:02:00,0,14277,France,-415
359636,C524235,37340,MULTICOLOUR SPRING FLOWER MUG,-4992,2010-09-28 11:02:00,0,14277,France,-499


# Data Preparation

In [17]:
df.isnull().sum()

Invoice             0
StockCode           0
Description      2928
Quantity            0
InvoiceDate         0
Price               0
Customer ID    107927
Country             0
TotalPrice          0
dtype: int64

In [18]:
df.dropna(inplace = True)

In [19]:
df.shape

(417534, 9)

In [20]:
df.describe([0.01,0.05,0.10,0.25,0.50,0.75,0.90,0.95, 0.99]).T

Unnamed: 0,count,mean,std,min,1%,5%,10%,25%,50%,75%,90%,95%,99%,max
Quantity,417534,13,101,-9360,-2,1,1,2,4,12,24,36,144,19152
Price,417534,4,71,0,0,0,1,1,2,4,7,8,15,25111
Customer ID,417534,15361,1681,12346,12435,12725,13042,13983,15311,16799,17706,17913,18196,18287
TotalPrice,417534,20,100,-25111,-11,1,2,4,11,19,35,65,196,15818


In [21]:
for feature in ["Quantity","Price","TotalPrice"]:

    Q1 = df[feature].quantile(0.01)
    Q3 = df[feature].quantile(0.99)
    IQR = Q3-Q1
    upper = Q3 + 1.5*IQR
    lower = Q1 - 1.5*IQR

    if df[(df[feature] > upper) | (df[feature] < lower)].any(axis=None):
        print(feature,"yes")
        print(df[(df[feature] > upper) | (df[feature] < lower)].shape[0])
    else:
        print(feature, "no")

Quantity yes
1063
Price yes
953
TotalPrice yes
1150


# Customer Segmentation with RFM Scores

Consists of initials of Recency, Frequency, Monetary expressions.

It is a technique that helps determine marketing and sales strategies based on customers' buying habits.

- Recency (innovation): Time since customer last purchased

     -- In other words, it is the “time since the last contact of the customer”.

     -- Today's date - Last purchase

     -- To give an example, if we are doing this analysis today, today's date is the last product purchase date.

     -- This can be for example 20 or 100. We know that 20 customers are hotter. He has been in contact with us recently.

- Frequency: Total number of purchases.

- Monetary (Monetary Value): Total spending by the customer.


In [22]:
df.head()

Unnamed: 0,Invoice,StockCode,Description,Quantity,InvoiceDate,Price,Customer ID,Country,TotalPrice
0,489434,85048,15CM CHRISTMAS GLASS BALL 20 LIGHTS,12,2009-12-01 07:45:00,7,13085,United Kingdom,83
1,489434,79323P,PINK CHERRY LIGHTS,12,2009-12-01 07:45:00,7,13085,United Kingdom,81
2,489434,79323W,WHITE CHERRY LIGHTS,12,2009-12-01 07:45:00,7,13085,United Kingdom,81
3,489434,22041,"RECORD FRAME 7"" SINGLE SIZE",48,2009-12-01 07:45:00,2,13085,United Kingdom,101
4,489434,21232,STRAWBERRY CERAMIC TRINKET BOX,24,2009-12-01 07:45:00,1,13085,United Kingdom,30


In [23]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 417534 entries, 0 to 525460
Data columns (total 9 columns):
 #   Column       Non-Null Count   Dtype         
---  ------       --------------   -----         
 0   Invoice      417534 non-null  object        
 1   StockCode    417534 non-null  object        
 2   Description  417534 non-null  object        
 3   Quantity     417534 non-null  int64         
 4   InvoiceDate  417534 non-null  datetime64[ns]
 5   Price        417534 non-null  float64       
 6   Customer ID  417534 non-null  float64       
 7   Country      417534 non-null  object        
 8   TotalPrice   417534 non-null  float64       
dtypes: datetime64[ns](1), float64(3), int64(1), object(4)
memory usage: 31.9+ MB


In [24]:
df["InvoiceDate"].min()

Timestamp('2009-12-01 07:45:00')

In [25]:
df["InvoiceDate"].max()

Timestamp('2010-12-09 20:01:00')

What is today? Now if we take today's date, then there will be a very serious difference.

For this reason, let us determine ourselves a "today" according to the structure of this data set.

We can set this day as the maximum day of the data set.

We can segmentation according to the day of the last recording.

In [26]:
import datetime as dt

today_date = dt.datetime(2010,12,9)

In [27]:
today_date

datetime.datetime(2010, 12, 9, 0, 0)

## 11. Show the last shopping dates of each customer.

In [28]:
df.groupby("Customer ID").agg({"InvoiceDate":"max"}).head()

Unnamed: 0_level_0,InvoiceDate
Customer ID,Unnamed: 1_level_1
12346,2010-10-04 16:33:00
12347,2010-12-07 14:57:00
12348,2010-09-27 14:59:00
12349,2010-10-28 08:23:00
12351,2010-11-29 15:23:00


Now we have the last shopping dates of each customer. Let's fix "Customer ID"s.

In [29]:
df["Customer ID"] = df["Customer ID"].astype(int)

## 12. What should we do for customer segmentation with RFM?

For each customer, we need to deduce the customers' last purchase date from today's date.

Then we have singularized customer deadlines.

In [30]:
(today_date - df.groupby("Customer ID").agg({"InvoiceDate":"max"})).head()

Unnamed: 0_level_0,InvoiceDate
Customer ID,Unnamed: 1_level_1
12346,65 days 07:27:00
12347,1 days 09:03:00
12348,72 days 09:01:00
12349,41 days 15:37:00
12351,9 days 08:37:00


In [31]:
temp_df = (today_date - df.groupby("Customer ID").agg({"InvoiceDate":"max"}))

In [32]:
temp_df.rename(columns={"InvoiceDate": "Recency"}, inplace = True)

In [33]:
temp_df.head()

Unnamed: 0_level_0,Recency
Customer ID,Unnamed: 1_level_1
12346,65 days 07:27:00
12347,1 days 09:03:00
12348,72 days 09:01:00
12349,41 days 15:37:00
12351,9 days 08:37:00


In [34]:
recency_df = temp_df["Recency"].apply(lambda x: x.days)

In [35]:
recency_df.head()

Customer ID
12346    65
12347     1
12348    72
12349    41
12351     9
Name: Recency, dtype: int64

In [36]:
#df.groupby("Customer ID").agg({"InvoiceDate": lambda x: (today_date - x.max()).days}).head()

# Frequency

In [37]:
temp_df = df.groupby(["Customer ID","Invoice"]).agg({"Invoice":"count"})

In [38]:
temp_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Invoice
Customer ID,Invoice,Unnamed: 2_level_1
12346,491725,1
12346,491742,1
12346,491744,1
12346,492718,1
12346,492722,1


In [39]:
temp_df.groupby("Customer ID").agg({"Invoice":"count"}).head()

Unnamed: 0_level_0,Invoice
Customer ID,Unnamed: 1_level_1
12346,15
12347,2
12348,1
12349,4
12351,1


In [40]:
freq_df = temp_df.groupby("Customer ID").agg({"Invoice":"sum"})
freq_df.rename(columns={"Invoice": "Frequency"}, inplace = True)
freq_df.head()

Unnamed: 0_level_0,Frequency
Customer ID,Unnamed: 1_level_1
12346,46
12347,71
12348,20
12349,107
12351,21


# Monetary

In [41]:
monetary_df = df.groupby("Customer ID").agg({"TotalPrice":"sum"})

In [42]:
monetary_df.head()

Unnamed: 0_level_0,TotalPrice
Customer ID,Unnamed: 1_level_1
12346,-65
12347,1323
12348,222
12349,2647
12351,301


In [43]:
# lets change names

monetary_df.rename(columns={"TotalPrice": "Monetary"}, inplace = True)

In [44]:
print(recency_df.shape,freq_df.shape,monetary_df.shape)

(4383,) (4383, 1) (4383, 1)


In [45]:
rfm = pd.concat([recency_df, freq_df, monetary_df],  axis=1)

In [46]:
rfm.head()

Unnamed: 0_level_0,Recency,Frequency,Monetary
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
12346,65,46,-65
12347,1,71,1323
12348,72,20,222
12349,41,107,2647
12351,9,21,301


## Now, we need to score according to the most recent (Recency), the cyclic (Frequency) and the monetary expenditure (Monetary).

## 13. Scoring for RFM

- Let's start with the last 5 here. Let's use the 'qcut' method to score.

In [47]:
rfm["RecencyScore"] = pd.qcut(rfm['Recency'], 5, labels = [5, 4, 3, 2, 1])   

In [48]:
rfm["FrequencyScore"] = pd.qcut(rfm['Frequency'].rank(method = "first"), 5, labels = [1, 2, 3, 4, 5])

In [49]:
rfm["MonetaryScore"] = pd.qcut(rfm['Monetary'], 5, labels = [1, 2, 3, 4, 5])

In [50]:
rfm.head()

Unnamed: 0_level_0,Recency,Frequency,Monetary,RecencyScore,FrequencyScore,MonetaryScore
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
12346,65,46,-65,3,3,1
12347,1,71,1323,5,4,4
12348,72,20,222,2,2,1
12349,41,107,2647,3,4,5
12351,9,21,301,5,2,2


Let's write code with RFM values side by side

In [51]:
(rfm['RecencyScore'].astype(str) + 
 rfm['FrequencyScore'].astype(str) + 
 rfm['MonetaryScore'].astype(str)).head()

Customer ID
12346    331
12347    544
12348    221
12349    345
12351    522
dtype: object

In [52]:
rfm["RFM_SCORE"] = rfm['RecencyScore'].astype(str) + rfm['FrequencyScore'].astype(str) + rfm['MonetaryScore'].astype(str)

In [53]:
rfm.head()

Unnamed: 0_level_0,Recency,Frequency,Monetary,RecencyScore,FrequencyScore,MonetaryScore,RFM_SCORE
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
12346,65,46,-65,3,3,1,331
12347,1,71,1323,5,4,4,544
12348,72,20,222,2,2,1,221
12349,41,107,2647,3,4,5,345
12351,9,21,301,5,2,2,522


In [54]:
rfm.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Recency,4383,89,98,-1,15,50,136,372
Frequency,4383,95,205,1,18,44,103,5710
Monetary,4383,1905,8519,-25111,285,656,1646,341777


If we rank 5 points out of 3, 555 are champions.

In [55]:
rfm[rfm["RFM_SCORE"] == "555"].head()

Unnamed: 0_level_0,Recency,Frequency,Monetary,RecencyScore,FrequencyScore,MonetaryScore,RFM_SCORE
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
12415,9,212,19544,5,5,5,555
12431,7,173,4303,5,5,5,555
12433,0,287,7053,5,5,5,555
12471,6,767,19208,5,5,5,555
12472,3,658,10727,5,5,5,555


If we rank 1 point out of 3, that is, 111 ones are the lowest.

In [56]:
rfm[rfm["RFM_SCORE"] == "111"].head()

Unnamed: 0_level_0,Recency,Frequency,Monetary,RecencyScore,FrequencyScore,MonetaryScore,RFM_SCORE
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
12362,372,1,130,1,1,1,111
12382,316,1,-18,1,1,1,111
12404,316,1,63,1,1,1,111
12416,290,11,203,1,1,1,111
12466,316,1,57,1,1,1,111


Let's do regex segmentation. With the help of regex, we will set rfm aside and consider r and f.

Example: If you see 1-2 in r and 1-2 in f, write 'Hibernating'

In [57]:
seg_map = {
    r'[1-2][1-2]': 'Hibernating',
    r'[1-2][3-4]': 'At Risk',
    r'[1-2]5': 'Can\'t Loose',
    r'3[1-2]': 'About to Sleep',
    r'33': 'Need Attention',
    r'[3-4][4-5]': 'Loyal Customers',
    r'41': 'Promising',
    r'51': 'New Customers',
    r'[4-5][2-3]': 'Potential Loyalists',
    r'5[4-5]': 'Champions'
}

In [58]:
rfm['Segment'] = rfm['RecencyScore'].astype(str) + rfm['FrequencyScore'].astype(str)
rfm['Segment'] = rfm['Segment'].replace(seg_map, regex=True)
rfm.head()

Unnamed: 0_level_0,Recency,Frequency,Monetary,RecencyScore,FrequencyScore,MonetaryScore,RFM_SCORE,Segment
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
12346,65,46,-65,3,3,1,331,Need Attention
12347,1,71,1323,5,4,4,544,Champions
12348,72,20,222,2,2,1,221,Hibernating
12349,41,107,2647,3,4,5,345,Loyal Customers
12351,9,21,301,5,2,2,522,Potential Loyalists


In [59]:
rfm[["Segment", "Recency","Frequency","Monetary"]].groupby("Segment").agg(["mean","count"])

Unnamed: 0_level_0,Recency,Recency,Frequency,Frequency,Monetary,Monetary
Unnamed: 0_level_1,mean,count,mean,count,mean,count
Segment,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
About to Sleep,51,346,15,346,383,346
At Risk,161,620,59,620,1062,620
Can't Loose,121,94,228,94,2876,94
Champions,5,667,272,667,6534,667
Hibernating,209,1024,13,1024,276,1024
Loyal Customers,35,768,170,768,2533,768
Need Attention,50,167,46,167,857,167
New Customers,6,65,7,65,441,65
Potential Loyalists,16,534,37,534,910,534
Promising,23,98,8,98,436,98


## If we need to comment, let's make an example of champions.

- Recency is the last 666 number of shopping last 5,
- Frequency average of 272 out of 666 how much shopping it makes,
- Monetary has spent an average of 6533 currencies over 666 shoppers.

Now, let's choose the class (Need Attention) that needs attention.
If we make strategy evaluations: you can take their "Customer ID" and keep it in excel, send sales department and prepare a campaign for them and make it more efficient.

In [60]:
rfm[rfm["Segment"] == "Need Attention"].head()

Unnamed: 0_level_0,Recency,Frequency,Monetary,RecencyScore,FrequencyScore,MonetaryScore,RFM_SCORE,Segment
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
12346,65,46,-65,3,3,1,331,Need Attention
12374,55,50,2246,3,3,5,335,Need Attention
12379,56,41,768,3,3,3,333,Need Attention
12389,36,49,1433,3,3,4,334,Need Attention
12425,64,59,904,3,3,3,333,Need Attention


## 14. Finally, create an excel file named New Customer.

In [61]:
rfm[rfm["Segment"] == "New Customers"].index

Int64Index([12386, 12427, 12441, 12538, 12686, 12738, 13010, 13011, 13029,
            13094, 13145, 13254, 13258, 13270, 13369, 13747, 13848, 14119,
            14213, 14306, 14491, 14576, 14589, 14865, 14987, 15018, 15181,
            15212, 15299, 15304, 15649, 15728, 15899, 15914, 15922, 15973,
            16194, 16473, 16545, 16552, 16711, 16752, 16988, 16995, 17026,
            17170, 17181, 17262, 17281, 17339, 17378, 17468, 17556, 17616,
            17674, 17723, 17857, 17870, 17924, 17925, 17951, 18084, 18113,
            18161, 18269],
           dtype='int64', name='Customer ID')

In [62]:
new_df = pd.DataFrame()
new_df["NewCustomerID"] = rfm[rfm["Segment"] == "New Customers"].index

In [63]:
new_df.head()

Unnamed: 0,NewCustomerID
0,12386
1,12427
2,12441
3,12538
4,12686


In [64]:
new_df.to_csv("new_customers.csv")



# Conclusion

    After this notebook, my aim is to prepare 'kernel' which is 'not clear' data set.

    If you have any suggestions, please could you write for me? I wil be happy for comment and critics!

    Thank you for your suggestion and votes ;)

