# **SOURCE**
https://www.kaggle.com/code/mgmarques/customer-segmentation-and-market-basket-analysis/notebook
- Customer segmentation: Customer segmentation is the problem of uncovering information about a firm's customer base, based on their interactions with the business. In most cases this interaction is in terms of their purchase behavior and patterns. We explore some of the ways in which this can be used.
- Market basket analysis: Market basket analysis is a method to gain insights into granular behavior of customers. This is helpful in devising strategies which uncovers deeper understanding of purchase decisions taken by the customers. This is interesting as a lot of times even the customer will be unaware of such biases or trends in their purchasing behavior.

Let's see the description of each column:
- InvoiceNo: A unique identifier for the invoice. An invoice number shared across rows means that those transactions were performed in a single invoice (multiple purchases).
- StockCode: Identifier for items contained in an invoice.
- Description: Textual description of each of the stock item.
- Quantity: The quantity of the item purchased.
- InvoiceDate: Date of purchase.
- UnitPrice: Value of each item.
- CustomerID: Identifier for customer making the purchase.
- Country: Country of customer.

# **DATA UNDERSTANDING**

In [76]:
import numpy as np
import pandas as pd
import warnings

warnings.filterwarnings('ignore')
pd.options.mode.chained_assignment = None

path = './db/online-retail.xlsx'
df = pd.read_excel(path)

In [77]:
df.head()

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom


In [78]:
line = '========================'
def dataProfile(data):
  dimension = data.shape
  dtype = data.dtypes
  countOfNull = data.isnull().sum()
  nullRatio = round(countOfNull/len(data)*100,4)
  countOfDistinct = data.nunique()
  distinctValue = data.apply(lambda x: x.unique())
  output = pd.DataFrame(list(zip(dtype, countOfNull, nullRatio, countOfDistinct, distinctValue)),
                        index=data.columns, 
                        columns=['dtype', 'count_of_null', 'null_ratio', 'count_of_distinct', 'distinct_value'])
  # output = pd.concat([dtype, countOfNull, nullRatio, countOfDistinct, distinctValue], axis=1)
  # output.rename(columns=['dtype', 'count_of_null', 'null_ratio', 'count_of_distinct', 'distinct_value'])
  print(f'Dimensions\t: {dimension}')
  print(f'Data Size\t: {round(data.memory_usage(deep=True).sum()/1000000, 2)} MB')
  print(line)
  print(f'Duplicated Data\t: {len(data[data.duplicated()])}')
  display(data[data.duplicated()])
  print(line)
  print('REVIEW')
  display(output)
  print(line)
  print('Stastical Numerics')
  display(data.describe())
  print(line)
  print('Stastical Categorics')
  display(data.describe(include=['category', 'object']))
  print(line)
  print('PREVIEW head(3)')
  display(data.head(3))
  

In [79]:
dataProfile(df)

Dimensions	: (541909, 8)
Data Size	: 141.48 MB
Duplicated Data	: 5268


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
517,536409,21866,UNION JACK FLAG LUGGAGE TAG,1,2010-12-01 11:45:00,1.25,17908.0,United Kingdom
527,536409,22866,HAND WARMER SCOTTY DOG DESIGN,1,2010-12-01 11:45:00,2.10,17908.0,United Kingdom
537,536409,22900,SET 2 TEA TOWELS I LOVE LONDON,1,2010-12-01 11:45:00,2.95,17908.0,United Kingdom
539,536409,22111,SCOTTIE DOG HOT WATER BOTTLE,1,2010-12-01 11:45:00,4.95,17908.0,United Kingdom
555,536412,22327,ROUND SNACK BOXES SET OF 4 SKULLS,1,2010-12-01 11:49:00,2.95,17920.0,United Kingdom
...,...,...,...,...,...,...,...,...
541675,581538,22068,BLACK PIRATE TREASURE CHEST,1,2011-12-09 11:34:00,0.39,14446.0,United Kingdom
541689,581538,23318,BOX OF 6 MINI VINTAGE CRACKERS,1,2011-12-09 11:34:00,2.49,14446.0,United Kingdom
541692,581538,22992,REVOLVER WOODEN RULER,1,2011-12-09 11:34:00,1.95,14446.0,United Kingdom
541699,581538,22694,WICKER STAR,1,2011-12-09 11:34:00,2.10,14446.0,United Kingdom


REVIEW


Unnamed: 0,dtype,count_of_null,null_ratio,count_of_distinct,distinct_value
InvoiceNo,object,0,0.0,25900,"[536365, 536366, 536367, 536368, 536369, 53637..."
StockCode,object,0,0.0,4070,"[85123A, 71053, 84406B, 84029G, 84029E, 22752,..."
Description,object,1454,0.2683,4223,"[WHITE HANGING HEART T-LIGHT HOLDER, WHITE MET..."
Quantity,int64,0,0.0,722,"[6, 8, 2, 32, 3, 4, 24, 12, 48, 18, 20, 36, 80..."
InvoiceDate,datetime64[ns],0,0.0,23260,"[2010-12-01T08:26:00.000000000, 2010-12-01T08:..."
UnitPrice,float64,0,0.0,1630,"[2.55, 3.39, 2.75, 7.65, 4.25, 1.85, 1.69, 2.1..."
CustomerID,float64,135080,24.9267,4372,"[17850.0, 13047.0, 12583.0, 13748.0, 15100.0, ..."
Country,object,0,0.0,38,"[United Kingdom, France, Australia, Netherland..."


Stastical Numerics


Unnamed: 0,Quantity,UnitPrice,CustomerID
count,541909.0,541909.0,406829.0
mean,9.55225,4.611114,15287.69057
std,218.081158,96.759853,1713.600303
min,-80995.0,-11062.06,12346.0
25%,1.0,1.25,13953.0
50%,3.0,2.08,15152.0
75%,10.0,4.13,16791.0
max,80995.0,38970.0,18287.0


Stastical Categorics


Unnamed: 0,InvoiceNo,StockCode,Description,Country
count,541909,541909,540455,541909
unique,25900,4070,4223,38
top,573585,85123A,WHITE HANGING HEART T-LIGHT HOLDER,United Kingdom
freq,1114,2313,2369,495478


PREVIEW head(3)


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850.0,United Kingdom


We can observe from the preceding output that Quantity and UnitPrice are having negative values, which may mean that we may have some return transactions in our data also. As our goal is customer segmentation and market basket analysis, it is important that these records are removed, but first we will take a look at whether there are records where both are negative or if one of them is negative and the other is zero.

# **DATA CLEANSING**

## **Drop Duplicated**

In [295]:
def dropDuplicates(df):
  print(f'Dimensions before remove duplicates: {df.shape}')
  df = df.drop_duplicates()
  print(f'Dimensions after remove duplicates: {df.shape}')
  return df

In [296]:
data = df.sort_values('CustomerID').copy()
data = dropDuplicates(data)
data

Dimensions before remove duplicates: (541909, 8)
Dimensions after remove duplicates: (536641, 8)


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
61619,541431,23166,MEDIUM CERAMIC TOP STORAGE JAR,74215,2011-01-18 10:01:00,1.04,12346.0,United Kingdom
61624,C541433,23166,MEDIUM CERAMIC TOP STORAGE JAR,-74215,2011-01-18 10:17:00,1.04,12346.0,United Kingdom
286628,562032,21578,WOODLAND DESIGN COTTON TOTE BAG,6,2011-08-02 08:48:00,2.25,12347.0,Iceland
72263,542237,47559B,TEA TIME OVEN GLOVE,10,2011-01-26 14:30:00,1.25,12347.0,Iceland
72264,542237,21154,RED RETROSPOT OVEN GLOVE,10,2011-01-26 14:30:00,1.25,12347.0,Iceland
...,...,...,...,...,...,...,...,...
541536,581498,85099B,JUMBO BAG RED RETROSPOT,5,2011-12-09 10:26:00,4.13,,United Kingdom
541537,581498,85099C,JUMBO BAG BAROQUE BLACK WHITE,4,2011-12-09 10:26:00,4.13,,United Kingdom
541538,581498,85150,LADIES & GENTLEMEN METAL SIGN,1,2011-12-09 10:26:00,4.96,,United Kingdom
541539,581498,85174,S/4 CACTI CANDLES,1,2011-12-09 10:26:00,10.79,,United Kingdom


## **Drop N/a CustomerID**

In [297]:
def dropNull(df, cols=None):
  print(f'Dimensions before remove duplicates: {df.shape}')
  if(cols==None):
    df = df.dropna()
  else:
    df = df.dropna(subset=cols, axis=0)
  print(f'Dimensions after remove duplicates: {df.shape}')
  return df

In [298]:
data = dropNull(data, cols=['CustomerID'])
data

Dimensions before remove duplicates: (536641, 8)
Dimensions after remove duplicates: (401604, 8)


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
61619,541431,23166,MEDIUM CERAMIC TOP STORAGE JAR,74215,2011-01-18 10:01:00,1.04,12346.0,United Kingdom
61624,C541433,23166,MEDIUM CERAMIC TOP STORAGE JAR,-74215,2011-01-18 10:17:00,1.04,12346.0,United Kingdom
286628,562032,21578,WOODLAND DESIGN COTTON TOTE BAG,6,2011-08-02 08:48:00,2.25,12347.0,Iceland
72263,542237,47559B,TEA TIME OVEN GLOVE,10,2011-01-26 14:30:00,1.25,12347.0,Iceland
72264,542237,21154,RED RETROSPOT OVEN GLOVE,10,2011-01-26 14:30:00,1.25,12347.0,Iceland
...,...,...,...,...,...,...,...,...
392737,570715,23269,SET OF 2 CERAMIC CHRISTMAS TREES,36,2011-10-12 10:23:00,1.45,18287.0,United Kingdom
392736,570715,23223,CHRISTMAS TREE HANGING SILVER,48,2011-10-12 10:23:00,0.83,18287.0,United Kingdom
392735,570715,23378,PACK OF 12 50'S CHRISTMAS TISSUES,24,2011-10-12 10:23:00,0.39,18287.0,United Kingdom
423939,573167,23264,SET OF 3 WOODEN SLEIGH DECORATIONS,36,2011-10-28 09:29:00,1.25,18287.0,United Kingdom


## **Data Types**

In [299]:
dataProfile(data)

Dimensions	: (401604, 8)
Data Size	: 107.96 MB
Duplicated Data	: 0


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country


REVIEW


Unnamed: 0,dtype,count_of_null,null_ratio,count_of_distinct,distinct_value
InvoiceNo,object,0,0.0,22190,"[541431, C541433, 562032, 542237, 573511, 5562..."
StockCode,object,0,0.0,3684,"[23166, 21578, 47559B, 21154, 21041, 21035, 22..."
Description,object,0,0.0,3896,"[MEDIUM CERAMIC TOP STORAGE JAR, WOODLAND DESI..."
Quantity,int64,0,0.0,436,"[74215, -74215, 6, 10, 3, 12, 4, 8, 24, 20, 2,..."
InvoiceDate,datetime64[ns],0,0.0,20460,"[2011-01-18T10:01:00.000000000, 2011-01-18T10:..."
UnitPrice,float64,0,0.0,620,"[1.04, 2.25, 1.25, 2.95, 12.75, 4.25, 0.42, 1...."
CustomerID,float64,0,0.0,4372,"[12346.0, 12347.0, 12348.0, 12349.0, 12350.0, ..."
Country,object,0,0.0,37,"[United Kingdom, Iceland, Finland, Italy, Norw..."


Stastical Numerics


Unnamed: 0,Quantity,UnitPrice,CustomerID
count,401604.0,401604.0,401604.0
mean,12.183273,3.474064,15281.160818
std,250.283037,69.764035,1714.006089
min,-80995.0,0.0,12346.0
25%,2.0,1.25,13939.0
50%,5.0,1.95,15145.0
75%,12.0,3.75,16784.0
max,80995.0,38970.0,18287.0


Stastical Categorics


Unnamed: 0,InvoiceNo,StockCode,Description,Country
count,401604,401604,401604,401604
unique,22190,3684,3896,37
top,576339,85123A,WHITE HANGING HEART T-LIGHT HOLDER,United Kingdom
freq,542,2065,2058,356728


PREVIEW head(3)


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
61619,541431,23166,MEDIUM CERAMIC TOP STORAGE JAR,74215,2011-01-18 10:01:00,1.04,12346.0,United Kingdom
61624,C541433,23166,MEDIUM CERAMIC TOP STORAGE JAR,-74215,2011-01-18 10:17:00,1.04,12346.0,United Kingdom
286628,562032,21578,WOODLAND DESIGN COTTON TOTE BAG,6,2011-08-02 08:48:00,2.25,12347.0,Iceland


In [256]:
data.CustomerID = data.CustomerID.astype('str')
data.CustomerID = data.CustomerID.str.replace(r'\D+0', '', regex=True)
numericalColumns = ['Quantity', 'UnitPrice', 'InvoiceDate']
for value in data.columns:
  if value not in numericalColumns:
    data[value] = data[value].astype('str')
dataProfile(data)

Dimensions	: (401604, 8)
Data Size	: 149.77 MB
Duplicated Data	: 0


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country


REVIEW


Unnamed: 0,dtype,count_of_null,null_ratio,count_of_distinct,distinct_value
InvoiceNo,object,0,0.0,22190,"[541431, C541433, 562032, 542237, 573511, 5562..."
StockCode,object,0,0.0,3684,"[23166, 21578, 47559B, 21154, 21041, 21035, 22..."
Description,object,0,0.0,3896,"[MEDIUM CERAMIC TOP STORAGE JAR, WOODLAND DESI..."
Quantity,int64,0,0.0,436,"[74215, -74215, 6, 10, 3, 12, 4, 8, 24, 20, 2,..."
InvoiceDate,datetime64[ns],0,0.0,20460,"[2011-01-18T10:01:00.000000000, 2011-01-18T10:..."
UnitPrice,float64,0,0.0,620,"[1.04, 2.25, 1.25, 2.95, 12.75, 4.25, 0.42, 1...."
CustomerID,object,0,0.0,4372,"[12346, 12347, 12348, 12349, 12350, 12352, 123..."
Country,object,0,0.0,37,"[United Kingdom, Iceland, Finland, Italy, Norw..."


Stastical Numerics


Unnamed: 0,Quantity,UnitPrice
count,401604.0,401604.0
mean,12.183273,3.474064
std,250.283037,69.764035
min,-80995.0,0.0
25%,2.0,1.25
50%,5.0,1.95
75%,12.0,3.75
max,80995.0,38970.0


Stastical Categorics


Unnamed: 0,InvoiceNo,StockCode,Description,CustomerID,Country
count,401604,401604,401604,401604,401604
unique,22190,3684,3896,4372,37
top,576339,85123A,WHITE HANGING HEART T-LIGHT HOLDER,17841,United Kingdom
freq,542,2065,2058,7812,356728


PREVIEW head(3)


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
61619,541431,23166,MEDIUM CERAMIC TOP STORAGE JAR,74215,2011-01-18 10:01:00,1.04,12346,United Kingdom
61624,C541433,23166,MEDIUM CERAMIC TOP STORAGE JAR,-74215,2011-01-18 10:17:00,1.04,12346,United Kingdom
286628,562032,21578,WOODLAND DESIGN COTTON TOTE BAG,6,2011-08-02 08:48:00,2.25,12347,Iceland


## **Explore**

### **Duplicated values and null values has been removed. Negative value in Quantity?**

In [257]:
print(f'negative quantity => refund?')
print(f'InvoiceNo startwith: {data[(data.Quantity<0)].InvoiceNo.apply(lambda x: str(x)[0]).unique()}\n{line}')
display(data[(data.Quantity<0)])
print(line)
print(f'zero unitprice => free/bug/error?')
print(f'length: {len(data[(data.UnitPrice==0)])}\n{line}')
display(data[(data.UnitPrice==0)])

negative quantity => refund?
InvoiceNo startwith: ['C']


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
61624,C541433,23166,MEDIUM CERAMIC TOP STORAGE JAR,-74215,2011-01-18 10:17:00,1.04,12346,United Kingdom
106397,C545330,M,Manual,-1,2011-03-01 15:49:00,376.50,12352,Norway
106395,C545329,M,Manual,-1,2011-03-01 15:47:00,183.75,12352,Norway
106394,C545329,M,Manual,-1,2011-03-01 15:47:00,280.05,12352,Norway
129743,C547388,21914,BLUE HARMONICA IN BOX,-12,2011-03-22 16:07:00,1.25,12352,Norway
...,...,...,...,...,...,...,...,...
488515,C577832,84988,SET OF 72 PINK HEART PAPER DOILIES,-12,2011-11-22 10:18:00,1.45,18274,United Kingdom
481908,C577386,23401,RUSTIC MIRROR WITH LACE HEART,-1,2011-11-18 16:54:00,6.25,18276,United Kingdom
481921,C577390,23401,RUSTIC MIRROR WITH LACE HEART,-1,2011-11-18 17:01:00,6.25,18276,United Kingdom
70604,C542086,22423,REGENCY CAKESTAND 3 TIER,-1,2011-01-25 12:34:00,12.75,18277,United Kingdom


zero unitprice => free/bug/error?
length: 40


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
436428,574138,23234,BISCUIT TIN VINTAGE CHRISTMAS,216,2011-11-03 11:26:00,0.0,12415,Australia
198383,554037,22619,SET OF 6 SOLDIER SKITTLES,80,2011-05-20 14:13:00,0.0,12415,Australia
439361,574469,22385,JUMBO BAG SPACEBOY DESIGN,12,2011-11-04 11:55:00,0.0,12431,Australia
436961,574252,M,Manual,1,2011-11-03 13:24:00,0.0,12437,France
480649,577314,23407,SET OF 2 TRAYS HOME SWEET HOME,2,2011-11-18 13:23:00,0.0,12444,Norway
395529,571035,M,Manual,1,2011-10-13 12:50:00,0.0,12446,RSA
157042,550188,22636,CHILDS BREAKFAST SET CIRCUS PARADE,1,2011-04-14 18:57:00,0.0,12457,Switzerland
282912,561669,22960,JAM MAKING SET WITH JARS,11,2011-07-28 17:09:00,0.0,12507,Spain
479546,577168,M,Manual,1,2011-11-18 10:42:00,0.0,12603,Germany
9302,537197,22841,ROUND CAKE TIN VINTAGE GREEN,1,2010-12-05 14:02:00,0.0,12647,Germany


In [301]:
zeroUP = data[data.UnitPrice==0][['StockCode', "Description"]]
priceZero = pd.merge(data, zeroUP, left_on=['StockCode', 'Description'], right_on=['StockCode', 'Description'], how='inner')
# priceZero
priceZero = priceZero.groupby(['StockCode', 'Description', 'UnitPrice'], as_index=False).agg(Count_=('UnitPrice', 'count')).reset_index(drop=True)
priceZero[priceZero.UnitPrice==0]

Unnamed: 0,StockCode,Description,UnitPrice,Count_
0,21208,PASTEL COLOUR HONEYCOMB FAN,0.0,1
4,21786,POLKADOT RAIN HAT,0.0,1
8,22055,MINI CAKE STAND HANGING STRAWBERY,0.0,1
12,22062,CERAMIC BOWL WITH LOVE HEART DESIGN,0.0,1
15,22065,CHRISTMAS PUDDING TRINKET POT,0.0,1
19,22089,PAPER BUNTING VINTAGE PAISLEY,0.0,1
22,22090,PAPER BUNTING RETROSPOT,0.0,1
26,22162,HEART GARLAND RUSTIC PADDED,0.0,1
28,22167,OVAL WALL MIRROR DIAMANTE,0.0,1
31,22168,ORGANISER WOOD ANTIQUE WHITE,0.0,1


### **Drop Zero UnitPrice**
The zero-valued UnitPrice only has 40 registers data. Therefore, it can be removed to avoid data inconsistencies. 

In [302]:
data = data[data.UnitPrice > 0]
data

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
61619,541431,23166,MEDIUM CERAMIC TOP STORAGE JAR,74215,2011-01-18 10:01:00,1.04,12346.0,United Kingdom
61624,C541433,23166,MEDIUM CERAMIC TOP STORAGE JAR,-74215,2011-01-18 10:17:00,1.04,12346.0,United Kingdom
286628,562032,21578,WOODLAND DESIGN COTTON TOTE BAG,6,2011-08-02 08:48:00,2.25,12347.0,Iceland
72263,542237,47559B,TEA TIME OVEN GLOVE,10,2011-01-26 14:30:00,1.25,12347.0,Iceland
72264,542237,21154,RED RETROSPOT OVEN GLOVE,10,2011-01-26 14:30:00,1.25,12347.0,Iceland
...,...,...,...,...,...,...,...,...
392737,570715,23269,SET OF 2 CERAMIC CHRISTMAS TREES,36,2011-10-12 10:23:00,1.45,18287.0,United Kingdom
392736,570715,23223,CHRISTMAS TREE HANGING SILVER,48,2011-10-12 10:23:00,0.83,18287.0,United Kingdom
392735,570715,23378,PACK OF 12 50'S CHRISTMAS TISSUES,24,2011-10-12 10:23:00,0.39,18287.0,United Kingdom
423939,573167,23264,SET OF 3 WOODEN SLEIGH DECORATIONS,36,2011-10-28 09:29:00,1.25,18287.0,United Kingdom


In [260]:
dataProfile(data)

Dimensions	: (401564, 8)
Data Size	: 149.76 MB
Duplicated Data	: 0


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country


REVIEW


Unnamed: 0,dtype,count_of_null,null_ratio,count_of_distinct,distinct_value
InvoiceNo,object,0,0.0,22186,"[541431, C541433, 562032, 542237, 573511, 5562..."
StockCode,object,0,0.0,3684,"[23166, 21578, 47559B, 21154, 21041, 21035, 22..."
Description,object,0,0.0,3896,"[MEDIUM CERAMIC TOP STORAGE JAR, WOODLAND DESI..."
Quantity,int64,0,0.0,435,"[74215, -74215, 6, 10, 3, 12, 4, 8, 24, 20, 2,..."
InvoiceDate,datetime64[ns],0,0.0,20456,"[2011-01-18T10:01:00.000000000, 2011-01-18T10:..."
UnitPrice,float64,0,0.0,619,"[1.04, 2.25, 1.25, 2.95, 12.75, 4.25, 0.42, 1...."
CustomerID,object,0,0.0,4371,"[12346, 12347, 12348, 12349, 12350, 12352, 123..."
Country,object,0,0.0,37,"[United Kingdom, Iceland, Finland, Italy, Norw..."


Stastical Numerics


Unnamed: 0,Quantity,UnitPrice
count,401564.0,401564.0
mean,12.149911,3.47441
std,249.512649,69.767501
min,-80995.0,0.001
25%,2.0,1.25
50%,5.0,1.95
75%,12.0,3.75
max,80995.0,38970.0


Stastical Categorics


Unnamed: 0,InvoiceNo,StockCode,Description,CustomerID,Country
count,401564,401564,401564,401564,401564
unique,22186,3684,3896,4371,37
top,576339,85123A,WHITE HANGING HEART T-LIGHT HOLDER,17841,United Kingdom
freq,542,2065,2058,7812,356704


PREVIEW head(3)


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
61619,541431,23166,MEDIUM CERAMIC TOP STORAGE JAR,74215,2011-01-18 10:01:00,1.04,12346,United Kingdom
61624,C541433,23166,MEDIUM CERAMIC TOP STORAGE JAR,-74215,2011-01-18 10:17:00,1.04,12346,United Kingdom
286628,562032,21578,WOODLAND DESIGN COTTON TOTE BAG,6,2011-08-02 08:48:00,2.25,12347,Iceland


### **Explore Returned/Canceled Transactions**

#### **By Transactions and Trasaction Items**

In [261]:
cancel = data.groupby(['InvoiceNo', 'CustomerID'], as_index=False).Quantity.sum().sort_values('CustomerID').reset_index(drop=True)
cancel['IsCanceled'] = np.where(cancel.InvoiceNo.str.startswith('C', na=False), 1, 0)

print(f'Total transactions\t\t: {len(cancel)}')
print(f'Total completed transactions\t: {len(cancel)-cancel.IsCanceled.sum()} => {round(100-(cancel.IsCanceled.sum()/len(cancel)*100),2)}%')
print(f'Total canceled transactions\t: {cancel.IsCanceled.sum()} => {round((cancel.IsCanceled.sum()/len(cancel)*100),2)}%')
print(line)
cancel

Total transactions		: 22186
Total completed transactions	: 18532 => 83.53%
Total canceled transactions	: 3654 => 16.47%


Unnamed: 0,InvoiceNo,CustomerID,Quantity,IsCanceled
0,541431,12346,74215,0
1,C541433,12346,-74215,1
2,549222,12347,483,0
3,537626,12347,319,0
4,562032,12347,277,0
...,...,...,...,...
22181,578262,18283,241,0
22182,579673,18283,132,0
22183,570715,18287,990,0
22184,554065,18287,488,0


In [310]:
# canceled items
data[data['InvoiceNo'].str.startswith("C", na = False)].sort_values('CustomerID').reset_index(drop=True)
# same as data[data.Quantity<0]

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,C541433,23166,MEDIUM CERAMIC TOP STORAGE JAR,-74215,2011-01-18 10:17:00,1.04,12346.0,United Kingdom
1,C547388,22784,LANTERN CREAM GAZEBO,-3,2011-03-22 16:07:00,4.95,12352.0,Norway
2,C547388,37448,CERAMIC CAKE DESIGN SPOTTED MUG,-12,2011-03-22 16:07:00,1.49,12352.0,Norway
3,C547388,22701,PINK DOG BOWL,-6,2011-03-22 16:07:00,2.95,12352.0,Norway
4,C547388,22645,CERAMIC HEART FAIRY CAKE MONEY BANK,-12,2011-03-22 16:07:00,1.45,12352.0,Norway
...,...,...,...,...,...,...,...,...
8867,C577832,84988,SET OF 72 PINK HEART PAPER DOILIES,-12,2011-11-22 10:18:00,1.45,18274.0,United Kingdom
8868,C577386,23401,RUSTIC MIRROR WITH LACE HEART,-1,2011-11-18 16:54:00,6.25,18276.0,United Kingdom
8869,C577390,23401,RUSTIC MIRROR WITH LACE HEART,-1,2011-11-18 17:01:00,6.25,18276.0,United Kingdom
8870,C542086,22423,REGENCY CAKESTAND 3 TIER,-1,2011-01-25 12:34:00,12.75,18277.0,United Kingdom


In [306]:
# canceled items
# same result
len(data[data.Quantity<0].sort_values('CustomerID').reset_index(drop=True).InvoiceNo.unique())

3654

In [311]:
# completed items
# same result
data[data.Quantity>0].reset_index(drop=True)

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,541431,23166,MEDIUM CERAMIC TOP STORAGE JAR,74215,2011-01-18 10:01:00,1.04,12346.0,United Kingdom
1,562032,21578,WOODLAND DESIGN COTTON TOTE BAG,6,2011-08-02 08:48:00,2.25,12347.0,Iceland
2,542237,47559B,TEA TIME OVEN GLOVE,10,2011-01-26 14:30:00,1.25,12347.0,Iceland
3,542237,21154,RED RETROSPOT OVEN GLOVE,10,2011-01-26 14:30:00,1.25,12347.0,Iceland
4,542237,21041,RED RETROSPOT OVEN GLOVE DOUBLE,6,2011-01-26 14:30:00,2.95,12347.0,Iceland
...,...,...,...,...,...,...,...,...
392687,570715,23269,SET OF 2 CERAMIC CHRISTMAS TREES,36,2011-10-12 10:23:00,1.45,18287.0,United Kingdom
392688,570715,23223,CHRISTMAS TREE HANGING SILVER,48,2011-10-12 10:23:00,0.83,18287.0,United Kingdom
392689,570715,23378,PACK OF 12 50'S CHRISTMAS TISSUES,24,2011-10-12 10:23:00,0.39,18287.0,United Kingdom
392690,573167,23264,SET OF 3 WOODEN SLEIGH DECORATIONS,36,2011-10-28 09:29:00,1.25,18287.0,United Kingdom


#### **Transactions Affected by Returned**

In [307]:
data.reset_index(drop=True, inplace=True)
dataIdx = data.copy()
dataIdx['idx'] = dataIdx.index
dataIdx

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,idx
0,541431,23166,MEDIUM CERAMIC TOP STORAGE JAR,74215,2011-01-18 10:01:00,1.04,12346.0,United Kingdom,0
1,C541433,23166,MEDIUM CERAMIC TOP STORAGE JAR,-74215,2011-01-18 10:17:00,1.04,12346.0,United Kingdom,1
2,562032,21578,WOODLAND DESIGN COTTON TOTE BAG,6,2011-08-02 08:48:00,2.25,12347.0,Iceland,2
3,542237,47559B,TEA TIME OVEN GLOVE,10,2011-01-26 14:30:00,1.25,12347.0,Iceland,3
4,542237,21154,RED RETROSPOT OVEN GLOVE,10,2011-01-26 14:30:00,1.25,12347.0,Iceland,4
...,...,...,...,...,...,...,...,...,...
401559,570715,23269,SET OF 2 CERAMIC CHRISTMAS TREES,36,2011-10-12 10:23:00,1.45,18287.0,United Kingdom,401559
401560,570715,23223,CHRISTMAS TREE HANGING SILVER,48,2011-10-12 10:23:00,0.83,18287.0,United Kingdom,401560
401561,570715,23378,PACK OF 12 50'S CHRISTMAS TISSUES,24,2011-10-12 10:23:00,0.39,18287.0,United Kingdom,401561
401562,573167,23264,SET OF 3 WOODEN SLEIGH DECORATIONS,36,2011-10-28 09:29:00,1.25,18287.0,United Kingdom,401562


In [308]:
dataCompleted = dataIdx[dataIdx.Quantity>0]
dataCanceled = dataIdx[dataIdx.Quantity<0]
dataReturned = pd.merge(dataCompleted, dataCanceled, how='inner',
                   on=['StockCode', 'Description', 'CustomerID', 'Country', 'UnitPrice'], 
                   suffixes=['_completed', '_canceled'])
dataReturned

Unnamed: 0,InvoiceNo_completed,StockCode,Description,Quantity_completed,InvoiceDate_completed,UnitPrice,CustomerID,Country,idx_completed,InvoiceNo_canceled,Quantity_canceled,InvoiceDate_canceled,idx_canceled
0,541431,23166,MEDIUM CERAMIC TOP STORAGE JAR,74215,2011-01-18 10:01:00,1.04,12346.0,United Kingdom,0,C541433,-74215,2011-01-18 10:17:00,1
1,545332,M,Manual,1,2011-03-01 15:52:00,183.75,12352.0,Norway,315,C545329,-1,2011-03-01 15:47:00,319
2,545332,M,Manual,1,2011-03-01 15:52:00,280.05,12352.0,Norway,316,C545329,-1,2011-03-01 15:47:00,320
3,545332,M,Manual,1,2011-03-01 15:52:00,376.50,12352.0,Norway,317,C545330,-1,2011-03-01 15:49:00,318
4,545323,84050,PINK HEART SHAPE EGG FRYING PAN,12,2011-03-01 14:57:00,1.65,12352.0,Norway,323,C547388,-12,2011-03-22 16:07:00,351
...,...,...,...,...,...,...,...,...,...,...,...,...,...
19811,575485,84509A,SET OF 4 ENGLISH ROSE PLACEMATS,4,2011-11-09 17:03:00,3.75,18274.0,United Kingdom,400707,C577832,-4,2011-11-22 10:18:00,400696
19812,575485,21231,SWEETHEART CERAMIC TRINKET BOX,12,2011-11-09 17:03:00,1.25,18274.0,United Kingdom,400708,C577832,-12,2011-11-22 10:18:00,400689
19813,572990,23401,RUSTIC MIRROR WITH LACE HEART,2,2011-10-27 10:54:00,6.25,18276.0,United Kingdom,400718,C577386,-1,2011-11-18 16:54:00,400713
19814,572990,23401,RUSTIC MIRROR WITH LACE HEART,2,2011-10-27 10:54:00,6.25,18276.0,United Kingdom,400718,C577390,-1,2011-11-18 17:01:00,400715


In [309]:
dataReturnedQtyMT = dataReturned[dataReturned.Quantity_completed < np.abs(dataReturned.Quantity_canceled)]
dataReturnedQtyEQ = dataReturned[dataReturned.Quantity_completed == np.abs(dataReturned.Quantity_canceled)]
dataReturnedQtyLT = dataReturned[dataReturned.Quantity_completed > np.abs(dataReturned.Quantity_canceled)]
print(line)
print(f'Transaction Items Affected by Returned => {len(dataReturned)}')
print(line)
print(f'Purchase Quantity > Return Quantity \t: {len(dataReturnedQtyLT)}')
display(dataReturnedQtyLT)
print(line)
print(f'Purchase Quantity == Return Quantity \t: {len(dataReturnedQtyEQ)}')
display(dataReturnedQtyEQ)
print(line)
print(f'Purchase Quantity < Return Quantity \t: {len(dataReturnedQtyMT)}')
display(dataReturnedQtyMT)


Transaction Items Affected by Returned => 19816
Purchase Quantity > Return Quantity 	: 13237


Unnamed: 0,InvoiceNo_completed,StockCode,Description,Quantity_completed,InvoiceDate_completed,UnitPrice,CustomerID,Country,idx_completed,InvoiceNo_canceled,Quantity_canceled,InvoiceDate_canceled,idx_canceled
19,540946,22666,RECIPE BOX PANTRY YELLOW DESIGN,6,2011-01-12 12:43:00,2.95,12359.0,Cyprus,696,C549955,-2,2011-04-13 13:38:00,684
20,543370,22666,RECIPE BOX PANTRY YELLOW DESIGN,6,2011-02-07 14:51:00,2.95,12359.0,Cyprus,726,C549955,-2,2011-04-13 13:38:00,684
21,540946,22720,SET OF 3 CAKE TINS PANTRY DESIGN,3,2011-01-12 12:43:00,4.95,12359.0,Cyprus,698,C580165,-1,2011-12-02 11:21:00,903
22,571034,22720,SET OF 3 CAKE TINS PANTRY DESIGN,3,2011-10-13 12:47:00,4.95,12359.0,Cyprus,889,C580165,-1,2011-12-02 11:21:00,903
24,571034,23245,SET OF 3 REGENCY CAKE TINS,4,2011-10-13 12:47:00,4.95,12359.0,Cyprus,882,C580165,-2,2011-12-02 11:21:00,710
...,...,...,...,...,...,...,...,...,...,...,...,...,...
19800,549185,22969,HOMEMADE JAM SCENTED CANDLES,24,2011-04-07 09:35:00,1.45,18272.0,United Kingdom,400583,C552720,-2,2011-05-11 09:49:00,400561
19801,551507,22204,MILK PAN BLUE POLKADOT,4,2011-04-28 18:11:00,3.75,18272.0,United Kingdom,400604,C552720,-1,2011-05-11 09:49:00,400564
19813,572990,23401,RUSTIC MIRROR WITH LACE HEART,2,2011-10-27 10:54:00,6.25,18276.0,United Kingdom,400718,C577386,-1,2011-11-18 16:54:00,400713
19814,572990,23401,RUSTIC MIRROR WITH LACE HEART,2,2011-10-27 10:54:00,6.25,18276.0,United Kingdom,400718,C577390,-1,2011-11-18 17:01:00,400715


Purchase Quantity == Return Quantity 	: 5382


Unnamed: 0,InvoiceNo_completed,StockCode,Description,Quantity_completed,InvoiceDate_completed,UnitPrice,CustomerID,Country,idx_completed,InvoiceNo_canceled,Quantity_canceled,InvoiceDate_canceled,idx_canceled
0,541431,23166,MEDIUM CERAMIC TOP STORAGE JAR,74215,2011-01-18 10:01:00,1.04,12346.0,United Kingdom,0,C541433,-74215,2011-01-18 10:17:00,1
1,545332,M,Manual,1,2011-03-01 15:52:00,183.75,12352.0,Norway,315,C545329,-1,2011-03-01 15:47:00,319
2,545332,M,Manual,1,2011-03-01 15:52:00,280.05,12352.0,Norway,316,C545329,-1,2011-03-01 15:47:00,320
3,545332,M,Manual,1,2011-03-01 15:52:00,376.50,12352.0,Norway,317,C545330,-1,2011-03-01 15:49:00,318
4,545323,84050,PINK HEART SHAPE EGG FRYING PAN,12,2011-03-01 14:57:00,1.65,12352.0,Norway,323,C547388,-12,2011-03-22 16:07:00,351
...,...,...,...,...,...,...,...,...,...,...,...,...,...
19808,575485,22989,SET 2 PANTRY DESIGN TEA TOWELS,6,2011-11-09 17:03:00,3.25,18274.0,United Kingdom,400704,C577832,-6,2011-11-22 10:18:00,400693
19809,575485,22423,REGENCY CAKESTAND 3 TIER,1,2011-11-09 17:03:00,12.75,18274.0,United Kingdom,400705,C577832,-1,2011-11-22 10:18:00,400688
19810,575485,22851,SET 20 NAPKINS FAIRY CAKES DESIGN,12,2011-11-09 17:03:00,0.85,18274.0,United Kingdom,400706,C577832,-12,2011-11-22 10:18:00,400692
19811,575485,84509A,SET OF 4 ENGLISH ROSE PLACEMATS,4,2011-11-09 17:03:00,3.75,18274.0,United Kingdom,400707,C577832,-4,2011-11-22 10:18:00,400696


Purchase Quantity < Return Quantity 	: 1197


Unnamed: 0,InvoiceNo_completed,StockCode,Description,Quantity_completed,InvoiceDate_completed,UnitPrice,CustomerID,Country,idx_completed,InvoiceNo_canceled,Quantity_canceled,InvoiceDate_canceled,idx_canceled
87,569402,POST,POSTAGE,1,2011-10-04 09:00:00,18.00,12413.0,France,3867,C540367,-3,2011-01-06 16:17:00,3846
193,546765,21843,RED RETROSPOT CAKE STAND,3,2011-03-16 14:37:00,10.95,12437.0,France,6608,C542714,-5,2011-01-31 13:28:00,6661
194,553577,21843,RED RETROSPOT CAKE STAND,4,2011-05-18 10:34:00,10.95,12437.0,France,6627,C542714,-5,2011-01-31 13:28:00,6661
280,576909,23198,PANTRY MAGNETIC SHOPPING LIST,12,2011-11-17 09:49:00,1.45,12471.0,Germany,8461,C573037,-13,2011-10-27 13:45:00,8884
284,559300,23198,PANTRY MAGNETIC SHOPPING LIST,12,2011-07-07 12:40:00,1.45,12471.0,Germany,8480,C573037,-13,2011-10-27 13:45:00,8884
...,...,...,...,...,...,...,...,...,...,...,...,...,...
19529,553913,22501,PICNIC BASKET WICKER LARGE,1,2011-05-19 19:47:00,9.95,18109.0,United Kingdom,391559,C556530,-3,2011-06-13 11:42:00,391558
19530,548698,22501,PICNIC BASKET WICKER LARGE,1,2011-04-03 10:55:00,9.95,18109.0,United Kingdom,391627,C556530,-3,2011-06-13 11:42:00,391558
19535,577503,22835,HOT WATER BOTTLE I AM SO POORLY,1,2011-11-20 12:34:00,4.95,18110.0,United Kingdom,391721,C577513,-4,2011-11-20 12:59:00,391731
19648,581099,23485,BOTANICAL GARDENS WALL CLOCK,1,2011-12-07 11:43:00,25.00,18219.0,United Kingdom,397782,C574489,-2,2011-11-04 13:03:00,397751


###### **Returned Qty >= Buying Qty**
There is a return quantity that is more than the purchase quantity in the product purchase transaction. Why?
it may be due to the warranty claimed by the customer in the previous purchase transaction with the same product. However, with a small amount of data only 1,197 transaction items and data lacking information related to this, the transaction item can be ignored or removed. Likewise, transaction items with return quantities that are equal to the purchase quantity. There is no term explanation on how to process product returns therefore it can be assumed that these transaction items eliminate each other.

In [269]:
transactionItemsRemovedByQty = pd.concat([dataReturnedQtyMT, dataReturnedQtyEQ])
transactionItemsRemovedByQty

Unnamed: 0,InvoiceNo_completed,StockCode,Description,Quantity_completed,InvoiceDate_completed,UnitPrice,CustomerID,Country,idx_completed,InvoiceNo_canceled,Quantity_canceled,InvoiceDate_canceled,idx_canceled
87,569402,POST,POSTAGE,1,2011-10-04 09:00:00,18.00,12413,France,3867,C540367,-3,2011-01-06 16:17:00,3846
193,546765,21843,RED RETROSPOT CAKE STAND,3,2011-03-16 14:37:00,10.95,12437,France,6608,C542714,-5,2011-01-31 13:28:00,6661
194,553577,21843,RED RETROSPOT CAKE STAND,4,2011-05-18 10:34:00,10.95,12437,France,6627,C542714,-5,2011-01-31 13:28:00,6661
280,576909,23198,PANTRY MAGNETIC SHOPPING LIST,12,2011-11-17 09:49:00,1.45,12471,Germany,8461,C573037,-13,2011-10-27 13:45:00,8884
284,559300,23198,PANTRY MAGNETIC SHOPPING LIST,12,2011-07-07 12:40:00,1.45,12471,Germany,8480,C573037,-13,2011-10-27 13:45:00,8884
...,...,...,...,...,...,...,...,...,...,...,...,...,...
19808,575485,22989,SET 2 PANTRY DESIGN TEA TOWELS,6,2011-11-09 17:03:00,3.25,18274,United Kingdom,400704,C577832,-6,2011-11-22 10:18:00,400693
19809,575485,22423,REGENCY CAKESTAND 3 TIER,1,2011-11-09 17:03:00,12.75,18274,United Kingdom,400705,C577832,-1,2011-11-22 10:18:00,400688
19810,575485,22851,SET 20 NAPKINS FAIRY CAKES DESIGN,12,2011-11-09 17:03:00,0.85,18274,United Kingdom,400706,C577832,-12,2011-11-22 10:18:00,400692
19811,575485,84509A,SET OF 4 ENGLISH ROSE PLACEMATS,4,2011-11-09 17:03:00,3.75,18274,United Kingdom,400707,C577832,-4,2011-11-22 10:18:00,400696


In [270]:
data.iloc[[200000]]

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
200000,558028,21232,STRAWBERRY CERAMIC TRINKET BOX,12,2011-06-24 11:39:00,1.25,15128,United Kingdom


In [271]:
# data_.iloc[[200000]]

In [272]:
data.drop(transactionItemsRemovedByQty.idx_completed.unique(), inplace=True)
data.drop(transactionItemsRemovedByQty.idx_canceled.unique(), inplace=True)
data

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
2,562032,21578,WOODLAND DESIGN COTTON TOTE BAG,6,2011-08-02 08:48:00,2.25,12347,Iceland
3,542237,47559B,TEA TIME OVEN GLOVE,10,2011-01-26 14:30:00,1.25,12347,Iceland
4,542237,21154,RED RETROSPOT OVEN GLOVE,10,2011-01-26 14:30:00,1.25,12347,Iceland
5,542237,21041,RED RETROSPOT OVEN GLOVE DOUBLE,6,2011-01-26 14:30:00,2.95,12347,Iceland
6,542237,21035,SET/2 RED RETROSPOT TEA TOWELS,6,2011-01-26 14:30:00,2.95,12347,Iceland
...,...,...,...,...,...,...,...,...
401559,570715,23269,SET OF 2 CERAMIC CHRISTMAS TREES,36,2011-10-12 10:23:00,1.45,18287,United Kingdom
401560,570715,23223,CHRISTMAS TREE HANGING SILVER,48,2011-10-12 10:23:00,0.83,18287,United Kingdom
401561,570715,23378,PACK OF 12 50'S CHRISTMAS TISSUES,24,2011-10-12 10:23:00,0.39,18287,United Kingdom
401562,573167,23264,SET OF 3 WOODEN SLEIGH DECORATIONS,36,2011-10-28 09:29:00,1.25,18287,United Kingdom


###### **Returned Qty < Purchase Qty**

In [273]:
dataReturnedQtyLT

Unnamed: 0,InvoiceNo_completed,StockCode,Description,Quantity_completed,InvoiceDate_completed,UnitPrice,CustomerID,Country,idx_completed,InvoiceNo_canceled,Quantity_canceled,InvoiceDate_canceled,idx_canceled
19,540946,22666,RECIPE BOX PANTRY YELLOW DESIGN,6,2011-01-12 12:43:00,2.95,12359,Cyprus,696,C549955,-2,2011-04-13 13:38:00,684
20,543370,22666,RECIPE BOX PANTRY YELLOW DESIGN,6,2011-02-07 14:51:00,2.95,12359,Cyprus,726,C549955,-2,2011-04-13 13:38:00,684
21,540946,22720,SET OF 3 CAKE TINS PANTRY DESIGN,3,2011-01-12 12:43:00,4.95,12359,Cyprus,698,C580165,-1,2011-12-02 11:21:00,903
22,571034,22720,SET OF 3 CAKE TINS PANTRY DESIGN,3,2011-10-13 12:47:00,4.95,12359,Cyprus,889,C580165,-1,2011-12-02 11:21:00,903
24,571034,23245,SET OF 3 REGENCY CAKE TINS,4,2011-10-13 12:47:00,4.95,12359,Cyprus,882,C580165,-2,2011-12-02 11:21:00,710
...,...,...,...,...,...,...,...,...,...,...,...,...,...
19800,549185,22969,HOMEMADE JAM SCENTED CANDLES,24,2011-04-07 09:35:00,1.45,18272,United Kingdom,400583,C552720,-2,2011-05-11 09:49:00,400561
19801,551507,22204,MILK PAN BLUE POLKADOT,4,2011-04-28 18:11:00,3.75,18272,United Kingdom,400604,C552720,-1,2011-05-11 09:49:00,400564
19813,572990,23401,RUSTIC MIRROR WITH LACE HEART,2,2011-10-27 10:54:00,6.25,18276,United Kingdom,400718,C577386,-1,2011-11-18 16:54:00,400713
19814,572990,23401,RUSTIC MIRROR WITH LACE HEART,2,2011-10-27 10:54:00,6.25,18276,United Kingdom,400718,C577390,-1,2011-11-18 17:01:00,400715


In [274]:
rm = dataReturnedQtyLT.copy()
rm['Quantity'] = rm.Quantity_completed - np.abs(rm.Quantity_canceled)
newQty = pd.DataFrame(list(zip(rm.InvoiceNo_completed,
                      rm.StockCode,
                      rm.Description,
                      rm.Quantity,
                      rm.InvoiceDate_completed,
                      rm.UnitPrice,
                      rm.CustomerID,
                      rm.Country)), columns=data.columns)
newQty

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,540946,22666,RECIPE BOX PANTRY YELLOW DESIGN,4,2011-01-12 12:43:00,2.95,12359,Cyprus
1,543370,22666,RECIPE BOX PANTRY YELLOW DESIGN,4,2011-02-07 14:51:00,2.95,12359,Cyprus
2,540946,22720,SET OF 3 CAKE TINS PANTRY DESIGN,2,2011-01-12 12:43:00,4.95,12359,Cyprus
3,571034,22720,SET OF 3 CAKE TINS PANTRY DESIGN,2,2011-10-13 12:47:00,4.95,12359,Cyprus
4,571034,23245,SET OF 3 REGENCY CAKE TINS,2,2011-10-13 12:47:00,4.95,12359,Cyprus
...,...,...,...,...,...,...,...,...
13232,549185,22969,HOMEMADE JAM SCENTED CANDLES,22,2011-04-07 09:35:00,1.45,18272,United Kingdom
13233,551507,22204,MILK PAN BLUE POLKADOT,3,2011-04-28 18:11:00,3.75,18272,United Kingdom
13234,572990,23401,RUSTIC MIRROR WITH LACE HEART,1,2011-10-27 10:54:00,6.25,18276,United Kingdom
13235,572990,23401,RUSTIC MIRROR WITH LACE HEART,1,2011-10-27 10:54:00,6.25,18276,United Kingdom


In [275]:
data.drop(6)

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
2,562032,21578,WOODLAND DESIGN COTTON TOTE BAG,6,2011-08-02 08:48:00,2.25,12347,Iceland
3,542237,47559B,TEA TIME OVEN GLOVE,10,2011-01-26 14:30:00,1.25,12347,Iceland
4,542237,21154,RED RETROSPOT OVEN GLOVE,10,2011-01-26 14:30:00,1.25,12347,Iceland
5,542237,21041,RED RETROSPOT OVEN GLOVE DOUBLE,6,2011-01-26 14:30:00,2.95,12347,Iceland
7,542237,22423,REGENCY CAKESTAND 3 TIER,3,2011-01-26 14:30:00,12.75,12347,Iceland
...,...,...,...,...,...,...,...,...
401559,570715,23269,SET OF 2 CERAMIC CHRISTMAS TREES,36,2011-10-12 10:23:00,1.45,18287,United Kingdom
401560,570715,23223,CHRISTMAS TREE HANGING SILVER,48,2011-10-12 10:23:00,0.83,18287,United Kingdom
401561,570715,23378,PACK OF 12 50'S CHRISTMAS TISSUES,24,2011-10-12 10:23:00,0.39,18287,United Kingdom
401562,573167,23264,SET OF 3 WOODEN SLEIGH DECORATIONS,36,2011-10-28 09:29:00,1.25,18287,United Kingdom


In [276]:
data[data.index==57]

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
57,573511,22699,ROSES REGENCY TEACUP AND SAUCER,18,2011-10-31 12:25:00,2.95,12347,Iceland


In [277]:
transactionItemsRemovedByQty[transactionItemsRemovedByQty.idx_completed==57]

Unnamed: 0,InvoiceNo_completed,StockCode,Description,Quantity_completed,InvoiceDate_completed,UnitPrice,CustomerID,Country,idx_completed,InvoiceNo_canceled,Quantity_canceled,InvoiceDate_canceled,idx_canceled


In [278]:
rm[rm.idx_completed==57]

Unnamed: 0,InvoiceNo_completed,StockCode,Description,Quantity_completed,InvoiceDate_completed,UnitPrice,CustomerID,Country,idx_completed,InvoiceNo_canceled,Quantity_canceled,InvoiceDate_canceled,idx_canceled,Quantity


In [279]:
data.drop([3,5])

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
2,562032,21578,WOODLAND DESIGN COTTON TOTE BAG,6,2011-08-02 08:48:00,2.25,12347,Iceland
4,542237,21154,RED RETROSPOT OVEN GLOVE,10,2011-01-26 14:30:00,1.25,12347,Iceland
6,542237,21035,SET/2 RED RETROSPOT TEA TOWELS,6,2011-01-26 14:30:00,2.95,12347,Iceland
7,542237,22423,REGENCY CAKESTAND 3 TIER,3,2011-01-26 14:30:00,12.75,12347,Iceland
8,542237,84969,BOX OF 6 ASSORTED COLOUR TEASPOONS,6,2011-01-26 14:30:00,4.25,12347,Iceland
...,...,...,...,...,...,...,...,...
401559,570715,23269,SET OF 2 CERAMIC CHRISTMAS TREES,36,2011-10-12 10:23:00,1.45,18287,United Kingdom
401560,570715,23223,CHRISTMAS TREE HANGING SILVER,48,2011-10-12 10:23:00,0.83,18287,United Kingdom
401561,570715,23378,PACK OF 12 50'S CHRISTMAS TISSUES,24,2011-10-12 10:23:00,0.39,18287,United Kingdom
401562,573167,23264,SET OF 3 WOODEN SLEIGH DECORATIONS,36,2011-10-28 09:29:00,1.25,18287,United Kingdom


In [280]:
data.iloc[[57]]

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
59,573511,23421,PANTRY HOOK SPATULA,12,2011-10-31 12:25:00,2.08,12347,Iceland


In [281]:
data.drop(rm.idx_completed.unique(), inplace=True, errors='ignore')
data.drop(rm.idx_canceled.unique(), inplace=True, errors='ignore')
data

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
2,562032,21578,WOODLAND DESIGN COTTON TOTE BAG,6,2011-08-02 08:48:00,2.25,12347,Iceland
3,542237,47559B,TEA TIME OVEN GLOVE,10,2011-01-26 14:30:00,1.25,12347,Iceland
4,542237,21154,RED RETROSPOT OVEN GLOVE,10,2011-01-26 14:30:00,1.25,12347,Iceland
5,542237,21041,RED RETROSPOT OVEN GLOVE DOUBLE,6,2011-01-26 14:30:00,2.95,12347,Iceland
6,542237,21035,SET/2 RED RETROSPOT TEA TOWELS,6,2011-01-26 14:30:00,2.95,12347,Iceland
...,...,...,...,...,...,...,...,...
401559,570715,23269,SET OF 2 CERAMIC CHRISTMAS TREES,36,2011-10-12 10:23:00,1.45,18287,United Kingdom
401560,570715,23223,CHRISTMAS TREE HANGING SILVER,48,2011-10-12 10:23:00,0.83,18287,United Kingdom
401561,570715,23378,PACK OF 12 50'S CHRISTMAS TISSUES,24,2011-10-12 10:23:00,0.39,18287,United Kingdom
401562,573167,23264,SET OF 3 WOODEN SLEIGH DECORATIONS,36,2011-10-28 09:29:00,1.25,18287,United Kingdom


In [282]:
data[data.Quantity<0]

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
906,C580165,22826,LOVE SEAT ANTIQUE WHITE METAL,-1,2011-12-02 11:21:00,42.50,12359,Cyprus
3541,C549253,20712,JUMBO BAG WOODLAND ANIMALS,-1,2011-04-07 12:20:00,2.08,12408,Belgium
4394,C574344,POST,POSTAGE,-1,2011-11-04 10:18:00,262.73,12415,Australia
4951,C557300,M,Manual,-1,2011-06-19 14:05:00,0.77,12421,Spain
6461,C538723,21217,RED RETROSPOT ROUND CAKE TINS,-1,2010-12-14 11:12:00,9.95,12434,Australia
...,...,...,...,...,...,...,...,...
400102,C555268,23057,BEADED CHANDELIER T-LIGHT HOLDER,-1,2011-06-01 16:17:00,4.95,18257,United Kingdom
400166,C545740,POST,POSTAGE,-1,2011-03-07 11:47:00,8.65,18257,United Kingdom
400508,C549945,POST,POSTAGE,-1,2011-04-13 12:39:00,5.95,18270,United Kingdom
400560,C552720,20932,PINK POT PLANT CANDLE,-1,2011-05-11 09:49:00,2.95,18272,United Kingdom


In [283]:
dataProfile(data)

Dimensions	: (378283, 8)
Data Size	: 141.09 MB
Duplicated Data	: 0


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country


REVIEW


Unnamed: 0,dtype,count_of_null,null_ratio,count_of_distinct,distinct_value
InvoiceNo,object,0,0.0,18939,"[562032, 542237, 573511, 556201, 549222, 53762..."
StockCode,object,0,0.0,3668,"[21578, 47559B, 21154, 21041, 21035, 22423, 84..."
Description,object,0,0.0,3880,"[WOODLAND DESIGN COTTON TOTE BAG, TEA TIME OV..."
Quantity,int64,0,0.0,338,"[6, 10, 3, 12, 4, 8, 24, 20, 2, 18, 36, 48, 16..."
InvoiceDate,datetime64[ns],0,0.0,17638,"[2011-08-02T08:48:00.000000000, 2011-01-26T14:..."
UnitPrice,float64,0,0.0,584,"[2.25, 1.25, 2.95, 12.75, 4.25, 0.42, 1.65, 3...."
CustomerID,object,0,0.0,4354,"[12347, 12348, 12349, 12350, 12352, 12353, 123..."
Country,object,0,0.0,37,"[Iceland, Finland, Italy, Norway, Bahrain, Spa..."


Stastical Numerics


Unnamed: 0,Quantity,UnitPrice
count,378283.0,378283.0
mean,12.229709,3.177673
std,43.479199,66.21398
min,-9360.0,0.001
25%,2.0,1.25
50%,6.0,1.85
75%,12.0,3.75
max,4300.0,38970.0


Stastical Categorics


Unnamed: 0,InvoiceNo,StockCode,Description,CustomerID,Country
count,378283,378283,378283,378283,378283
unique,18939,3668,3880,4354,37
top,576339,85123A,WHITE HANGING HEART T-LIGHT HOLDER,17841,United Kingdom
freq,542,1828,1821,6401,337536


PREVIEW head(3)


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
2,562032,21578,WOODLAND DESIGN COTTON TOTE BAG,6,2011-08-02 08:48:00,2.25,12347,Iceland
3,542237,47559B,TEA TIME OVEN GLOVE,10,2011-01-26 14:30:00,1.25,12347,Iceland
4,542237,21154,RED RETROSPOT OVEN GLOVE,10,2011-01-26 14:30:00,1.25,12347,Iceland


In [284]:
dataCompleted = data[data.Quantity>0]
dataCanceled = data[data.Quantity<0]
dataReturned = pd.merge(dataCompleted, dataCanceled, how='inner',
                   on=['StockCode', 'Description', 'CustomerID', 'Country', 'UnitPrice'], 
                   suffixes=['_completed', '_canceled'])
dataReturnedABeforeCheckout = dataReturned[dataReturned['InvoiceDate_completed'] >= dataReturned['InvoiceDate_canceled']]
dataReturnedAAfterCheckout = dataReturned[dataReturned['InvoiceDate_completed'] < dataReturned['InvoiceDate_canceled']]
print(line)
print(f'Returned Transactions\t\t\t: {len(dataReturned)}')
print(line)
print(f'Returned Transactions Before Checkout\t: {len(dataReturnedABeforeCheckout)}')
print('Samples:')
display(dataReturnedABeforeCheckout.sample(3))
print(line)
print(f'Returned Transactions After Checkout\t: {len(dataReturnedAAfterCheckout)}')
print('Samples:')
display(dataReturnedAAfterCheckout.sample(3))

Returned Transactions			: 0
Returned Transactions Before Checkout	: 0
Samples:


ValueError: a must be greater than 0 unless no samples are taken

In [None]:
dataCompleted = data[data.Quantity>0]
dataCanceled = data[data.Quantity<0]
dataReturned = pd.merge(dataCompleted, dataCanceled, how='inner',
                   on=['StockCode', 'Description', 'CustomerID', 'Country', 'UnitPrice'], 
                   suffixes=['_completed', '_canceled'])
dataReturnedABeforeCheckout = dataReturned[dataReturned['InvoiceDate_completed'] >= dataReturned['InvoiceDate_canceled']]
dataReturnedAAfterCheckout = dataReturned[dataReturned['InvoiceDate_completed'] < dataReturned['InvoiceDate_canceled']]
print(line)
print(f'Returned Transactions\t\t\t: {len(dataReturned)}')
print(line)
print(f'Returned Transactions Before Checkout\t: {len(dataReturnedABeforeCheckout)}')
print('Samples:')
display(dataReturnedABeforeCheckout.sample(3))
print(line)
print(f'Returned Transactions After Checkout\t: {len(dataReturnedAAfterCheckout)}')
print('Samples:')
display(dataReturnedAAfterCheckout.sample(3))

Returned Transactions			: 19816
Returned Transactions Before Checkout	: 6629
Samples:


Unnamed: 0,InvoiceNo_completed,StockCode,Description,Quantity_completed,InvoiceDate_completed,UnitPrice,CustomerID,Country,InvoiceNo_canceled,Quantity_canceled,InvoiceDate_canceled
10202,567899,21471,STRAWBERRY RAFFIA FOOD COVER,6,2011-09-22 16:26:00,3.75,14911,EIRE,C564759,-2,2011-08-30 10:40:00
10634,550186,22698,PINK REGENCY TEACUP AND SAUCER,12,2011-04-14 18:28:00,2.95,14051,United Kingdom,C548485,-2,2011-03-31 12:50:00
9500,580362,21314,SMALL GLASS HEART TRINKET POT,8,2011-12-02 16:30:00,2.1,13884,United Kingdom,C545823,-3,2011-03-07 12:54:00


Returned Transactions After Checkout	: 13187
Samples:


Unnamed: 0,InvoiceNo_completed,StockCode,Description,Quantity_completed,InvoiceDate_completed,UnitPrice,CustomerID,Country,InvoiceNo_canceled,Quantity_canceled,InvoiceDate_canceled
11784,549835,37340,MULTICOLOUR SPRING FLOWER MUG,48,2011-04-12 13:24:00,0.39,17511,United Kingdom,C559136,-1,2011-07-06 13:21:00
8686,544301,21067,VINTAGE RED TEATIME MUG,2,2011-02-17 12:59:00,1.25,14606,United Kingdom,C545836,-1,2011-03-07 13:19:00
17995,567874,23239,SET OF 4 KNICK KNACK TINS POPPIES,6,2011-09-22 14:26:00,4.15,13055,United Kingdom,C569970,-1,2011-10-06 18:57:00
