# **SOURCE**
https://www.kaggle.com/code/mgmarques/customer-segmentation-and-market-basket-analysis/notebook
- Customer segmentation: Customer segmentation is the problem of uncovering information about a firm's customer base, based on their interactions with the business. In most cases this interaction is in terms of their purchase behavior and patterns. We explore some of the ways in which this can be used.
- Market basket analysis: Market basket analysis is a method to gain insights into granular behavior of customers. This is helpful in devising strategies which uncovers deeper understanding of purchase decisions taken by the customers. This is interesting as a lot of times even the customer will be unaware of such biases or trends in their purchasing behavior.

Let's see the description of each column:
- InvoiceNo: A unique identifier for the invoice. An invoice number shared across rows means that those transactions were performed in a single invoice (multiple purchases).
- StockCode: Identifier for items contained in an invoice.
- Description: Textual description of each of the stock item.
- Quantity: The quantity of the item purchased.
- InvoiceDate: Date of purchase.
- UnitPrice: Value of each item.
- CustomerID: Identifier for customer making the purchase.
- Country: Country of customer.

# **DATA UNDERSTANDING**

In [2]:
import numpy as np
import pandas as pd
import warnings

warnings.filterwarnings('ignore')
pd.options.mode.chained_assignment = None

path = './db/online-retail.xlsx'
df = pd.read_excel(path)

In [3]:
df.head()

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom


In [4]:
line = '========================'
def dataProfile(data):
  dimension = data.shape
  dtype = data.dtypes
  countOfNull = data.isnull().sum()
  nullRatio = round(countOfNull/len(data)*100,4)
  countOfDistinct = data.nunique()
  distinctValue = data.apply(lambda x: x.unique())
  output = pd.DataFrame(list(zip(dtype, countOfNull, nullRatio, countOfDistinct, distinctValue)),
                        index=data.columns, 
                        columns=['dtype', 'count_of_null', 'null_ratio', 'count_of_distinct', 'distinct_value'])
  # output = pd.concat([dtype, countOfNull, nullRatio, countOfDistinct, distinctValue], axis=1)
  # output.rename(columns=['dtype', 'count_of_null', 'null_ratio', 'count_of_distinct', 'distinct_value'])
  print(f'Dimensions\t: {dimension}')
  print(f'Data Size\t: {round(data.memory_usage(deep=True).sum()/1000000, 2)} MB')
  print(line)
  print(f'Duplicated Data\t: {len(data[data.duplicated()])}')
  display(data[data.duplicated()])
  print(line)
  print('REVIEW')
  display(output)
  print(line)
  print('Stastical Numerics')
  display(data.describe())
  print(line)
  print('Stastical Categorics')
  display(data.describe(include=['category', 'object']))
  print(line)
  print('PREVIEW head(3)')
  display(data.head(3))
  

In [207]:
dataProfile(df)

Dimensions	: (541909, 8)
Data Size	: 141.48 MB
Duplicated Data	: 5268


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
517,536409,21866,UNION JACK FLAG LUGGAGE TAG,1,2010-12-01 11:45:00,1.25,17908.0,United Kingdom
527,536409,22866,HAND WARMER SCOTTY DOG DESIGN,1,2010-12-01 11:45:00,2.10,17908.0,United Kingdom
537,536409,22900,SET 2 TEA TOWELS I LOVE LONDON,1,2010-12-01 11:45:00,2.95,17908.0,United Kingdom
539,536409,22111,SCOTTIE DOG HOT WATER BOTTLE,1,2010-12-01 11:45:00,4.95,17908.0,United Kingdom
555,536412,22327,ROUND SNACK BOXES SET OF 4 SKULLS,1,2010-12-01 11:49:00,2.95,17920.0,United Kingdom
...,...,...,...,...,...,...,...,...
541675,581538,22068,BLACK PIRATE TREASURE CHEST,1,2011-12-09 11:34:00,0.39,14446.0,United Kingdom
541689,581538,23318,BOX OF 6 MINI VINTAGE CRACKERS,1,2011-12-09 11:34:00,2.49,14446.0,United Kingdom
541692,581538,22992,REVOLVER WOODEN RULER,1,2011-12-09 11:34:00,1.95,14446.0,United Kingdom
541699,581538,22694,WICKER STAR,1,2011-12-09 11:34:00,2.10,14446.0,United Kingdom


REVIEW


Unnamed: 0,dtype,count_of_null,null_ratio,count_of_distinct,distinct_value
InvoiceNo,object,0,0.0,25900,"[536365, 536366, 536367, 536368, 536369, 53637..."
StockCode,object,0,0.0,4070,"[85123A, 71053, 84406B, 84029G, 84029E, 22752,..."
Description,object,1454,0.2683,4223,"[WHITE HANGING HEART T-LIGHT HOLDER, WHITE MET..."
Quantity,int64,0,0.0,722,"[6, 8, 2, 32, 3, 4, 24, 12, 48, 18, 20, 36, 80..."
InvoiceDate,datetime64[ns],0,0.0,23260,"[2010-12-01T08:26:00.000000000, 2010-12-01T08:..."
UnitPrice,float64,0,0.0,1630,"[2.55, 3.39, 2.75, 7.65, 4.25, 1.85, 1.69, 2.1..."
CustomerID,float64,135080,24.9267,4372,"[17850.0, 13047.0, 12583.0, 13748.0, 15100.0, ..."
Country,object,0,0.0,38,"[United Kingdom, France, Australia, Netherland..."


Stastical Numerics


Unnamed: 0,Quantity,UnitPrice,CustomerID
count,541909.0,541909.0,406829.0
mean,9.55225,4.611114,15287.69057
std,218.081158,96.759853,1713.600303
min,-80995.0,-11062.06,12346.0
25%,1.0,1.25,13953.0
50%,3.0,2.08,15152.0
75%,10.0,4.13,16791.0
max,80995.0,38970.0,18287.0


Stastical Categorics


Unnamed: 0,InvoiceNo,StockCode,Description,Country
count,541909,541909,540455,541909
unique,25900,4070,4223,38
top,573585,85123A,WHITE HANGING HEART T-LIGHT HOLDER,United Kingdom
freq,1114,2313,2369,495478


PREVIEW head(3)


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850.0,United Kingdom


We can observe from the preceding output that Quantity and UnitPrice are having negative values, which may mean that we may have some return transactions in our data also. As our goal is customer segmentation and market basket analysis, it is important that these records are removed, but first we will take a look at whether there are records where both are negative or if one of them is negative and the other is zero.

# **DATA CLEANSING**

## **Drop Duplicated**

In [276]:
def dropDuplicates(df):
  print(f'Dimensions before remove duplicates: {df.shape}')
  df = df.drop_duplicates()
  print(f'Dimensions after remove duplicates: {df.shape}')
  return df

In [277]:
data = df.copy()
data = dropDuplicates(data)
data

Dimensions before remove duplicates: (541909, 8)
Dimensions after remove duplicates: (536641, 8)


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
...,...,...,...,...,...,...,...,...
541904,581587,22613,PACK OF 20 SPACEBOY NAPKINS,12,2011-12-09 12:50:00,0.85,12680.0,France
541905,581587,22899,CHILDREN'S APRON DOLLY GIRL,6,2011-12-09 12:50:00,2.10,12680.0,France
541906,581587,23254,CHILDRENS CUTLERY DOLLY GIRL,4,2011-12-09 12:50:00,4.15,12680.0,France
541907,581587,23255,CHILDRENS CUTLERY CIRCUS PARADE,4,2011-12-09 12:50:00,4.15,12680.0,France


## **Drop N/a CustomerID**

In [278]:
def dropNull(df, cols=None):
  print(f'Dimensions before remove duplicates: {df.shape}')
  if(cols==None):
    df = df.dropna()
  else:
    df = df.dropna(subset=cols, axis=0)
  print(f'Dimensions after remove duplicates: {df.shape}')
  return df

In [279]:
data = dropNull(data, cols=['CustomerID'])
data

Dimensions before remove duplicates: (536641, 8)
Dimensions after remove duplicates: (401604, 8)


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
...,...,...,...,...,...,...,...,...
541904,581587,22613,PACK OF 20 SPACEBOY NAPKINS,12,2011-12-09 12:50:00,0.85,12680.0,France
541905,581587,22899,CHILDREN'S APRON DOLLY GIRL,6,2011-12-09 12:50:00,2.10,12680.0,France
541906,581587,23254,CHILDRENS CUTLERY DOLLY GIRL,4,2011-12-09 12:50:00,4.15,12680.0,France
541907,581587,23255,CHILDRENS CUTLERY CIRCUS PARADE,4,2011-12-09 12:50:00,4.15,12680.0,France


## **Data Types**

In [280]:
dataProfile(data)

Dimensions	: (401604, 8)
Data Size	: 107.96 MB
Duplicated Data	: 0


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country


REVIEW


Unnamed: 0,dtype,count_of_null,null_ratio,count_of_distinct,distinct_value
InvoiceNo,object,0,0.0,22190,"[536365, 536366, 536367, 536368, 536369, 53637..."
StockCode,object,0,0.0,3684,"[85123A, 71053, 84406B, 84029G, 84029E, 22752,..."
Description,object,0,0.0,3896,"[WHITE HANGING HEART T-LIGHT HOLDER, WHITE MET..."
Quantity,int64,0,0.0,436,"[6, 8, 2, 32, 3, 4, 24, 12, 48, 18, 20, 36, 80..."
InvoiceDate,datetime64[ns],0,0.0,20460,"[2010-12-01T08:26:00.000000000, 2010-12-01T08:..."
UnitPrice,float64,0,0.0,620,"[2.55, 3.39, 2.75, 7.65, 4.25, 1.85, 1.69, 2.1..."
CustomerID,float64,0,0.0,4372,"[17850.0, 13047.0, 12583.0, 13748.0, 15100.0, ..."
Country,object,0,0.0,37,"[United Kingdom, France, Australia, Netherland..."


Stastical Numerics


Unnamed: 0,Quantity,UnitPrice,CustomerID
count,401604.0,401604.0,401604.0
mean,12.183273,3.474064,15281.160818
std,250.283037,69.764035,1714.006089
min,-80995.0,0.0,12346.0
25%,2.0,1.25,13939.0
50%,5.0,1.95,15145.0
75%,12.0,3.75,16784.0
max,80995.0,38970.0,18287.0


Stastical Categorics


Unnamed: 0,InvoiceNo,StockCode,Description,Country
count,401604,401604,401604,401604
unique,22190,3684,3896,37
top,576339,85123A,WHITE HANGING HEART T-LIGHT HOLDER,United Kingdom
freq,542,2065,2058,356728


PREVIEW head(3)


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850.0,United Kingdom


In [281]:
data.CustomerID = data.CustomerID.astype('str')
data.CustomerID = data.CustomerID.str.replace(r'\D+0', '', regex=True)
numericalColumns = ['Quantity', 'UnitPrice', 'InvoiceDate']
for value in data.columns:
  if value not in numericalColumns:
    data[value] = data[value].astype('str')
dataProfile(data)

Dimensions	: (401604, 8)
Data Size	: 149.77 MB
Duplicated Data	: 0


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country


REVIEW


Unnamed: 0,dtype,count_of_null,null_ratio,count_of_distinct,distinct_value
InvoiceNo,object,0,0.0,22190,"[536365, 536366, 536367, 536368, 536369, 53637..."
StockCode,object,0,0.0,3684,"[85123A, 71053, 84406B, 84029G, 84029E, 22752,..."
Description,object,0,0.0,3896,"[WHITE HANGING HEART T-LIGHT HOLDER, WHITE MET..."
Quantity,int64,0,0.0,436,"[6, 8, 2, 32, 3, 4, 24, 12, 48, 18, 20, 36, 80..."
InvoiceDate,datetime64[ns],0,0.0,20460,"[2010-12-01T08:26:00.000000000, 2010-12-01T08:..."
UnitPrice,float64,0,0.0,620,"[2.55, 3.39, 2.75, 7.65, 4.25, 1.85, 1.69, 2.1..."
CustomerID,object,0,0.0,4372,"[17850, 13047, 12583, 13748, 15100, 15291, 146..."
Country,object,0,0.0,37,"[United Kingdom, France, Australia, Netherland..."


Stastical Numerics


Unnamed: 0,Quantity,UnitPrice
count,401604.0,401604.0
mean,12.183273,3.474064
std,250.283037,69.764035
min,-80995.0,0.0
25%,2.0,1.25
50%,5.0,1.95
75%,12.0,3.75
max,80995.0,38970.0


Stastical Categorics


Unnamed: 0,InvoiceNo,StockCode,Description,CustomerID,Country
count,401604,401604,401604,401604,401604
unique,22190,3684,3896,4372,37
top,576339,85123A,WHITE HANGING HEART T-LIGHT HOLDER,17841,United Kingdom
freq,542,2065,2058,7812,356728


PREVIEW head(3)


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850,United Kingdom


## **Explore**

### **Duplicated values and null values has been removed. Negative value in Quantity?**

In [282]:
print(f'negative quantity => refund?')
print(f'InvoiceNo startwith: {data[(data.Quantity<0)].InvoiceNo.apply(lambda x: str(x)[0]).unique()}\n{line}')
display(data[(data.Quantity<0)])
print(line)
print(f'zero unitprice => free/bug/error?')
print(f'length: {len(data[(data.UnitPrice==0)])}\n{line}')
display(data[(data.UnitPrice==0)])

negative quantity => refund?
InvoiceNo startwith: ['C']


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
141,C536379,D,Discount,-1,2010-12-01 09:41:00,27.50,14527,United Kingdom
154,C536383,35004C,SET OF 3 COLOURED FLYING DUCKS,-1,2010-12-01 09:49:00,4.65,15311,United Kingdom
235,C536391,22556,PLASTERS IN TIN CIRCUS PARADE,-12,2010-12-01 10:24:00,1.65,17548,United Kingdom
236,C536391,21984,PACK OF 12 PINK PAISLEY TISSUES,-24,2010-12-01 10:24:00,0.29,17548,United Kingdom
237,C536391,21983,PACK OF 12 BLUE PAISLEY TISSUES,-24,2010-12-01 10:24:00,0.29,17548,United Kingdom
...,...,...,...,...,...,...,...,...
540449,C581490,23144,ZINC T-LIGHT HOLDER STARS SMALL,-11,2011-12-09 09:57:00,0.83,14397,United Kingdom
541541,C581499,M,Manual,-1,2011-12-09 10:28:00,224.69,15498,United Kingdom
541715,C581568,21258,VICTORIAN SEWING BOX LARGE,-5,2011-12-09 11:57:00,10.95,15311,United Kingdom
541716,C581569,84978,HANGING HEART JAR T-LIGHT HOLDER,-1,2011-12-09 11:58:00,1.25,17315,United Kingdom


zero unitprice => free/bug/error?
length: 40


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
9302,537197,22841,ROUND CAKE TIN VINTAGE GREEN,1,2010-12-05 14:02:00,0.0,12647,Germany
33576,539263,22580,ADVENT CALENDAR GINGHAM SACK,4,2010-12-16 14:36:00,0.0,16560,United Kingdom
40089,539722,22423,REGENCY CAKESTAND 3 TIER,10,2010-12-21 13:45:00,0.0,14911,EIRE
47068,540372,22090,PAPER BUNTING RETROSPOT,24,2011-01-06 16:41:00,0.0,13081,United Kingdom
47070,540372,22553,PLASTERS IN TIN SKULLS,24,2011-01-06 16:41:00,0.0,13081,United Kingdom
56674,541109,22168,ORGANISER WOOD ANTIQUE WHITE,1,2011-01-13 15:10:00,0.0,15107,United Kingdom
86789,543599,84535B,FAIRY CAKES NOTEBOOK A6 SIZE,16,2011-02-10 13:08:00,0.0,17560,United Kingdom
130188,547417,22062,CERAMIC BOWL WITH LOVE HEART DESIGN,36,2011-03-23 10:25:00,0.0,13239,United Kingdom
139453,548318,22055,MINI CAKE STAND HANGING STRAWBERY,5,2011-03-30 12:45:00,0.0,13113,United Kingdom
145208,548871,22162,HEART GARLAND RUSTIC PADDED,2,2011-04-04 14:42:00,0.0,14410,United Kingdom


In [283]:
zeroUP = data[data.UnitPrice==0][['StockCode', "Description"]]
priceZero = pd.merge(data, zeroUP, left_on=['StockCode', 'Description'], right_on=['StockCode', 'Description'], how='inner')
# priceZero
priceZero.groupby(['StockCode', 'Description', 'UnitPrice'], as_index=False).agg(Count_=('UnitPrice', 'count')).reset_index(drop=True)

Unnamed: 0,StockCode,Description,UnitPrice,Count_
0,21208,PASTEL COLOUR HONEYCOMB FAN,0.000,1
1,21208,PASTEL COLOUR HONEYCOMB FAN,0.390,50
2,21208,PASTEL COLOUR HONEYCOMB FAN,1.450,2
3,21208,PASTEL COLOUR HONEYCOMB FAN,1.650,9
4,21786,POLKADOT RAIN HAT,0.000,1
...,...,...,...,...
316,M,Manual,4287.630,6
317,M,Manual,6930.000,6
318,M,Manual,38970.000,6
319,PADS,PADS TO MATCH ALL CUSHIONS,0.000,1


### **Drop Zero UnitPrice**
The zero-valued UnitPrice only has 40 registers data. Therefore, it can be removed to avoid data inconsistencies. 

In [284]:
data = data[data.UnitPrice > 0]
data

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 08:26:00,3.39,17850,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 08:26:00,3.39,17850,United Kingdom
...,...,...,...,...,...,...,...,...
541904,581587,22613,PACK OF 20 SPACEBOY NAPKINS,12,2011-12-09 12:50:00,0.85,12680,France
541905,581587,22899,CHILDREN'S APRON DOLLY GIRL,6,2011-12-09 12:50:00,2.10,12680,France
541906,581587,23254,CHILDRENS CUTLERY DOLLY GIRL,4,2011-12-09 12:50:00,4.15,12680,France
541907,581587,23255,CHILDRENS CUTLERY CIRCUS PARADE,4,2011-12-09 12:50:00,4.15,12680,France


In [285]:
dataProfile(data)

Dimensions	: (401564, 8)
Data Size	: 149.76 MB
Duplicated Data	: 0


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country


REVIEW


Unnamed: 0,dtype,count_of_null,null_ratio,count_of_distinct,distinct_value
InvoiceNo,object,0,0.0,22186,"[536365, 536366, 536367, 536368, 536369, 53637..."
StockCode,object,0,0.0,3684,"[85123A, 71053, 84406B, 84029G, 84029E, 22752,..."
Description,object,0,0.0,3896,"[WHITE HANGING HEART T-LIGHT HOLDER, WHITE MET..."
Quantity,int64,0,0.0,435,"[6, 8, 2, 32, 3, 4, 24, 12, 48, 18, 20, 36, 80..."
InvoiceDate,datetime64[ns],0,0.0,20456,"[2010-12-01T08:26:00.000000000, 2010-12-01T08:..."
UnitPrice,float64,0,0.0,619,"[2.55, 3.39, 2.75, 7.65, 4.25, 1.85, 1.69, 2.1..."
CustomerID,object,0,0.0,4371,"[17850, 13047, 12583, 13748, 15100, 15291, 146..."
Country,object,0,0.0,37,"[United Kingdom, France, Australia, Netherland..."


Stastical Numerics


Unnamed: 0,Quantity,UnitPrice
count,401564.0,401564.0
mean,12.149911,3.47441
std,249.512649,69.767501
min,-80995.0,0.001
25%,2.0,1.25
50%,5.0,1.95
75%,12.0,3.75
max,80995.0,38970.0


Stastical Categorics


Unnamed: 0,InvoiceNo,StockCode,Description,CustomerID,Country
count,401564,401564,401564,401564,401564
unique,22186,3684,3896,4371,37
top,576339,85123A,WHITE HANGING HEART T-LIGHT HOLDER,17841,United Kingdom
freq,542,2065,2058,7812,356704


PREVIEW head(3)


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850,United Kingdom


### **Explore Returned/Canceled Transactions**

#### **By Transactions and Trasaction Items**

In [349]:
cancel = data.groupby(['InvoiceNo', 'CustomerID'], as_index=False).Quantity.sum().sort_values('CustomerID').reset_index(drop=True)
cancel['IsCanceled'] = np.where(cancel.InvoiceNo.str.startswith('C', na=False), 1, 0)

print(f'Total transactions\t\t: {len(cancel)}')
print(f'Total completed transactions\t: {len(cancel)-cancel.IsCanceled.sum()} => {round(100-(cancel.IsCanceled.sum()/len(cancel)*100),2)}%')
print(f'Total canceled transactions\t: {cancel.IsCanceled.sum()} => {round((cancel.IsCanceled.sum()/len(cancel)*100),2)}%')
print(line)
cancel

Total transactions		: 22186
Total completed transactions	: 18532 => 83.53%
Total canceled transactions	: 3654 => 16.47%


Unnamed: 0,InvoiceNo,CustomerID,Quantity,IsCanceled
0,541431,12346,74215,0
1,C541433,12346,-74215,1
2,549222,12347,483,0
3,537626,12347,319,0
4,562032,12347,277,0
...,...,...,...,...
22181,578262,18283,241,0
22182,579673,18283,132,0
22183,570715,18287,990,0
22184,554065,18287,488,0


In [342]:
# canceled items
data[data['InvoiceNo'].str.startswith("C", na = False)].sort_values('CustomerID').reset_index(drop=True)
# same as data[data.Quantity<0]

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,C541433,23166,MEDIUM CERAMIC TOP STORAGE JAR,-74215,2011-01-18 10:17:00,1.04,12346,United Kingdom
1,C547388,21914,BLUE HARMONICA IN BOX,-12,2011-03-22 16:07:00,1.25,12352,Norway
2,C547388,84050,PINK HEART SHAPE EGG FRYING PAN,-12,2011-03-22 16:07:00,1.65,12352,Norway
3,C547388,37448,CERAMIC CAKE DESIGN SPOTTED MUG,-12,2011-03-22 16:07:00,1.49,12352,Norway
4,C547388,22784,LANTERN CREAM GAZEBO,-3,2011-03-22 16:07:00,4.95,12352,Norway
...,...,...,...,...,...,...,...,...
8867,C577832,21231,SWEETHEART CERAMIC TRINKET BOX,-12,2011-11-22 10:18:00,1.25,18274,United Kingdom
8868,C577386,23401,RUSTIC MIRROR WITH LACE HEART,-1,2011-11-18 16:54:00,6.25,18276,United Kingdom
8869,C577390,23401,RUSTIC MIRROR WITH LACE HEART,-1,2011-11-18 17:01:00,6.25,18276,United Kingdom
8870,C542086,22423,REGENCY CAKESTAND 3 TIER,-1,2011-01-25 12:34:00,12.75,18277,United Kingdom


In [350]:
# canceled items
# same result
data[data.Quantity<0].sort_values('CustomerID').reset_index(drop=True)

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,C541433,23166,MEDIUM CERAMIC TOP STORAGE JAR,-74215,2011-01-18 10:17:00,1.04,12346,United Kingdom
1,C547388,21914,BLUE HARMONICA IN BOX,-12,2011-03-22 16:07:00,1.25,12352,Norway
2,C547388,84050,PINK HEART SHAPE EGG FRYING PAN,-12,2011-03-22 16:07:00,1.65,12352,Norway
3,C547388,37448,CERAMIC CAKE DESIGN SPOTTED MUG,-12,2011-03-22 16:07:00,1.49,12352,Norway
4,C547388,22784,LANTERN CREAM GAZEBO,-3,2011-03-22 16:07:00,4.95,12352,Norway
...,...,...,...,...,...,...,...,...
8867,C577832,21231,SWEETHEART CERAMIC TRINKET BOX,-12,2011-11-22 10:18:00,1.25,18274,United Kingdom
8868,C577386,23401,RUSTIC MIRROR WITH LACE HEART,-1,2011-11-18 16:54:00,6.25,18276,United Kingdom
8869,C577390,23401,RUSTIC MIRROR WITH LACE HEART,-1,2011-11-18 17:01:00,6.25,18276,United Kingdom
8870,C542086,22423,REGENCY CAKESTAND 3 TIER,-1,2011-01-25 12:34:00,12.75,18277,United Kingdom


In [351]:
# completed items
# same result
data[data.Quantity>0].sort_values('CustomerID').reset_index(drop=True)

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,541431,23166,MEDIUM CERAMIC TOP STORAGE JAR,74215,2011-01-18 10:01:00,1.04,12346,United Kingdom
1,537626,84558A,3D DOG PICTURE PLAYING CARDS,24,2010-12-07 14:57:00,2.95,12347,Iceland
2,537626,22727,ALARM CLOCK BAKELIKE RED,4,2010-12-07 14:57:00,3.75,12347,Iceland
3,537626,22728,ALARM CLOCK BAKELIKE PINK,4,2010-12-07 14:57:00,3.75,12347,Iceland
4,537626,22729,ALARM CLOCK BAKELIKE ORANGE,4,2010-12-07 14:57:00,3.75,12347,Iceland
...,...,...,...,...,...,...,...,...
392687,570715,22065,CHRISTMAS PUDDING TRINKET POT,48,2011-10-12 10:23:00,0.39,18287,United Kingdom
392688,570715,21824,PAINTED METAL STAR WITH HOLLY BELLS,24,2011-10-12 10:23:00,0.39,18287,United Kingdom
392689,570715,22306,SILVER MUG BONE CHINA TREE OF LIFE,24,2011-10-12 10:23:00,1.06,18287,United Kingdom
392690,570715,21481,FAWN BLUE HOT WATER BOTTLE,4,2011-10-12 10:23:00,3.75,18287,United Kingdom


#### **Transactions Affected by Returned**

In [412]:
data.reset_index(drop=True, inplace=True)
dataIdx = data.copy()
dataIdx['idx'] = dataIdx.index
dataIdx

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,idx
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850,United Kingdom,0
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,1
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850,United Kingdom,2
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,3
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 08:26:00,3.39,17850,United Kingdom,4
...,...,...,...,...,...,...,...,...,...
401559,581587,22613,PACK OF 20 SPACEBOY NAPKINS,12,2011-12-09 12:50:00,0.85,12680,France,401559
401560,581587,22899,CHILDREN'S APRON DOLLY GIRL,6,2011-12-09 12:50:00,2.10,12680,France,401560
401561,581587,23254,CHILDRENS CUTLERY DOLLY GIRL,4,2011-12-09 12:50:00,4.15,12680,France,401561
401562,581587,23255,CHILDRENS CUTLERY CIRCUS PARADE,4,2011-12-09 12:50:00,4.15,12680,France,401562


In [413]:
dataCompleted = dataIdx[dataIdx.Quantity>0]
dataCanceled = dataIdx[dataIdx.Quantity<0]
dataReturned = pd.merge(dataCompleted, dataCanceled, how='inner',
                   on=['StockCode', 'Description', 'CustomerID', 'Country', 'UnitPrice'], 
                   suffixes=['_completed', '_canceled'])
dataReturned

Unnamed: 0,InvoiceNo_completed,StockCode,Description,Quantity_completed,InvoiceDate_completed,UnitPrice,CustomerID,Country,idx_completed,InvoiceNo_canceled,Quantity_canceled,InvoiceDate_canceled,idx_canceled
0,536365,21730,GLASS STAR FROSTED T-LIGHT HOLDER,6,2010-12-01 08:26:00,4.25,17850,United Kingdom,6,C543611,-1,2011-02-10 14:38:00,54577
1,536373,21730,GLASS STAR FROSTED T-LIGHT HOLDER,6,2010-12-01 09:02:00,4.25,17850,United Kingdom,64,C543611,-1,2011-02-10 14:38:00,54577
2,536375,21730,GLASS STAR FROSTED T-LIGHT HOLDER,6,2010-12-01 09:32:00,4.25,17850,United Kingdom,81,C543611,-1,2011-02-10 14:38:00,54577
3,536396,21730,GLASS STAR FROSTED T-LIGHT HOLDER,6,2010-12-01 10:51:00,4.25,17850,United Kingdom,295,C543611,-1,2011-02-10 14:38:00,54577
4,536406,21730,GLASS STAR FROSTED T-LIGHT HOLDER,6,2010-12-01 11:33:00,4.25,17850,United Kingdom,432,C543611,-1,2011-02-10 14:38:00,54577
...,...,...,...,...,...,...,...,...,...,...,...,...,...
19811,581469,51014C,"FEATHER PEN,COAL BLACK",12,2011-12-08 19:28:00,0.39,14606,United Kingdom,400832,C570331,-1,2011-10-10 12:18:00,284017
19812,581469,51014C,"FEATHER PEN,COAL BLACK",12,2011-12-08 19:28:00,0.39,14606,United Kingdom,400832,C574026,-1,2011-11-02 12:26:00,322301
19813,581469,84029E,RED WOOLLY HOTTIE WHITE HEART.,1,2011-12-08 19:28:00,4.25,14606,United Kingdom,400841,C565848,-1,2011-09-07 12:48:00,236299
19814,581469,21485,RETROSPOT HEART HOT WATER BOTTLE,1,2011-12-08 19:28:00,4.95,14606,United Kingdom,400842,C565848,-1,2011-09-07 12:48:00,236300


##### **By Quantity**

In [414]:
dataReturnedQtyMT = dataReturned[dataReturned.Quantity_completed < np.abs(dataReturned.Quantity_canceled)]
dataReturnedQtyEQ = dataReturned[dataReturned.Quantity_completed == np.abs(dataReturned.Quantity_canceled)]
dataReturnedQtyLT = dataReturned[dataReturned.Quantity_completed > np.abs(dataReturned.Quantity_canceled)]
print(line)
print(f'Transaction Items Affected by Returned => {len(dataReturned)}')
print(line)
print(f'Purchase Quantity > Return Quantity \t: {len(dataReturnedQtyLT)}')
display(dataReturnedQtyLT)
print(line)
print(f'Purchase Quantity == Return Quantity \t: {len(dataReturnedQtyEQ)}')
display(dataReturnedQtyEQ)
print(line)
print(f'Purchase Quantity < Return Quantity \t: {len(dataReturnedQtyMT)}')
display(dataReturnedQtyMT)


Transaction Items Affected by Returned => 19816
Purchase Quantity > Return Quantity 	: 13237


Unnamed: 0,InvoiceNo_completed,StockCode,Description,Quantity_completed,InvoiceDate_completed,UnitPrice,CustomerID,Country,idx_completed,InvoiceNo_canceled,Quantity_canceled,InvoiceDate_canceled,idx_canceled
0,536365,21730,GLASS STAR FROSTED T-LIGHT HOLDER,6,2010-12-01 08:26:00,4.25,17850,United Kingdom,6,C543611,-1,2011-02-10 14:38:00,54577
1,536373,21730,GLASS STAR FROSTED T-LIGHT HOLDER,6,2010-12-01 09:02:00,4.25,17850,United Kingdom,64,C543611,-1,2011-02-10 14:38:00,54577
2,536375,21730,GLASS STAR FROSTED T-LIGHT HOLDER,6,2010-12-01 09:32:00,4.25,17850,United Kingdom,81,C543611,-1,2011-02-10 14:38:00,54577
3,536396,21730,GLASS STAR FROSTED T-LIGHT HOLDER,6,2010-12-01 10:51:00,4.25,17850,United Kingdom,295,C543611,-1,2011-02-10 14:38:00,54577
4,536406,21730,GLASS STAR FROSTED T-LIGHT HOLDER,6,2010-12-01 11:33:00,4.25,17850,United Kingdom,432,C543611,-1,2011-02-10 14:38:00,54577
...,...,...,...,...,...,...,...,...,...,...,...,...,...
19803,580719,84946,ANTIQUE SILVER T-LIGHT GLASS,72,2011-12-05 16:54:00,1.06,14739,United Kingdom,392875,C581162,-3,2011-12-07 14:32:00,397647
19804,580978,22107,PIZZA PLATE IN BOX,8,2011-12-06 15:36:00,1.25,13078,United Kingdom,395841,C581460,-1,2011-12-08 18:48:00,400744
19808,581179,21232,STRAWBERRY CERAMIC TRINKET POT,48,2011-12-07 15:43:00,1.25,12471,Germany,398037,C573037,-2,2011-10-27 13:45:00,312437
19811,581469,51014C,"FEATHER PEN,COAL BLACK",12,2011-12-08 19:28:00,0.39,14606,United Kingdom,400832,C570331,-1,2011-10-10 12:18:00,284017


Purchase Quantity == Return Quantity 	: 5382


Unnamed: 0,InvoiceNo_completed,StockCode,Description,Quantity_completed,InvoiceDate_completed,UnitPrice,CustomerID,Country,idx_completed,InvoiceNo_canceled,Quantity_canceled,InvoiceDate_canceled,idx_canceled
28,557627,POST,POSTAGE,1,2011-06-21 14:30:00,18.00,12583,France,167229,C543992,-1,2011-02-15 10:29:00,57005
30,564471,POST,POSTAGE,1,2011-08-25 12:26:00,18.00,12583,France,225567,C543992,-1,2011-02-15 10:29:00,57005
33,570851,POST,POSTAGE,1,2011-10-12 14:46:00,18.00,12583,France,289242,C543992,-1,2011-02-15 10:29:00,57005
51,536373,21071,VINTAGE BILLBOARD DRINK ME MUG,6,2010-12-01 09:02:00,1.06,17850,United Kingdom,55,C543611,-6,2011-02-10 14:38:00,54580
52,536375,21071,VINTAGE BILLBOARD DRINK ME MUG,6,2010-12-01 09:32:00,1.06,17850,United Kingdom,72,C543611,-6,2011-02-10 14:38:00,54580
...,...,...,...,...,...,...,...,...,...,...,...,...,...
19809,581325,16169E,WRAP 50'S CHRISTMAS,100,2011-12-08 11:53:00,0.42,15877,United Kingdom,398941,C581330,-100,2011-12-08 11:57:00,398960
19810,581325,22959,WRAP CHRISTMAS VILLAGE,25,2011-12-08 11:53:00,0.42,15877,United Kingdom,398942,C581330,-25,2011-12-08 11:57:00,398959
19813,581469,84029E,RED WOOLLY HOTTIE WHITE HEART.,1,2011-12-08 19:28:00,4.25,14606,United Kingdom,400841,C565848,-1,2011-09-07 12:48:00,236299
19814,581469,21485,RETROSPOT HEART HOT WATER BOTTLE,1,2011-12-08 19:28:00,4.95,14606,United Kingdom,400842,C565848,-1,2011-09-07 12:48:00,236300


Purchase Quantity < Return Quantity 	: 1197


Unnamed: 0,InvoiceNo_completed,StockCode,Description,Quantity_completed,InvoiceDate_completed,UnitPrice,CustomerID,Country,idx_completed,InvoiceNo_canceled,Quantity_canceled,InvoiceDate_canceled,idx_canceled
97,536373,82494L,WOODEN FRAME ANTIQUE WHITE,6,2010-12-01 09:02:00,2.55,17850,United Kingdom,60,C543611,-12,2011-02-10 14:38:00,54579
99,536375,82494L,WOODEN FRAME ANTIQUE WHITE,6,2010-12-01 09:32:00,2.55,17850,United Kingdom,77,C543611,-12,2011-02-10 14:38:00,54579
103,536406,82494L,WOODEN FRAME ANTIQUE WHITE,6,2010-12-01 11:33:00,2.55,17850,United Kingdom,427,C543611,-12,2011-02-10 14:38:00,54579
105,536600,82494L,WOODEN FRAME ANTIQUE WHITE,6,2010-12-02 08:32:00,2.55,17850,United Kingdom,1939,C543611,-12,2011-02-10 14:38:00,54579
107,536609,82494L,WOODEN FRAME ANTIQUE WHITE,6,2010-12-02 09:41:00,2.55,17850,United Kingdom,2018,C543611,-12,2011-02-10 14:38:00,54579
...,...,...,...,...,...,...,...,...,...,...,...,...,...
19684,578088,21098,CHRISTMAS TOILET ROLL,2,2011-11-22 16:47:00,1.25,16376,United Kingdom,366699,C579948,-3,2011-12-01 10:56:00,384558
19685,578088,21098,CHRISTMAS TOILET ROLL,1,2011-11-22 16:47:00,1.25,16376,United Kingdom,366723,C579948,-3,2011-12-01 10:56:00,384558
19724,578936,22952,60 CAKE CASES VINTAGE CHRISTMAS,5,2011-11-27 13:00:00,0.55,16923,United Kingdom,375271,C542413,-24,2011-01-27 17:11:00,45474
19725,578936,21485,RETROSPOT HEART HOT WATER BOTTLE,2,2011-11-27 13:00:00,4.95,16923,United Kingdom,375316,C542413,-6,2011-01-27 17:11:00,45483


There is a return quantity that is more than the purchase quantity in the product purchase transaction. Why?
it may be due to the warranty claimed by the customer in the previous purchase transaction with the same product. However, with a small amount of data only 1,197 transaction items and data lacking information related to this, the transaction item can be ignored or removed. Likewise, transaction items with return quantities that are equal to the purchase quantity. There is no term explanation on how to process product returns therefore it can be assumed that these transaction items eliminate each other.

In [415]:
transactionItemsRemovedByQty = pd.concat([dataReturnedQtyMT, dataReturnedQtyEQ])
transactionItemsRemovedByQty
# data[(data.InvoiceNo.isin(transactionItemsRemovedByQty.InvoiceNo_completed)&
#       data.StockCode.isin(transactionItemsRemovedByQty.StockCode)&
#       data.Description.isin(transactionItemsRemovedByQty.Description)&
#       data.Quantity.isin(transactionItemsRemovedByQty.Quantity_completed)&
#       data.InvoiceDate.isin(transactionItemsRemovedByQty.InvoiceDate_completed)&
#       data.UnitPrice.isin(transactionItemsRemovedByQty.UnitPrice)&
#       data.CustomerID.isin(transactionItemsRemovedByQty.CustomerID)&
#       data.Country.isin(transactionItemsRemovedByQty.Country)
#       )]

Unnamed: 0,InvoiceNo_completed,StockCode,Description,Quantity_completed,InvoiceDate_completed,UnitPrice,CustomerID,Country,idx_completed,InvoiceNo_canceled,Quantity_canceled,InvoiceDate_canceled,idx_canceled
97,536373,82494L,WOODEN FRAME ANTIQUE WHITE,6,2010-12-01 09:02:00,2.55,17850,United Kingdom,60,C543611,-12,2011-02-10 14:38:00,54579
99,536375,82494L,WOODEN FRAME ANTIQUE WHITE,6,2010-12-01 09:32:00,2.55,17850,United Kingdom,77,C543611,-12,2011-02-10 14:38:00,54579
103,536406,82494L,WOODEN FRAME ANTIQUE WHITE,6,2010-12-01 11:33:00,2.55,17850,United Kingdom,427,C543611,-12,2011-02-10 14:38:00,54579
105,536600,82494L,WOODEN FRAME ANTIQUE WHITE,6,2010-12-02 08:32:00,2.55,17850,United Kingdom,1939,C543611,-12,2011-02-10 14:38:00,54579
107,536609,82494L,WOODEN FRAME ANTIQUE WHITE,6,2010-12-02 09:41:00,2.55,17850,United Kingdom,2018,C543611,-12,2011-02-10 14:38:00,54579
...,...,...,...,...,...,...,...,...,...,...,...,...,...
19809,581325,16169E,WRAP 50'S CHRISTMAS,100,2011-12-08 11:53:00,0.42,15877,United Kingdom,398941,C581330,-100,2011-12-08 11:57:00,398960
19810,581325,22959,WRAP CHRISTMAS VILLAGE,25,2011-12-08 11:53:00,0.42,15877,United Kingdom,398942,C581330,-25,2011-12-08 11:57:00,398959
19813,581469,84029E,RED WOOLLY HOTTIE WHITE HEART.,1,2011-12-08 19:28:00,4.25,14606,United Kingdom,400841,C565848,-1,2011-09-07 12:48:00,236299
19814,581469,21485,RETROSPOT HEART HOT WATER BOTTLE,1,2011-12-08 19:28:00,4.95,14606,United Kingdom,400842,C565848,-1,2011-09-07 12:48:00,236300


In [442]:
transactionItemsRemovedByQty[transactionItemsRemovedByQty.duplicated()]

Unnamed: 0,InvoiceNo_completed,StockCode,Description,Quantity_completed,InvoiceDate_completed,UnitPrice,CustomerID,Country,idx_completed,InvoiceNo_canceled,Quantity_canceled,InvoiceDate_canceled,idx_canceled


In [441]:
transactionItemsRemovedByQty.idx_completed.value_counts()

200420    5
116812    5
98256     5
318520    5
258034    5
         ..
32591     1
32420     1
202204    1
335522    1
401091    1
Name: idx_completed, Length: 6068, dtype: int64

In [440]:
transactionItemsRemovedByQty.idx_completed.sort_values()

51           55
67           57
97           60
52           72
69           74
          ...  
19814    400842
10474    400899
19815    401091
1068     401484
17446    401526
Name: idx_completed, Length: 6579, dtype: int64

In [437]:
data[data.index.isin(transactionItemsRemovedByQty.idx_canceled.to_list())]

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
1405,C536543,22355,CHARLOTTE BAG SUKI DESIGN,-2,2010-12-01 14:30:00,0.85,17841,United Kingdom
1416,C536548,22168,ORGANISER WOOD ANTIQUE WHITE,-2,2010-12-01 14:33:00,8.50,12472,Germany
2154,C536622,22752,SET 7 BABUSHKA NESTING BOXES,-2,2010-12-02 10:37:00,8.50,12471,Germany
2226,C536625,22839,3 TIER CAKE TIN GREEN AND CREAM,-2,2010-12-02 10:46:00,14.95,14766,United Kingdom
2708,C536734,22780,LIGHT GARLAND BUTTERFILES PINK,-4,2010-12-02 12:50:00,4.25,16042,United Kingdom
...,...,...,...,...,...,...,...,...
399892,C581409,23462,ROCOCO WALL MIRROR WHITE,-1,2011-12-08 14:08:00,19.95,12476,Germany
399898,C581409,85127,SMALL SQUARE CUT GLASS CANDLESTICK,-5,2011-12-08 14:08:00,4.95,12476,Germany
400749,C581462,16219,HOUSE SHAPE PENCIL SHARPENER,-48,2011-12-08 18:51:00,0.06,12985,United Kingdom
400750,C581462,21642,ASSORTED TUTTI FRUTTI PEN,-72,2011-12-08 18:51:00,0.29,12985,United Kingdom


In [424]:
data.iloc[(transactionItemsRemovedByQty.idx_completed.to_list())]

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
60,536373,82494L,WOODEN FRAME ANTIQUE WHITE,6,2010-12-01 09:02:00,2.55,17850,United Kingdom
77,536375,82494L,WOODEN FRAME ANTIQUE WHITE,6,2010-12-01 09:32:00,2.55,17850,United Kingdom
427,536406,82494L,WOODEN FRAME ANTIQUE WHITE,6,2010-12-01 11:33:00,2.55,17850,United Kingdom
1939,536600,82494L,WOODEN FRAME ANTIQUE WHITE,6,2010-12-02 08:32:00,2.55,17850,United Kingdom
2018,536609,82494L,WOODEN FRAME ANTIQUE WHITE,6,2010-12-02 09:41:00,2.55,17850,United Kingdom
...,...,...,...,...,...,...,...,...
398941,581325,16169E,WRAP 50'S CHRISTMAS,100,2011-12-08 11:53:00,0.42,15877,United Kingdom
398942,581325,22959,WRAP CHRISTMAS VILLAGE,25,2011-12-08 11:53:00,0.42,15877,United Kingdom
400841,581469,84029E,RED WOOLLY HOTTIE WHITE HEART.,1,2011-12-08 19:28:00,4.25,14606,United Kingdom
400842,581469,21485,RETROSPOT HEART HOT WATER BOTTLE,1,2011-12-08 19:28:00,4.95,14606,United Kingdom


Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
398941,581325,16169E,WRAP 50'S CHRISTMAS,100,2011-12-08 11:53:00,0.42,15877,United Kingdom
398942,581325,22959,WRAP CHRISTMAS VILLAGE,25,2011-12-08 11:53:00,0.42,15877,United Kingdom
400841,581469,84029E,RED WOOLLY HOTTIE WHITE HEART.,1,2011-12-08 19:28:00,4.25,14606,United Kingdom
400842,581469,21485,RETROSPOT HEART HOT WATER BOTTLE,1,2011-12-08 19:28:00,4.95,14606,United Kingdom
401091,581483,23843,"PAPER CRAFT , LITTLE BIRDIE",80995,2011-12-09 09:15:00,2.08,16446,United Kingdom


In [433]:
data[data.index.isin(transactionItemsRemovedByQty.idx_completed.to_list())]

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
55,536373,21071,VINTAGE BILLBOARD DRINK ME MUG,6,2010-12-01 09:02:00,1.06,17850,United Kingdom
57,536373,82483,WOOD 2 DRAWER CABINET WHITE FINISH,2,2010-12-01 09:02:00,4.95,17850,United Kingdom
60,536373,82494L,WOODEN FRAME ANTIQUE WHITE,6,2010-12-01 09:02:00,2.55,17850,United Kingdom
72,536375,21071,VINTAGE BILLBOARD DRINK ME MUG,6,2010-12-01 09:32:00,1.06,17850,United Kingdom
74,536375,82483,WOOD 2 DRAWER CABINET WHITE FINISH,2,2010-12-01 09:32:00,4.95,17850,United Kingdom
...,...,...,...,...,...,...,...,...
400842,581469,21485,RETROSPOT HEART HOT WATER BOTTLE,1,2011-12-08 19:28:00,4.95,14606,United Kingdom
400899,581473,22992,REVOLVER WOODEN RULER,1,2011-12-08 19:57:00,1.95,12748,United Kingdom
401091,581483,23843,"PAPER CRAFT , LITTLE BIRDIE",80995,2011-12-09 09:15:00,2.08,16446,United Kingdom
401484,581579,22083,PAPER CHAIN KIT RETROSPOT,6,2011-12-09 12:19:00,2.95,17581,United Kingdom


In [380]:
vvv = ['536373', '82494L', 'WOODEN FRAME ANTIQUE WHITE', 6, '2010-12-01 09:02:00', 2.55, '17850', 'United Kingdom']
data[data.InvoiceNo.]

Series([], dtype: int64)

In [370]:
data[(
      data.InvoiceNo.isin(transactionItemsRemovedByQty.InvoiceNo_completed)&
      data.StockCode.isin(transactionItemsRemovedByQty.StockCode)&
      data.Description.isin(transactionItemsRemovedByQty.Description)&
      data.Quantity.isin(transactionItemsRemovedByQty.Quantity_completed)&
      data.InvoiceDate.isin(transactionItemsRemovedByQty.InvoiceDate_completed)&
      data.UnitPrice.isin(transactionItemsRemovedByQty.UnitPrice)&
      data.CustomerID.isin(transactionItemsRemovedByQty.CustomerID)&
      data.Country.isin(transactionItemsRemovedByQty.Country)
      )]

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
49,536373,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 09:02:00,2.55,17850,United Kingdom
50,536373,71053,WHITE METAL LANTERN,6,2010-12-01 09:02:00,3.39,17850,United Kingdom
51,536373,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 09:02:00,2.75,17850,United Kingdom
52,536373,20679,EDWARDIAN PARASOL RED,6,2010-12-01 09:02:00,4.95,17850,United Kingdom
54,536373,21871,SAVE THE PLANET MUG,6,2010-12-01 09:02:00,1.06,17850,United Kingdom
...,...,...,...,...,...,...,...,...
541880,581585,22727,ALARM CLOCK BAKELIKE RED,4,2011-12-09 12:31:00,3.75,15804,United Kingdom
541882,581585,21916,SET 12 RETRO WHITE CHALK STICKS,24,2011-12-09 12:31:00,0.42,15804,United Kingdom
541884,581585,84946,ANTIQUE SILVER T-LIGHT GLASS,12,2011-12-09 12:31:00,1.25,15804,United Kingdom
541886,581585,22398,MAGNETS PACK OF 4 SWALLOWS,12,2011-12-09 12:31:00,0.39,15804,United Kingdom


In [None]:

# dataReturnedABeforeCheckout = dataReturned[dataReturned['InvoiceDate_completed'] >= dataReturned['InvoiceDate_canceled']]
# dataReturnedAAfterCheckout = dataReturned[dataReturned['InvoiceDate_completed'] < dataReturned['InvoiceDate_canceled']]
# print(line)
# print(f'Returned Transactions\t\t\t: {len(dataReturned)}')
# print(line)
# print(f'Returned Transactions Before Checkout\t: {len(dataReturnedABeforeCheckout)}')
# print('Samples:')
# display(dataReturnedABeforeCheckout.sample(3))
# print(line)
# print(f'Returned Transactions After Checkout\t: {len(dataReturnedAAfterCheckout)}')
# print('Samples:')
# display(dataReturnedAAfterCheckout.sample(3))

In [334]:
dataCompleted = data[data.Quantity>0]
dataCanceled = data[data.Quantity<0]
dataReturned = pd.merge(dataCompleted, dataCanceled, how='inner',
                   on=['StockCode', 'Description', 'CustomerID', 'Country', 'UnitPrice'], 
                   suffixes=['_completed', '_canceled'])
dataReturnedABeforeCheckout = dataReturned[dataReturned['InvoiceDate_completed'] >= dataReturned['InvoiceDate_canceled']]
dataReturnedAAfterCheckout = dataReturned[dataReturned['InvoiceDate_completed'] < dataReturned['InvoiceDate_canceled']]
print(line)
print(f'Returned Transactions\t\t\t: {len(dataReturned)}')
print(line)
print(f'Returned Transactions Before Checkout\t: {len(dataReturnedABeforeCheckout)}')
print('Samples:')
display(dataReturnedABeforeCheckout.sample(3))
print(line)
print(f'Returned Transactions After Checkout\t: {len(dataReturnedAAfterCheckout)}')
print('Samples:')
display(dataReturnedAAfterCheckout.sample(3))

Returned Transactions			: 19816
Returned Transactions Before Checkout	: 6629
Samples:


Unnamed: 0,InvoiceNo_completed,StockCode,Description,Quantity_completed,InvoiceDate_completed,UnitPrice,CustomerID,Country,InvoiceNo_canceled,Quantity_canceled,InvoiceDate_canceled
4514,544398,20829,GLITTER HANGING BUTTERFLY STRING,4,2011-02-18 12:27:00,2.1,15311,United Kingdom,C537805,-1,2010-12-08 13:18:00
12581,581334,23148,MINIATURE ANTIQUE ROSE HOOK IVORY,3,2011-12-08 12:07:00,0.83,17841,United Kingdom,C577547,-7,2011-11-20 14:48:00
5039,568654,15056N,EDWARDIAN PARASOL NATURAL,3,2011-09-28 12:20:00,5.95,14911,EIRE,C539576,-6,2010-12-20 12:25:00


Returned Transactions After Checkout	: 13187
Samples:


Unnamed: 0,InvoiceNo_completed,StockCode,Description,Quantity_completed,InvoiceDate_completed,UnitPrice,CustomerID,Country,InvoiceNo_canceled,Quantity_canceled,InvoiceDate_canceled
11839,553390,85114B,IVORY ENCHANTED FOREST PLACEMAT,8,2011-05-16 16:41:00,1.65,17841,United Kingdom,C569233,-2,2011-10-02 15:26:00
12072,576665,21927,BLUE/CREAM STRIPE CUSHION COVER,2,2011-11-16 11:46:00,1.25,17841,United Kingdom,C578280,-4,2011-11-23 13:53:00
10606,568225,21232,STRAWBERRY CERAMIC TRINKET BOX,12,2011-09-26 10:35:00,1.25,13014,United Kingdom,C568582,-3,2011-09-28 10:02:00


In [340]:
dataCompleted = data[data.Quantity>0]
dataCanceled = data[data.Quantity<0]
dataReturned = pd.merge(dataCompleted, dataCanceled, how='inner',
                   on=['StockCode', 'Description', 'CustomerID', 'Country', 'UnitPrice'], 
                   suffixes=['_completed', '_canceled'])
dataReturnedABeforeCheckout = dataReturned[dataReturned['InvoiceDate_completed'] >= dataReturned['InvoiceDate_canceled']]
dataReturnedAAfterCheckout = dataReturned[dataReturned['InvoiceDate_completed'] < dataReturned['InvoiceDate_canceled']]
print(line)
print(f'Returned Transactions\t\t\t: {len(dataReturned)}')
print(line)
print(f'Returned Transactions Before Checkout\t: {len(dataReturnedABeforeCheckout)}')
print('Samples:')
display(dataReturnedABeforeCheckout.sample(3))
print(line)
print(f'Returned Transactions After Checkout\t: {len(dataReturnedAAfterCheckout)}')
print('Samples:')
display(dataReturnedAAfterCheckout.sample(3))

Returned Transactions			: 19816
Returned Transactions Before Checkout	: 6629
Samples:


Unnamed: 0,InvoiceNo_completed,StockCode,Description,Quantity_completed,InvoiceDate_completed,UnitPrice,CustomerID,Country,InvoiceNo_canceled,Quantity_canceled,InvoiceDate_canceled
1397,579910,22423,REGENCY CAKESTAND 3 TIER,3,2011-12-01 08:52:00,12.75,14911,EIRE,C563600,-1,2011-08-18 06:24:00
9681,565478,82483,WOOD 2 DRAWER CABINET WHITE FINISH,1,2011-09-05 11:18:00,6.95,15005,United Kingdom,C551285,-1,2011-04-27 14:07:00
5553,570388,22961,JAM MAKING SET PRINTED,12,2011-10-10 12:37:00,1.45,14911,EIRE,C557874,-1,2011-06-23 13:05:00


Returned Transactions After Checkout	: 13187
Samples:


Unnamed: 0,InvoiceNo_completed,StockCode,Description,Quantity_completed,InvoiceDate_completed,UnitPrice,CustomerID,Country,InvoiceNo_canceled,Quantity_canceled,InvoiceDate_canceled
2584,538900,21463,MIRRORED DISCO BALL,2,2010-12-15 09:51:00,5.95,13078,United Kingdom,C549898,-2,2011-04-12 16:00:00
13454,552950,22796,PHOTO FRAME 3 CLASSIC HANGING,2,2011-05-12 12:01:00,9.95,16116,United Kingdom,C553569,-2,2011-05-18 09:23:00
4599,548673,22423,REGENCY CAKESTAND 3 TIER,6,2011-04-01 16:18:00,12.75,13767,United Kingdom,C575662,-5,2011-11-10 14:48:00


Terdapat transaksi refund dengan jumlah produk yang direfund lebih besar dibandingkan dengan jumlah produk yang dibeli. Hal ini bisa saja dikarenakan oleh guaranteed, human error, atau apa pun. Namun, data kurang informasi mengenai hal tersebut. Jumlah transaksi hanya 2 x 1197 = 2394. Oleh karena itu dapat diremoved untuk menghindari inkonsistensi data.

In [330]:
dataCanceled

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
141,C536379,D,Discount,-1,2010-12-01 09:41:00,27.50,14527,United Kingdom
154,C536383,35004C,SET OF 3 COLOURED FLYING DUCKS,-1,2010-12-01 09:49:00,4.65,15311,United Kingdom
235,C536391,22556,PLASTERS IN TIN CIRCUS PARADE,-12,2010-12-01 10:24:00,1.65,17548,United Kingdom
236,C536391,21984,PACK OF 12 PINK PAISLEY TISSUES,-24,2010-12-01 10:24:00,0.29,17548,United Kingdom
237,C536391,21983,PACK OF 12 BLUE PAISLEY TISSUES,-24,2010-12-01 10:24:00,0.29,17548,United Kingdom
...,...,...,...,...,...,...,...,...
540449,C581490,23144,ZINC T-LIGHT HOLDER STARS SMALL,-11,2011-12-09 09:57:00,0.83,14397,United Kingdom
541541,C581499,M,Manual,-1,2011-12-09 10:28:00,224.69,15498,United Kingdom
541715,C581568,21258,VICTORIAN SEWING BOX LARGE,-5,2011-12-09 11:57:00,10.95,15311,United Kingdom
541716,C581569,84978,HANGING HEART JAR T-LIGHT HOLDER,-1,2011-12-09 11:58:00,1.25,17315,United Kingdom


In [333]:
dataReturnedQtyErr = dataReturned[dataReturned.Quantity_completed >= np.abs(dataReturned.Quantity_canceled)]
dataReturnedQtyErr

Unnamed: 0,InvoiceNo_completed,StockCode,Description,Quantity_completed,InvoiceDate_completed,UnitPrice,CustomerID,Country,InvoiceNo_canceled,Quantity_canceled,InvoiceDate_canceled
0,536365,21730,GLASS STAR FROSTED T-LIGHT HOLDER,6,2010-12-01 08:26:00,4.25,17850,United Kingdom,C543611,-1,2011-02-10 14:38:00
1,536373,21730,GLASS STAR FROSTED T-LIGHT HOLDER,6,2010-12-01 09:02:00,4.25,17850,United Kingdom,C543611,-1,2011-02-10 14:38:00
2,536375,21730,GLASS STAR FROSTED T-LIGHT HOLDER,6,2010-12-01 09:32:00,4.25,17850,United Kingdom,C543611,-1,2011-02-10 14:38:00
3,536396,21730,GLASS STAR FROSTED T-LIGHT HOLDER,6,2010-12-01 10:51:00,4.25,17850,United Kingdom,C543611,-1,2011-02-10 14:38:00
4,536406,21730,GLASS STAR FROSTED T-LIGHT HOLDER,6,2010-12-01 11:33:00,4.25,17850,United Kingdom,C543611,-1,2011-02-10 14:38:00
...,...,...,...,...,...,...,...,...,...,...,...
19811,581469,51014C,"FEATHER PEN,COAL BLACK",12,2011-12-08 19:28:00,0.39,14606,United Kingdom,C570331,-1,2011-10-10 12:18:00
19812,581469,51014C,"FEATHER PEN,COAL BLACK",12,2011-12-08 19:28:00,0.39,14606,United Kingdom,C574026,-1,2011-11-02 12:26:00
19813,581469,84029E,RED WOOLLY HOTTIE WHITE HEART.,1,2011-12-08 19:28:00,4.25,14606,United Kingdom,C565848,-1,2011-09-07 12:48:00
19814,581469,21485,RETROSPOT HEART HOT WATER BOTTLE,1,2011-12-08 19:28:00,4.95,14606,United Kingdom,C565848,-1,2011-09-07 12:48:00


##### **Drop Returned Quantity > Buying Quantity**

In [305]:
dataReturnedQtyErr

Unnamed: 0,InvoiceNo_completed,StockCode,Description,Quantity_completed,InvoiceDate_completed,UnitPrice,CustomerID,Country,InvoiceNo_canceled,Quantity_canceled,InvoiceDate_canceled
97,536373,82494L,WOODEN FRAME ANTIQUE WHITE,6,2010-12-01 09:02:00,2.55,17850,United Kingdom,C543611,-12,2011-02-10 14:38:00
99,536375,82494L,WOODEN FRAME ANTIQUE WHITE,6,2010-12-01 09:32:00,2.55,17850,United Kingdom,C543611,-12,2011-02-10 14:38:00
103,536406,82494L,WOODEN FRAME ANTIQUE WHITE,6,2010-12-01 11:33:00,2.55,17850,United Kingdom,C543611,-12,2011-02-10 14:38:00
105,536600,82494L,WOODEN FRAME ANTIQUE WHITE,6,2010-12-02 08:32:00,2.55,17850,United Kingdom,C543611,-12,2011-02-10 14:38:00
107,536609,82494L,WOODEN FRAME ANTIQUE WHITE,6,2010-12-02 09:41:00,2.55,17850,United Kingdom,C543611,-12,2011-02-10 14:38:00
...,...,...,...,...,...,...,...,...,...,...,...
19684,578088,21098,CHRISTMAS TOILET ROLL,2,2011-11-22 16:47:00,1.25,16376,United Kingdom,C579948,-3,2011-12-01 10:56:00
19685,578088,21098,CHRISTMAS TOILET ROLL,1,2011-11-22 16:47:00,1.25,16376,United Kingdom,C579948,-3,2011-12-01 10:56:00
19724,578936,22952,60 CAKE CASES VINTAGE CHRISTMAS,5,2011-11-27 13:00:00,0.55,16923,United Kingdom,C542413,-24,2011-01-27 17:11:00
19725,578936,21485,RETROSPOT HEART HOT WATER BOTTLE,2,2011-11-27 13:00:00,4.95,16923,United Kingdom,C542413,-6,2011-01-27 17:11:00


In [327]:
data[(data.InvoiceNo.isin(dataReturnedQtyErr.InvoiceNo_completed)&
      data.StockCode.isin(dataReturnedQtyErr.StockCode)&
      data.Description.isin(dataReturnedQtyErr.Description)&
      data.Quantity.isin(dataReturnedQtyErr.Quantity_completed)&
      data.InvoiceDate.isin(dataReturnedQtyErr.InvoiceDate_completed)&
      data.UnitPrice.isin(dataReturnedQtyErr.UnitPrice)&
      data.CustomerID.isin(dataReturnedQtyErr.CustomerID)&
      data.Country.isin(dataReturnedQtyErr.Country)
      )]

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
49,536373,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 09:02:00,2.55,17850,United Kingdom
52,536373,20679,EDWARDIAN PARASOL RED,6,2010-12-01 09:02:00,4.95,17850,United Kingdom
54,536373,21871,SAVE THE PLANET MUG,6,2010-12-01 09:02:00,1.06,17850,United Kingdom
57,536373,82483,WOOD 2 DRAWER CABINET WHITE FINISH,2,2010-12-01 09:02:00,4.95,17850,United Kingdom
59,536373,82482,WOODEN PICTURE FRAME WHITE FINISH,6,2010-12-01 09:02:00,2.10,17850,United Kingdom
...,...,...,...,...,...,...,...,...
540235,581473,21055,TOOL BOX SOFT TOY,1,2011-12-08 19:57:00,8.95,12748,United Kingdom
540240,581473,22086,PAPER CHAIN KIT 50'S CHRISTMAS,3,2011-12-08 19:57:00,2.95,12748,United Kingdom
540243,581473,23007,SPACEBOY BABY GIFT SET,1,2011-12-08 19:57:00,16.95,12748,United Kingdom
540245,581473,23080,RED METAL BOX TOP SECRET,1,2011-12-08 19:57:00,8.25,12748,United Kingdom


In [328]:
data[(data.InvoiceNo.isin(dataReturnedQtyErr.InvoiceNo_canceled)&
      data.StockCode.isin(dataReturnedQtyErr.StockCode)&
      data.Description.isin(dataReturnedQtyErr.Description)&
      data.Quantity.isin(dataReturnedQtyErr.Quantity_canceled)&
      data.InvoiceDate.isin(dataReturnedQtyErr.InvoiceDate_canceled)&
      data.UnitPrice.isin(dataReturnedQtyErr.UnitPrice)&
      data.CustomerID.isin(dataReturnedQtyErr.CustomerID)&
      data.Country.isin(dataReturnedQtyErr.Country)
      )]

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
1442,C536543,22355,CHARLOTTE BAG SUKI DESIGN,-2,2010-12-01 14:30:00,0.85,17841,United Kingdom
3410,C536625,22839,3 TIER CAKE TIN GREEN AND CREAM,-2,2010-12-02 10:46:00,14.95,14766,United Kingdom
4836,C536807,22501,PICNIC BASKET WICKER LARGE,-2,2010-12-02 16:45:00,9.95,15834,United Kingdom
21467,C538083,85099B,JUMBO BAG RED RETROSPOT,-3,2010-12-09 14:40:00,1.95,15750,United Kingdom
21468,C538083,22633,HAND WARMER UNION JACK,-4,2010-12-09 14:40:00,2.10,15750,United Kingdom
...,...,...,...,...,...,...,...,...
533838,C581117,23311,VINTAGE CHRISTMAS STOCKING,-2,2011-12-07 12:24:00,2.55,16393,United Kingdom
536911,C581228,22423,REGENCY CAKESTAND 3 TIER,-6,2011-12-08 10:06:00,10.95,16019,United Kingdom
536913,C581228,82494L,WOODEN FRAME ANTIQUE WHITE,-6,2011-12-08 10:06:00,2.95,16019,United Kingdom
536914,C581228,22781,GUMBALL MAGAZINE RACK,-24,2011-12-08 10:06:00,6.75,16019,United Kingdom


In [293]:
data[data.InvoiceNo=='562099' & StockCode='']

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
287237,562099,21930,JUMBO STORAGE BAG SKULLS,10,2011-08-02 13:46:00,2.08,14688,United Kingdom
287238,562099,21931,JUMBO STORAGE BAG SUKI,10,2011-08-02 13:46:00,2.08,14688,United Kingdom
287239,562099,22386,JUMBO BAG PINK POLKADOT,10,2011-08-02 13:46:00,2.08,14688,United Kingdom
287240,562099,23201,JUMBO BAG ALPHABET,10,2011-08-02 13:46:00,2.08,14688,United Kingdom
287241,562099,85099C,JUMBO BAG BAROQUE BLACK WHITE,10,2011-08-02 13:46:00,2.08,14688,United Kingdom
287242,562099,85099F,JUMBO BAG STRAWBERRY,10,2011-08-02 13:46:00,2.08,14688,United Kingdom
287243,562099,85099B,JUMBO BAG RED RETROSPOT,10,2011-08-02 13:46:00,2.08,14688,United Kingdom
287244,562099,23199,JUMBO BAG APPLES,10,2011-08-02 13:46:00,2.08,14688,United Kingdom
287245,562099,22380,TOY TIDY SPACEBOY,5,2011-08-02 13:46:00,2.1,14688,United Kingdom
287246,562099,21934,SKULL SHOULDER BAG,10,2011-08-02 13:46:00,1.65,14688,United Kingdom


In [296]:
ea = dataReturnedQtyErr[['InvoiceNo_completed', 'InvoiceNo_canceled', 'StockCode']]
ea

Unnamed: 0,InvoiceNo_completed,InvoiceNo_canceled,StockCode
97,536373,C543611,82494L
99,536375,C543611,82494L
103,536406,C543611,82494L
105,536600,C543611,82494L
107,536609,C543611,82494L
...,...,...,...
19684,578088,C579948,21098
19685,578088,C579948,21098
19724,578936,C542413,22952
19725,578936,C542413,21485


In [304]:
for x in list(zip(ea.InvoiceNo_completed, ea.StockCode)):
  print(x)

('536373', '82494L')
('536375', '82494L')
('536406', '82494L')
('536600', '82494L')
('536609', '82494L')
('536612', '82494L')
('536628', '82494L')
('536630', '82494L')
('536685', '82494L')
('536690', '82494L')
('536750', '82494L')
('536752', '82494L')
('536787', '82494L')
('536790', '82494L')
('536378', '20723')
('543400', '20723')
('545290', '20723')
('562099', '20723')
('536378', '84991')
('544105', '22469')
('538998', '20719')
('540648', '20719')
('541488', '20719')
('545412', '20719')
('546527', '20719')
('546540', '20719')
('547419', '20719')
('548406', '20719')
('536783', '85178')
('538998', '85178')
('562286', '22659')
('580873', '22659')
('536800', '84378')
('536804', '22423')
('539443', '22423')
('540396', '22423')
('554974', '22423')
('562534', '22423')
('572274', '22423')
('577393', '22423')
('536831', '22207')
('540029', '22207')
('541610', '22207')
('543247', '22207')
('547101', '22207')
('553469', '22207')
('554364', '22207')
('555585', '22207')
('559419', '22207')
('5616