# UK Retail Store Analysis

This is a transactional data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail. The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers.

### Variables

| Variable Name | Role | Type | Description	Units | Missing Values |
|---------------|------|------|-------------------|----------------|
| InvoiceNo | ID | Categorical | A 6-digit integral number uniquely assigned to each transaction. If this code starts with letter 'c', it indicates a cancellation | No |
| StockCode | ID | Categorical | A 5-digit integral number uniquely assigned to each distinct product | No |
| Description | ID | Categorical | Product name | No |
| Quantity | Feature | Integer | The quantities of each product (item) per transaction | No |
| InvoiceDate | Feature | Date | The dat and time when each transaction was generated | No |
| UnitPrice | Feature | Continuous | Product price per unit sterling | No |
| CustomerID | Feature | Categorical | A 5-digit integral number uniquely assigned ot each customer | No |
| Country | Feature | Categorical | The name of the country where each customer resides | No |

### Additional Variable Information
- InvoiceNo: Invoice number. Nominal, A 6-digit integral number uniquely assigned to each transaction. **If this code starts with letter 'c', it indicates a cancellation.** 
- StockCode: Product (item) code. Nominal, a 5-digit integral number uniquely assigned to each distinct product.
- Description: Product (item) name. Nominal.
- Quantity: The quantities of each product (item) per transaction. Numeric.	
- InvoiceDate: Invoice Date and time. Numeric, The day and time when each transaction was generated.
- UnitPrice: Unit price. Numeric, Product price per unit in sterling.
- CustomerID: Customer number. Nominal, A 5-digit integral number uniquely assigned to each customer.
- Country: Country name. Nominal, The name of the country where each customer resides.

## Loading the data

In [None]:
'''

!pip install numpy
!pip install pandas
!pip install matplotlib
!pip install openpyxl

'''



In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [7]:
#Load data
retail = pd.read_excel("./data/online_retail.xlsx")
retail.head()

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom


In [8]:
#Basic data features
retail.shape

(541909, 8)

In [12]:
retail.dtypes

InvoiceNo              object
StockCode              object
Description            object
Quantity                int64
InvoiceDate    datetime64[ns]
UnitPrice             float64
CustomerID            float64
Country                object
dtype: object

In [17]:
retail.isna().sum() #135080 transactions where we don't know the customer -- should remove these

InvoiceNo           0
StockCode           0
Description      1454
Quantity            0
InvoiceDate         0
UnitPrice           0
CustomerID     135080
Country             0
dtype: int64

### Examine InvoiceNo

In [36]:
#Examine InvoiceNo --> check how many "c" numbers there are 
retail.InvoiceNo = retail.InvoiceNo.astype(str)

c_counter = 0
for i in retail.InvoiceNo.unique():
    if "c" in i.lower():
        c_counter +=1

print(c_counter)

3836


In [55]:
#Check unique starting codes for InvoiceNo
only_letters_InvoiceNo = []
for number in retail.InvoiceNo:
    for element in number:
        if element.isdigit():
            number = number.replace(element, "")
        else:
            pass
    only_letters_InvoiceNo.append(number)
    
unique_letters = list(set(only_letters_InvoiceNo))

unique_letters #We have an A and C prefix -- A wasn't mentioned in data dictionary

['', 'A', 'C']

In [65]:
retail[retail['InvoiceNo'].str.contains('A')].head(10) #Looks like there's only 3 and they are debt adjustments
#We should remove these before analyzing customer segments

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
299982,A563185,B,Adjust bad debt,1,2011-08-12 14:50:00,11062.06,,United Kingdom
299983,A563186,B,Adjust bad debt,1,2011-08-12 14:51:00,-11062.06,,United Kingdom
299984,A563187,B,Adjust bad debt,1,2011-08-12 14:52:00,-11062.06,,United Kingdom


### Examine StockCode

In [69]:
retail["StockCode"].head(10) #Looks like some have numbers and some do not?

0    85123A
1     71053
2    84406B
3    84029G
4    84029E
5     22752
6     21730
7     22633
8     22632
9     84879
Name: StockCode, dtype: object