## Task : Download the 'Portugal_online_retail', 'Sweden_online_retail, and 'UK_online_retail' datasets. Apply the apriori algorithm to all datasets using three different confidence levels. Select one confidence level for each dataset that you think works better. Determine the first three most important rules for each dataset using the selected confidence level and report them in the report cell. Explain what each rule means.

In [12]:
############### Write your code in this cell (If applicable) ##################
# Loading the necessary libraries
import numpy as np
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

# Set display options to show all columns and rows without truncation
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# Loading the datasets
portugal_df = pd.read_csv('C:/Users/muhab/Downloads/Portugal_online_retail.csv')
sweden_df = pd.read_csv('C:/Users/muhab/Downloads/Sweden_online_retail.csv')
uk_df = pd.read_csv('C:/Users/muhab/Downloads/UK_online_retail.csv')


portugal_df=portugal_df.drop('InvoiceNo',axis=1)
sweden_df=sweden_df.drop('InvoiceNo',axis=1)
uk_df=uk_df.drop('InvoiceNo',axis=1)

# Define different confidence levels
#confidence_levels = [0.5, 0.6, 0.7]
confidence_levels = [0.6]

# Apply the Apriori algorithm with different confidence levels for each dataset
for confidence_level in confidence_levels:
    print(f"Confidence Level: {confidence_level}")
    
    # Apply Apriori algorithm
    portugal_frequent_itemsets = apriori(portugal_df, min_support=0.05, use_colnames=True)
    portugal_rules = association_rules(portugal_frequent_itemsets, metric="confidence", min_threshold=confidence_level)
    
    sweden_frequent_itemsets = apriori(sweden_df, min_support=0.05, use_colnames=True)
    sweden_rules = association_rules(sweden_frequent_itemsets, metric="confidence", min_threshold=confidence_level)
    
    uk_frequent_itemsets = apriori(uk_df, min_support=0.03, use_colnames=True)
    uk_rules = association_rules(uk_frequent_itemsets, metric="confidence", min_threshold=confidence_level)
    
    # Display the results
    #print("Portugal Rules:")
    #print(portugal_rules.sort_values(by=['confidence'],ascending=False))
    #print("\nSweden Rules:")
    #print(sweden_rules.sort_values(by=['confidence'],ascending=False))
    #print("\nUK Rules:")
    #print(uk_rules.sort_values(by=['confidence'],ascending=False))
    
    
# Display the top three rules for each dataset
print("\nPortugal Rules:")
print(portugal_rules.sort_values(by=['confidence'], ascending=False).head(3))
print("\nSweden Rules:")
print(sweden_rules.sort_values(by=['confidence'], ascending=False).head(3))
print("\nUK Rules:")
print(uk_rules.sort_values(by=['confidence'], ascending=False).head(3))



Confidence Level: 0.6





Portugal Rules:
                                              antecedents  \
122269  (JUMBO SHOPPER VINTAGE RED PAISLEY, JUMBO  BAG...   
114844  (JUMBO SHOPPER VINTAGE RED PAISLEY, PACK OF 12...   
114836  (PACK OF 12 RED RETROSPOT TISSUES, RETROSPOT T...   

                                              consequents  antecedent support  \
122269  (LUNCH BAG CARS BLUE, CHARLOTTE BAG SUKI DESIG...            0.051724   
114844  (PLASTERS IN TIN VINTAGE PAISLEY, RETROSPOT TE...            0.051724   
114836  (JUMBO SHOPPER VINTAGE RED PAISLEY, PLASTERS I...            0.051724   

        consequent support   support  confidence       lift  leverage  \
122269            0.051724  0.051724         1.0  19.333333  0.049049   
114844            0.068966  0.051724         1.0  14.500000  0.048157   
114836            0.068966  0.051724         1.0  14.500000  0.048157   

        conviction  zhangs_metric  
122269         inf       1.000000  
114844         inf       0.981818  
114836      

######################## REPORT #############################

I firstly applied three confidence to all the datasets which are 0.5, 0.6 and 0.7. Based on the result I got, it seemed that a confidence level of 0.5 or 0.6 might work best as they yield a larger number of rules with high confidence (1.0).

However, I selected 0.6 because I traded off more rules for more confidence as using 0.6 results to slightly fewer rules but with a slightly higher confidence threshold whereas using 0.5 results more rules but lesser confidence.

Subsequentlty, I used the 'head(3)' function to extract the first three most important rules for each dataset using 0.6 confidence level.

PORTUGAL RULES:

The first rule suggests that customers who purchase both the "JUMBO SHOPPER VINTAGE RED PAISLEY" and "JUMBO BAG WOODLAND ANIMALS" are highly likely to buy the "LUNCH BAG CARS BLUE" and "CHARLOTTE BAG SUKI DESIGN" as well, with a confidence of 1.0 (100%).

The second rule indicates that customers who buy the "JUMBO SHOPPER VINTAGE RED PAISLEY" and "PACK OF 12 RED RETROSPOT TISSUES" are highly likely to purchase the "PLASTERS IN TIN VINTAGE PAISLEY" and "RETROSPOT TEA SET CERAMIC 11 PC" with a confidence of 1.0 (100%).

The third rule shows that customers who purchase the "PACK OF 12 RED RETROSPOT TISSUES" and "RETROSPOT TEA SET CERAMIC 11 PC" are highly likely to buy the "JUMBO SHOPPER VINTAGE RED PAISLEY" and "PLASTERS IN TIN VINTAGE PAISLEY" as well, with a confidence of 1.0 (100%).

SWEDEN RULES:

In Sweden, the rule suggests that customers who buy the "12 PENCILS SMALL TUBE SKULL" are highly likely to purchase the "PACK OF 72 SKULL CAKE CASES" as well, with a confidence of 1.0 (100%).

The second rule indicates that customers who purchase both the "MINI PAINT SET VINTAGE" and "PACK OF 72 RETROSPOT CAKE CASES" are highly likely to buy the "60 CAKE CASES DOLLY GIRL DESIGN" and "RETROSPOT TEA SET CERAMIC 11 PC" as well, with a confidence of 1.0 (100%).

The third rule shows that customers who buy the "60 CAKE CASES DOLLY GIRL DESIGN" and "RETROSPOT TEA SET CERAMIC 11 PC" are highly likely to purchase the "PACK OF 72 RETROSPOT CAKE CASES" and "BAG 250g SWIRLY MARBLES" as well, with a confidence of 1.0 (100%).

UK RULES:

The first rule in the UK dataset suggests that customers who purchase the "PINK REGENCY TEACUP AND SAUCER" are highly likely to buy the "GREEN REGENCY TEACUP AND SAUCER" as well, with a confidence of 0.82 (82%).

The second rule indicates that customers who buy the "GREEN REGENCY TEACUP AND SAUCER" are highly likely to purchase the "ROSES REGENCY TEACUP AND SAUCER" as well, with a confidence of 0.75 (75%).

The third rule shows that customers who purchase the "ROSES REGENCY TEACUP AND SAUCER" are highly likely to buy the "GREEN REGENCY TEACUP AND SAUCER" as well, with a confidence of 0.73 (73%).

These rules provide insights into purchasing patterns and associations between different products in each dataset.

