Domain: E-commerce Retail Transactions

Objective:

The objective is to extract meaningful patterns, trends, and insights from customer transactions data to improve business strategies, enhance customer experience, and optimize operations.
Data Features:

Customer ID: Unique identifier for each customer.
Age: Age of the customer.
Gender: Gender of the customer.
Item Purchased: Description of the purchased item.
Category: Category to which the item belongs.
Purchase Amount (USD): The amount spent on the purchase.
Location: Geographic location of the customer.
Size: Size of the purchased item.
Color: Color of the purchased item.
Season: Season associated with the purchase.
Review Rating: Rating provided by the customer for the purchased item.
Subscription Status: Indicates whether the customer is subscribed to any service.
Payment Method: Method used by the customer to make the payment.
Shipping Type: Type of shipping chosen by the customer.
Discount Applied: Whether a discount was applied to the purchase.
Promo Code Used: Indicates whether a promo code was used.
Previous Purchases: Number of previous purchases made by the customer.
Preferred Payment Method: Customer's preferred payment method.
Frequency of Purchases: How frequently the customer makes purchases

ecommendation system  specifically a hybrid recommendation system that combines 
clustering and collaborative filtering. It aims to provide personalized product recommendations to 
customers based on their behavior and preferences, as well as similarities with other customers in the same cluster.



In [1]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from mlxtend.frequent_patterns import apriori, association_rules

# Load the dataset
df = pd.read_csv('shopping_trends.csv')
df
# Data Preprocessing



FileNotFoundError: [Errno 2] No such file or directory: 'shopping_trends.csv'

In [None]:



# Convert 'Frequency of Purchases' to a numeric format?
df






In [None]:
label_encoder = LabelEncoder()
categorical_cols = ['Gender', 'Location', 'Subscription Status', 'Payment Method', 'Item Purchased', 'Shipping Type', 'Promo Code Used', 'Preferred Payment Method', 'Frequency of Purchases']
for col in categorical_cols:
    df[col] = label_encoder.fit_transform(df[col])

# Convert 'Frequency of Purchases' to a numeric format
df['Frequency of Purchases'] = pd.to_numeric(df['Frequency of Purchases'])
df


In [None]:
# Prepare the data for clustering
X_clustering = df[['Age', 'Review Rating', 'Previous Purchases', 'Frequency of Purchases']]

# Apply K-Means clustering
num_clusters = 5  # Adjust the number of clusters as needed
kmeans = KMeans(n_clusters=num_clusters, random_state=42)
df['Cluster'] = kmeans.fit_predict(X_clustering)

# Prepare the data for product recommendation within a specific cluster
cluster_number = 2  # Choose the cluster for which you want to make recommendations
cluster_data = df[df['Cluster'] == cluster_number]

# Prepare the data for collaborative filtering (user-item matrix)
user_item_matrix = cluster_data.pivot_table(index='Customer ID', columns='Item Purchased', values='Purchase Amount (USD)', fill_value=0)

# Reset index to convert 'Item Purchased' back to a regular column
user_item_matrix.reset_index(inplace=True)

# Collaborative Filtering: Use RandomForestRegressor as an example
model = RandomForestRegressor(n_estimators=100, random_state=42)

# Fit the model
target_columns = user_item_matrix.columns[1:]  # Exclude 'Customer ID'
X_train = user_item_matrix.drop(target_columns, axis=1)
y_train = user_item_matrix[target_columns]
model.fit(X_train, y_train)

# Make predictions for the purchase amounts of all products
predicted_purchase_all_products = model.predict(X_train)

# Select the top products based on predicted purchase amounts
top_products = user_item_matrix.columns[1:][predicted_purchase_all_products.mean(axis=0).argsort()[::-1][:5]]  


# Visualize the results
plt.figure(figsize=(10, 6))
plt.bar(top_products, predicted_purchase_all_products.mean(axis=0)[predicted_purchase_all_products.mean(axis=0).argsort()[::-1][:5]])
plt.xlabel('Product')
plt.ylabel('Average Predicted Purchase Amount')
plt.title('Top Recommended Products')
plt.show()

In [None]:
# Encode categorical variables

# Convert 'Frequency of Purchases' to a numeric format
df['Frequency of Purchases'] = pd.to_numeric(df['Frequency of Purchases'], errors='coerce')

# Select relevant columns for Apriori
apriori_df = df[['Gender', 'Location', 'Subscription Status', 'Payment Method', 'Item Purchased', 'Shipping Type', 'Promo Code Used', 'Preferred Payment Method']]

# Convert categorical columns to string type for Apriori
apriori_df = apriori_df.astype(str)

# Apply Apriori algorithm
frequent_itemsets = apriori(apriori_df, min_support=0.1, use_colnames=True)

# Generate association rules
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)

# Display the generated rules
print(rules)

In [None]:


df = pd.read_csv('shopping_trends.csv')

# Encoding categorical variables
df_encoded = pd.get_dummies(df[['Item Purchased', 'Category', 'Shipping Type', 'Discount Applied', 'Promo Code Used']])

# Concatenate encoded variables with the original DataFrame
df = pd.concat([df[['Customer ID']], df_encoded], axis=1)

# Apriori algorithm
frequent_itemsets = apriori(df_encoded, min_support=0.5, use_colnames=True)

# Generate association rules
rules = association_rules(frequent_itemsets, metric='confidence', min_threshold=0.7)

# Display the association rules
print("Association Rules:")
print(rules[['antecedents', 'consequents', 'confidence']])

# Function to recommend products based on a given item
def recommend_products(item, rules):
    recommended_products = set()
    for index, row in rules.iterrows():
        if f'{item}_1' in row['antecedents']:
            recommended_products.update(row['consequents'])
    return recommended_products

# Function to recommend products for all items
def recommend_products_for_all(df, rules):
    all_recommendations = {}
    unique_items = df_encoded.columns
    
    for item in unique_items:
        recommended_products = recommend_products(item, rules)
        all_recommendations[item] = recommended_products
    
    return all_recommendations

# Example: Recommend products for all items
all_recommendations = recommend_products_for_all(df, rules)

print("\nProducts recommended for each item:")
for item, recommendations in all_recommendations.items():
    print(f"\nProducts recommended for '{item}': {recommendations}")


In [None]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from mlxtend.frequent_patterns import apriori, association_rules

# Sample Data
data = {
    'Customer ID': [1, 2, 3, 4],
    'Age': [55, 19, 50, 21],
    'Gender': ['Male', 'Male', 'Male', 'Male'],
    'Item Purchased': ['Blouse', 'Sweater', 'Jeans', 'Sandals'],
    'Category': ['Clothing', 'Clothing', 'Clothing', 'Footwear'],
    'Purchase Amount (USD)': [53, 64, 73, 90],
    'Location': ['Kentucky', 'Maine', 'Massachusetts', 'Rhode Island'],
    'Size': ['L', 'L', 'S', 'M'],
    'Color': ['Gray', 'Maroon', 'Maroon', 'Maroon'],
    'Season': ['Winter', 'Winter', 'Spring', 'Spring'],
    'Review Rating': [3.1, 3.1, 3.1, 3.5],
    'Subscription Status': ['Yes', 'Yes', 'Yes', 'Yes'],
    'Payment Method': ['Credit Card', 'Bank Transfer', 'Cash', 'PayPal'],
    'Shipping Type': ['Express', 'Express', 'Free Shipping', 'Next Day Air'],
    'Discount Applied': ['Yes', 'Yes', 'Yes', 'Yes'],
    'Promo Code Used': ['Yes', 'Yes', 'Yes', 'Yes'],
    'Previous Purchases': [14, 2, 23, 49],
    'Preferred Payment Method': ['Venmo', 'Cash', 'Credit Card', 'PayPal'],
    'Frequency of Purchases': ['Fortnightly', 'Fortnightly', 'Weekly', 'Weekly']
}

df = pd.DataFrame(data)

# Apply label encoding to categorical columns
label_encoder = LabelEncoder()
categorical_cols = ['Gender', 'Location', 'Subscription Status', 'Payment Method', 'Item Purchased', 
                    'Shipping Type', 'Discount Applied', 'Promo Code Used', 'Preferred Payment Method', 
                    'Frequency of Purchases']
for col in categorical_cols:
    df[col] = label_encoder.fit_transform(df[col])

# Drop unnecessary columns
df = df[['Customer ID', 'Purchase Amount (USD)', 'Discount Applied', 'Promo Code Used']]

# Apriori algorithm
frequent_itemsets = apriori(df, min_support=0.1, use_colnames=True)

# Generate association rules
rules = association_rules(frequent_itemsets, metric='confidence', min_threshold=0.7)

# Display the association rules
print("Association Rules:")
print(rules[['antecedents', 'consequents', 'confidence']])

# Filter rules related to Promo Code
promo_rules = rules[rules['antecedents'].apply(lambda x: 'Promo Code Used' in x)]

# Display rules related to Promo Code
print("\nAssociation Rules for Promo Code:")
print(promo_rules[['antecedents', 'consequents', 'confidence']])


In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from mlxtend.frequent_patterns import apriori, association_rules

# Load the dataset
df = pd.read_csv('shopping_trends.csv')

# Encoding categorical variables
df_encoded = pd.get_dummies(df[['Discount Applied', 'Promo Code Used', 'Item Purchased']])

# Discretize 'Purchase Amount (USD)' into bins
df['Purchase Amount (USD)'] = pd.cut(df['Purchase Amount (USD)'], bins=[0, 50, 100, 150], labels=['Low', 'Medium', 'High'])

# Ensure the encoded variables are boolean
df_encoded = df_encoded.astype(bool)

# Concatenate encoded variables with the original DataFrame
df = pd.concat([df[['Discount Applied', 'Promo Code Used', 'Purchase Amount (USD)']], df_encoded], axis=1)

# Apriori algorithm
frequent_itemsets = apriori(df_encoded, min_support=0.1, use_colnames=True)

# Generate association rules
rules = association_rules(frequent_itemsets, metric='lift', min_threshold=1)




# Display the association rules including Purchase Amount (USD)
print("Association Rules:")
print(rules[['antecedents', 'consequents', 'confidence', 'lift']])

# Filter rules related to Purchase Amount (USD)
purchase_amount_rules = rules[rules['antecedents'].apply(lambda x: 'Purchase Amount (USD)' in x)]

# Display rules related to Purchase Amount (USD)
print("\nAssociation Rules for Purchase Amount (USD):")
print(purchase_amount_rules[['antecedents', 'consequents', 'confidence', 'lift']])


plt.figure(figsize=(10, 6))
sns.scatterplot(x='confidence', y='lift', size='support', data=rules)
plt.xlabel('Confidence')
plt.ylabel('Lift')
plt.title('Association Rules - Confidence vs Lift')
plt.grid(True)
plt.show()
