<a href="https://colab.research.google.com/github/patrickhuang5/project-3-cis-2100/blob/main/Project_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

1. Import Required Libraries
- Imports necessary libraries for data manipulation, random data generation, and market basket analysis.

In [20]:
# Project Analysis Notebook
# Goal: Identify the best-selling items for each store and across the entire organization

# Step 1: Import Required Libraries
import pandas as pd
import numpy as np
import random
from itertools import combinations
from collections import Counter

  and should_run_async(code)


2. Define Goals

In [21]:
# Step 2: Define Goals
# "The objective of this analysis is to determine the best-selling items for each store and across the organization.
# Using market basket analysis, we will analyze customer baskets to discover the most frequently purchased sets of
# items. The results will provide valuable insights into customer purchasing patterns for better inventory management
# and strategic planning."

  and should_run_async(code)


3. Generate Synthetic Data
- Defines a function to generate synthetic transactional data for analysis, including random transactions, store names, and product names.

In [22]:
# Step 3: Generate Synthetic Data
def generate_synthetic_data(filepath="synthetic_project2_data.csv"):
    """
    Generate synthetic transactional data for analysis.
    """
    num_transactions = 2000
    transaction_ids = [f"T{i}" for i in range(1, num_transactions + 1)]
    store_names = [f"Store_{i}" for i in range(1, 11)]  # 10 stores
    product_names = [f"Product_{chr(65+i)}" for i in range(10)]  # Products A-J

    synthetic_data = {
        "transaction_id": random.choices(transaction_ids, k=num_transactions),
        "store_name": random.choices(store_names, k=num_transactions),
        "product_name": random.choices(product_names, k=num_transactions),
    }

    df = pd.DataFrame(synthetic_data)
    print("Synthetic data generated.")
    return df

  and should_run_async(code)


4. Perform Market Basket Analysis
- Description: Defines a function to perform market basket analysis and identify frequent item pairs based on their support in the data.

In [23]:
# Step 4: Perform Market Basket Analysis
def frequent_itemsets(dataframe, min_support=0.01):
    """
    Find frequent itemsets across transactions using combinations.
    """
    transactions = dataframe.groupby('transaction_id')['product_name'].apply(list).tolist()

    # Generate itemsets
    itemsets = []
    for transaction in transactions:
        itemsets.extend(combinations(transaction, 2))  # Pairwise combinations

    # Count itemsets
    itemset_counts = Counter(itemsets)

    # Calculate support
    total_transactions = len(transactions)
    frequent_itemsets = {itemset: count/total_transactions for itemset, count in itemset_counts.items() if count/total_transactions >= min_support}

    # Sort by support
    sorted_frequent_itemsets = dict(sorted(frequent_itemsets.items(), key=lambda x: x[1], reverse=True))

    return sorted_frequent_itemsets

  and should_run_async(code)


5. Store-wise and Organization-wide Analysis
- Defines a function to perform market basket analysis for each store and overall across the organization.

In [24]:
# Step 5: Store-wise and Organization-wide Analysis
def store_wise_analysis(dataframe):
    """
    Perform market basket analysis for each store and across the organization.
    """
    stores = dataframe['store_name'].unique()
    store_frequent_itemsets = {}

    for store in stores:
        store_data = dataframe[dataframe['store_name'] == store]
        store_frequent_itemsets[store] = frequent_itemsets(store_data)

    overall_frequent_itemsets = frequent_itemsets(dataframe)
    return store_frequent_itemsets, overall_frequent_itemsets

  and should_run_async(code)


6. Convert Analysis Results to Table
- Converts the results of the market basket analysis into a DataFrame for easy viewing and further analysis.

In [25]:
# Step 6: Convert Analysis Results to Table
def frequent_itemsets_to_table(store_frequent_itemsets, overall_frequent_itemsets):
    """
    Convert frequent itemsets for stores and organization into a DataFrame.
    """
    itemset_data = []

    for store, itemsets in store_frequent_itemsets.items():
        for itemset, support in itemsets.items():
            itemset_data.append({
                'Store': store,
                'Itemset': ' & '.join(itemset),
                'Support': support
            })

    for itemset, support in overall_frequent_itemsets.items():
        itemset_data.append({
            'Store': 'All Stores',
            'Itemset': ' & '.join(itemset),
            'Support': support
        })

    itemset_df = pd.DataFrame(itemset_data)
    return itemset_df

  and should_run_async(code)


7. Main Execution
- Main code execution block that generates synthetic data, performs the analysis, converts results into a table, and displays them.

In [26]:
# Step 7: Main Execution
if __name__ == "__main__":
    # Generate synthetic data
    df = generate_synthetic_data()

    # Perform analysis
    store_frequent_itemsets, overall_frequent_itemsets = store_wise_analysis(df)

    # Convert results to table
    itemset_table = frequent_itemsets_to_table(store_frequent_itemsets, overall_frequent_itemsets)

    # Display results
    print("\nBest-Selling Itemsets by Store and Across the Organization:")
    print(itemset_table)

Synthetic data generated.

Best-Selling Itemsets by Store and Across the Organization:
         Store                Itemset   Support
0      Store_1  Product_B & Product_D  0.010811
1      Store_1  Product_E & Product_I  0.010811
2      Store_6  Product_B & Product_J  0.010363
3      Store_6  Product_E & Product_I  0.010363
4   All Stores  Product_E & Product_A  0.015576
5   All Stores  Product_H & Product_F  0.014798
6   All Stores  Product_D & Product_A  0.013240
7   All Stores  Product_D & Product_J  0.013240
8   All Stores  Product_A & Product_J  0.011682
9   All Stores  Product_B & Product_H  0.011682
10  All Stores  Product_E & Product_D  0.011682
11  All Stores  Product_J & Product_A  0.011682
12  All Stores  Product_J & Product_E  0.011682
13  All Stores  Product_G & Product_J  0.010903
14  All Stores  Product_E & Product_I  0.010903
15  All Stores  Product_B & Product_J  0.010903
16  All Stores  Product_J & Product_H  0.010125
17  All Stores  Product_I & Product_D  0.010125
1

  and should_run_async(code)
