<a href="https://colab.research.google.com/github/patrickhuang5/project-3-cis-2100/blob/main/Project_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

1. Libraries and Dependencies
- Import required libraries for data manipulation, generating combinations, counting occurrences, and handling file uploads in Google Colab.

In [1]:
# Google Colab Market Basket Analysis Notebook
# Goal: Identify the best-selling items for each store and across the organization

# Step 1: Import Required Libraries
import pandas as pd
from itertools import combinations
from collections import Counter
from google.colab import files

2. File Upload and Data Loading
- Handles file upload and loads the transactional data into a Pandas DataFrame for anaylsis.

In [2]:
# Step 2: Define Goals
"""
The objective of this analysis is to determine the best-selling items for each store and across the organization.
Using market basket analysis, we will analyze customer baskets to discover the most frequently purchased sets of
items. The results will provide valuable insights into customer purchasing patterns for better inventory management
and strategic planning.
"""

# Step 3: Load Data

def load_data():
    """Upload and load the transactional data from a CSV file."""
    print("Upload your CSV file containing transactional data.")
    uploaded = files.upload()  # Upload file
    file_name = list(uploaded.keys())[0]  # Get the file name
    data = pd.read_csv(file_name)  # Load into DataFrame
    print(f"Data loaded successfully with {len(data)} rows.")
    return data

3. Product Frequency Analysis
- Calculates how often each product appears in the dataset and organizes it into a frequency table.

In [3]:
# Step 4: Product Frequency Analysis

def product_frequency(dataframe):
    """Analyze the frequency of individual products."""
    product_counts = dataframe['product_name'].value_counts().reset_index()
    product_counts.columns = ['Product', 'Frequency']
    return product_counts

4. Frequent Itemsets Generation
- Identifies frequent product pairings purchased together in transactions and calculates their relative support in the dataset.

In [4]:
# Step 5: Market Basket Analysis (Frequent Itemsets)

def frequent_itemsets(dataframe, min_support=0.01):
    """Find frequent itemsets across transactions using combinations."""
    transactions = dataframe.groupby('transaction_id')['product_name'].apply(list).tolist()

    # Generate itemsets
    itemsets = []
    for transaction in transactions:
        itemsets.extend(combinations(transaction, 2))  # Pairwise combinations

    # Count itemsets
    itemset_counts = Counter(itemsets)

    # Calculate support
    total_transactions = len(transactions)
    frequent_itemsets = {itemset: count/total_transactions for itemset, count in itemset_counts.items() if count/total_transactions >= min_support}

    # Sort by support
    sorted_frequent_itemsets = dict(sorted(frequent_itemsets.items(), key=lambda x: x[1], reverse=True))

    # Convert to DataFrame
    itemset_df = pd.DataFrame({
        'Itemset': [' & '.join(item) for item in sorted_frequent_itemsets.keys()],
        'Support': sorted_frequent_itemsets.values()
    })

    return itemset_df

5. Store-Wise Analysis and Organization-Wide Analysis
- Performs product frequency and frequent itemset analysis for a specific store.
- Conducts the same analysis as individual stores but considers all transactions across the organization.

In [5]:

# Step 6: Store-wise and Organization-wide Analysis

def store_analysis(dataframe, store_list):
    """Perform product frequency and itemset analysis for each store and overall."""
    results = {}

    # Analyze each store
    for store in store_list:
        store_data = dataframe[dataframe['store_name'] == store]
        print(f"Analyzing data for {store}...")
        results[store] = {
            'Product Frequency': product_frequency(store_data),
            'Frequent Itemsets': frequent_itemsets(store_data)
        }

    # Overall analysis
    print("Analyzing data for all stores...")
    results['All Stores'] = {
        'Product Frequency': product_frequency(dataframe),
        'Frequent Itemsets': frequent_itemsets(dataframe)
    }

    return results

6. Display Results
- Prints formatted product frequency and frequent itemset tables for a given store or the entire organization.

In [6]:
# Step 7: Display Results as Tables

def display_results(results):
    """Display product frequency and frequent itemset tables."""
    for store, analyses in results.items():
        print(f"\n=== {store} ===")
        print("\nProduct Frequency:")
        print(analyses['Product Frequency'].head(10))  # Top 10 products
        print("\nFrequent Itemsets:")
        print(analyses['Frequent Itemsets'].head(10))  # Top 10 itemsets

7. Main Execution
- Executes the analysis for the first 5 unique stores and the entire organization, printing the results for each analysis step.

In [7]:
# Step 8: Main Execution
if __name__ == "__main__":
    # Step 8.1: Load the data
    df = load_data()

    # Step 8.2: Define store list
    stores = df['store_name'].unique()[:5]  # Limit to the first 5 stores for analysis

    # Step 8.3: Perform analysis
    analysis_results = store_analysis(df, stores)

    # Step 8.4: Display results
    display_results(analysis_results)

Upload your CSV file containing transactional data.


Saving synthetic_project2_data.csv to synthetic_project2_data.csv
Data loaded successfully with 2000 rows.
Analyzing data for Store_2...
Analyzing data for Store_4...
Analyzing data for Store_1...
Analyzing data for Store_5...
Analyzing data for Store_9...
Analyzing data for all stores...

=== Store_2 ===

Product Frequency:
     Product  Frequency
0  Product_B         27
1  Product_H         26
2  Product_J         25
3  Product_A         24
4  Product_I         23
5  Product_C         18
6  Product_D         18
7  Product_E         18
8  Product_F         14
9  Product_G         13

Frequent Itemsets:
Empty DataFrame
Columns: [Itemset, Support]
Index: []

=== Store_4 ===

Product Frequency:
     Product  Frequency
0  Product_D         28
1  Product_B         27
2  Product_G         26
3  Product_F         26
4  Product_I         25
5  Product_A         25
6  Product_J         24
7  Product_C         20
8  Product_E         17
9  Product_H         14

Frequent Itemsets:
Empty DataFram