## Analysis of Results

The Apriori algorithm identified frequent itemsets and association rules with a minimum support of 0.01 and lift > 1. Key findings:
- **Baby Food → Bread**: Support = 0.4074, Confidence = 0.6822, Lift = 1.1337, indicating a strong association.
- Similar patterns were observed for Baby Food with Butter and Cereal, suggesting cross-selling opportunities.
- The lift values (> 1) confirm that these product pairs are purchased together more often than expected by chance.

The Logistic Regression model, using features `quantity`, `price`, and a binary `is_diaper_babyfood` indicator, achieved a **Precision@1 of 0.778**. This exceeds the PRD target of > 0.6, indicating a robust baseline for predicting bundle-eligible products.

## Next Steps
- Explore additional features (e.g., `product_category`, `store_id`) to improve model performance.
- Test advanced models like XGBoost or neural networks.
- Deploy the challenge platform to allow submissions and leaderboard updates.
  

In [5]:
import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori, association_rules
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_score

# Load data
train_df = pd.read_csv('../data/train.csv')

# 1. Apriori for Association Rules
# Create transaction basket
basket = train_df.groupby(['transaction_id', 'product_name'])['quantity'].sum().unstack().fillna(0)
basket = basket.map(lambda x: True if x > 0 else False)  # Updated to use map and boolean

# Run Apriori
frequent_itemsets = apriori(basket, min_support=0.01, use_colnames=True)
rules = association_rules(frequent_itemsets, metric='lift', min_threshold=1.0)
print('Top 5 Association Rules:')
print(rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']].head())

# 2. Logistic Regression for Prediction
# Feature engineering: Add binary feature for frequent itemsets
train_df['is_diaper_babyfood'] = train_df.apply(
    lambda x: 1 if x['product_id'] in [101, 102] else 0, axis=1)

# Features and target
X = train_df[['quantity', 'price', 'is_diaper_babyfood']]
y = train_df['is_bundle_target']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = LogisticRegression()
model.fit(X_train, y_train)

# Evaluate
y_pred = model.predict(X_test)
precision = precision_score(y_test, y_pred, average='binary')
print(f'Precision@1: {precision:.3f}')

# Predict on test set
test_df = pd.read_csv('../data/test.csv')
test_df['is_diaper_babyfood'] = test_df.apply(
    lambda x: 1 if x['product_id'] in [101, 102] else 0, axis=1)
X_test = test_df[['quantity', 'price', 'is_diaper_babyfood']]
test_df['is_bundle_target'] = model.predict(X_test)

# Save predictions
submission = test_df[['transaction_id', 'product_id', 'is_bundle_target']]
submission.to_csv('../data/baseline_submission.csv', index=False)
print('Generated baseline_submission.csv')

Top 5 Association Rules:
   antecedents  consequents  support  confidence      lift
0  (Baby Food)      (Bread)   0.4074    0.682241  1.133667
1      (Bread)  (Baby Food)   0.4074    0.676969  1.133667
2  (Baby Food)     (Butter)   0.4063    0.680399  1.130700
3     (Butter)  (Baby Food)   0.4063    0.675197  1.130700
4     (Cereal)  (Baby Food)   0.4112    0.676705  1.133225


  opt_res = optimize.minimize(


Precision@1: 0.778
Generated baseline_submission.csv
