<a href="https://colab.research.google.com/github/sisi195/Marketing-Optimization/blob/main/marketmind.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#MarketMind: Predicting Shopping Trends
## Predicting Ad Performance with Machine Learning
**Sierra Gordon • 5/15/25**
## Overview
In this notebook, I’ve explored consumer data to understand the factors influencing ad clicks and conversions. I began by cleaning the data, handling missing values, encoding categorical variables, and performing exploratory data analysis (including interactive visualizations such as age-binned box plots, bar charts, and a correlation heatmap).
## Insights  
1. **Key Drivers:** Demographic factors, user behavior, and ad attributes heavily influence both the number of ad clicks and, indirectly, conversion behavior.  
2. **Predictive Power:** Consumer features are strong predictors of ad clicks, as demonstrated by the high performance of the regression model.  
3. **Challenges in Conversion Prediction:** While overall performance is promising, classifying conversions accurately still poses challenges, indicating that additional conversion-specific features or advanced modeling techniques may be required.

## Importing necessary libraries


In [None]:
!pip install imbalanced-learn xgboost catboost plotly

# Import Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="whitegrid")
import plotly.express as px
import shap
import xgboost as xgb
from imblearn.over_sampling import SMOTE
from sklearn.model_selection import train_test_split, GridSearchCV, KFold, cross_val_score
from sklearn.preprocessing import MinMaxScaler, LabelEncoder
import plotly.express as px
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.metrics import (mean_squared_error, r2_score, accuracy_score, f1_score,
                             classification_report, confusion_matrix, roc_auc_score, roc_curve)

Collecting catboost
  Downloading catboost-1.2.8-cp311-cp311-manylinux2014_x86_64.whl.metadata (1.2 kB)
Downloading catboost-1.2.8-cp311-cp311-manylinux2014_x86_64.whl (99.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m99.2/99.2 MB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: catboost
Successfully installed catboost-1.2.8


##Load and Preview Data

In [None]:
# Load dataset
consumer_df = pd.read_csv('/content/consumer_behavior1.csv',
                          encoding='utf-8-sig',
                          delimiter=',',
                          on_bad_lines='skip')

# Preview the dataset
print("Consumer Behavior Dataset:")
print(consumer_df.head())
print("Shape:", consumer_df.shape)
print("Missing Values:", consumer_df.isnull().sum().sum())

Consumer Behavior Dataset:
   Age  Gender      Income  Location Ad_Type Ad_Topic   Ad_Placement  Clicks  \
0   26  Female  $60,611.72  Suburban  Native   Health   Social Media      17   
1   27    Male  $73,527.82  Suburban  Banner  Fashion        Website      15   
2   44    Male  $48,057.15     Rural  Banner     Food        Website      15   
3   40  Female  $72,046.39  Suburban  Banner  Finance  Search Engine      14   
4   28    Male  $35,230.06  Suburban  Banner   Health   Social Media      14   

  Click_Date  Conversion_Rate     CTR  
0  8/14/2023           0.0179  0.0411  
1  5/21/2023           0.3073  0.0057  
2  10/1/2023           0.1666  0.0671  
3  4/26/2023           0.2083  0.0331  
4  2/18/2024           0.4993  0.0504  
Shape: (496, 11)
Missing Values: 0


## **Data Cleaning in Power BI**
- **Removed Erroneous Values**  
- Eliminated negative values in Age and Income to maintain logical accuracy.  
- **Handled Missing Data**  
- **Standardized Formats**  
- Unified data formats to ensure column consistency


## Data Cleaning, Preprocessing and validating in Python

In this section I standardize the column names, handle missing values, convert financial values
- "income" into a numerical format
- remove negative values in "age" and "income"
- convert the "click_date" column into datetime objects, and extract the day and month
- Scaled the "income" column for later use in modeling.
- I also encodesd categorical features for use by downstream models.

I started by exploring the raw data to understand its distributions and relationships. I grouped ages into bins, then used box plots, bar charts, and a correlation heatmap to uncover relationships between demographics and conversion rate.

In [None]:
# Standardize column names
consumer_df.columns = consumer_df.columns.str.strip().str.lower().str.replace(" ", "_", regex=False)
print("Standardized Columns:", consumer_df.columns.tolist())


# Handle missing values (if any)
numeric_cols = consumer_df.select_dtypes(include=[np.number]).columns
consumer_df[numeric_cols] = consumer_df[numeric_cols].fillna(consumer_df[numeric_cols].mean())
cat_cols = consumer_df.select_dtypes(include=['object', 'category']).columns
consumer_df[cat_cols] = consumer_df[cat_cols].apply(lambda col: col.fillna(col.mode()[0]))
print("Total Missing Values after handling:", consumer_df.isnull().sum().sum())


# Fill numeric NaNs with the mean; fill categorical NaNs with the mode or a placeholder
numeric_cols = consumer_df.select_dtypes(include=[np.number]).columns
consumer_df[numeric_cols] = consumer_df[numeric_cols].fillna(consumer_df[numeric_cols].mean())
cat_cols = consumer_df.select_dtypes(include=['object', 'category']).columns
for col in cat_cols:
    if not consumer_df[col].mode().empty:
        consumer_df[col] = consumer_df[col].fillna(consumer_df[col].mode()[0])
    else:
        consumer_df[col] = consumer_df[col].fillna('missing')
print("Total Missing Values after handling:", consumer_df.isnull().sum().sum())

# Remove duplicates and rows with missing values
consumer_df.drop_duplicates(inplace=True)
consumer_df.dropna(inplace=True)

# Convert income remove currency symbols/commas and convert to float
try:
    consumer_df['income'] = consumer_df['income'].astype(str).str.replace("[\$,]", "", regex=True).astype(float)
except ValueError as e:
    print(f"Error converting 'income' to float: {e}")

# Remove duplicates and rows with missing values
consumer_df.drop_duplicates(inplace=True)
consumer_df.dropna(inplace=True)

# Remove any negative ages or income
if 'income' in consumer_df.columns:
    consumer_df = consumer_df[consumer_df['income'] >= 0]
if 'age' in consumer_df.columns:
    consumer_df = consumer_df[consumer_df['age'] >= 0]

# Convert 'click_date' to datetime and extract day and month features
if 'click_date' in consumer_df.columns:
    try:
        consumer_df['click_date'] = pd.to_datetime(consumer_df['click_date'], errors='coerce')
        initial_rows = consumer_df.shape[0]
        consumer_df.dropna(subset=['click_date'], inplace=True)
        if consumer_df.shape[0] < initial_rows:
            print(f"Dropped {initial_rows - consumer_df.shape[0]} rows due to invalid 'click_date' format.")
        consumer_df['click_day'] = consumer_df['click_date'].dt.day
        consumer_df['click_month'] = consumer_df['click_date'].dt.month
        print("'click_day' and 'click_month' features created.")
    except Exception as e:
        print(f"Error converting 'click_date': {e}")
else:
    print("Warning: 'click_date' column not found. Skipping date feature creation.")

# Scale 'income' for subsequent interactions
if 'income' in consumer_df.columns:
    scaler_income = MinMaxScaler()
    consumer_df['income_scaled'] = scaler_income.fit_transform(consumer_df[['income']])
    print("'income_scaled' column created.")
else:
    print("Warning: 'income' column not found. Cannot scale income.")

# Encode categorical features with LabelEncoder for later use in modeling
label_encoder = LabelEncoder()
categorical_cols_to_encode = ['ad_type','ad_topic','ad_placement','gender','location']
for col in categorical_cols_to_encode:
    if col in consumer_df.columns and consumer_df[col].dtype == 'object':
        try:
            consumer_df[col + '_encoded'] = label_encoder.fit_transform(consumer_df[col])
            print(f"'{col}' encoded to '{col}_encoded'.")
        except Exception as e:
            print(f"Error encoding '{col}': {e}. Skipping.")
    elif col in consumer_df.columns:
        print(f"Warning: Column '{col}' exists but is not of object dtype. Skipping encoding.")
    else:
        print(f"Warning: Column '{col}' not found for encoding. Skipping.")

# Create additional interaction features
if 'ctr' in consumer_df.columns and 'conversion_rate' in consumer_df.columns:
    consumer_df['ctr_x_conversion'] = consumer_df['ctr'] * consumer_df['conversion_rate']
if 'income_scaled' in consumer_df.columns and 'clicks' in consumer_df.columns:
    consumer_df['income_x_clicks'] = consumer_df['income_scaled'] * consumer_df['clicks']

# Verify cleaning results
print("\nCleaned Consumer Behavior Dataset:")
print(consumer_df.head())
print("Shape after cleaning:", consumer_df.shape)
print("Missing Values after cleaning:", consumer_df.isnull().sum().sum())


# Feature Engineering
# Create interaction features
if 'ctr' in consumer_df.columns and 'clicks' in consumer_df.columns:
    consumer_df['ctr_x_clicks'] = consumer_df['ctr'] * consumer_df['clicks']
    print("Feature 'ctr_x_clicks' created.")
else:
    print("Warning: 'ctr' or 'clicks' missing. Skipping 'ctr_x_clicks' creation.")

if 'ctr' in consumer_df.columns and 'conversion_rate' in consumer_df.columns:
    consumer_df['ctr_x_conversion'] = consumer_df['ctr'] * consumer_df['conversion_rate']
    print("Feature 'ctr_x_conversion' created.")
else:
    print("Warning: 'ctr' or 'conversion_rate' missing. Skipping 'ctr_x_conversion' creation.")

if 'income_scaled' in consumer_df.columns and 'clicks' in consumer_df.columns:
    consumer_df['income_x_clicks'] = consumer_df['income_scaled'] * consumer_df['clicks']
    print("Feature 'income_x_clicks' created.")
else:
    print("Warning: 'income_scaled' or 'clicks' missing. Skipping 'income_x_clicks' creation.")

# Verify cleaning and feature engineering
print("\nCleaned Consumer Behavior Dataset:")
print(consumer_df.head())
print("Shape after cleaning:", consumer_df.shape)
print("Missing Values after cleaning:", consumer_df.isnull().sum().sum())

Standardized Columns: ['age', 'gender', 'income', 'location', 'ad_type', 'ad_topic', 'ad_placement', 'clicks', 'click_date', 'conversion_rate', 'ctr']
Total Missing Values after handling: 0
Total Missing Values after handling: 0
'click_day' and 'click_month' features created.
'income_scaled' column created.
'ad_type' encoded to 'ad_type_encoded'.
'ad_topic' encoded to 'ad_topic_encoded'.
'ad_placement' encoded to 'ad_placement_encoded'.
'gender' encoded to 'gender_encoded'.
'location' encoded to 'location_encoded'.

Cleaned Consumer Behavior Dataset:
   age  gender    income  location ad_type ad_topic   ad_placement  clicks  \
0   26  Female  60611.72  Suburban  Native   Health   Social Media      17   
1   27    Male  73527.82  Suburban  Banner  Fashion        Website      15   
2   44    Male  48057.15     Rural  Banner     Food        Website      15   
3   40  Female  72046.39  Suburban  Banner  Finance  Search Engine      14   
4   28    Male  35230.06  Suburban  Banner   Health  

# Exploratory Data Analysis (EDA) & Visualizations

In this section, I dive into the consumer behavior dataset using a series of interactive Plotly charts. These visuals help me answer the key questions for my project, such as:

- **How does income affect click behavior?**
- **What are the relationships among the numerical features in the dataset?**
- **How do click rates vary across different income groups?**
- **Are there differences in click behavior between Female and Male consumers?**
- **How do conversion rates change over time for different genders?**

Below is a summary of the interactive charts I created:

### 1. Interactive Scatter Plot: Income vs. Clicks (Colored by Age)
This scatter plot displays the relationship between income and clicks. The size of each point reflects the income level, while the color (based on age) adds a demographic perspective. Hovering over a point reveals additional details such as location and ad type, helping me understand which age groups are more engaged at various income levels.

### 2. Interactive Correlation Heatmap (Numeric Features)
I computed the correlation matrix for all numeric features in the dataset and displayed it as an interactive heatmap. This chart quickly highlights how features like income, clicks, conversion rate, and age are interrelated, providing insight into potential drivers behind consumer behavior.

### 3. Interactive Bar Chart: Average Clicks by Income Range
To investigate how click behavior changes with income, I binned the income values into 10 groups and calculated the average number of clicks for each income range. This interactive bar chart clearly shows trends and outliers, making it easier to spot which income groups generate more interactions.

### 4. Interactive Box Plot: Clicks by Gender
This box plot visualizes the distribution of clicks for each gender. I used a custom color palette—pink for Female and blue for Male—to clearly delineate the differences in engagement between the genders.

### 5. Conversion Map: Average Conversion Rate Over Time by Gender
Finally, I created a conversion map to observe how conversion rates evolve over a year. After extracting the year-month from the click dates, I grouped the data by both month and gender, and then computed the average conversion rate. The resulting interactive line chart (with Females in pink and Males in blue) lets me compare trends over time, revealing seasonal effects and gender-specific performance differences.

These interactive visuals provide a comprehensive look into the factors driving consumer clicks and conversions in our dataset, supporting further model development and strategic decision-making.

In [None]:
# Interactive Scatter Plot: Income vs. Clicks (Colored by Age)
fig_scatter = px.scatter(
    consumer_df,
    x="income",
    y="clicks",
    color="age",
    size="income",  # Larger points for higher income
    hover_data=["location", "ad_type"],
    title="Income vs. Clicks (Colored by Age)",
    labels={"income": "Income", "clicks": "Clicks", "age": "Age"},
    template="plotly_white"
)
fig_scatter.show()



# Interactive Correlation Heatmap using Numeric Features
numeric_features = consumer_df.select_dtypes(include=[np.number])
corr_matrix = numeric_features.corr()
fig_corr = px.imshow(
    corr_matrix,
    text_auto=True,
    aspect="auto",
    title="Interactive Correlation Heatmap of Numeric Features",
    labels={"x": "Features", "y": "Features", "color": "Correlation"}
)
fig_corr.show()



# Interactive Bar Chart: Average Clicks by Custom Income Range
# Define custom bins and labels.
bins = [10000, 25000, 40000, 55000, 70000, 90000]
labels = ["10K-25K", "25K-40K", "40K-55K", "55K-70K", "70K-90K"]

# Create a new column with the custom income ranges.
consumer_df["income_range_custom"] = pd.cut(consumer_df["income"], bins=bins, labels=labels, include_lowest=True)

# Group by the custom income range and compute the average clicks.
avg_clicks_by_income = consumer_df.groupby("income_range_custom", observed=False)["clicks"].mean().reset_index()
avg_clicks_by_income.columns = ['Income Range', 'Average Clicks']

# Define a custom discrete color mapping for each income bin.
color_map_custom = {
    "10K-25K": "#FFFF00",
    "25K-40K": "#FFD700",
    "40K-55K": "#DA70D6",
    "55K-70K": "#BA55D3",
    "70K-90K": "#800080"
}

fig_bar = px.bar(
    avg_clicks_by_income,
    x="Income Range",
    y="Average Clicks",
    title="Average Clicks by Income Range (Custom Bins)",
    labels={"Income Range": "Income Range", "Average Clicks": "Avg. Clicks"},
    template="plotly_white",
    color="Income Range",
    color_discrete_map=color_map_custom
)
fig_bar.show()

# Interactive Box Plot: Clicks by Gender
gender_palette = {"Female": "pink", "Male": "blue"}
fig_box = px.box(
    consumer_df,
    x="gender",
    y="clicks",
    color="gender",
    title="Clicks by Gender",
    color_discrete_map=gender_palette,
    template="plotly_white"
)
fig_box.show()

# Conversion Map: Average Conversion Rate Over Time by Gender
# Ensure the click_date column is datetime
consumer_df['click_date'] = pd.to_datetime(consumer_df['click_date'], errors='coerce')

# Create a new column to represent Year-Month
consumer_df['year_month'] = consumer_df['click_date'].dt.to_period('M').astype(str)

# Group data by year_month and gender to compute the average conversion_rate for each group.
conversion_by_month = consumer_df.groupby(['year_month', 'gender'], observed=False)['conversion_rate'].mean().reset_index()

custom_colors = {"Female": "pink", "Male": "blue"}

fig_line = px.line(
    conversion_by_month,
    x='year_month',
    y='conversion_rate',
    color='gender',
    markers=True,
    title='Average Conversion Rate Over Time by Gender',
    labels={'year_month': 'Year-Month', 'conversion_rate': 'Average Conversion Rate', 'gender': 'Gender'},
    color_discrete_map=custom_colors,
    template='plotly_white'
)
fig_line.show()


# Machine Learning Pipeline

In this section, I build two predictive models using the consumer behavior dataset:

1. **Regression Pipeline:**  
   I treat the number of clicks as a continuous target. I drop any columns that might leak information (like click_date) and one-hot encode categorical features. After scaling and feature selection, I train a Random Forest regressor to predict the actual click counts. I then evaluate the model using metrics such as Mean Squared Error (MSE) and R² score, and I further refine it through GridSearchCV hyperparameter tuning.

2. **Classification Pipeline:**  
   To better capture different engagement levels, I also create a binary target by labeling consumers with clicks above the median as 1 (high engagement) and those at or below the median as 0 (low engagement). I then split, scale, and select features before training a Random Forest classifier. The classifier’s performance is evaluated using accuracy, F1 score, a classification report, and a confusion matrix.

This end-to-end ML pipeline helps me understand the predictive power of consumer attributes while also providing actionable insights for further model improvements.


# Machine Learning Pipeline

In this section, I build and evaluate two predictive models using our cleaned consumer behavior dataset:

1. **Regression Pipeline:**  
   I treat the number of clicks as a continuous target. The features (after dropping columns that might leak information like "click_date") are one-hot encoded and scaled. I then perform feature selection (SelectKBest with f_regression), train a Random Forest regressor to predict click counts, and tune the model using GridSearchCV. The performance is measured via Mean Squared Error (MSE) and R² score.

2. **Classification Pipeline:**  
   To distinguish between high and low engagement, I create a binary target by marking consumers with clicks above the median as high (1) and the rest as low (0). After splitting, scaling, and feature selection (using SelectKBest with f_classif), I train a Random Forest classifier and evaluate its performance using accuracy, F1 score, a classification report, and a confusion matrix.

This end-to-end ML pipeline provides deep insights into the predictive power of consumer attributes, while also offering a foundation for further model refinements.



In [None]:
# ================================
# Prepare the Dataset for Modeling
# ================================
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV, KFold
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.metrics import (mean_squared_error, r2_score, accuracy_score,
                             f1_score, classification_report, confusion_matrix)
from sklearn.preprocessing import MinMaxScaler
from sklearn.feature_selection import SelectKBest, f_regression, f_classif

# Define the target for regression (continuous clicks prediction).
# Drop columns that might leak information (e.g., click_date) from predictors.
X = consumer_df.drop(columns=["clicks", "click_date"])
y_reg = consumer_df["clicks"]

# One-hot encode categorical features.
X_encoded = pd.get_dummies(X, drop_first=True)

# Scale features.
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X_encoded)

# ====================================
# Regression Pipeline: Predicting Clicks
# ====================================
# Split the data.
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y_reg, test_size=0.2, random_state=42)

# Feature selection: select up to 50 features.
selector_reg = SelectKBest(score_func=f_regression, k=min(50, X_train.shape[1]))
X_train_sel = selector_reg.fit_transform(X_train, y_train)
X_test_sel = selector_reg.transform(X_test)

# Train a Random Forest Regressor.
rf_reg = RandomForestRegressor(n_estimators=200, random_state=42)
rf_reg.fit(X_train_sel, y_train)
y_pred = rf_reg.predict(X_test_sel)

print("=== Regression Performance ===")
print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
print("R² Score:", r2_score(y_test, y_pred))

# Hyperparameter tuning via GridSearchCV.
param_grid_reg = {
    "n_estimators": [100, 200, 300],
    "max_depth": [5, 10, 15],
    "min_samples_leaf": [1, 2, 4]
}
cv = KFold(n_splits=5, shuffle=True, random_state=42)
grid_reg = GridSearchCV(RandomForestRegressor(random_state=42),
                        param_grid=param_grid_reg,
                        cv=cv,
                        scoring="r2")
grid_reg.fit(X_train_sel, y_train)
print("Best Regression Parameters:", grid_reg.best_params_)
y_pred_best = grid_reg.best_estimator_.predict(X_test_sel)
print("Best Regression R² Score:", r2_score(y_test, y_pred_best))


# ====================================
# Classification Pipeline: High vs. Low Clicks
# ====================================
# Create a binary target: 1 if clicks > median, 0 otherwise.
median_clicks = y_reg.median()
y_bin = (y_reg > median_clicks).astype(int)

# Split the data (stratify to preserve class proportions).
X_train_bin, X_test_bin, y_train_bin, y_test_bin = train_test_split(
    X_scaled, y_bin, test_size=0.2, random_state=42, stratify=y_bin)

# Feature selection with SelectKBest using f_classif.
selector_clf = SelectKBest(score_func=f_classif, k=min(50, X_train_bin.shape[1]))
X_train_bin_sel = selector_clf.fit_transform(X_train_bin, y_train_bin)
X_test_bin_sel = selector_clf.transform(X_test_bin)

# Train a Random Forest Classifier.
rf_clf = RandomForestClassifier(n_estimators=200, random_state=42)
rf_clf.fit(X_train_bin_sel, y_train_bin)
y_pred_bin = rf_clf.predict(X_test_bin_sel)

print("\n=== Classification Performance ===")
print("Accuracy:", accuracy_score(y_test_bin, y_pred_bin))
print("F1 Score:", f1_score(y_test_bin, y_pred_bin))
print("\nClassification Report:\n", classification_report(y_test_bin, y_pred_bin))
print("Confusion Matrix:\n", confusion_matrix(y_test_bin, y_pred_bin))


=== Regression Performance ===
Mean Squared Error: 0.48606250000000023
R² Score: 0.7428104661622308
Best Regression Parameters: {'max_depth': 15, 'min_samples_leaf': 2, 'n_estimators': 300}
Best Regression R² Score: 0.7455427508969683

=== Classification Performance ===
Accuracy: 0.84
F1 Score: 0.5

Classification Report:
               precision    recall  f1-score   support

           0       0.84      0.99      0.90        77
           1       0.89      0.35      0.50        23

    accuracy                           0.84       100
   macro avg       0.86      0.67      0.70       100
weighted avg       0.85      0.84      0.81       100

Confusion Matrix:
 [[76  1]
 [15  8]]


# Project Conclusion

In this project, I have performed a comprehensive analysis of our consumer behavior dataset to understand the drivers behind ad clicks and conversions, and to develop predictive models for consumer engagement.

**Data Cleaning & Preprocessing:**  
I standardized column names, handled missing values, removed duplicates, and converted financial and date fields into appropriate formats. Additionally, I engineered temporal features (such as `click_day` and `click_month`) and interaction terms (like `income_x_clicks` and `ctr_x_conversion`) to enrich the dataset for deeper analysis. These steps ensured that the data was reliable for subsequent exploratory analysis and modeling.

**Exploratory Data Analysis (EDA):**  
Using interactive visualizations, I uncovered several key insights:
- **Income vs. Clicks Scatter Plot (Colored by Age):** This plot revealed that clicks vary significantly across income levels, and the age-based color coding highlighted distinct engagement patterns across demographic groups.
- **Correlation Heatmap:** The heatmap illustrated strong relationships among numeric features, confirming that income, clicks, and conversion rates are interrelated.
- **Average Clicks by Custom Income Range Bar Chart:** By dividing income into five custom bins (10K–25K, 25K–40K, 40K–55K, 55K–70K, and 70K–90K) and applying a color gradient from yellow to purple, I was able to visually compare the average number of clicks across different income brackets.
- **Clicks by Gender Box Plot:** This visualization, using pink for Female and blue for Male, clearly delineated the differences in click behaviors between the genders.
- **Conversion Map:** An interactive line chart tracking monthly conversion rates by gender provided further insight into seasonal and demographic effects on conversions.

**Machine Learning Pipeline:**  
I built two predictive models:
- **Regression Pipeline:** A Random Forest Regressor was trained to predict the continuous number of clicks. The model achieved a robust R² score (approximately 0.74) with a low mean squared error, demonstrating a strong ability to capture the variability in consumer clicks.
- **Classification Pipeline:** I also constructed a binary classification model to distinguish between high and low engagement based on whether a consumer’s clicks exceeded the median. Although the model achieved an overall accuracy of 79%, the relatively low F1 score for the high engagement class indicates that further refinement—such as improved feature engineering or alternative modeling techniques—will be necessary to capture conversion-specific nuances more effectively.

**Overall Insights and Next Steps:**  
- **Key Drivers:** Demographic factors (age, gender), income levels, and ad attributes are critical in influencing consumer engagement.
- **Modeling Challenges:** While our regression model shows promise, the classification model reveals the need for additional conversion-specific features or more advanced modeling techniques.
- **Future Work:** Enhancing feature engineering, exploring ensemble methods, and integrating external data sources could further improve model performance and insights.

This project provides a solid foundation for data-driven decisions in ad targeting and campaign design. Continuous refinement and iteration will be key to unlocking deeper insights and ensuring that our models remain effective in a dynamic advertising landscape.
