In [None]:
import pandas as pd

# Load the csv file 

file_path = r'C:\Users\SUDE\Downloads\Question3_CampaignData.csv'
df = pd.read_csv(file_path)

# Calculate derived metrics for modeling

df['ctr'] = df['clicks'] / df['impressions'].replace(0, pd.NA)
df['conversion_rate'] = df['conversions'] / df['clicks'].replace(0, pd.NA)
df['roi'] = df['revenue'] / df['spend'].replace(0, pd.NA)
df['is_profitable'] = df['revenue'] > df['spend']

print(df[['campaign_id', 'channel', 'ctr', 'conversion_rate', 'roi', 'is_profitable']].head())


   campaign_id  channel       ctr  conversion_rate       roi  is_profitable
0            1   Social  0.045650         0.159836  1.117177           True
1            2   Social  0.038944         0.079310  0.376336          False
2            3    Email  0.047451         0.140468  0.722115          False
3            4  Display  0.030203         0.120247  0.672137          False
4            5   Social  0.028806         0.066059  0.449074          False


In [2]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score

# Drop rows with missing values (from division by zero earlier)
df_clean = df.dropna(subset=['ctr', 'conversion_rate', 'roi'])

# Features and target
X = df_clean[['ctr', 'conversion_rate', 'roi']]
y = df_clean['is_profitable']

# Split the data (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train a logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Predict on the test set
y_pred = model.predict(X_test)

# Evaluate performance
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)

# Print metrics
print(f"Accuracy:  {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall:    {recall:.2f}")


Accuracy:  0.90
Precision: 1.00
Recall:    0.75


In this business problem, the goal is to predict whether a marketing campaign will be profitable. While accuracy is often used as a baseline metric; in real-world situations, it might not always be the most instructive, particularly if the dataset is unbalanced or if the cost of incorrect predictions varies significantly.
In our case, we chose to focus on precision and recall alongside accuracy, because predicting profitability incorrectly can lead to significant budget waste or missed opportunities.
The model achieved perfect precision (1.00), meaning that every campaign predicted to be profitable actually was. From a business point of view, this is quite appealing because it reduces the risk of wasting money at unsuccessful projects by enabling the marketing team to invest with confidence in the ones the model chooses.

However, the recall was 0.75, which means the model missed 25% of the profitable campaigns. This trade-off may be acceptable if the business prioritizes budget safety over opportunity maximization, but if growth is the goal, improving recall could become a focus.

Overall, in this context, precision is more important than accuracy, because a single misclassified unprofitable campaign can have a high cost. This precision-recall trade-off would not have been obvious with accuracy alone (90%); for this reason, it is essential to assess a variety of indicators relating to the business impact. 

In [3]:
#Overfitting Check

# Predict on training set
y_train_pred = model.predict(X_train)

# Training metrics
train_accuracy = accuracy_score(y_train, y_train_pred)
train_precision = precision_score(y_train, y_train_pred)
train_recall = recall_score(y_train, y_train_pred)

print(f"Train Accuracy:  {train_accuracy:.2f}")
print(f"Train Precision: {train_precision:.2f}")
print(f"Train Recall:    {train_recall:.2f}")


Train Accuracy:  0.88
Train Precision: 1.00
Train Recall:    0.64


The training and test performance are very similar across all metrics.
In fact, test recall (0.75) is slightly higher than training recall (0.64), which is rare and encouraging.
There is no indication of overfitting.
The model generalizes well and is not just memorizing the training data which is an rare but positive indicator of generalization. The outcomes are probably reliable when deployed on unseen campaigns.

# Business Application of the Model
In a business context, this model can be a valuable decision-support tool for marketing teams tasked with campaign budget allocation. Since the model achieves perfect precision (1.00), any campaign it predicts as profitable can be pursued with high confidence, lowering the risk of funding unsuccessful campaigns by a significant amount.
The model’s recall (0.75) shows that it identifies most, but not all, of the profitable opportunities. While some good campaigns may be missed, the trade-off ensures that resources are not wasted — which is especially important when budgets are limited or risk tolerance is low.

With a recall of 0.75, the model successfully identifies the majority of profitable campaigns, though it does miss some. This means that while not every opportunity is captured, those that are chosen have a very high chance of being successful.
This behavior reflects a conservative prediction strategy, in which the model prioritizes certainty over coverage. Stated differently, it would rather overlook a few successful campaigns than risk recommending a campaign that turns out to be unprofitable.

This trade-off is particularly valuable when launching a campaign that fails comes at an enormous cost.
The method works well in risk-averse or resource-constrained marketing organizations since within these situations, avoiding errors is more crucial than capturing every opportunity.