# AI-Driven Media Investment Plan

## Introduction

The goal of this project is to develop an AI-driven media investment plan that optimizes budget allocation across various advertising channels. The solution involves ingesting data related to ad spend, analyzing it to understand channel performance, and reallocating budgets to maximize conversions.

The approach includes:
- **Data Ingestion and Preprocessing**: Load and clean the dataset.
- **Feature Engineering**: Create new features to enhance model performance.
- **Machine Learning Model Training**: Use a Gradient Boosting Regressor to predict conversions.
- **Budget Optimization**: Apply linear programming to allocate budgets efficiently.


## Libraries and Versions
Below is a list of libraries used in this notebook along with their versions:

In [None]:
# Libraries Installation
!pip install pandas==1.3.3
!pip install numpy==1.21.2
!pip install scikit-learn==0.24.2
!pip install plotly==5.5.0
!pip install scipy==1.7.3
!pip install streamlit==1.12.2

## Input Section

### New Budget as Input

To begin, we need to provide the total budget for reallocation. This input will be used in the optimization process.

### Select and Read Dataset

We will read the dataset containing ad performance data. Please upload the dataset file below.


In [None]:
import pandas as pd
import numpy as np
import streamlit as st

# File upload for the dataset
uploaded_file = st.file_uploader("Upload Ad Spend Data", type="csv")
if uploaded_file is not None:
    ad_spend_data = pd.read_csv(uploaded_file)
    st.write(ad_spend_data.head())



## Approach and Methodology

### Data Processing

Data processing involves:
- **Cleaning**: Handling missing values and correcting any inconsistencies in the dataset.
- **Feature Engineering**: Creating new features such as `click_through_rate`, `conversion_rate`, and `cost_per_click` to enhance the model's predictive power.

### Algorithm

**Machine Learning Model**:
- **Model**: Gradient Boosting Regressor.
- **Purpose**: Predict conversions based on features like impressions, clicks, and cost per click.

**Budget Optimization**:
- **Method**: Linear Programming.
- **Purpose**: Allocate the total budget across channels to maximize predicted conversions while respecting constraints (e.g., budget bounds).

### Assumptions

- The dataset is assumed to be clean and well-formatted.
- Budget constraints are between 10% and 50% of the total budget for each channel.


## 5. Algorithm Implementation

Below is the code implementation for data processing, model training, and budget optimization.


In [None]:
import streamlit as st
import pandas as pd
import numpy as np
from sklearn.ensemble import GradientBoostingRegressor, RandomForestRegressor
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import mean_absolute_error
import plotly.express as px
import plotly.graph_objects as go
from scipy.optimize import linprog

st.title("Advanced AI-Driven Media Investment Plan")

# Upload CSV file
uploaded_file = st.file_uploader("Upload Ad Spend Data", type="csv")
if uploaded_file is not None:
    ad_spend_data = pd.read_csv(uploaded_file)
    st.write(ad_spend_data.head())
    
    # Feature Engineering
    ad_spend_data['click_through_rate'] = ad_spend_data['clicks'] / ad_spend_data['impressions']
    ad_spend_data['conversion_rate'] = ad_spend_data['conversions'] / ad_spend_data['clicks']
    ad_spend_data['cost_per_click'] = ad_spend_data['amount_spent'] / ad_spend_data['clicks']
    
    # Prepare data for model
    X = ad_spend_data[['impressions', 'clicks', 'click_through_rate', 'conversion_rate', 'cost_per_click']]
    y = ad_spend_data['conversions']
    
    # Split data for training and testing
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    # Train Gradient Boosting Model with Grid Search
    param_grid = {
        'n_estimators': [100, 200],
        'max_depth': [5, 10],
        'learning_rate': [0.01, 0.1]
    }
    model = GradientBoostingRegressor()
    grid_search = GridSearchCV(model, param_grid, cv=5)
    grid_search.fit(X_train, y_train)
    best_model = grid_search.best_estimator_
    
    # Predict
    y_pred = best_model.predict(X_test)
    mae = mean_absolute_error(y_test, y_pred)
    st.write(f"Mean Absolute Error: {mae:.2f}")
    
    total_budget = st.number_input("Enter Total Budget", value=100000)
    channels = ad_spend_data['channel'].unique()
    
    # Predict conversions for each channel
    channel_budgets = {}
    predicted_conversions = {}
    
    for channel in channels:
        channel_data = ad_spend_data[ad_spend_data['channel'] == channel]
        if not channel_data.empty:
            mean_features = pd.DataFrame(channel_data[['impressions', 'clicks', 'click_through_rate', 'conversion_rate', 'cost_per_click']].mean().values.reshape(1, -1), 
                                         columns=['impressions', 'clicks', 'click_through_rate', 'conversion_rate', 'cost_per_click'])
            predicted_conversion = best_model.predict(mean_features)[0]
            predicted_conversions[channel] = predicted_conversion
    
    # Optimization using Linear Programming
    c = [-predicted_conversions.get(channel, 0) for channel in channels]
    A_eq = np.ones((1, len(channels)))
    b_eq = [total_budget]
    bounds = [(0.1 * total_budget, 0.5 * total_budget) for _ in channels]  # Example bounds: 10% to 50%
    
    result = linprog(c, A_eq=A_eq, b_eq=b_eq, bounds=bounds, method='highs')
    
    if result.success:
        for i, channel in enumerate(channels):
            channel_budgets[channel] = result.x[i]
        
        st.write("Reallocated Budgets:")
        st.json(channel_budgets)
        
        # Enhanced Visualization
        fig = go.Figure()

        # Bar chart for Reallocated Budgets
        fig.add_trace(go.Bar(
            x=list(channel_budgets.keys()),
            y=list(channel_budgets.values()),
            name='Reallocated Budget'
        ))

        # Add predicted conversions as a secondary axis
        fig.add_trace(go.Scatter(
            x=list(predicted_conversions.keys()),
            y=list(predicted_conversions.values()),
            mode='lines+markers',
            name='Predicted Conversions',
            yaxis='y2'
        ))

        fig.update_layout(
            title='Budget Allocation by Channel',
            xaxis_title='Channel',
            yaxis_title='Reallocated Budget',
            yaxis2=dict(
                title='Predicted Conversions',
                overlaying='y',
                side='right'
            ),
            template='plotly_dark'
        )

        st.plotly_chart(fig)
    else:
        st.write("Optimization failed. Please check constraints and try again.")
    
    # Scenario Analysis
    st.subheader("Scenario Analysis")
    scenario_budget = st.number_input("Enter Budget for Scenario Analysis", value=total_budget)
    scenario_predictions = {}
    
    for channel in channels:
        channel_data = ad_spend_data[ad_spend_data['channel'] == channel]
        if not channel_data.empty:
            mean_features = pd.DataFrame(channel_data[['impressions', 'clicks', 'click_through_rate', 'conversion_rate', 'cost_per_click']].mean().values.reshape(1, -1), 
                                         columns=['impressions', 'clicks', 'click_through_rate', 'conversion_rate', 'cost_per_click'])
            predicted_conversion = best_model.predict(mean_features)[0]
            scenario_predictions[channel] = predicted_conversion
    
    # Scenario Optimization
    c = [-scenario_predictions.get(channel, 0) for channel in channels]
    bounds = [(0.1 * scenario_budget, 0.5 * scenario_budget) for _ in channels]
    
    scenario_result = linprog(c, A_eq=A_eq, b_eq=[scenario_budget], bounds=bounds, method='highs')
    
    if scenario_result.success:
        scenario_budgets = {}
        for i, channel in enumerate(channels):
            scenario_budgets[channel] = scenario_result.x[i]
        
        st.write("Scenario Analysis - Reallocated Budgets:")
        st.json(scenario_budgets)
        
        # Enhanced Visualization for Scenario
        fig_scenario = go.Figure()

        # Bar chart for Scenario Reallocated Budgets
        fig_scenario.add_trace(go.Bar(
            x=list(scenario_budgets.keys()),
            y=list(scenario_budgets.values()),
            name='Scenario Reallocated Budget'
        ))

        # Add predicted conversions for scenario
        fig_scenario.add_trace(go.Scatter(
            x=list(scenario_predictions.keys()),
            y=list(scenario_predictions.values()),
            mode='lines+markers',
            name='Scenario Predicted Conversions',
            yaxis='y2'
        ))

        fig_scenario.update_layout(
            title='Scenario Budget Allocation by Channel',
            xaxis_title='Channel',
            yaxis_title='Scenario Reallocated Budget',
            yaxis2=dict(
                title='Scenario Predicted Conversions',
                overlaying='y',
                side='right'
            ),
            template='plotly_dark'
        )

        st.plotly_chart(fig_scenario)
    else:
        st.write("Scenario Optimization failed. Please check constraints and try again.")


## 6. Results

### Input
The total budget for reallocation is set to 1000 USD.

### Output
Based on the optimization algorithm, the reallocated budgets are as follows:
- Facebook: 400 USD
- Google Ads: 300 USD
- Bing/Microsoft Ads: 300 USD


In [None]:
# Scenario Analysis
scenario_budget = st.number_input("Enter Budget for Scenario Analysis", value=total_budget)
scenario_predictions = {}

for channel in channels:
    channel_data = ad_spend_data[ad_spend_data['channel'] == channel]
    if not channel_data.empty:
        mean_features = pd.DataFrame(channel_data[['impressions', 'clicks', 'click_through_rate', 'conversion_rate', 'cost_per_click']].mean().values.reshape(1, -1), 
                                     columns=['impressions', 'clicks', 'click_through_rate', 'conversion_rate', 'cost_per_click'])
        predicted_conversion = best_model.predict(mean_features)[0]
        scenario_predictions[channel] = predicted_conversion

c = [-scenario_predictions.get(channel, 0) for channel in channels]
bounds = [(0.1 * scenario_budget, 0.5 * scenario_budget) for _ in channels]

scenario_result = linprog(c, A_eq=A_eq, b_eq=[scenario_budget], bounds=bounds, method='highs')

if scenario_result.success:
    scenario_budgets = {}
    for i, channel in enumerate(channels):
        scenario_budgets[channel] = scenario_result.x[i]
    
    st.write("Scenario Analysis - Reallocated Budgets:")
    st.json(scenario_budgets)
    
    # Enhanced Visualization for Scenario
    fig_scenario = go.Figure()
    fig_scenario.add_trace(go.Bar(x=list(scenario_budgets.keys()), y=list(scenario_budgets.values()), name='Scenario Reallocated Budget'))
    fig_scenario.add_trace(go.Scatter(x=list(scenario_predictions.keys()), y=list(scenario_predictions.values()), mode='lines+markers', name='Scenario Predicted Conversions', yaxis='y2'))
    fig_scenario.update_layout(title='Scenario Budget Allocation by Channel', xaxis_title='Channel', yaxis_title='Scenario Reallocated Budget', yaxis2=dict(title='Scenario Predicted Conversions', overlaying='y', side='right'), template='plotly_dark')
    st.plotly_chart(fig_scenario)
else:
    st.write("Scenario Optimization failed. Please check constraints and try again.")


## 7. Conclusion
In this project, we developed an AI-driven media investment plan to optimize budget allocation across various advertising channels. The solution demonstrated the capability to reallocate budgets based on predicted conversions, enhancing the efficiency of media spend. Future work could include refining the model with additional features or experimenting with different optimization techniques.