## Changes that have been implemented in the following code - 

1. Applied feature scaling separately for the input features and target variable using StandardScaler. We fit the scaler on the training data and transform both the training and testing data accordingly.

2. Adjusted the creation and training of the Random Forest Regressor model with increased n_estimators (e.g., 100) and max_depth (e.g., 10) values to potentially improve the model's performance and reduce the MSE. Increased the number of estimators (n_estimators) to 100. Increasing the number of trees can improve the model's performance and potentially reduce MSE and also increased the maximum depth of the trees (max_depth) to 10. This allows the trees to have a greater depth, potentially capturing more complex relationships in the data.

3. Inverse transformed the scaled predictions (y_pred) and actual values (y_test) back to their original scales using scaler_y.inverse_transform().

4. Calculated the Mean Squared Error (MSE) between the original predicted values (y_pred) and actual values (y_test) using the mean_squared_error function from scikit-learn.


In [1]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

In [2]:
df = pd.read_excel("/Users/rachitasingh/Desktop/MD -Weekly Data Set.xlsx")
df.head()

Unnamed: 0,GRPs,Reach @1+,Reach @3+
0,0.01,0.01,0.0
1,0.03,0.03,0.0
2,0.06,0.04,0.0
3,0.12,0.1,0.0
4,0.19,0.13,0.0


In [3]:
df['GRPs'] = df['GRPs'].astype(float)

In [4]:
X = df[['GRPs']]
y = df['Reach @1+']

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("Shape of X_train:", X_train.shape)
print("Shape of y_train:", y_train.shape)
print("Shape of X_test:", X_test.shape)
print("Shape of y_test:", y_test.shape)

Shape of X_train: (50244, 1)
Shape of y_train: (50244,)
Shape of X_test: (12561, 1)
Shape of y_test: (12561,)


In [6]:
# Scaling the input features and target variable
scaler_x = StandardScaler()
scaler_y = StandardScaler()
X_train = scaler_x.fit_transform(X_train)
X_test = scaler_x.transform(X_test)
y_train = scaler_y.fit_transform(y_train.values.reshape(-1, 1))
y_test = scaler_y.transform(y_test.values.reshape(-1, 1))

In [7]:
# Creating and training the Random Forest Regressor model
clf = RandomForestRegressor(n_estimators=100, max_depth=10, random_state=42)
clf.fit(X_train, y_train)

  clf.fit(X_train, y_train)


In [8]:
# Making predictions on the test set
clf_preds = clf.predict(X_test)

In [9]:
# Inverse transform the scaled predictions and actual values
clf_preds = scaler_y.inverse_transform(clf_preds.reshape(-1, 1))
y_test = scaler_y.inverse_transform(y_test.reshape(-1, 1))

In [10]:
# Calculating the Mean Squared Error (MSE)
mse = mean_squared_error(y_test, clf_preds)
print("Mean Squared Error (MSE):", mse)

Mean Squared Error (MSE): 1.0157583439297226


### Reach Prediction



In [15]:
import tkinter as tk
from tkinter import messagebox

# Function to handle button click event
def predict_reach():
    grp_input = entry.get()
    grps = [int(grp) for grp in grp_input.split(',')]
    df1 = pd.DataFrame({'GRPs': grps})
    testx = scaler_x.transform(np.array(df1['GRPs']).reshape(-1, 1))
    trial = clf.predict(testx)
    f = scaler_y.inverse_transform(trial.reshape(-1, 1))
    df1['Predicted Reach 1+'] = f
    
    # Display predicted reach in a messagebox
    messagebox.showinfo("Predicted Reach", str(df1))

# Create the GUI window
window = tk.Tk()
window.title("GRP Predictor")

# Create a label and entry field for GRP input
grp_label = tk.Label(window, text="Enter GRP values separated by commas:")
grp_label.pack()
entry = tk.Entry(window)
entry.pack()

# Create a button to trigger the prediction
predict_button = tk.Button(window, text="Predict Reach", command=predict_reach)
predict_button.pack()

# Run the GUI event loop
window.mainloop()


