# 0. Overview: Real-Life Employee Performance Case in Machine Learning (Regression) using Pycaret 2.1

We are going to utilize a low-code Machine Learning Python library, Pycaret (version 2.1), to predict the First Call Resolution (FCR) metric for Customer Service Agents (Employees) in Call Centers. FCR is an important metric in a call center as it indicates the percentage of issues that were resolved when the customer called the first time. We want to ensure that customers do not keep calling back to resolve an issue as it costs the company money when the issue is not resolved the first time.

Below is the approach we will take to predict a Customer Service Agent or Contact Agents FCR metric:

Conduct Exploratory Data Analysis (EDA) on the real data from a global call center.
Execute regression models to determine how accurately we can predict the FCR metric for each employee.
Create a classification indicator to determine whether predicting an employee's increase or decrease in FCR metric performance is more meaningful than regression.
We will leverage a real-case data from a business process outsourcer (BPO) that supports many Fortune 500 companies. *Note: dataset has been sanitized of personal information as it is a real dataset.

Let's get started!

## 1. Lab Scenario

Imagine you're working with a global call center that handles millions of customer queries. The company wants to predict FCR for each employee so it can **improve service quality, reduce costs, and boost customer satisfaction**.

You are provided with **real anonymized data** from a major Business Process Outsourcer (BPO) supporting Fortune 500 clients.

## 2. Goal
By the end of this lab, you will be able to:

* Build and evaluate a regression model using PyCaret.

* Predict the FCR score for a given employee.

* Understand how this prediction can be used to enhance performance management.

* Explore how machine learning can be applied to real business problems.

## 3. Key Concepts (for Beginners)
🔹 **First Call Resolution (FCR)**: FCR shows whether a customer’s issue was solved in the first call. Higher FCR = better support = lower cost.

🔹 **Regression**: A type of machine learning used to predict numbers (e.g., FCR score). It’s like drawing a best-fit line to estimate future values.

🔹 **Exploratory Data Analysis (EDA)**: EDA helps us understand the dataset before modeling. We explore trends, missing values, and patterns.

🔹 **PyCaret**: An easy-to-use ML library that automates model training, evaluation, and deployment with just a few lines of code.

# 4. Load Libraries

In [None]:
# Import libraries for data processing.
import numpy as np       # For numerical operations
import pandas as pd      # For working with datasets (rows & columns)
import warnings          # To suppress warnings and handle timing
import time
warnings.filterwarnings('ignore')

# Import libraries for visualization and set default values.
import matplotlib.pyplot as plt            # For creating graphs
import seaborn as sns                      # High-level plots with easy syntax
# Use the seaborn style with the 'darkgrid' context
sns.set_theme(style="darkgrid")            # Use a pleasant theme for better visuals
# or another available seaborn style


from sklearn import set_config
set_config(display='text')                 # Display pipeline steps in text format

In [None]:
# Import libraries for data processing.
import numpy as np
import pandas as pd
import warnings
import time
warnings.filterwarnings('ignore')

# Import libraries for visualization and set default values.
import matplotlib.pyplot as plt
import seaborn as sns
# Use the seaborn style with the 'darkgrid' context
sns.set_theme(style="darkgrid")
# or another available seaborn style


from sklearn import set_config
set_config(display='text')

# Install specific scikit-learn version compatible with Pycaret 2.1.x
!pip install scikit-learn==0.23.2 --quiet
!pip install pycaret --quiet  # Install/reinstall pycaret

# Install and import Pycaret library for transformation and classification.
!pip uninstall pycaret -y  # Uninstall pycaret if previously installed
!pip install pycaret --quiet # Reinstall pycaret
from pycaret.regression import * # Now import

# Confirm Pycaret version is 2.1
from pycaret.utils import version
print('Confirm Pycaret version is 2.1.X?')
print('Pycaret Version: ', version())


# Load Dataset.
url = 'https://raw.githubusercontent.com/aliagowani/Pycaret_2.1_Regression_EmployeePerformance/master/employee_performance.csv'
dataset = pd.read_csv(url)

# Check shape of dataset and view first few observations to ensure data loaded correctly.
print("Shape of dataset (observations, features):", dataset.shape)
dataset.head(5).round(2)

# Describe the features in the dataset, such as, count, mean, standard deviation, min, max, etc.
dataset.describe().round(2)

# Create a correlation for the dataset (exclude non-numeric columns)
numeric_cols = dataset.select_dtypes(include=['float64', 'int64']).columns
dataset_corr = dataset[numeric_cols].corr().round(4)

# Create a correlation matrix. Only bottom left corner valued.
mask = np.zeros_like(dataset_corr.round(4))
mask[np.triu_indices_from(mask)] = True

# Generate the correlation matrix (heatmap) using Seaborn
with sns.axes_style("whitegrid"):
    f, ax = plt.subplots(figsize=(12, 10))
    ax = sns.heatmap(dataset_corr.round(2), mask=mask, vmax=1, center=0, vmin=-1,
                     square=True, cmap='PuOr', linewidths=.5,
                     annot=True, annot_kws={"size": 12}, fmt='.1f')
    plt.title('Heatmap (Correlations) of Features in the Dataset', fontsize=15)
    plt.xlabel('Features', fontsize=15)
    plt.ylabel('Features', fontsize=15)
plt.show()

# Visualize the FCR for specific day in the probability density chart.
# ... (rest of the code remains the same) ...

  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mpip subprocess to install build dependencies[0m did not run successfully.
  [31m│[0m exit code: [1;36m1[0m
  [31m╰─>[0m See above for output.
  
  [1;35mnote[0m: This error originates from a subprocess, and is likely not a problem with pip.
  Installing build dependencies ... [?25l[?25herror
[1;31merror[0m: [1msubprocess-exited-with-error[0m

[31m×[0m [32mpip subprocess to install build dependencies[0m did not run successfully.
[31m│[0m exit code: [1;36m1[0m
[31m╰─>[0m See above for output.

[1;35mnote[0m: This error originates from a subprocess, and is likely not a problem with pip.
Found existing installation: pycaret 3.3.2
Uninstalling pycaret-3.3.2:
  Successfully uninstalled pycaret-3.3.2


ImportError: cannot import name 'is_scalar_nan' from 'sklearn.utils' (/usr/local/lib/python3.11/dist-packages/sklearn/utils/__init__.py)

## 5. Load Dataset

In this step, we’ll load a **real-world Employee Performance dataset** that contains information about customer service agents in a call center. This data will help us build a machine learning model to predict **First Call Resolution (FCR)**.

**What is pd.read_csv()?**
* This function reads a CSV (Comma-Separated Values) file and turns it into a table (called a DataFrame) that we can analyze.

In [None]:
# Load Datasetm
url = 'https://raw.githubusercontent.com/aliagowani/Pycaret_2.1_Regression_EmployeePerformance/master/employee_performance.csv'
dataset = pd.read_csv(url)

In [None]:
# Check shape of dataset and view first few observations to ensure data loaded correctly.
print("Shape of dataset (observations, features):", dataset.shape)
dataset.head(5).round(2)

Shape of dataset (observations, features): (102, 19)


Unnamed: 0,Agent_ID,Friday,Monday,Saturday,Sunday,Thursday,Tuesday,Wednesday,Site,Function_Field,tenure,Total number of calls,Assistance,Recommend,CSat,total coaching,total coaching improved,Actual Value,FCR Week before
0,384091,100.0,90.0,90.0,96.15,100.0,96.88,100.0,Kuala Lumpur,Agent,33,163,95.06,94.23,4.87,0,0,85.71,97.14
1,369185,94.0,100.0,96.87,96.87,96.1,95.89,98.33,Kuala Lumpur,Agent,33,306,95.08,94.67,4.83,0,0,100.0,88.64
2,360854,94.44,80.0,92.94,92.94,100.0,93.94,96.3,Kuala Lumpur,Agent,32,138,94.16,94.74,4.82,2,1,80.0,92.31
3,374407,100.0,95.0,96.63,96.63,92.5,95.65,100.0,Kuala Lumpur,Agent,32,148,98.65,95.83,4.84,0,0,94.44,100.0
4,372496,96.88,95.83,94.28,94.28,83.33,95.35,100.0,Kuala Lumpur,Agent,29,142,97.18,98.55,4.86,0,0,100.0,90.91


## 6. Exploratory Data Analysis (EDA)

In [None]:
# Below is a high-level description of each feature. The data set is comprised of 102 agents and we have 19 features to evaluate. The data is between the time period of June 1, 2020 and July 31, 2020, with the 'Actual Value' as the Actual FCR value of each agent for August 7, 2020. The goal is to create a model that predicts the FCR for an employee can have at the end of the week (Friday).

# 'Agent_ID': unique identifier of the employee or agent.
# 'Friday', 'Monday', 'Saturday', 'Sunday', 'Thursday', 'Tuesday', 'Wednesday': shows the FCR percent for each agent as an aggregate (mean). The higher the percentage the higher the rate at which the customer's issue was resolved in the first call.
# 'Site': city location of the call center.
# 'Function_Field': this is the role of the employee. In our case, it should all be Agents.
# 'tenure': tenure of the agent at the company in months.
# 'Total number of calls': number of all phone calls taken by the agent in the given timeframe.
# 'Assistance': percentage of time the agent needed to provide additional assistance by escalating to supervisor.
# 'Recommend': percentage of time the agent would be recommended by the customer to resolve an issue.
# 'CSat': average survey response (on a scale from 1 (least satisfied) to 5 (most satisfied)) that the customer was satisfied with the experience.
# 'total coaching': number of times the agent received coaching sessions from supervisor to improve FCR metric in the given timeframe.
# 'total coaching improved': number of times the agent's FCR value increased after a week from the initial coaching.
# 'Actual Value': the label we are trying to predict. It is the next Friday (August 7, 2020) FCR value for the agent.
# 'FCR Week before': a strong indicator is how well the agent performed the previous Friday. This is the FCR value for the Friday before the predicted value ('Actual Value').

# Describe the features in the dataset, such as, count, mean, standard deviation, min, max, etc.
dataset.describe().round(2)

### Correlation Heatmap
* A heatmap helps us see **how features are related**. Darker colors or values close to **+1 or -1** show stronger relationships.

### Why it's useful:
**Correlation tells us which features might help in predicting the target (Actual Value)**.

In [None]:
# Create a correlation for the dataset (exclude non-numeric columns)
numeric_cols = dataset.select_dtypes(include=['float64', 'int64']).columns
dataset_corr = dataset[numeric_cols].corr().round(4)

# Create a correlation matrix. Only bottom left corner valued.
mask = np.zeros_like(dataset_corr.round(4))
mask[np.triu_indices_from(mask)] = True

# Generate the correlation matrix (heatmap) using Seaborn
with sns.axes_style("whitegrid"):
    f, ax = plt.subplots(figsize=(12, 10))
    ax = sns.heatmap(dataset_corr.round(2), mask=mask, vmax=1, center=0, vmin=-1,
                     square=True, cmap='PuOr', linewidths=.5,
                     annot=True, annot_kws={"size": 12}, fmt='.1f')
    plt.title('Heatmap (Correlations) of Features in the Dataset', fontsize=15)
    plt.xlabel('Features', fontsize=15)
    plt.ylabel('Features', fontsize=15)
plt.show()


* To understand how FCR (First Call Resolution) values vary across different days of the week, we use **KDE plots (Kernel Density Estimation)**.

* This chart shows the **distribution (or shape) of FCR for each day — Sunday to Saturday — in smooth curves**. Each curve represents how frequently certain FCR values occur for that day.

In [None]:
# Visualize the FCR for specific day in the probability density chart.
facet = sns.FacetGrid(dataset, aspect = 3, height=5)
facet.map(sns.kdeplot, 'Sunday', shade = True, color='#4E79A7')
facet.map(sns.kdeplot, 'Monday', shade = True, color='#F28E2B')
facet.map(sns.kdeplot, 'Tuesday', shade = True, color='#59A14F')
facet.map(sns.kdeplot, 'Wednesday', shade = True, color='#E15759')
facet.map(sns.kdeplot, 'Thursday', shade = True, color='#B6992D')
facet.map(sns.kdeplot, 'Friday', shade = True, color='#499894')
facet.map(sns.kdeplot, 'Saturday', shade = True, color='#B07AA1')
facet.set(xlim = (0, dataset['Thursday'].max()))
facet.add_legend()
plt.title('First Call Resolution for Days.', fontsize=12)
plt.ylabel('Probability Density', fontsize=12)
plt.xlabel('First Call Resolution (FCR)', fontsize=12)
plt.show()

### Boxplot + Swarmplot: FCR Distribution by Day
This visualization helps us understand how **FCR values are spread out across each day of the week** and if there are any **outliers (unusual values)**.

In [None]:
# Visualize the dispersion of FCR values in a given day and identify the outliers.

# Create a color set that matches the probability density chart.
my_pal = {'Sunday': '#4E79A7','Monday': '#F28E2B', 'Tuesday': '#59A14F', 'Wednesday': '#E15759', 'Thursday': '#B6992D', 'Friday': '#499894', 'Saturday': '#B07AA1'}

# Generate a boxplot using Seaborn.
dataset_boxplot = pd.DataFrame(data = dataset, columns = ['Sunday','Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'])
plt.figure(figsize=(15,5))
sns.boxplot(x="value", y="variable", data=pd.melt(dataset_boxplot), color='#cccccc')
sns.swarmplot(x="value", y="variable", data=pd.melt(dataset_boxplot), palette=my_pal, alpha=0.5)
plt.title('First Call Resolution by Days.', fontsize=12)
plt.ylabel('Days', fontsize=12)
plt.xlabel('FCR', fontsize=12)
plt.show()

## 7. Conduct Regression Model in Pycaret

In this step, we set up the data for **training a regression model using PyCaret**. This step prepares and transforms the data so PyCaret can find the best model to **predict the FCR (First Call Resolution)** for each agent.

🚀 Once setup is done, PyCaret gives an interactive report **showing data types, number of features, transformations applied, and more**. It's like preparing your kitchen before cooking — **clean, sort, and set up ingredients before making the dish (model)!**

In [None]:
# Transform dataset and split the dataset
reg_fcr = setup(data=dataset,
                target='Actual Value',
                session_id=786,
                transformation=True,
                normalize=False,
                train_size=0.75,
                numeric_features=('Sunday', 'Monday', 'Saturday', 'Thursday', 'Tuesday', 'Wednesday', 'Friday',
                                'Total number of calls', 'CSat', 'total coaching', 'total coaching improved', 'FCR Week before'),
                remove_outliers=True,
                outliers_threshold=0.05,
                remove_multicollinearity=True,
                multicollinearity_threshold=0.9,
                feature_selection=True,
                ignore_features=['Agent_ID'],
                polynomial_features=True,
                feature_selection_method='classic',
                pca=True)


In [None]:
# Below is a list of models that Pycaret can use for regression. The ID for each regression can be used to include or exclude models for various functions.
models()

Unnamed: 0_level_0,Name,Reference,Turbo
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
lr,Linear Regression,sklearn.linear_model._base.LinearRegression,True
lasso,Lasso Regression,sklearn.linear_model._coordinate_descent.Lasso,True
ridge,Ridge Regression,sklearn.linear_model._ridge.Ridge,True
en,Elastic Net,sklearn.linear_model._coordinate_descent.Elast...,True
lar,Least Angle Regression,sklearn.linear_model._least_angle.Lars,True
llar,Lasso Least Angle Regression,sklearn.linear_model._least_angle.LassoLars,True
omp,Orthogonal Matching Pursuit,sklearn.linear_model._omp.OrthogonalMatchingPu...,True
br,Bayesian Ridge,sklearn.linear_model._bayes.BayesianRidge,True
ard,Automatic Relevance Determination,sklearn.linear_model._bayes.ARDRegression,False
par,Passive Aggressive Regressor,sklearn.linear_model._passive_aggressive.Passi...,True


In [None]:
# We can do a compare_models() function without assigning it to a variable. However, we have top 5 models selected using n_select and assigning it to top5 variable. We plan to use this for Stacking and Blending purposes. We have excluded 'RANSAC' (Random Sample Consensus) and 'KNN' (K-Nearest Neighbor) models and have adjusted the default fold value from 10 to 5.
top5 = compare_models(n_select=5, exclude=(['ransac', 'knn']), sort='RMSE', fold=5)

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE,TT (Sec)
dummy,Dummy Regressor,4.4985,37.5842,5.9886,-0.0788,0.0652,0.0497,0.096
omp,Orthogonal Matching Pursuit,4.6248,40.6462,6.2665,-0.1999,0.0681,0.0511,0.092
lightgbm,Light Gradient Boosting Machine,4.9516,41.2393,6.298,-0.1984,0.0683,0.0542,0.112
llar,Lasso Least Angle Regression,5.1937,53.6806,7.1933,-0.8474,0.0788,0.0567,0.092
lar,Least Angle Regression,5.1937,53.6806,7.1933,-0.8474,0.0788,0.0567,0.094
en,Elastic Net,5.1937,53.6806,7.1933,-0.8474,0.0788,0.0567,0.094
br,Bayesian Ridge,5.1937,53.6806,7.1933,-0.8474,0.0788,0.0567,0.106
lasso,Lasso Regression,5.1937,53.6806,7.1933,-0.8474,0.0788,0.0567,1.006
lr,Linear Regression,5.1937,53.6806,7.1933,-0.8474,0.0788,0.0567,1.144
ridge,Ridge Regression,5.1937,53.6806,7.1933,-0.8474,0.0788,0.0567,0.714


Processing:   0%|          | 0/77 [00:00<?, ?it/s]

In [None]:
# Sometimes you want to include the output of the compare_models() as a screenshot into a report. However, with the yellow highlights it gets difficult to read. Pycaret has thought of that and you can use the pull() function to show the model results in the sort by or ascending order.
pull().sort_values(by='RMSE', ascending=True)

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE,TT (Sec)
dummy,Dummy Regressor,4.4985,37.5842,5.9886,-0.0788,0.0652,0.0497,0.096
omp,Orthogonal Matching Pursuit,4.6248,40.6462,6.2665,-0.1999,0.0681,0.0511,0.092
lightgbm,Light Gradient Boosting Machine,4.9516,41.2393,6.298,-0.1984,0.0683,0.0542,0.112
ridge,Ridge Regression,5.1937,53.6806,7.1933,-0.8474,0.0788,0.0567,0.714
lasso,Lasso Regression,5.1937,53.6806,7.1933,-0.8474,0.0788,0.0567,1.006
br,Bayesian Ridge,5.1937,53.6806,7.1933,-0.8474,0.0788,0.0567,0.106
lr,Linear Regression,5.1937,53.6806,7.1933,-0.8474,0.0788,0.0567,1.144
lar,Least Angle Regression,5.1937,53.6806,7.1933,-0.8474,0.0788,0.0567,0.094
llar,Lasso Least Angle Regression,5.1937,53.6806,7.1933,-0.8474,0.0788,0.0567,0.092
en,Elastic Net,5.1937,53.6806,7.1933,-0.8474,0.0788,0.0567,0.094


Now that we've set up our data, it's time to **fine-tune the top-performing models** to improve their performance even more.

✅ What we are doing:
* Create a list tuned_top5 to store our tuned models.

* Loop through the top 5 models identified earlier.

* Use tune_model() to optimize each model for best performance.

* Handle exceptions in case a model doesn’t support tuning (e.g., some tree-based models).

* Store the tuned model, or the original if tuning fails.

**📌 Result:** A list of the best-tuned models — ready to predict FCR more accurately!

In [None]:
# Create a list to store tuned models
tuned_top5 = []

# Iterate through top5 models and tune those that support it
for model in top5:
    try:
        tuned_model = tune_model(model, n_iter=120, optimize='RMSE', fold=5)
        tuned_top5.append(tuned_model)
    except ValueError as e:
        print(f"Skipping tuning for {type(model).__name__} - {str(e)}")
        tuned_top5.append(model)  # Add original model if tuning not supported

print("\nTuned models (or original if tuning not supported):")
for model in tuned_top5:
    print(type(model).__name__)


NameError: name 'top5' is not defined

In [None]:
# Blending models is an ensemble method of combining different machine learning algorithms and use a majority vote to build consensus of final prediction values. Let's try building a blending model from our top 5 models and evaluate the results.
blender_specific = blend_models(estimator_list=tuned_top5[0:], fold=5, optimize='RMSE', choose_better=False)

Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,3.3478,24.5022,4.95,0.0116,0.0538,0.036
1,4.3667,31.6857,5.629,-0.2434,0.0603,0.0469
2,5.0781,59.8878,7.7387,-0.0755,0.0874,0.0601
3,4.9074,35.6489,5.9707,-0.6761,0.0626,0.0515
4,4.8027,44.3548,6.6599,-0.0086,0.073,0.0537
Mean,4.5005,39.2159,6.1897,-0.1984,0.0674,0.0496
Std,0.6225,12.1627,0.9508,0.2551,0.0117,0.008


Processing:   0%|          | 0/6 [00:00<?, ?it/s]

In [None]:
# Below is a view of the model parameters.
blender_specific

VotingRegressor(estimators=[('Dummy Regressor', DummyRegressor()),
                            ('Orthogonal Matching Pursuit',
                             OrthogonalMatchingPursuit(n_nonzero_coefs=2)),
                            ('Light Gradient Boosting Machine',
                             LGBMRegressor(bagging_fraction=1.0, bagging_freq=6,
                                           feature_fraction=0.7,
                                           learning_rate=0.001,
                                           min_child_samples=16,
                                           min_split_gain=0, n_estimators=200,
                                           n_jobs=-1, num_leaves=2,
                                           random_state=786, reg_alpha=0.5,
                                           reg_lambda=5)),
                            ('Lasso Least Angle Regression',
                             LassoLars(random_state=786)),
                            ('Least Angle Regression', L

In [None]:
# Stacking models is an ensemble method of using meta learning, where a meta model is created using multiple base estimators to generate the final prediction. Let's try building a stacking model from our top 5 models and evaluate the results.
stacker_specific = stack_models(estimator_list=tuned_top5[1:], meta_model=tuned_top5[0], fold=5, optimize='RMSE', choose_better=False)

Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,4.1201,28.2817,5.3181,-0.1408,0.0573,0.0438
1,3.6623,25.4974,5.0495,-0.0005,0.0542,0.0396
2,5.3236,67.4885,8.2151,-0.212,0.0923,0.0632
3,4.2083,21.6214,4.6499,-0.0166,0.0487,0.0443
4,5.178,45.0323,6.7106,-0.0241,0.0737,0.0578
Mean,4.4985,37.5842,5.9886,-0.0788,0.0652,0.0497
Std,0.6433,16.954,1.3117,0.0832,0.0159,0.0091


Processing:   0%|          | 0/6 [00:00<?, ?it/s]

In [None]:
# Below is a view of the model parameters.
stacker_specific

StackingRegressor(cv=5,
                  estimators=[('Orthogonal Matching Pursuit',
                               OrthogonalMatchingPursuit(n_nonzero_coefs=2)),
                              ('Light Gradient Boosting Machine',
                               LGBMRegressor(bagging_fraction=1.0,
                                             bagging_freq=6,
                                             feature_fraction=0.7,
                                             learning_rate=0.001,
                                             min_child_samples=16,
                                             min_split_gain=0, n_estimators=200,
                                             n_jobs=-1, num_leaves=2,
                                             random_state=786, reg_alpha=0.5,
                                             reg_lambda=5)),
                              ('Lasso Least Angle Regression',
                               LassoLars(random_state=786)),
                            

## 8. Evaluate Results and Finalize Model

This step involves evaluating the performance of the trained models by **analyzing various diagnostic plots and generating predictions**. Here's a breakdown:

### Plotting Model Diagnostics:

* The plot_model() function is used to generate diagnostic plots to visually assess the performance of the models.
* For example: Cook's Distance and t-SNE plots to check for outliers and visualize the data distribution.
* Residuals, Prediction Error, and Learning Plots to compare the performance of two models (blender and stacker) side by side.
* These plots help identify issues like overfitting, underfitting, or irregularities in model predictions.

In [None]:
# We can use Pycaret's built in plot_model() function to generate side-by-side plots: the Cook's Distance Outliers and t-SNE Manifold charts.
fig = plt.figure(figsize=(20,30))
ax = fig.add_subplot(5,2,1)
plot_model(blender_specific, plot='cooks', save=True, verbose=False, scale=1.1)
ax = fig.add_subplot(5,2,2)
plot_model(blender_specific, plot='manifold', save=True, verbose=False, scale=1.1)
plt.savefig('plots_cooks_and_manifold.png', dpi=300, pad_inches=0.25)
plt.show()

<Figure size 800x550 with 0 Axes>

### Prediction:

The predict_model() function generates **predicted values based on the trained models** (blender_specific and stacker_specific).

This step allows us to evaluate how well the models generalize to new data.

In [None]:
# We can use Pycaret's built in plot_model() function to generate side-by-side plots: the Residuals chart, Prediction Error and Cross Validation (learning) charts. Let's compare the Blend and Stack model plots in a side-by-side comparison.
fig = plt.figure(figsize=(25,20))
ax = fig.add_subplot(3,2,1)
plot_model(blender_specific, plot='residuals', save=True, verbose=False, scale=1.1)
ax = fig.add_subplot(3,2,2)
plot_model(stacker_specific, plot='residuals', save=True, verbose=False, scale=1.1)
ax = fig.add_subplot(3,2,3)
plot_model(blender_specific, plot='error', save=True, verbose=False, scale=1.1)
ax = fig.add_subplot(3,2,4)
plot_model(stacker_specific, plot='error', save=True, verbose=False, scale=1.1)
ax = fig.add_subplot(3,2,5)
plot_model(blender_specific, plot='learning', save=True, verbose=False, scale=1.1)
ax = fig.add_subplot(3,2,6)
plot_model(stacker_specific, plot='learning', save=True, verbose=False, scale=1.1)
plt.savefig('plots_blender_vs_stacker.png', dpi=300, pad_inches=0.25)
plt.show()

<Figure size 800x550 with 0 Axes>

In [None]:
# We can execute the predict_model() function to use the model to generate the predicted values.
pred_tunded_blender = predict_model(blender_specific)

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Voting Regressor,4.1873,24.8257,4.9825,-0.2582,0.0523,0.0443




In [None]:
# We can execute the predict_model() function to use the model to generate the predicted values.
pred_tunded_stacker = predict_model(stacker_specific)

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Stacking Regressor,3.9484,20.2534,4.5004,-0.0265,0.0473,0.0415




### Interactive Prediction with Gradio:

Gradio is used to create an **interactive interface for making real-time predictions**. Users input values, and the model predicts employee performance based on the trained model.

This user-friendly interface makes it **easy to deploy the model for real-world applications**.

In [None]:
import gradio as gr

# Define the prediction function using the `blender_specific` model
def make_prediction(Friday, Monday, Saturday, Sunday, Thursday, Tuesday, Wednesday, Site, Function_Field, tenure, Total_number_of_calls, Assistance, Recommend, CSat, total_coaching, total_coaching_improved, FCR_Week_before):
    input_data = pd.DataFrame({
        'Friday': [Friday],
        'Monday': [Monday],
        'Saturday': [Saturday],
        'Sunday': [Sunday],
        'Thursday': [Thursday],
        'Tuesday': [Tuesday],
        'Wednesday': [Wednesday],
        'Site': [Site],
        'Function_Field': [Function_Field],
        'tenure': [tenure],
        'Total number of calls': [Total_number_of_calls],
        'Assistance': [Assistance],
        'Recommend': [Recommend],
        'CSat': [CSat],
        'total coaching': [total_coaching],
        'total coaching improved': [total_coaching_improved],
        'FCR Week before': [FCR_Week_before]
    })
    # Ensure the input data has the same columns and types as the training data
    input_data = input_data.astype({
        'Friday': 'float32',
        'Monday': 'float32',
        'Saturday': 'float32',
        'Sunday': 'float32',
        'Thursday': 'float32',
        'Tuesday': 'float32',
        'Wednesday': 'float32',
        'Site': 'category',
        'Function_Field': 'category',
        'tenure': 'int8',
        'Total number of calls': 'int32',
        'Assistance': 'float32',
        'Recommend': 'float32',
        'CSat': 'float32',
        'total coaching': 'int8',
        'total coaching improved': 'int8',
        'FCR Week before': 'float32'
    })
    # Align the input data columns with the model's expected features
    input_data = input_data.reindex(columns=blender_specific.feature_names_in_, fill_value=0)
    prediction = blender_specific.predict(input_data)
    return prediction[0]

# Create Gradio interface
iface = gr.Interface(
    fn=make_prediction,
    inputs=[
        gr.Number(value=95.0, label="Friday"),
        gr.Number(value=90.0, label="Monday"),
        gr.Number(value=92.0, label="Saturday"),
        gr.Number(value=94.0, label="Sunday"),
        gr.Number(value=93.0, label="Thursday"),
        gr.Number(value=91.0, label="Tuesday"),
        gr.Number(value=92.0, label="Wednesday"),
        gr.Dropdown(choices=["Kuala Lumpur", "Santiago"], value="Kuala Lumpur", label="Site"),
        gr.Dropdown(choices=["Agent"], value="Agent", label="Function_Field"),
        gr.Number(value=10, label="tenure"),
        gr.Number(value=150, label="Total number of calls"),
        gr.Number(value=95.0, label="Assistance"),
        gr.Number(value=94.0, label="Recommend"),
        gr.Number(value=4.8, label="CSat"),
        gr.Number(value=1, label="total coaching"),
        gr.Number(value=1, label="total coaching improved"),
        gr.Number(value=90.0, label="FCR Week before")
    ],
    outputs="number",
    title="Employee Performance Prediction",
    description="Enter the values to predict the employee's performance."
)

# Launch the Gradio app
iface.launch()


* Running on local URL:  http://127.0.0.1:7864

To create a public link, set `share=True` in `launch()`.


