# Machine Learning Model Selection for Customer Inquiry Classification
To accurately classify customer inquiries in the banking industry, we implemented multiple machine learning models to find the best-performing algorithm. The models were trained on SBERT embeddings generated from customer inquiries and optimized using hyperparameter tuning.

>[Machine Learning Model Selection for Customer Inquiry Classification](#scrollTo=VZhgkoITtV3Z)

>>[Imports](#scrollTo=fwDBl0VKtZ_B)

>>[Load Data](#scrollTo=YK4_CdUJ-YDn)

>>[Baseline Score](#scrollTo=l3PrfoEdBS4U)

>>>[Baseline Accuracy](#scrollTo=CMQi0-ShCQeV)

>>[Fine Tuning with GridSearch](#scrollTo=FIJYTL54FD3A)

>>[XGBoost Model](#scrollTo=rgD_4oXw-HlH)

>>[CatBoost Model](#scrollTo=-vBM9qBL-TZh)

>>[MLPClassifier](#scrollTo=t8uKLFs2-dso)

>>[Deep Neural Network (DNN)](#scrollTo=AfxiLFBwBMKO)

>>[Summary](#scrollTo=xGUvKOEBfQtM)



## Imports

In [1]:
!pip install --upgrade scikit-learn
!pip install --upgrade xgboost scikit-learn
!pip install catboost
!pip install scikeras tensorflow



import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.preprocessing import LabelEncoder
from collections import Counter
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.svm import SVC
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
from catboost import CatBoostClassifier
from sklearn.neural_network import MLPClassifier

from scikeras.wrappers import KerasClassifier

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, Model

from sklearn.metrics import classification_report

import joblib
import os



Dask dataframe query planning is disabled because dask-expr is not installed.

You can install it with `pip install dask[dataframe]` or `conda install dask`.
This will raise in a future version.



In [2]:
# Mount to drive
from google.colab import drive
drive.mount('/content/drive/', force_remount=True)

Mounted at /content/drive/


## Load Data

In [3]:
data_path = 'drive/MyDrive/Colab_Notebooks/financial_customer_inq/data/final_data.csv'

embeddings_path = 'drive/MyDrive/Colab_Notebooks/financial_customer_inq/data/embeddings.npy'

In [4]:
# Load the final dataset
df = pd.read_csv(data_path)

# Load the embedding
embeddings = np.load(embeddings_path)

In [5]:
# Display the size of the dataset: {df.shape[0]}")
print(f"Number of features: {df.shape[1]}\n\n")
df.head()

Number of features: 6




Unnamed: 0,conversation_id,utterance,sentiment_score,cluster,cleaned_utterance,topic
0,acs-00051ccd-7f2b-4b4d-85dc-f716c7e9f34f-1,agent: hey may help today customer: hi want ch...,0.9896,0,: hey may today : want change address associ...,credit card charge/address update
1,acs-000e0b37-c8f0-46fe-9ffb-0c727e888339-1,agent: hellohow may help today customer: hi ne...,0.9977,0,: hellohow may today : need remove unwanted...,credit card charge/address update
2,acs-000efddb-1d74-4422-808e-1b4ccbf988f1-1,customer: good morning agent: good morning may...,0.9926,1,: good : good may today : lost card safe mo...,lost or stolen credit card
3,acs-001df2c1-5318-4715-8d99-7ece76c95fa2-1,agent: hello customer: hello jamie customer: n...,0.9961,0,: : jamie : najma : good may assist today :...,credit card charge/address update
4,acs-002c64ef-f434-41cd-8c36-39c4d8b9cd30-1,customer: hi agent: hello may help today custo...,0.9943,2,: : may today : want transfer money another...,money transfer request


## Baseline Score

In [6]:
# Define X (features) and y(target)
X = embeddings

label_encoder = LabelEncoder()
y = label_encoder.fit_transform(df["topic"])  # Convert topics to numbers

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, stratify=y)
print(f"X shape ---------- {X.shape}")
print(f"y shape ---------- {y.shape}")

X shape ---------- (8832, 384)
y shape ---------- (8832,)


### Baseline Accuracy

In [7]:
# Calculate the frequency of each class in the topic
class_frequencies = Counter(y_train)  # Using y_train for baseline calculation

# Get the most frequent class
most_frequent_class = class_frequencies.most_common(1)[0][0]

# Calculate baseline accuracy
baseline_accuracy = class_frequencies[most_frequent_class] / len(y_train)

print(f"Baseline Accuracy: {baseline_accuracy}")

Baseline Accuracy: 0.35778985507246375


## Fine Tuning with GridSearch

To improve the accuracy and performance of our customer inquiry classification models, we performed hyperparameter tuning using GridSearchCV. This process systematically tests different model configurations to find the optimal settings for each algorithm.

We applied GridSearchCV to the following models:

- Logistic Regression
- Support Vector Machine (SVC)
- Random Forest
- Gradient Boosting
- LGBM Classifier

In [8]:
# Define hyperparameter grids
param_grids = {
    "LogisticRegression": {
        "C": [0.1, 1, 10, 100],
        "solver": ["lbfgs", "newton-cg", "saga"],
        "max_iter": [500, 1000]
    },
    "SVC": {
        "C": [0.1, 1, 10],
        "kernel": ["linear", "rbf"],
        "gamma": ["scale", "auto"]
    },
    "RandomForestClassifier": {
        "n_estimators": [50, 100],
        "max_depth": [None, 10, 20],
        "min_samples_split": [2, 5, 10]
    },
    #"GradientBoostingClassifier": {
    #    "n_estimators": [100],                  # Had to cancel this model because of project timing
    #    "max_depth": [3, 5],
    #    "learning_rate": [0.05, 0.1]
   # },
    "LGBMClassifier" : {
        "n_estimators": [100, 200],
        "max_depth": [3, 5],
        "learning_rate": [0.01, 0.1],
        "subsample": [0.7, 0.9],
        "colsample_bytree": [0.7, 0.9]
}
}


In [9]:
# Models to evaluate
models = {
    "LogisticRegression": LogisticRegression(multi_class="multinomial"),
    "SVC": SVC(),
    "RandomForestClassifier": RandomForestClassifier(),
    #"GradientBoostingClassifier": GradientBoostingClassifier(), # Had to cancel this model because of project timing
    "LGBMClassifier": LGBMClassifier(objective="multiclass")
    }

In [10]:
# Define the directory path
save_dir = '/content/drive/MyDrive/Colab_Notebooks/financial_customer_inq/models/'

# Create the directory
os.makedirs(save_dir, exist_ok=True)

# Dictionary to store best models
best_models = {}

# Perform GridSearchCV for each model
for model_name, model in models.items():
    print(f"Running GridSearch for {model_name}...")
    grid_search = GridSearchCV(model, param_grids[model_name], cv=5, scoring="accuracy", n_jobs=-1)
    grid_search.fit(X_train, y_train)
    best_models[model_name] = grid_search.best_estimator_

    # Save model
    joblib.dump(grid_search.best_estimator_, os.path.join(save_dir, f"{model_name}_best_model.pkl"))

    # Print best parameters and score
    print(f"Best Parameters for {model_name}: {grid_search.best_params_}")
    print(f"Best Score: {grid_search.best_score_}\n")

Running GridSearch for LogisticRegression...




Best Parameters for LogisticRegression: {'C': 100, 'max_iter': 1000, 'solver': 'saga'}
Best Score: 0.9916968591460981

Running GridSearch for SVC...
Best Parameters for SVC: {'C': 10, 'gamma': 'scale', 'kernel': 'rbf'}
Best Score: 0.9935084079119877

Running GridSearch for RandomForestClassifier...
Best Parameters for RandomForestClassifier: {'max_depth': 20, 'min_samples_split': 2, 'n_estimators': 100}
Best Score: 0.9814320241691842

Running GridSearch for LGBMClassifier...




[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.047838 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 97920
[LightGBM] [Info] Number of data points in the train set: 6624, number of used features: 384
[LightGBM] [Info] Start training from score -5.854016
[LightGBM] [Info] Start training from score -4.062256
[LightGBM] [Info] Start training from score -1.027809
[LightGBM] [Info] Start training from score -3.515251
[LightGBM] [Info] Start training from score -3.369109
[LightGBM] [Info] Start training from score -1.399668
[LightGBM] [Info] Start training from score -1.167023
Best Parameters for LGBMClassifier: {'colsample_bytree': 0.9, 'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 200, 'subsample': 0.7}
Best Score: 0.9874703300461724



## XGBoost Model

In [11]:
# Define XGBoost model
xgb = XGBClassifier(n_estimators=100, max_depth=5, learning_rate=0.1, objective="multi:softmax")

# Train model
xgb.fit(X_train, y_train)

# Predict and evaluate
y_pred_xgb = xgb.predict(X_test)

# Save model
joblib.dump(xgb, os.path.join(save_dir, f"xgb_model.pkl"))

print("XGBoost Accuracy:", xgb.score(X_test, y_test))

XGBoost Accuracy: 0.9818840579710145


## CatBoost Model

In [12]:
# Define CatBoost model
cat = CatBoostClassifier(task_type="CPU", iterations=500, depth=6, learning_rate=0.1, loss_function="MultiClass")

# Train model
cat.fit(X_train, y_train)

# Predict and evaluate
y_pred_cat = cat.predict(X_test)

# Save model
joblib.dump(cat, os.path.join(save_dir, f"cat_model.pkl"))

print("CatBoost Accuracy:", cat.score(X_test, y_test))

0:	learn: 1.5651640	total: 1.43s	remaining: 11m 55s
1:	learn: 1.3322840	total: 2.38s	remaining: 9m 52s
2:	learn: 1.1710979	total: 3.36s	remaining: 9m 16s
3:	learn: 1.0238261	total: 4.35s	remaining: 8m 59s
4:	learn: 0.9160066	total: 5.36s	remaining: 8m 50s
5:	learn: 0.8282993	total: 6.16s	remaining: 8m 27s
6:	learn: 0.7522055	total: 6.72s	remaining: 7m 53s
7:	learn: 0.6850857	total: 7.3s	remaining: 7m 29s
8:	learn: 0.6275024	total: 7.88s	remaining: 7m 9s
9:	learn: 0.5768769	total: 8.47s	remaining: 6m 55s
10:	learn: 0.5352684	total: 9.07s	remaining: 6m 43s
11:	learn: 0.4950167	total: 9.65s	remaining: 6m 32s
12:	learn: 0.4613123	total: 10.2s	remaining: 6m 23s
13:	learn: 0.4280224	total: 10.8s	remaining: 6m 14s
14:	learn: 0.4026869	total: 11.4s	remaining: 6m 7s
15:	learn: 0.3765264	total: 11.9s	remaining: 6m 1s
16:	learn: 0.3541340	total: 12.5s	remaining: 5m 55s
17:	learn: 0.3343917	total: 13.1s	remaining: 5m 51s
18:	learn: 0.3157249	total: 13.7s	remaining: 5m 47s
19:	learn: 0.2980298	tota

## MLPClassifier

In [13]:
# Define neural network MLPClassifier
mlp = MLPClassifier(hidden_layer_sizes=(256, 128, 64), activation="relu", solver="adam", max_iter=500)

# Train model
mlp.fit(X_train, y_train)

# Predict and evaluate
y_pred_mlp = mlp.predict(X_test)

# Save model
joblib.dump(mlp, os.path.join(save_dir, f"mlp_model.pkl"))

print("MLPClassifier Accuracy:", mlp.score(X_test, y_test))


MLPClassifier Accuracy: 0.9895833333333334


## Deep Neural Network (DNN)

A Deep Neural Network (DNN) is a powerful machine learning model that can learn complex relationships in data. In this project, we use DNN to classify customer inquiries based on text embeddings generated from interactions.

**Key benefits**
- Understands subtle differences in customer inquiries better than traditional models.
- Works well with 384D SBERT embeddings without needing dimensionality reduction.
- Can handle increasing customer interactions and be improved over time.
- Uses RandomizedSearchCV to optimize parameters like learning rate, activation functions, and batch size.

In [18]:
# Define DNN Model Using Functional API
def build_dnn_model(input_dim, num_classes):
    inputs = layers.Input(shape=(input_dim,))  # Explicit Input layer
    x = layers.Dense(256, activation="relu")(inputs)
    x = layers.Dense(128, activation="relu")(x)
    x = layers.Dense(64, activation="relu")(x)
    outputs = layers.Dense(num_classes, activation="softmax")(x)  # Output layer

    model = Model(inputs=inputs, outputs=outputs)  # Define Model

    # Compile model
    model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])
    return model

# Define model
input_dim = X_train.shape[1]  # Number of input features (SBERT embedding size)
num_classes = len(set(y_train))  # Number of classes
dnn_model = build_dnn_model(input_dim, num_classes)

# Train DNN
dnn_model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=32)

# Evaluate performance
test_loss, test_acc = dnn_model.evaluate(X_test, y_test)
print("Test Accuracy:", test_acc)


Epoch 1/10
[1m207/207[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 7ms/step - accuracy: 0.8234 - loss: 0.7450 - val_accuracy: 0.9769 - val_loss: 0.0722
Epoch 2/10
[1m207/207[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - accuracy: 0.9816 - loss: 0.0592 - val_accuracy: 0.9819 - val_loss: 0.0608
Epoch 3/10
[1m207/207[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 6ms/step - accuracy: 0.9863 - loss: 0.0419 - val_accuracy: 0.9823 - val_loss: 0.0524
Epoch 4/10
[1m207/207[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - accuracy: 0.9878 - loss: 0.0312 - val_accuracy: 0.9769 - val_loss: 0.0754
Epoch 5/10
[1m207/207[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - accuracy: 0.9884 - loss: 0.0311 - val_accuracy: 0.9769 - val_loss: 0.0767
Epoch 6/10
[1m207/207[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - accuracy: 0.9920 - loss: 0.0203 - val_accuracy: 0.9805 - val_loss: 0.0521
Epoch 7/10
[1m207/207[0m 

In [19]:
# Save model
joblib.dump(dnn_model, os.path.join(save_dir, f"dnn_model.pkl"))

['/content/drive/MyDrive/Colab_Notebooks/financial_customer_inq/models/dnn_model.pkl']

In [25]:
# Define the directory path
save_dir = '/content/drive/MyDrive/Colab_Notebooks/financial_customer_inq/models/'

# List of models to load
model_filenames = {
    "XGBoost": "xgb_model.pkl",
    "CatBoost": "cat_model.pkl",
    "MLP": "mlp_model.pkl",
    "DNN": "dnn_model.pkl",
    "SVC": "SVC_best_model.pkl",
    "Logistic Regression": "LogisticRegression_best_model.pkl",
    "LightGBM": "LGBMClassifier_best_model.pkl",
    "Random Forest": "RandomForestClassifier_best_model.pkl"
}

# Load models
models = {name: joblib.load(os.path.join(save_dir, filename)) for name, filename in model_filenames.items()}

# Dictionary to store classification reports
classification_reports = {}

# Evaluate each model
for model_name, model in models.items():
    y_pred = model.predict(X_test)

    # Convert probability outputs to class labels if necessary
    if y_pred.ndim > 1 and y_pred.shape[1] > 1:  # Check if model outputs probabilities
        y_pred = np.argmax(y_pred, axis=1)  # Convert to class labels

    # Generate classification report
    report = classification_report(y_test, y_pred, output_dict=True)
    classification_reports[model_name] = report["weighted avg"]  # Store only weighted avg metrics

# Convert results to DataFrame
df_results = pd.DataFrame.from_dict(classification_reports, orient="index")

# Display results
print("\nModel Performance Comparison:")
print(df_results)

# Save results to CSV for reference
df_results.to_csv("model_classification_reports.csv", index=True)

print("\nClassification reports saved as 'model_classification_reports.csv'")



[1m69/69[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step





Model Performance Comparison:
                     precision    recall  f1-score  support
XGBoost               0.981708  0.981884  0.981364   2208.0
CatBoost              0.985652  0.985507  0.985375   2208.0
MLP                   0.989474  0.989583  0.989492   2208.0
DNN                   0.986122  0.985960  0.985889   2208.0
SVC                   0.991463  0.991395  0.991420   2208.0
Logistic Regression   0.989109  0.989130  0.989112   2208.0
LightGBM              0.985153  0.985054  0.984708   2208.0
Random Forest         0.975099  0.978261  0.975759   2208.0

Classification reports saved as 'model_classification_reports.csv'


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


## Summary

- Trained multiple classification models XGBoost, CatBoost, MLP, DNN, SVC, Logistic Regression, LightGBM, and Random Forest for customer inquiry categorization.
- Optimized models with hyperparameter tuning with GridSearchCV.
- Evaluated models using accuracy, precision, recall, and F1-score.
- The well tuned SVC model with GridSearch has the best accuracy score by 0.9935.
