# Sentiment Analysis with MLflow Experiment Tracking

This notebook demonstrates how to use MLflow for:
- Experiment tracking
- Hyperparameter comparison
- Model logging and registration

- Dataset: Flipkart Reviews (Badminton Product)
- Metric: F1-score


In [1]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScaler
import re
import pickle

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.svm import LinearSVC
from sklearn.pipeline import Pipeline
from sklearn.metrics import f1_score, classification_report

import mlflow
import mlflow.sklearn

from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

plt.style.use('bmh')
sns.set_style('darkgrid')

In [2]:
df = pd.read_csv("reviews_badminton.csv")
df.head()


Unnamed: 0,Reviewer Name,Review Title,Place of Review,Up Votes,Down Votes,Month,Review text,Ratings
0,Kamal Suresh,Nice product,"Certified Buyer, Chirakkal",889.0,64.0,Feb 2021,"Nice product, good quality, but price is now r...",4
1,Flipkart Customer,Don't waste your money,"Certified Buyer, Hyderabad",109.0,6.0,Feb 2021,They didn't supplied Yonex Mavis 350. Outside ...,1
2,A. S. Raja Srinivasan,Did not meet expectations,"Certified Buyer, Dharmapuri",42.0,3.0,Apr 2021,Worst product. Damaged shuttlecocks packed in ...,1
3,Suresh Narayanasamy,Fair,"Certified Buyer, Chennai",25.0,1.0,,"Quite O. K. , but nowadays the quality of the...",3
4,ASHIK P A,Over priced,,147.0,24.0,Apr 2016,Over pricedJust â?¹620 ..from retailer.I didn'...,1


In [3]:
df.shape
df.columns


(8518, 8)

Index(['Reviewer Name', 'Review Title', 'Place of Review', 'Up Votes',
       'Down Votes', 'Month', 'Review text', 'Ratings'],
      dtype='object')

In [4]:
def rating_to_sentiment(rating):
    if rating >= 4:
        return 1
    elif rating <= 2:
        return 0
    else:
        return None

df["sentiment"] = df["Ratings"].apply(rating_to_sentiment)
df = df.dropna(subset=["sentiment"])
df["sentiment"] = df["sentiment"].astype(int)


In [5]:
df["text"] = df["Review Title"].fillna("") + " " + df["Review text"].fillna("")
df = df[["text", "sentiment"]]
df.head()


Unnamed: 0,text,sentiment
0,"Nice product Nice product, good quality, but p...",1
1,Don't waste your money They didn't supplied Yo...,0
2,Did not meet expectations Worst product. Damag...,0
4,Over priced Over pricedJust â?¹620 ..from reta...,0
5,Mind-blowing purchase Good quality product. De...,1


In [6]:
X = df["text"]
y = df["sentiment"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)


In [7]:
mlflow.set_experiment("Flipkart Sentiment Analysis - MLflow")


2026/02/08 00:09:28 INFO mlflow.store.db.utils: Creating initial MLflow database tables...
2026/02/08 00:09:28 INFO mlflow.store.db.utils: Updating database tables
2026/02/08 00:09:28 INFO alembic.runtime.migration: Context impl SQLiteImpl.
2026/02/08 00:09:28 INFO alembic.runtime.migration: Will assume non-transactional DDL.
2026/02/08 00:09:28 INFO alembic.runtime.migration: Running upgrade  -> 451aebb31d03, add metric step
2026/02/08 00:09:28 INFO alembic.runtime.migration: Running upgrade 451aebb31d03 -> 90e64c465722, migrate user column to tags
2026/02/08 00:09:28 INFO alembic.runtime.migration: Running upgrade 90e64c465722 -> 181f10493468, allow nulls for metric values
2026/02/08 00:09:28 INFO alembic.runtime.migration: Running upgrade 181f10493468 -> df50e92ffc5e, Add Experiment Tags Table
2026/02/08 00:09:28 INFO alembic.runtime.migration: Running upgrade df50e92ffc5e -> 7ac759974ad8, Update run tags with larger limit
2026/02/08 00:09:28 INFO alembic.runtime.migration: Running 

<Experiment: artifact_location='file:///C:/Users/kishore/OneDrive/Desktop/GenAI__/flipkart_sentiment_mlflow/mlruns/1', creation_time=1770489570690, experiment_id='1', last_update_time=1770489570690, lifecycle_stage='active', name='Flipkart Sentiment Analysis - MLflow', tags={}>

In [8]:
with mlflow.start_run(run_name="LogReg_C_1"):
    
    pipeline = Pipeline([
        ("tfidf", TfidfVectorizer(max_features=5000, ngram_range=(1,2))),
        ("clf", LogisticRegression(C=1, max_iter=1000))
    ])
    
    pipeline.fit(X_train, y_train)
    preds = pipeline.predict(X_test)
    
    f1 = f1_score(y_test, preds)
    
    mlflow.log_param("model", "LogisticRegression")
    mlflow.log_param("C", 1)
    mlflow.log_metric("f1_score", f1)
    
    mlflow.sklearn.log_model(pipeline, "model")
    
    print("F1 Score:", f1)


0,1,2
,steps,"[('tfidf', ...), ('clf', ...)]"
,transform_input,
,memory,
,verbose,False

0,1,2
,input,'content'
,encoding,'utf-8'
,decode_error,'strict'
,strip_accents,
,lowercase,True
,preprocessor,
,tokenizer,
,analyzer,'word'
,stop_words,
,token_pattern,'(?u)\\b\\w\\w+\\b'

0,1,2
,penalty,'l2'
,dual,False
,tol,0.0001
,C,1
,fit_intercept,True
,intercept_scaling,1
,class_weight,
,random_state,
,solver,'lbfgs'
,max_iter,1000


'LogisticRegression'

1



<mlflow.models.model.ModelInfo at 0x172de09dcd0>

F1 Score: 0.9605403483825098


In [9]:
with mlflow.start_run(run_name="LogReg_C_10"):
    
    pipeline = Pipeline([
        ("tfidf", TfidfVectorizer(max_features=5000, ngram_range=(1,2))),
        ("clf", LogisticRegression(C=10, max_iter=1000))
    ])
    
    pipeline.fit(X_train, y_train)
    preds = pipeline.predict(X_test)
    
    f1 = f1_score(y_test, preds)
    
    mlflow.log_param("model", "LogisticRegression")
    mlflow.log_param("C", 10)
    mlflow.log_metric("f1_score", f1)
    
    mlflow.sklearn.log_model(pipeline, "model")
    
    print("F1 Score:", f1)


0,1,2
,steps,"[('tfidf', ...), ('clf', ...)]"
,transform_input,
,memory,
,verbose,False

0,1,2
,input,'content'
,encoding,'utf-8'
,decode_error,'strict'
,strip_accents,
,lowercase,True
,preprocessor,
,tokenizer,
,analyzer,'word'
,stop_words,
,token_pattern,'(?u)\\b\\w\\w+\\b'

0,1,2
,penalty,'l2'
,dual,False
,tol,0.0001
,C,10
,fit_intercept,True
,intercept_scaling,1
,class_weight,
,random_state,
,solver,'lbfgs'
,max_iter,1000


'LogisticRegression'

10



<mlflow.models.model.ModelInfo at 0x172de20fa50>

F1 Score: 0.9635773530472412


In [10]:
with mlflow.start_run(run_name="LinearSVC"):
    
    pipeline = Pipeline([
        ("tfidf", TfidfVectorizer(max_features=5000, ngram_range=(1,2))),
        ("clf", LinearSVC(C=1))
    ])
    
    pipeline.fit(X_train, y_train)
    preds = pipeline.predict(X_test)
    
    f1 = f1_score(y_test, preds)
    
    mlflow.log_param("model", "LinearSVC")
    mlflow.log_param("C", 1)
    mlflow.log_metric("f1_score", f1)
    
    mlflow.sklearn.log_model(pipeline, "model")
    
    print("F1 Score:", f1)


0,1,2
,steps,"[('tfidf', ...), ('clf', ...)]"
,transform_input,
,memory,
,verbose,False

0,1,2
,input,'content'
,encoding,'utf-8'
,decode_error,'strict'
,strip_accents,
,lowercase,True
,preprocessor,
,tokenizer,
,analyzer,'word'
,stop_words,
,token_pattern,'(?u)\\b\\w\\w+\\b'

0,1,2
,penalty,'l2'
,loss,'squared_hinge'
,dual,'auto'
,tol,0.0001
,C,1
,multi_class,'ovr'
,fit_intercept,True
,intercept_scaling,1
,class_weight,
,verbose,0


'LinearSVC'

1



<mlflow.models.model.ModelInfo at 0x172de2ebc50>

F1 Score: 0.964969302997472


## Experiment Summary

Three different models were trained and evaluated using F1-score:

- Logistic Regression (C=1)
- Logistic Regression (C=10)
- Linear SVC (C=1)

All experiments were tracked using MLflow, including parameters, metrics, and model artifacts.


###### Based on the logged F1-scores, LinearSVC achieved the highest performance.
###### This model will be registered in the MLflow Model Registry using the MLflow UI.


## Model Registration (via MLflow UI)

Steps followed to register the best model:

1. Open MLflow UI using `mlflow ui`
2. Navigate to the experiment: *Flipkart Sentiment Analysis - MLflow*
3. Select the run with the highest F1-score
4. Register the logged model to the Model Registry
5. Add tags such as:
   - dataset: Flipkart Reviews
   - metric: F1-score
   - algorithm: LinearSVC


## Metrics and Hyperparameter Visualization

MLflow UI was used to:
- Compare F1-score across multiple runs
- Visualize how hyperparameter changes affected performance
- Analyze experiment history for reproducibility


## Reproducibility

MLflow enables reproducibility by tracking:
- Dataset used
- Model parameters
- Evaluation metrics
- Model artifacts

Each experiment run can be revisited or reproduced at any time using MLflow.


## Prefect Workflow 

Prefect workflow was created to orchestrate the training process.
The workflow automates data loading, model training, and logging steps.


## Conclusion

This project demonstrates the integration of MLflow for experiment tracking and model management in a sentiment analysis task.

Key outcomes:
- Multiple models were trained and compared
- Parameters and metrics were logged systematically
- The best-performing model was registered in MLflow
- MLflow UI was used for visualization and analysis

This workflow ensures transparency, reproducibility, and effective model selection.
