# üß† Custom Text Classfier Practice

This notebook demonstrates a **supervised sentiment classification pipeline**
using classical machine-learning models and **Bag-of-Words text vectorization**.

The goal is to:
- convert raw text into numerical features
- train multiple classifiers
- compare their performance on sentiment prediction

This example is part of my **AI & Robotics Portfolio** and focuses on
**understanding the full ML workflow for NLP**, not just model accuracy.


## 1Ô∏è‚É£ Problem Definition

We are solving a **multi-class text classification problem**.

Each input sentence is labeled as one of:
- **positive**
- **neutral**
- **negative**

The task is to train a model that can automatically predict
the sentiment of unseen text.


In [None]:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

## 2Ô∏è‚É£ Dataset Overview

The dataset consists of **150 short sentences**, evenly split across:
- 50 positive samples
- 50 neutral samples
- 50 negative samples

Each row contains:
- `text` ‚Üí the sentence
- `sentiment` ‚Üí the target label

The dataset is shuffled to avoid ordering bias before training.


In [None]:
data = pd.DataFrame([
    # ---------- POSITIVE (50) ----------
    ("i enjoyed the workshop and learned many useful skills", "positive"),
    ("the support team resolved my issue quickly", "positive"),
    ("i feel motivated after completing my workout", "positive"),
    ("the app update made everything faster and smoother", "positive"),
    ("i had a relaxing weekend with no stress", "positive"),
    ("the presentation was clear and well organized", "positive"),
    ("my interview went better than expected", "positive"),
    ("the food tasted fresh and delicious", "positive"),
    ("i am happy with the progress i am making", "positive"),
    ("the new feature works exactly as promised", "positive"),
    ("i received positive feedback from my manager", "positive"),
    ("the project was completed successfully", "positive"),
    ("i feel confident about my upcoming exam", "positive"),
    ("the customer experience exceeded my expectations", "positive"),
    ("i enjoyed collaborating with my teammates", "positive"),
    ("the design looks clean and professional", "positive"),
    ("everything worked smoothly without issues", "positive"),
    ("i am satisfied with the final result", "positive"),
    ("the service was fast and reliable", "positive"),
    ("this solution saved me a lot of time", "positive"),
    ("the training session was very helpful", "positive"),
    ("i feel proud of the work i delivered", "positive"),
    ("the update improved overall performance", "positive"),
    ("i had a pleasant experience using the platform", "positive"),
    ("the interface feels intuitive and simple", "positive"),
    ("my goals are becoming clearer every day", "positive"),
    ("the response was accurate and helpful", "positive"),
    ("i appreciate the clear communication", "positive"),
    ("the task was completed ahead of schedule", "positive"),
    ("i am optimistic about the next phase", "positive"),
    ("the collaboration went really well", "positive"),
    ("i found the documentation easy to follow", "positive"),
    ("the service met all my expectations", "positive"),
    ("the system performed reliably throughout", "positive"),
    ("i feel encouraged by the results", "positive"),
    ("the process was efficient and smooth", "positive"),
    ("i learned something valuable today", "positive"),
    ("the tool makes my work easier", "positive"),
    ("everything was set up without problems", "positive"),
    ("the quality of work was impressive", "positive"),
    ("i enjoyed using the new feature", "positive"),
    ("the outcome was better than planned", "positive"),
    ("i feel accomplished after finishing the task", "positive"),
    ("the experience was enjoyable overall", "positive"),
    ("the instructions were clear and helpful", "positive"),
    ("the workflow feels much improved", "positive"),
    ("i am pleased with the service provided", "positive"),
    ("the platform feels stable and responsive", "positive"),
    ("the results were exactly what i hoped for", "positive"),
    ("this update adds real value", "positive"),

    # ---------- NEUTRAL (50) ----------
    ("the meeting started at nine in the morning", "neutral"),
    ("i updated the software on my laptop", "neutral"),
    ("the package arrived this afternoon", "neutral"),
    ("i worked from home today", "neutral"),
    ("the train stopped at every station", "neutral"),
    ("i read the documentation before installing the tool", "neutral"),
    ("the report contains five sections", "neutral"),
    ("i logged into the system and checked the dashboard", "neutral"),
    ("the event lasted two hours", "neutral"),
    ("i submitted the form online", "neutral"),
    ("the email was sent this morning", "neutral"),
    ("the device was turned on successfully", "neutral"),
    ("i attended the scheduled appointment", "neutral"),
    ("the file was saved in the documents folder", "neutral"),
    ("the software requires an internet connection", "neutral"),
    ("the update was installed yesterday", "neutral"),
    ("the task appears in the project list", "neutral"),
    ("the system recorded the input data", "neutral"),
    ("the instructions are listed on the website", "neutral"),
    ("the user logged out of the application", "neutral"),
    ("the meeting was moved to next week", "neutral"),
    ("the form includes several required fields", "neutral"),
    ("the report was shared with the team", "neutral"),
    ("the device was connected to the network", "neutral"),
    ("the process consists of three steps", "neutral"),
    ("the schedule was updated in the calendar", "neutral"),
    ("the document was reviewed earlier", "neutral"),
    ("the file size is listed on the page", "neutral"),
    ("the system uses cloud storage", "neutral"),
    ("the task was marked as complete", "neutral"),
    ("the message was received successfully", "neutral"),
    ("the user profile was updated", "neutral"),
    ("the application supports multiple users", "neutral"),
    ("the data was exported as a csv file", "neutral"),
    ("the tool was opened from the menu", "neutral"),
    ("the button is located at the top", "neutral"),
    ("the configuration file was loaded", "neutral"),
    ("the session ended automatically", "neutral"),
    ("the input was processed by the system", "neutral"),
    ("the table contains ten rows", "neutral"),
    ("the device restarted after the update", "neutral"),
    ("the request was logged", "neutral"),
    ("the platform supports dark mode", "neutral"),
    ("the version number was displayed", "neutral"),
    ("the setup guide explains the steps", "neutral"),
    ("the user accessed the dashboard", "neutral"),
    ("the page refreshed automatically", "neutral"),
    ("the notification was shown on screen", "neutral"),
    ("the settings were saved", "neutral"),
    ("the system check completed", "neutral"),

    # ---------- NEGATIVE (50) ----------
    ("the application crashed multiple times", "negative"),
    ("i am stressed because of tight deadlines", "negative"),
    ("the customer service did not respond", "negative"),
    ("i missed the deadline and feel frustrated", "negative"),
    ("the update caused more problems than before", "negative"),
    ("i am disappointed with the overall experience", "negative"),
    ("the system is slow and unreliable", "negative"),
    ("i feel exhausted after working late nights", "negative"),
    ("the instructions were confusing and unclear", "negative"),
    ("the issue remains unresolved after several attempts", "negative"),
    ("the process took much longer than expected", "negative"),
    ("i encountered multiple errors during setup", "negative"),
    ("the service was unavailable for hours", "negative"),
    ("the application failed to load properly", "negative"),
    ("i am unhappy with the final outcome", "negative"),
    ("the response time was extremely slow", "negative"),
    ("the feature does not work as described", "negative"),
    ("i experienced frequent interruptions", "negative"),
    ("the results were inaccurate and misleading", "negative"),
    ("the problem keeps happening repeatedly", "negative"),
    ("the interface is difficult to use", "negative"),
    ("the task failed without explanation", "negative"),
    ("i am annoyed by constant errors", "negative"),
    ("the system froze during operation", "negative"),
    ("the instructions did not help at all", "negative"),
    ("the update broke existing functionality", "negative"),
    ("i feel overwhelmed by the workload", "negative"),
    ("the service quality has declined", "negative"),
    ("the tool behaves inconsistently", "negative"),
    ("the error message was unclear", "negative"),
    ("the issue caused significant delays", "negative"),
    ("the application consumes too much memory", "negative"),
    ("the setup process was frustrating", "negative"),
    ("i regret using this solution", "negative"),
    ("the system crashed unexpectedly", "negative"),
    ("the feature is poorly implemented", "negative"),
    ("the performance degraded noticeably", "negative"),
    ("the service failed during peak hours", "negative"),
    ("the task could not be completed", "negative"),
    ("the experience was disappointing overall", "negative"),
    ("the interface feels cluttered", "negative"),
    ("the workflow is inefficient", "negative"),
    ("the system produced incorrect results", "negative"),
    ("the problem remains unsolved", "negative"),
    ("the tool failed under load", "negative"),
    ("the update introduced new bugs", "negative"),
    ("i am dissatisfied with the service", "negative"),
    ("the process was unnecessarily complex", "negative"),
    ("the system response was inconsistent", "negative"),
    ("the application stopped working entirely", "negative"),
], columns=["text", "sentiment"])

## 3Ô∏è‚É£ Text Vectorization (Bag-of-Words)

Machine-learning models cannot work directly with raw text.

We use **Bag-of-Words (BoW)** via `CountVectorizer` to:
- tokenize text into words
- build a vocabulary
- count word occurrences per document

Each sentence becomes a **numerical feature vector** where:
- rows represent documents
- columns represent words
- values represent word counts


In [None]:
data = data.sample(frac=1).reset_index(drop=True)

In [None]:
x = data["text"]   
y = data["sentiment"]

In [None]:
countVec = CountVectorizer()
countVec_Fit = countVec.fit_transform(x)   

In [None]:
bag_of_words  = pd.DataFrame(countVec_Fit.toarray(), columns=countVec.get_feature_names_out())

In [None]:
print(bag_of_words)

## 4Ô∏è‚É£ Train-Test Split

The dataset is split into:
- **70% training data**
- **30% test data**

This allows us to:
- train models on one portion
- evaluate performance on unseen data


In [None]:
x_train, x_test, y_train, y_test = train_test_split(bag_of_words, y, test_size=0.3, random_state=7)

## 5Ô∏è‚É£ Models Trained

We train and evaluate **three different classifiers**:

### üîπ Logistic Regression
- Linear classifier commonly used for text classification
- Works well with high-dimensional sparse data

### üîπ Multinomial Naive Bayes
- Probabilistic model well-suited for word counts
- Often strong baseline for NLP tasks

### üîπ Linear SVM (SGDClassifier)
- Margin-based classifier
- Effective for large feature spaces


In [None]:
lr =  LogisticRegression(random_state=1).fit(x_train, y_train)

In [None]:
y_pred_lr = lr.predict(x_test)

## 6Ô∏è‚É£ Model Evaluation

Each model is evaluated using:
- **Accuracy**
- **Precision**
- **Recall**
- **F1-score**

The `classification_report` provides per-class performance metrics,
allowing us to compare how well each model handles:
- positive
- neutral
- negative sentiment


In [None]:
accuracy_score(y_test,  y_pred_lr)

In [None]:
print(classification_report(y_test, y_pred_lr, zero_division=0))

In [None]:
from sklearn.naive_bayes import MultinomialNB

nb = MultinomialNB().fit(x_train, y_train)


In [None]:
y_pred_nb = nb.predict(x_test)

In [None]:
accuracy_score(y_test, y_pred_nb)

In [None]:
print(classification_report(y_test, y_pred_nb, zero_division=0))

In [None]:
from sklearn.linear_model import SGDClassifier
svm = SGDClassifier().fit(x_train, y_train)

In [None]:
y_pred_svm = svm.predict(x_test)

In [None]:
accuracy_score(y_test, y_pred_svm)

## 7Ô∏è‚É£ Key Observations

- Different models perform differently on the same data
- Naive Bayes often performs well with small text datasets
- Linear models benefit from sparse Bag-of-Words features
- Model comparison is essential before choosing a final approach


## ‚úÖ Final Takeaways

- Text must be converted into numerical form before ML can be applied
- Bag-of-Words is a simple but powerful baseline representation
- Classical ML models remain highly effective for NLP tasks
- Comparing multiple classifiers provides better insight than relying on one
- Understanding the pipeline matters more than chasing accuracy alone
