<a href="https://colab.research.google.com/github/rutripathi96/Text_based_emotion_calssifier/blob/main/Emotion_Classifier_ipynb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Natural Language Processing (NLP) for Text Classification: Create a text classification model for sentiment analysis, spam detection, or topic categorization using NLP techniques and libraries like NLTK or spaCy.


# Create a text classification model for sentiment analysis

**Emotion Classification on Twitter Dataset**

This Colab notebook demonstrates how to perform emotion classification**(sentiment analysis)**  on a Twitter dataset using machine learning techniques. The dataset consists of Twitter messages labeled with six different emotions: sadness, joy, love, anger, fear, and surprise. We aim to build a text classification model to predict the predominant emotion conveyed in each message.

**Dataset Description:**

https://www.kaggle.com/datasets/nelgiriyewithana/emotions  

The dataset contains text segments extracted from Twitter messages, along with corresponding labels indicating the predominant emotion conveyed in each message. The emotions are classified into six categories:

Sadness (0)
Joy (1)
Love (2)
Anger (3)
Fear (4)
Surprise (5)
Whether you're interested in sentiment analysis, emotion classification, or text mining, this dataset provides a rich foundation for exploring the nuanced emotional landscape within the realm of social media.

# Importing Necessary Libraries:

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import SVC
from sklearn.metrics import classification_report, accuracy_score


#  Loading the Dataset:

In [None]:
# Load the dataset
df = pd.read_csv('text.csv')


# Data Preprocessing:

In [None]:
# Split data into features (text) and labels (emotions)
X = df['text']
y = df['label']

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


# Feature Engineering: TF-IDF Vectorization:

In [None]:
# Vectorize the text data using TF-IDF
vectorizer = TfidfVectorizer(max_features=5000)  # You can adjust max_features as needed
X_train_vectorized = vectorizer.fit_transform(X_train)
X_test_vectorized = vectorizer.transform(X_test)


# Perform dimensionality reduction using PCA

In [None]:
from sklearn.decomposition import TruncatedSVD
n_components = 100  # Adjust the number of components as needed
pca = TruncatedSVD(n_components=n_components)
X_train_pca = pca.fit_transform(X_train_vectorized)
X_test_pca = pca.transform(X_test_vectorized)

# Model Training and prediction:

In [None]:
import numpy as np
from sklearn.linear_model import LogisticRegression
# Find indices of NaN values in y_test
nan_indices = np.where(np.isnan(y_test))[0]

# Adjust indices to ensure correct length after removal
nan_indices_adjusted = [idx - i for i, idx in enumerate(nan_indices)]

# Remove rows with NaN values from X_test_pca and y_test
X_test_pca_clean = np.delete(X_test_pca, nan_indices_adjusted, axis=0)
y_test_clean = y_test.dropna()

# Predict on the cleaned testing set
y_pred_logreg = logreg_classifier.predict(X_test_pca_clean)

# Evaluate the Logistic Regression model
accuracy_logreg = accuracy_score(y_test_clean, y_pred_logreg)
print("Logistic Regression Accuracy:", accuracy_logreg)
print("\nLogistic Regression Classification Report:")
print(classification_report(y_test_clean, y_pred_logreg))


Logistic Regression Accuracy: 0.42615053275901155

Logistic Regression Classification Report:
              precision    recall  f1-score   support

         0.0       0.40      0.56      0.47      5186
         1.0       0.46      0.72      0.56      6048
         2.0       0.21      0.01      0.02      1399
         3.0       0.29      0.04      0.07      2386
         4.0       0.35      0.07      0.12      1957
         5.0       0.27      0.01      0.01       668

    accuracy                           0.43     17644
   macro avg       0.33      0.24      0.21     17644
weighted avg       0.38      0.43      0.35     17644



the code provided fulfills the requirements for a text classification task using Natural Language Processing (NLP) techniques for sentiment analysis on the Twitter emotion dataset. Here's how it aligns with the task description:

Text Classification Model: The code builds a text classification model using the provided dataset. It trains a machine learning classifier (Logistic Regression) to predict the predominant emotion conveyed in Twitter messages.

NLP Techniques: The code utilizes NLP techniques such as TF-IDF vectorization for feature extraction from text data. It also performs dimensionality reduction using Truncated Singular Value Decomposition (TruncatedSVD), a technique commonly used in NLP tasks to reduce the dimensionality of text data.

Libraries: The code leverages popular Python libraries for NLP and machine learning, including pandas for data manipulation, scikit-learn for machine learning algorithms, and numpy for numerical computations.

Task Fulfillment: The code successfully executes the entire pipeline, including data loading, preprocessing, model training, prediction, and evaluation. It provides accuracy metrics and a classification report to assess the performance of the trained model.