<a href="https://colab.research.google.com/github/hadissuryaalamin/Sentiment-Analysis-Emotion/blob/main/Sentiment_Analysis_Emotion.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Data Loading

In [49]:
#import library
import pandas as pd

In [50]:
#load the data
data = pd.read_csv('/content/emotions.csv')

Each entry in this dataset consists of a text segment representing a Twitter message and a corresponding label indicating the predominant emotion conveyed. The emotions are classified into six categories: sadness (0), joy (1), love (2), anger (3), fear (4), and surprise (5). Whether you're interested in sentiment analysis, emotion classification, or text mining, this dataset provides a rich foundation for exploring the nuanced emotional landscape within the realm of social media.

Key Features:

# 2. Data Preprocessing

In [None]:
#import library
import re
import nltk
nltk.download('stopwords', download_dir='/usr/local/share/nltk_data')
nltk.download('punkt', download_dir='/usr/local/share/nltk_data')
nltk.download('wordnet', download_dir='/usr/local/share/nltk_data')
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer

In [52]:
lemmatizer = WordNetLemmatizer()
stop_words = set(stopwords.words('english'))

def preprocess_text(text):
    # Remove special characters and digits
    text = re.sub(r'\W', ' ', text)
    text = re.sub(r'\d', ' ', text)
    # Tokenization
    tokens = word_tokenize(text)
    # Convert to lowercase and lemmatize
    tokens = [lemmatizer.lemmatize(word.lower()) for word in tokens]
    # Remove stopwords
    tokens = [word for word in tokens if word not in stop_words]
    return ' '.join(tokens)

data['processed_text'] = data['text'].apply(preprocess_text)

# 3. Feature Extraction

In [30]:
#import library
from sklearn.feature_extraction.text import TfidfVectorizer

In [31]:
tfidf_vectorizer = TfidfVectorizer(max_features=5000)
X = tfidf_vectorizer.fit_transform(data['processed_text'])
y = data['label']

# 4. Model Selection and Training

In [32]:
#import library
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

In [33]:
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# 5. Evaluation

In [34]:
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.94      0.94      0.94     24201
           1       0.92      0.93      0.92     28164
           2       0.81      0.77      0.79      6929
           3       0.90      0.90      0.90     11441
           4       0.85      0.85      0.85      9594
           5       0.77      0.70      0.74      3033

    accuracy                           0.90     83362
   macro avg       0.86      0.85      0.86     83362
weighted avg       0.90      0.90      0.90     83362



# 6. Inference

In [35]:
def predict_sentiment(tweet):
    processed_tweet = preprocess_text(tweet)
    vectorized_tweet = tfidf_vectorizer.transform([processed_tweet])
    sentiment = model.predict(vectorized_tweet)[0]
    return sentiment

# 7. Testing

In [57]:
# Example usage
sentiment_mapping = {
    0: "sadness",
    1: "joy",
    2: "love",
    3: "anger",
    4: "fear",
    5: "surprise"
}

# Example usage
new_tweet = "after my friend is married, I feel lonely" #test here
predicted_sentiment = predict_sentiment(new_tweet)

# Print the predicted sentiment label
if predicted_sentiment in sentiment_mapping:
    print("Predicted Sentiment:", sentiment_mapping[predicted_sentiment])
else:
    print("Unknown sentiment label")

Predicted Sentiment: sadness
