**Mini Project Airline Tweet Sentiment Classifier using Natural Language Processing**

Notes: Use sample dataset: https://github.com/salman1256/aimltraining/blob/main/Day-30/airline_tweets_sample.csv



---


Steps:
1. libraries
2. Load and explore dataset
3. Convert text to numerical vectors (TF-IDF)
4. Split into train and test sets
5. Train a Logistic Regression model
6. Evaluate accuracy and classification report
7. Predict sentiment for new example tweets

In [220]:
# Step 1 a: Import Required Libraries
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score,confusion_matrix,classification_report
import re
import nltk
from nltk.corpus import stopwords
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.stem import PorterStemmer
from nltk.stem import WordNetLemmatizer
import string
from sklearn.preprocessing import LabelEncoder

In [221]:
# Step 1 b: Download nltk required thing like stopwords
nltk.download('stopwords')
nltk.download('wordnet')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [222]:
# Step 2 a: Load DataSet
df = pd.read_csv('/content/airline_tweets_sample.csv')
df.head()

Unnamed: 0,text,sentiment
0,"@United flight was delayed for 3 hours, worst ...",negative
1,"Loved the service on @Delta, crew was super fr...",positive
2,"@AmericanAir lost my luggage again, so disappo...",negative
3,Smooth boarding and on-time arrival. Great job...,positive
4,The seats were uncomfortable but staff was polite,neutral


In [223]:
# Step 3:
# Text Cleaning and Preprocessing
# For each tweet do:
# a. Convert to lowercase
# b. Remove URLs
# c. Remove special characters and numbers
# d. Remove stopwords (common words like *the, is, and* etc.)
# e. Apply **stemming** (reduce words to their root form)

def clean_text(text):
  text=text.lower()
  text=re.sub(r'http\S+',' ',text)
  text=re.sub(r'www\S+',' ',text)
  text = re.sub(r'[^a-zA-Z]', ' ', text)
  text = re.sub(r'\s+',' ', text).strip()

  stemmer = PorterStemmer()
  stop_words = set(stopwords.words('english')) - {'no', 'not', 'never'}
  text = ' '.join([stemmer.stem(word) for word in text.split() if word not in stop_words])
  return text

In [224]:
df['cleaned_text'] = df['text'].apply(clean_text)
df

Unnamed: 0,text,sentiment,cleaned_text
0,"@United flight was delayed for 3 hours, worst ...",negative,unit flight delay hour worst experi ever
1,"Loved the service on @Delta, crew was super fr...",positive,love servic delta crew super friendli
2,"@AmericanAir lost my luggage again, so disappo...",negative,americanair lost luggag disappoint
3,Smooth boarding and on-time arrival. Great job...,positive,smooth board time arriv great job southwestair
4,The seats were uncomfortable but staff was polite,neutral,seat uncomfort staff polit
5,"@JetBlue flight attendants were rude, not flyi...",negative,jetblu flight attend rude not fli
6,"Got a free upgrade to business class, thank yo...",positive,got free upgrad busi class thank unit
7,"Average flight, nothing special to mention",neutral,averag flight noth special mention
8,@DeltaAirLines provided excellent support with...,positive,deltaairlin provid excel support book
9,The in-flight entertainment was not working,negative,flight entertain not work


In [225]:
# Step 4:
# a) Convert text to numerical vectors (TF-IDF)
# b) check X,y and shape len
le=LabelEncoder()
df['sentiment']=le.fit_transform(df['sentiment'])
X=df['cleaned_text']
y=df['sentiment']
X
y

Unnamed: 0,sentiment
0,0
1,2
2,0
3,2
4,1
5,0
6,2
7,1
8,2
9,0


In [226]:
X.shape, y.shape, len(X)

((30,), (30,), 30)

In [227]:
# Step 5: Split into train and test sets
# 80% training and 20% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [228]:
tfidf = TfidfVectorizer(max_features=5000, sublinear_tf=True)
X_train_tfidf = tfidf.fit_transform(X_train)
X_test_tfidf = tfidf.transform(X_test)

In [229]:
# Step 6: Train a Logistic Regression model
# a: Create Logistic Model
# b: Train Logistic Model
model=LogisticRegression()
model.fit(X_train_tfidf,y_train)

In [230]:
# Step 7:
# Evaluate accuracy and classification report
# a. predict Model
# b. Precision, reacll, F1-Score for each sentiments
y_pred=model.predict(X_test_tfidf)
print('Accuracy',accuracy_score(y_test,y_pred))
print('Confusion Matrix\n',confusion_matrix(y_test,y_pred))
print("\n Classification Report:\n", classification_report(y_test, y_pred))

Accuracy 0.3333333333333333
Confusion Matrix
 [[1 0 0]
 [0 0 1]
 [3 0 1]]

 Classification Report:
               precision    recall  f1-score   support

           0       0.25      1.00      0.40         1
           1       0.00      0.00      0.00         1
           2       0.50      0.25      0.33         4

    accuracy                           0.33         6
   macro avg       0.25      0.42      0.24         6
weighted avg       0.38      0.33      0.29         6



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [231]:
# Step 8: Predict sentiment for new example tweets
new_tweet = [
    "The flight was very comfortable and the staff were really friendly and helpful!",
    "My flight got delayed and the airline didn’t provide any updates!",
    "The flight was fine, nothing went wrong but nothing was particularly impressive either."
]
new_clean_tweet = [clean_text(tweet) for tweet in new_tweet]
new_clean_tweet_vec = tfidf.transform(new_clean_tweet)
predictions = model.predict(new_clean_tweet_vec)

sentiment_map = {0: "Negative", 1: "Neutral", 2: "Positive"}

print('\nPredictions for new Tweets are:')
for msg, prediction in zip(new_tweet, predictions):
    print(f'{msg} --> {sentiment_map[prediction]}')


Predictions for new Tweets are:
The flight was very comfortable and the staff were really friendly and helpful! --> Positive
My flight got delayed and the airline didn’t provide any updates! --> Negative
The flight was fine, nothing went wrong but nothing was particularly impressive either. --> Neutral
