<a href="https://colab.research.google.com/github/rahamath10/internship-/blob/main/Restaurant_Review_Sentiment_Analysis_Using_TF_IDF_and_Logistic_Regression_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##**1. Import Required Libraries**

In [42]:
import nltk
import pandas as pd
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()

##**2. Load Dataset**

In [43]:
df = pd.read_csv("/content/Restaurant_Reviews.tsv",sep='\t')
df.head()

Unnamed: 0,Review,Liked
0,Wow... Loved this place.,1
1,Crust is not good.,0
2,Not tasty and the texture was just nasty.,0
3,Stopped by during the late May bank holiday of...,1
4,The selection on the menu was great and so wer...,1


##**3. Check First Review**

In [44]:
df.iloc[0]['Review']

'Wow... Loved this place.'

##**4.Initialize NLP Tools And Text Cleaning & Preprocessing**

In [45]:
import re
import nltk
from nltk.stem import PorterStemmer
from nltk.corpus import stopwords

nltk.download('stopwords')
ps = PorterStemmer()

corpus=[]
for i in range(0,len(df)):
    Review = re.sub('[^a-zA-Z]',' ',df['Review'][i])
    Review = Review.lower()
    Review = Review.split()
    Review = [ps.stem(word) for word in Review if word not in set(stopwords.words('english'))]
    Review = ' '.join(Review)
    corpus.append(Review)

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


##**5. Check Cleaned Review**

In [46]:
corpus[0]

'wow love place'

##**6. Check Missing Values**

In [47]:
df.isnull().sum()

Unnamed: 0,0
Review,0
Liked,0


##**7. Define Features & Target**

In [48]:
X = df['Review']
y = df['Liked']

##**8. Class Distribution**

In [49]:
y.value_counts(1)*100

Unnamed: 0_level_0,proportion
Liked,Unnamed: 1_level_1
1,50.0
0,50.0


##**9. Train-Test Split**

In [50]:
# Train Test Split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, random_state = 0)

##**10. Convert Text to Numbers (TF-IDF)**

In [51]:
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer()
X_train_vect = vectorizer.fit_transform(X_train)
X_test_vect = vectorizer.transform(X_test)

##**11. Train Logistic Regression Model**

In [52]:
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression(solver='lbfgs')
clf.fit(X_train_vect,y_train)

##**12. Model Evaluation**

In [53]:
from sklearn.metrics import accuracy_score, classification_report
y_pred = clf.predict(X_test_vect)
accuracy_score(y_test,y_pred)
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

           0       0.75      0.85      0.80       143
           1       0.84      0.75      0.79       157

    accuracy                           0.79       300
   macro avg       0.80      0.80      0.79       300
weighted avg       0.80      0.79      0.79       300



##**13.Test with New Review**

In [54]:
nltk.download('wordnet')
test = "this resturnt is good"

a = re.sub('[^a-zA-Z]',' ',test)
a = a.lower()
a = a.split()
a = [lemmatizer.lemmatize(word) for word in a ]
a = ' '.join(a)


example_counts = vectorizer.transform([a])

prediction =clf.predict(example_counts)
prediction[0]

if prediction[0]==0:
    print("This is Negative Review")
elif prediction[0]==1:
    print("This is Positive Review")

This is Positive Review


[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
