<a href="https://colab.research.google.com/github/hblacksmith/Clustering/blob/main/MSc_DTS_New_ML_Classifier_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Natural Language Processing

## Importing the libraries

In [81]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## Importing the dataset

In [82]:
dataset = pd.read_csv('Restaurant_Reviews.tsv', delimiter = '\t', quoting = 3)

## Cleaning the texts

In [83]:
import re
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
corpus = []
for i in range(0, 49):
  review = re.sub('[^a-zA-Z]', ' ', dataset['Review'][i])
  review = review.lower()
  review = review.split()
  ps = PorterStemmer()
  all_stopwords = stopwords.words('english')
  all_stopwords.remove('not')
  review = [ps.stem(word) for word in review if not word in set(all_stopwords)]
  review = ' '.join(review)
  corpus.append(review)

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [84]:
print(corpus)

['raf cadet journey demonstr leadership took charg cadet activ collect money chariti supermarket help pack bag outset ensur cadet knew repres raf need full correct uniform look proud explain format day includ time made sure cadet awar behaviour need display made clear bad languag behaviour would toler saturday morn met cadet supermarket ask older cadet work newer cadet creat support buddi team morn quiet lunch time approach got busi call cadet short break explain idea chang lunch break slightli later us could collect money busi period group discuss idea agre take later break instruct anyon need take lunch could peopl might differ need diabet need eat throughout day regularli check team spoke cadet home time thank made sure everyon ok happi day went day success collect', 'work role half year lot experi knowledg role trust manag lead team throughout day alloc task new member join team train use equip carri differ job furthermor key holder one show new manag lock build teach lock everi fi

## Creating the Bag of Words model

In [85]:
from sklearn.feature_extraction.text import TfidfVectorizer
cv = TfidfVectorizer(max_features = 1500)
X = cv.fit_transform(corpus).toarray()
y = dataset.iloc[:, -1].values

## Splitting the dataset into the Training set and Test set

In [86]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

## Training the Naive Bayes model on the Training set

In [87]:
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(X_train, y_train)

GaussianNB()

## Predicting the Test set results

In [88]:
y_pred = classifier.predict(X_test)
print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))

[[0 0]
 [0 0]
 [0 0]
 [0 0]
 [0 0]
 [0 1]
 [0 0]
 [0 0]
 [1 1]
 [0 0]
 [0 0]
 [0 0]
 [0 0]]


## Making the Confusion Matrix

In [89]:
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

[[11  0]
 [ 1  1]]


0.9230769230769231

In [90]:
print(X_test)

[[0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.04664339 0.         ... 0.         0.         0.        ]
 [0.         0.         0.07860934 ... 0.         0.         0.        ]
 ...
 [0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.04328291 0.         ... 0.         0.         0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]]


## Predicting if a single review is positive or negative

### Positive review

In [91]:
new_review = "Recently I lead a charity event which successfully raised £3000.Throughout this I was in charge of all the accounts, contacting companies, fixing dates, presenting ideas, delegating roles. During this process a member of our group lost the research and final PowerPoint a few days before the pitch to our school and to the venue for the event which at the time was a devastating problem. However because of my persistent nature I decided to stay late after school and take responsibility as a leader to accumulate the lost research and create a PowerPoint within the deadline. I worked fast, efficiently and professionally and managed to get our group back on track. The extra work I put in for our group showed how determined I was regardless of tensions and external pressures which eventually did pay of as everything went as planned and we raised the highest amount in the borough. This example shows my leadership capabilities alongside my ability to work at pace whilst communicating with my team members and leading the group to an overall success."
new_review = re.sub('[^a-zA-Z]', ' ', new_review)
new_review = new_review.lower()
new_review = new_review.split()
ps = PorterStemmer()
all_stopwords = stopwords.words('english')
all_stopwords.remove('not')
new_review = [ps.stem(word) for word in new_review if not word in set(all_stopwords)]
new_review = ' '.join(new_review)
new_corpus = [new_review]
new_X_test = cv.transform(new_corpus).toarray()
new_y_pred = classifier.predict(new_X_test)
print(new_y_pred)

[1]


The review was correctly predicted as positive by our model.

### Negative review

**Solution:** We just repeat the same text preprocessing process we did before, but this time with a single review.

In [92]:
new_review = "I was responsible for running the Budget Day marketing campaign. I organised meetings with the tax and marketing teams. I was briefed by my line manager on my responsibilities and created a contingency plan. I had liaised with the designers to make sure they were on standby and the layout was approved. I collaborated with the marketing team to ensure they understood and were happy with their responsibilities. During the day, the budget was announced and the tax team were drafting their articles. However, there was a mix up with the banners for our social media posts. It seemed there was miscommunication with the marketing team. The banner concept was wrong and the colour scheme did not match the overall branding. Whist waiting for the articles from compliance check, I decided to rectify the problem by taking a leading role and arranged a meeting with the marketing team. We decided to make some minor adjustments to the banner which worked with contrast to the design of the publication. The marketing team felt better and they assisted me with proof checking the articles before sending to the designers. As a result, I managed to diffuse the conflict with the marketing team and I proactively created a stronger relationship with them. The budget day publication was sent out on time and everyone was happy with the end result. The marketing team were happy that their needs were considered and I received positive feedback from my line manager since I resolved the situation."

new_review = re.sub('[^a-zA-Z]', ' ', new_review)
new_review = new_review.lower()
new_review = new_review.split()
ps = PorterStemmer()
all_stopwords = stopwords.words('english')
all_stopwords.remove('not')
new_review = [ps.stem(word) for word in new_review if not word in set(all_stopwords)]
new_review = ' '.join(new_review)
new_corpus = [new_review]
new_X_test = cv.transform(new_corpus).toarray()
new_y_pred = classifier.predict(new_X_test)
print(new_y_pred)

[0]


The review was correctly predicted as negative by our model.