This lesson was adapted from code posted in the following article: 
[A Simple NLP Sentiment Analysis with Keras and Google Colaboratory](https://medium.com/i-a/simple-nlp-sentiment-analysis-with-google-colaboratory-761a5391b57c) 


**Step 1:**  
Load required modules

In [0]:
import numpy as np
import requests

**Step 2:**

Download three training data files and append the lines to a single list.

In [0]:
urlpath = 'https://raw.githubusercontent.com/hdekk/elag19/master/'
filenames = ['imdb_labelled.txt', 'amazon_cells_labelled.txt', 'yelp_labelled.txt']
corpus = []
for f in filenames:
  response = requests.get(urlpath + f)
  corpus += response.text.split('\r\n')



**Step 3:**  
Split each line on the tab character and drop any line containing an empty text field. Also, drop any line that doesn't contain exactly two records.

In [0]:
clean_corpus = [line.split('\t') for line in corpus if len(line.split('\t')) == 2 and line.split('\t')[1] != '']

**Optional:**  
Compare the number of lines in the raw data to the number in the cleaned data.

In [24]:
print(len(corpus))
print(len(clean_corpus))

3003
3000


**Train the model**

In [0]:
train_documents = [line[0] for line in clean_corpus]
train_labels = [int(line[1]) for line in clean_corpus]

**Use Naive Bayes Algorithm from the scikit-learn module**

In [0]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import BernoulliNB

**Convert the training set to a matrix of token counts**

In [0]:
count_vectorizer = CountVectorizer(binary='true')
train_documents = count_vectorizer.fit_transform(train_documents)

**Fit BernoulliNB Classifier**

In [0]:
classifier = BernoulliNB().fit(train_documents, train_labels)

**Create a function to output the sentiment**

In [0]:
def predictionOutput(sentence):
  prediction = classifier.predict(count_vectorizer.transform([sentence]))
  if(prediction[0] == 1):
    print('This is a positive statement')
  elif(prediction[0] == 0):
    print('This is a negative statement')

**Let's test it!**

In [31]:
predictionOutput('Berlin is a wonderful city')

This is a positive statement


In [32]:
predictionOutput('The traffic in Berlin is terrible')

This is a negative statement
