# Natural Language Processing (NLP)

Natural Language Processing (or NLP for short) is a disipline in computing that deals with the communication between natural (human) languages and computer languages. 

[*Pedro St Clair*](http://github.com/pedrostclair)

##Introduction

NLP, is essentially the field that focuses on how computers can understand and/or process natural/human languages. Common examples of NLP include useful tools we take for granted these days, such as: autocomplete or spellcheck.

 

##Recursive Neural Networks

We will introduce a 'new' kind of neural network, one that is much is much more capable of processing sequential data such as text or characters called a recurrent neural network (RNN for short).

We will use a recurrent neural network to do the following:

    Sentiment Analysis
    Character Generation

RNN's are rather complex, and come in a variety of different forms. For now, the focus is merely how they work and what they are best suited for.

##Sequence Data

We look closer at sequences of text and learn how we can encode them in a meaningful way. Unlike images, sequence data such as: 'long chains of text', 'weather-patterns', videos and practically anything - where the notion of a step - or time - is relevant. This is processed, and handled, in a special way. But what is meant by 'sequences'? And why on earth is 'text data' a 'sequence'? One might often ponder. Well, Textual data, it seems, contains many words that follow a very specific and meaningful order. The abilty to keep track of each word, and when it occurs in the data, is required. Encoding an entire of text into one data point wouldn't provide a very meaningful picture of the data, and would very difficult to do anything with. The aim, is to Keep track of where each of these words appear , and use that information to try to understand the meaning of pieces of text. 

##Encoding Text

Machine learning models and neural networks don't take raw data as an input. A means to encode textual data to numeric values is required, so that the models can understand. Fortunately, a few methods do exist. We look at an example below.

Let's understand the information we can get from textual data by looking at the following two movie reviews:

 
 
 
 
 
 `I thought the movie was going to be bad, but it was actually amazing!`


 `I thought the movie was going to be amazing, but it was actually bad!`





The two sentences above are very similar, but we know they have two different meanings.

This is largely thanks to **ordering** of words, a vital property of textual data.

We will keep this in mind as we consider various ways of encoding textual data.






### Bag of Words
The first and the simplest way to encode our data is to use something called **`bag_of_words`**
It's an easy technique where each word in a sentence is encoded with an interger and thrown into a collection that does not maintain the order of words but does keep track of the frequency.
The python function below encodes a string of text into a **`bag_of_words`**.

#### Setup

In [None]:
import tensorflow as tf
import keras
import sys
from sys import exit

#### Define 'text' and 'bag'

In [None]:
text = "Oh , A cup of rooi bos tea ? What a marvelous idea !" # Throwing punctuation in the bag? Leave a space between words
bag = bag_of_words(text)


#### Encode textual data

In [None]:
vocab = {} #maps word to integer representing it
word_encoding = 1
def bag_of_words(text):
  global word_encoding

words = text.lower().split(" ") # create a list of all the words in the text.
bag = {} # stores all of the encoding and their frequency

for word in words:
  if word in vocab:
    encoding = vocab[word] # get encoding from vocab
  else:
    vocab[word] = word_encoding
    encoding = word_encoding
    word_encoding += 1

  if encoding in bag:
    bag[encoding] += 1
  else:
      bag[encoding] = 1

# ' return bag ' statement removed
# the return statement was removed, as it returned an 'outside function' error.
# Syntactically, 'return statements' only make sense within a function, example:
#
#   
#   def foo():
#    while True:
#        return False
#
# for more --> https://stackoverflow.com/questions/7842120/python-return-statement-error-return-outside-function




text1 = " Yes we know who Morgan Freeman is, and his contribution to the arts . " # Throwing punctuation in the bag? Leave a space between words
text2 = " However , that movie was horrid, and then the inquisition just barging in like that ? We demand a refund ! "
text = text1 + text2
bag = bag_of_words(text)
print(vocab)
print(bag)



{'': 1, 'yes': 2, 'we': 3, 'know': 4, 'who': 5, 'morgan': 6, 'freeman': 7, 'is,': 8, 'and': 9, 'his': 10, 'contribution': 11, 'to': 12, 'the': 13, 'arts': 14, '.': 15, 'however': 16, ',': 17, 'that': 18, 'movie': 19, 'was': 20, 'horrid,': 21, 'then': 22, 'inquisition': 23, 'just': 24, 'barging': 25, 'in': 26, 'like': 27, '?': 28, 'demand': 29, 'a': 30, 'refund': 31, '!': 32}
None


# References

### 1. NLP FreeCodeCamp Curriculum
"https://www.freecodecamp.org/learn/machine-learning-with-python/tensorflow/natural-language-processing-with-rnns"


### 2. Return Statements
"https://stackoverflow.com/questions/7842120/python-return-statement-error-return-outside-function"