<a href="https://colab.research.google.com/github/purvasingh96/Deep-learning-with-neural-networks/blob/master/Deep-learning-with-pytorch/3.%20Recurrent%20Neural%20Networks/Sentiment_analysis_via_RNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Sentiment Analysis with an RNN
In this notebook I have implemented a RNN that performs sentiment analysis. <br>
Reason for using RNN instead of a strictly feedforward network is that we can also include information about *sequence* of words.

### Network Architecture
Below would be the architecture diagram for my sentiment analysis model - <br>
<img src="https://github.com/purvasingh96/Deep-learning-with-neural-networks/blob/master/Deep-learning-with-pytorch/3.%20Recurrent%20Neural%20Networks/images/network_diagram.png?raw=1" width=40%></img>

**Notes -**
1. Since we are performing sentiment analysis, we need a more efficient representation of words as compared to one_hot_encoded vectors. Hence, using *embeded layer for dimensionality reduction.*
2. The new embeddings will be passed to LSTM cells. LSTM cells will add recurrent connections and add ability to *include information about sequence of words.*
3. Final LSTM outputs will go to *Sigmoid output layer.*

### Load in and visualize the data

In [0]:
import numpy as np

In [0]:
with open('reviews.txt', 'r') as f:
  reviews = f.read()
with open('labels.txt', 'r') as f:
  labels = f.read()

In [0]:
print(reviews[:100])
print(labels[:100])

bromwell high is a cartoon comedy . it ran at the same time as some other programs about school life
positive
negative
positive
negative
positive
negative
positive
negative
positive
negative
positive
n


### Data pre-processing
1. Get rid of punctuation marks etc.
2. Reviews are delimited by \n. Use \n as delimiter to split text into each reviews.
3. Combine reviews in step-2 into 1 big string.

In [0]:
from string import punctuation

# get rid of punctuation
reviews = reviews.lower()
all_text = ''.join([c for c in reviews if c not in punctuation])

# split by new lines and space
reviews_split = all_text.split('\n')
all_text = ' '.join(reviews_split)

# create a list of words
words = all_text.split()

In [0]:
words[:20]

['bromwell',
 'high',
 'is',
 'a',
 'cartoon',
 'comedy',
 'it',
 'ran',
 'at',
 'the',
 'same',
 'time',
 'as',
 'some',
 'other',
 'programs',
 'about',
 'school',
 'life',
 'such']

In [19]:
from collections import Counter

counts = Counter(words)
'''
counts = Counter({'bromwell': 5,
                  'high': 742,
                  'is': 39879,
                  'a': 60733
                  })

vocabulary_to_int = {'the': 1,
                      'and': 2,
                      'a': 3,
                      'of': 4,
                      'to': 5,
                      'is': 6
                    }
'''
vocabulary = sorted(counts, key=counts.get, reverse=True)
vocabulary_to_int = {word:ii for ii, word in enumerate(vocabulary, 1)}
reviews_int = []
for reviews in reviews_split:
  reviews_int.append([vocabulary_to_int[word] for word in reviews.split()])
reviews_int[:1]


[[21025,
  308,
  6,
  3,
  1050,
  207,
  8,
  2138,
  32,
  1,
  171,
  57,
  15,
  49,
  81,
  5785,
  44,
  382,
  110,
  140,
  15,
  5194,
  60,
  154,
  9,
  1,
  4975,
  5852,
  475,
  71,
  5,
  260,
  12,
  21025,
  308,
  13,
  1978,
  6,
  74,
  2395,
  5,
  613,
  73,
  6,
  5194,
  1,
  24103,
  5,
  1983,
  10166,
  1,
  5786,
  1499,
  36,
  51,
  66,
  204,
  145,
  67,
  1199,
  5194,
  19869,
  1,
  37442,
  4,
  1,
  221,
  883,
  31,
  2988,
  71,
  4,
  1,
  5787,
  10,
  686,
  2,
  67,
  1499,
  54,
  10,
  216,
  1,
  383,
  9,
  62,
  3,
  1406,
  3686,
  783,
  5,
  3483,
  180,
  1,
  382,
  10,
  1212,
  13583,
  32,
  308,
  3,
  349,
  341,
  2913,
  10,
  143,
  127,
  5,
  7690,
  30,
  4,
  129,
  5194,
  1406,
  2326,
  5,
  21025,
  308,
  10,
  528,
  12,
  109,
  1448,
  4,
  60,
  543,
  102,
  12,
  21025,
  308,
  6,
  227,
  4146,
  48,
  3,
  2211,
  12,
  8,
  215,
  23]]

In [0]:
reviews_

In [0]:
from google.colab import drive
drive.mount('/content/drive')