## Elective: Text Sentiment Analysis

If you would like more practice with analyzing sequences of words with a simple network, now would be a great time to check out the elective section: Text Sentiment Analysis. In this section, Andrew Trask teaches you how to convert words into vectors and then analyze the sentiment of these vectors. He goes through constructing and tuning a model and addresses some common errors in text analysis. This section does not contain material that is required to complete this program or the project in this section, but it is interesting and you may find it useful!


## Deep Learnign for NLP 

Deep Learning, is a suite of tools that allows you to take what you know and predict what you want to know using the neural networks. NLP is 
![image.png](attachment:24b451d9-bd51-4950-adb9-e8e56cb5c056.png)

Natural language processing is the study of human language using tools such as machine learning, and in this case, deep learning.

In this tutorial, we're going to be talking about sentiment classification, or the classification of whether or not a section of human-generated text is positive or negative. So in this case, what we know is a section of human-generated text, and we want to know if one of these is a positive or negative label. Now, what this tutorial is really going to be about is framing a problem so the network can be successful in discovering correlation between your input and your output data. Sentiment classification is really good for this because neural nets don't naturally accept text input; they accept numbers.

So what we're going to have to do is transform our textual input data into numerical form in such a way that the neural network can easily discover the correlation. Our goal is going to be to see how we can change the way that we do this, so we set our problem in such a way that the neural net discovers correlation as quickly and easily as possible.




# Sentiment Classification & How To "Frame Problems" for a Neural Network

### What You Should Already Know

- neural networks, forward and back-propagation
- stochastic gradient descent
- mean squared error
- and train/test splits

### Where to Get Help if You Need it
- Re-watch previous Udacity Lectures
- Leverage the recommended Course Reading Material - [Grokking Deep Learning](https://www.manning.com/books/grokking-deep-learning) (Check inside your classroom for a discount code)
- Shoot me a tweet @iamtrask


### Tutorial Outline:

- Intro: The Importance of "Framing a Problem" (this lesson)

- [Curate a Dataset](#lesson_1)
- [Developing a "Predictive Theory"](#lesson_2)
- [**PROJECT 1**: Quick Theory Validation](#project_1)


- [Transforming Text to Numbers](#lesson_3)
- [**PROJECT 2**: Creating the Input/Output Data](#project_2)


- Putting it all together in a Neural Network (video only - nothing in notebook)
- [**PROJECT 3**: Building our Neural Network](#project_3)


- [Understanding Neural Noise](#lesson_4)
- [**PROJECT 4**: Making Learning Faster by Reducing Noise](#project_4)


- [Analyzing Inefficiencies in our Network](#lesson_5)
- [**PROJECT 5**: Making our Network Train and Run Faster](#project_5)


- [Further Noise Reduction](#lesson_6)
- [**PROJECT 6**: Reducing Noise by Strategically Reducing the Vocabulary](#project_6)


- [Analysis: What's going on in the weights?](#lesson_7)

# Lesson: Curate a Dataset<a id='lesson_1'></a>
The cells from here until Project 1 include code Andrew shows in the videos leading up to mini project 1. We've included them so you can run the code along with the videos without having to type in everything.

In [10]:
def pretty_print_review_and_label(i):
    print(labels[i] + "\t:\t" + reviews[i][:80] + "...")

g = open('reviews.txt','r') # What we know!
reviews = list(map(lambda x:x[:-1],g.readlines()))
g.close()

g = open('labels.txt','r') # What we WANT to know!
labels = list(map(lambda x:x[:-1].upper(),g.readlines()))
g.close()

**Note:** The data in `reviews.txt` we're using has already been preprocessed a bit and contains only lower case characters. If we were working from raw data, where we didn't know it was all lower case, we would want to add a step here to convert it. That's so we treat different variations of the same word, like `The`, `the`, and `THE`, all the same way.

In [11]:
len(reviews)

25000

In [12]:
reviews[0]

'POSITIVE'

In [13]:
labels[0]

NameError: name 'labels' is not defined

# Lesson: Develop a Predictive Theory<a id='lesson_2'></a>

In [7]:
print("labels.txt \t : \t reviews.txt\n")
pretty_print_review_and_label(2137)
pretty_print_review_and_label(12816)
pretty_print_review_and_label(6267)
pretty_print_review_and_label(21934)
pretty_print_review_and_label(5297)
pretty_print_review_and_label(4998)

labels.txt 	 : 	 reviews.txt



NameError: name 'labels' is not defined