# Intro to simple text classification in keras

In [1]:
# Do our imports
import numpy as np
import tensorflow as tf
from keras.models import Sequential #base keras model
from keras.layers import Dense, Activation #dense = fully connected layer
from tensorflow.keras.optimizers import SGD

In [2]:
#if any of these give you problems, make sure you've installed all libraries used (pandas, sklearn, and matplot lib)
# using conda install or pip install
# see the moodle page "Instructions for setting up and using Python and Jupyter" for more info on how to do this
import pandas as pd
from sklearn.feature_extraction import _stop_words
from sklearn.metrics.pairwise import cosine_similarity as cosine
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.decomposition import TruncatedSVD
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

## Loading a dataset

We're going to use a small set of 1000 movie reviews from IMDB. [The original dataset can be found here.](https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews?resource=download) 

Here's how to load in the dataset into the notebook:

In [3]:
# This assumes the IMDBsubset.csv file lives in a directory called "data" which lives in the same directory as this notebook.
# ***if you want to edit this notebook to use a different dataset, edit this to specify a different file:
df = pd.read_csv("data.csv")

Now we've got the data read in! We've used a special data type called a "data frame", using the Pandas library, to store this data. Pandas makes working with data pretty convenient.

Printing df will show you the data in a table-like format (specifically, it'll show you the first and last few rows of the table).

**Note that a sentiment of "1" means "positive" and "0" means "negative"**

In [4]:
df

Unnamed: 0.1,Unnamed: 0,name,authors,favorite_quote,One_line_review,why_should_read,key_takeaway_1,key_takeaway_2,key_takeaway_3
0,0,Solve For Happy,Mo Gawdat,If you can afford the brain cycles to worry ab...,Solve For Happy lays out a former Google engin...,Sometimes- it takes a tragedy to understand ha...,Your inner voice is not the real you.,Many cognitive filters prevent you from seeing...,No matter if life is good or bad- staying in t...
1,1,Stumbling On Happiness,Dan Gilbert,The secret of happiness is variety- but the se...,Stumbling On Happiness examines the capacity o...,This book examines how your brain tries to lie...,Your brain is really bad at filling in the bla...,You should always compare products based on va...,Bad experiences are better than no experiences.
2,2,The Happiness Advantage,Shawn Achor,I could care less about whether it's half full...,The Happiness Advantage turns the tables on ha...,Shawn Achor's research reveals the lies in the...,Happiness comes before success- not after it.,You can train yourself to be optimistic with t...,Fall up instead of down.
3,3,The Happiness Hypothesis,Jonathan Haidt,Love and work are to people what water and sun...,The Happiness Hypothesis is the most thorough ...,This book dives into the neurological aspects ...,Surround yourself with the people you love the...,Do work that matters to you.,Find a partner who will stand by your side thr...
4,4,Flourish,Martin Seligman,I'm trying to broaden the scope of positive ps...,Flourish establishes a new model for well-bein...,Martin Seligman is the father of positive psyc...,A life of profound fulfillment is built on the...,Simple positivity exercises can have life-chan...,IQ isn't everything - success is based on char...
5,5,The Power Of No,James Altucher,When you get in the mud with a pig- you get di...,The Power Of No is an encompassing instruction...,Ultimately- this book is not about saying no- ...,Rate your regulars to say no to the wrong people.,Stop doing things you don't like- and everyone...,"Say no to scarcity to go beyond ""glass half fu..."
6,6,Don't Sweat The Small Stuff,Richard Carlson,Success is nothing more than a socially accept...,Don't Sweat The Small Stuff (… And It's All Sm...,This book spent 100 weeks on the New York Time...,Remember that your life isn't an emergency.,Give others a break- especially when they don'...,Don't procrastinate on relaxing.
7,7,Happier At Home,Gretchen Rubin,I am living my real life- this is it. Now is n...,Happier At Home is an instruction manual to tr...,This book is a result of the author feeling ho...,Get rid of clutter.,Underreact to problems.,Meet your neighbors.
8,8,How To Stop Worrying And Start Living,Dale Carnegie,Let's not allow ourselves to be upset by small...,How To Stop Worrying And Start Living is a sel...,This book is a classic in identifying roadbloc...,Use a 3-step approach to deal with confusion- ...,Put a stop-loss on stress and grief.,Take criticism as compliments.
9,9,Happiness,Richard Layard,Competition for status is a zero sum game,Happiness will teach you how our desire for it...,The author of this book has researched happine...,Our capability to feel happiness is a result o...,More money actually makes you less happy- unle...,One of the simplest ways for Western countries...


This is great, but if we want to read the full reviews (handy for later) then we can change our display options:

In [5]:
pd.set_option('display.max_colwidth', None) #show me everything in the column, even if it's long!
df #Show me the first and last few examples

Unnamed: 0.1,Unnamed: 0,name,authors,favorite_quote,One_line_review,why_should_read,key_takeaway_1,key_takeaway_2,key_takeaway_3
0,0,Solve For Happy,Mo Gawdat,If you can afford the brain cycles to worry about the future- then by definition you have nothing to worry about right now. Right now- you're okay,Solve For Happy lays out a former Google engineers formula for happiness- which shows you that it's our default state and how to overcome the obstacles we face in remaining in it.,"Sometimes- it takes a tragedy to understand happiness is a choice. Mo Gawdat knows. He lost his 21-year-old son Ali. He taught himself to choose happiness instead of sadness regardless. What made Gawdat's choice an obvious one was the formula that he and Ali had been working on for years: ""Happiness is equal to or greater than the events of your life minus your expectation of how life should be.""This incredible book shows you why your perspective- more than anything else- determines your happiness.",Your inner voice is not the real you.,Many cognitive filters prevent you from seeing the whole world around you.,No matter if life is good or bad- staying in the present always makes you feel more content with it.
1,1,Stumbling On Happiness,Dan Gilbert,The secret of happiness is variety- but the secret of variety- like the secret of all spices- is knowing when to use it.,Stumbling On Happiness examines the capacity of our brains to fill in gaps and simulate experiences- shows how our lack of awareness of these powers sometimes leads us to wrong decisions- and how we can change our behavior to synthesize our own happiness.,This book examines how your brain tries to lie to you- specifically about what will happen in the future. Dan Gilbert's years of research show just how our minds trick us into worrying- which makes us unhappy with our decisions even before we make them. It turns out that a big key to happiness is figuring out how to tell the difference between fact and fiction!,Your brain is really bad at filling in the blanks- but it keeps on trying.,You should always compare products based on value- never on price.,Bad experiences are better than no experiences.
2,2,The Happiness Advantage,Shawn Achor,I could care less about whether it's half full or half empty - as long as I can fill it up,The Happiness Advantage turns the tables on happiness by proving it is a tool for success rather than of the result of it- sharing seven actionable principles you can use to increase both.,Shawn Achor's research reveals the lies in the conventional idea that hard work and success lead to happiness. He's identified- with science- that happiness comes first- then you will become successful. This book points to several ways that you can start being happier right now.,Happiness comes before success- not after it.,"You can train yourself to be optimistic with the ""Tetris Effect.""",Fall up instead of down.
3,3,The Happiness Hypothesis,Jonathan Haidt,Love and work are to people what water and sunshine are to plants.,The Happiness Hypothesis is the most thorough analysis of how you can find happiness in our modern society- backed by plenty of scientific research- real-life examples- and even a literal formula for happiness.,This book dives into the neurological aspects that contribute to happiness with a twist. Instead of getting lost in medical terms- Haidt employs the memorable analogy of a rider on an elephant. The metaphor shows how we can harness our brains to make us happy. More importantly- you'll learn how to build thinking and relationship habits that will lead to long-term happiness.,Surround yourself with the people you love the most and live in accordance with reciprocity,Do work that matters to you.,Find a partner who will stand by your side through sunshine and rain.
4,4,Flourish,Martin Seligman,I'm trying to broaden the scope of positive psychology well beyond the smiley face. Happiness is just one-fifth of what human beings choose to do.,Flourish establishes a new model for well-being rooted in positive psychology- building on five key pillars to help you create a happy life through the power of simple exercises.,Martin Seligman is the father of positive psychology. Prior to his work- brain science was based solely on the problems with the mind. Seligman changed that with his research. He is one of the best sources for beating dysfunctional thinking patterns. This book stands out with simple but powerful exercises you can do immediately to improve your happiness.,A life of profound fulfillment is built on the acronym PERMA.,Simple positivity exercises can have life-changing effects.,IQ isn't everything - success is based on character traits- not just intelligence.
5,5,The Power Of No,James Altucher,When you get in the mud with a pig- you get dirty and the pig gets happy,The Power Of No is an encompassing instruction manual on using the power of a little word to get healthy- rid yourself of bad relationships- embrace abundance- and ultimately say yes to yourself.,Ultimately- this book is not about saying no- although you'll get a lot of tips on how to do that. The benefit of this book lies in learning to eliminate unnecessary things from your life so that you can say yes to yourself. It's packed with practical tips for ridding your life of that which pulls you down- which will make you feel freer and happier.,Rate your regulars to say no to the wrong people.,Stop doing things you don't like- and everyone will be better off.,"Say no to scarcity to go beyond ""glass half full."""
6,6,Don't Sweat The Small Stuff,Richard Carlson,Success is nothing more than a socially acceptable form of mental illness.,Don't Sweat The Small Stuff (… And It's All Small Stuff) will keep you from letting the little things drive you insane- like your email inbox- rushing to trains- and annoying co-workers- and help you find peace and calm in a stressful world.,This book spent 100 weeks on the New York Times Bestseller list. If you've ever rushed in traffic only to end up next to the same person you passed a couple of miles back- this book might change your life. It'll open your mind to the idea of letting go of the unimportant things that society has trained us to think of as vital. The author left a great legacy that lives on in this good book.,Remember that your life isn't an emergency.,Give others a break- especially when they don't deserve it.,Don't procrastinate on relaxing.
7,7,Happier At Home,Gretchen Rubin,I am living my real life- this is it. Now is now and if I waited to be happier- waited to have fun- waited to do the things that I know I ought to do- I might never get the chance,Happier At Home is an instruction manual to transform your home into a castle of happiness by figuring out what needs to be changed- what needs to stay the same- and embracing the gift of family.,This book is a result of the author feeling homesick while standing in her own kitchen. Knowing it was time to make some changes- she worked hard for the next nine months to improve her home and family life. She investigates four themes that make for a happy home: time- possessions- parenthood- and marriage and family. Reading this book will help you feel happy at home.,Get rid of clutter.,Underreact to problems.,Meet your neighbors.
8,8,How To Stop Worrying And Start Living,Dale Carnegie,Let's not allow ourselves to be upset by small things we should despise and forget. Remember: life's too short to be little.,How To Stop Worrying And Start Living is a self-help classic that addresses one of the leading causes of physical illness - worry - by showing you simple and actionable techniques to eliminate it from your life.,This book is a classic in identifying roadblocks to happiness and how to eliminate them. Nobody likes worrying- it's a killer of joy- but it has several different causes. Having sold six million copies- this book can help you deal with all kinds of negative emotions like confusion- stress- grief- and criticism.,Use a 3-step approach to deal with confusion- and you'll eliminate the worry caused by it.,Put a stop-loss on stress and grief.,Take criticism as compliments.
9,9,Happiness,Richard Layard,Competition for status is a zero sum game,Happiness will teach you how our desire for it developed- what its benefits are- why money actually hurts our happiness and where it really comes from- and how Western countries could easily increase their happiness with a few changes.,The author of this book has researched happiness for almost 50 years. He takes a holistic approach to this emotion we all seek. You'll learn the history of happiness in mankind- what money has to do with it- and why higher taxes might- counterintuitively- be a good thing.,Our capability to feel happiness is a result of evolution- we weren't always able to feel happy.,More money actually makes you less happy- unless you live in poverty.,One of the simplest ways for Western countries to increase happiness would be to raise taxes.


Let's do something super simple to transform this into a dataset that we can send to a neural network. 

Similarly to the sentiment classification we discussed in lecture last week, we're going to represent each example (review) as a vector of word counts.

The CountVectorizer object from sklearn allows us to make these word count vectors pretty easily. Once we do the counts, we'll store these in a new dataframe.


The following code transforms a review dataframe to a word count dataframe called wordcounts.

In [6]:
# Transform a dataframe containing a column named "review" 
# such that each row becomes represented by a set of word counts,corresponding to the number of each term in the review
# These next two lines perform word counting:
# 转换包含名为"review"列的数据帧
#，以便每行都由一组单词计数表示，对应于复习中的每个词的数量
# 下面两行执行单词计数:
vectorizer = CountVectorizer(stop_words='english', min_df=0.01)
 

# stop_words='english' removes very common english words that are unlikely to be useful (e.g. "and", "the")
# min_df=0.1 removes very rare words that are likely to be typos, uninformative, etc.
# You can type ?CountVectorizer in its own cell to read its documentation
# ***Note that "df['review']" is used below because "review" is the name of the column containing our text in the dataframe
# If you apply this to your own data, you may probably need to change this column name!
# stop_words='english'删除了不太可能有用的常见英语单词(例如:“和”,“的”)
# min_df=0.1删除了可能是拼写错误、信息不足等的非常罕见的单词。
# 你可以在它自己的单元格中输入?CountVectorizer来读取它的文档
# ***注意下面使用“df['review']”，因为“review”是数据框架中包含我们的文本的列的名称
# 如果你把这个应用到你自己的数据，你可能需要改变这个列名!
matrix = vectorizer.fit_transform(df['name'])
    
# This line converts matrix into another dataframe, with column names corresponding to the word being counted
#这一行将矩阵转换为另一个数据框架，其列名对应于被计数的单词
data = pd.DataFrame(matrix.toarray(), columns=vectorizer.get_feature_names_out())




Take a look at the data:

In [7]:
data 
#prints data to screen

Unnamed: 0,10,advantage,aren,art,book,brain,buddha,changing,don,equation,...,start,stop,stuff,stumbling,sweat,thinking,tidying,try,trying,worrying
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0
2,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,0,0,0,0,0,0,0,0,1,0,...,0,0,1,0,1,0,0,0,0,0
7,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,0,0,0,0,0,0,0,0,0,0,...,1,1,0,0,0,0,0,0,0,1
9,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Note that you can examine this dataset, e.g. to look at the column of counts for the word "wonderful":

In [8]:
data["happiness"]

0     0
1     1
2     1
3     1
4     0
5     0
6     0
7     0
8     0
9     1
10    0
11    0
12    0
13    0
14    0
15    1
16    1
17    1
18    0
19    0
20    1
21    0
22    0
23    0
24    0
25    0
26    0
27    1
28    0
29    0
30    0
31    0
32    0
Name: happiness, dtype: int64

In [9]:
#or the word hate:
#data["hate"]

## Let's do some machine learning!

In [10]:
#First, let's split our dataset into training and test sets
# Remember: X is for input, y is for output
# The first argument of train_test_split is your training data (here, lives in "data" object you created using word counts)
# The second argument of train_test_split is your labels/targets for the training data. This lives in the "sentiment" column of the original dataframe df we loaded from the file.
# (***If you are using a different dataset, you'll need to change the name of this column to whatever it is in your dataset)
# The test_size argument specifies % of data going into test set: here, 20% of the data goes into test set and 80% goes into training set
X_train, X_test, y_train, y_test = train_test_split(data, df['favorite_quote'], test_size=0.2, random_state=0)

In [11]:
#If you ever want to learn more about a function, you can always use ? 
?train_test_split

In [12]:
# We can examine it a bit using np.shape:
np.shape(X_train) #What does our training data look like? It's 800 rows, with 1674 dimensions of input (features)

(26, 51)

In [13]:
#Now let's make a simple neural network with 1 hidden layer containing 10 neurons
num_neurons = 10 # neurons in each layer
model = Sequential()

#Make the first (hidden) layer, which will have num_neurons neurons. Each neuron will get inputs from all columns of the dataframe, except sentiment
#model.add(Dense(num_neurons, input_dim=len(data.columns)-1))
model.add(Dense(num_neurons, input_dim=np.shape(X_train)[1]))
model.add(Activation('sigmoid')) #Now we'll use a sigmoid activation function

#Now let's add another layer for the output: A single sigmoid neuron.
model.add(Dense(1)) 
model.add(Activation('sigmoid'))

In [14]:
#Use compile() to set up our training

# For loss, we'll use binary cross-entropy loss, 
# which is appropriate for a binary classification problem (0/1 for negative/positive)
# ***If you edit this notebook to apply it to a multi-class classification problem, you'll need 
#    to change the loss to something like categorical_crossentropy, and you'll also need to change the
#    encoding of the class to a one hot representation (see https://machinelearningmastery.com/how-to-choose-loss-functions-when-training-deep-learning-neural-networks/)


model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

In [15]:
#Train it!
# Plus store history of training in a variable called "history"
history = model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test))

Epoch 1/10


UnimplementedError: Graph execution error:

Detected at node 'binary_crossentropy/Cast' defined at (most recent call last):
    File "/opt/anaconda3/envs/emi/lib/python3.8/runpy.py", line 194, in _run_module_as_main
      return _run_code(code, main_globals, None,
    File "/opt/anaconda3/envs/emi/lib/python3.8/runpy.py", line 87, in _run_code
      exec(code, run_globals)
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/ipykernel_launcher.py", line 16, in <module>
      app.launch_new_instance()
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/traitlets/config/application.py", line 846, in launch_instance
      app.start()
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/ipykernel/kernelapp.py", line 619, in start
      self.io_loop.start()
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/tornado/platform/asyncio.py", line 199, in start
      self.asyncio_loop.run_forever()
    File "/opt/anaconda3/envs/emi/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
      self._run_once()
    File "/opt/anaconda3/envs/emi/lib/python3.8/asyncio/base_events.py", line 1859, in _run_once
      handle._run()
    File "/opt/anaconda3/envs/emi/lib/python3.8/asyncio/events.py", line 81, in _run
      self._context.run(self._callback, *self._args)
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/tornado/ioloop.py", line 688, in <lambda>
      lambda f: self._run_callback(functools.partial(callback, future))
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/tornado/ioloop.py", line 741, in _run_callback
      ret = callback()
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/tornado/gen.py", line 814, in inner
      self.ctx_run(self.run)
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/tornado/gen.py", line 775, in run
      yielded = self.gen.send(value)
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 374, in dispatch_queue
      yield self.process_one()
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/tornado/gen.py", line 250, in wrapper
      runner = Runner(ctx_run, result, future, yielded)
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/tornado/gen.py", line 741, in __init__
      self.ctx_run(self.run)
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/tornado/gen.py", line 775, in run
      yielded = self.gen.send(value)
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 358, in process_one
      yield gen.maybe_future(dispatch(*args))
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/tornado/gen.py", line 234, in wrapper
      yielded = ctx_run(next, result)
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 261, in dispatch_shell
      yield gen.maybe_future(handler(stream, idents, msg))
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/tornado/gen.py", line 234, in wrapper
      yielded = ctx_run(next, result)
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 536, in execute_request
      self.do_execute(
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/tornado/gen.py", line 234, in wrapper
      yielded = ctx_run(next, result)
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/ipykernel/ipkernel.py", line 302, in do_execute
      res = shell.run_cell(code, store_history=store_history, silent=silent)
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/ipykernel/zmqshell.py", line 539, in run_cell
      return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 2863, in run_cell
      result = self._run_cell(
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 2909, in _run_cell
      return runner(coro)
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/IPython/core/async_helpers.py", line 129, in _pseudo_sync_runner
      coro.send(None)
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3106, in run_cell_async
      has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3309, in run_ast_nodes
      if await self.run_code(code, result, async_=asy):
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3369, in run_code
      exec(code_obj, self.user_global_ns, self.user_ns)
    File "<ipython-input-15-2f0298a259e7>", line 3, in <cell line: 3>
      history = model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test))
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/keras/engine/training.py", line 1384, in fit
      tmp_logs = self.train_function(iterator)
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/keras/engine/training.py", line 1021, in train_function
      return step_function(self, iterator)
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/keras/engine/training.py", line 1010, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/keras/engine/training.py", line 1000, in run_step
      outputs = model.train_step(data)
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/keras/engine/training.py", line 860, in train_step
      loss = self.compute_loss(x, y, y_pred, sample_weight)
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/keras/engine/training.py", line 918, in compute_loss
      return self.compiled_loss(
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/keras/engine/compile_utils.py", line 201, in __call__
      loss_value = loss_obj(y_t, y_p, sample_weight=sw)
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/keras/losses.py", line 141, in __call__
      losses = call_fn(y_true, y_pred)
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/keras/losses.py", line 245, in call
      return ag_fn(y_true, y_pred, **self._fn_kwargs)
    File "/opt/anaconda3/envs/emi/lib/python3.8/site-packages/keras/losses.py", line 1922, in binary_crossentropy
      y_true = tf.cast(y_true, y_pred.dtype)
Node: 'binary_crossentropy/Cast'
Cast string to float is not supported
	 [[{{node binary_crossentropy/Cast}}]] [Op:__inference_train_function_684]

Let's take a look at how training set and test set accuracy change with each epoch:

In [None]:
plt.plot(history.history['accuracy'], label='training set accuracy')
plt.plot(history.history['val_accuracy'], label = 'test set accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim([0.5, 1])
plt.legend(loc='lower right')

## Examining model behaviour

First, let's explore how we can apply the trained model to a specific example in our test data (or training data), to examine what it's done.

We'll use the following code techniques:
* We can apply the trained model to any example using the `.predict()` function
* We can get the nth row from any dataframe using the `.iloc[[n]]` function

In [16]:
#For instance, let's make z the first test example:
z = X_test.iloc[[0]]

#and let's output the prediction for this example:
model.predict(z)

array([[0.7812595]], dtype=float32)

Note that this prediction will be somewhere between 0 and 1. This can be interpreted loosely as confidence: closer to 1 is more confident it is positive sentiment, closer to 0 is more confident it is negative sentiment.

Let's compare this to the actual sentiment of the review, as stored in y_test:

In [17]:
y_test.iloc[[0]]

11    Although happiness is a very important goal for most people- they also seem to devalue it as they go about their lives. That is- people seem to routinely sacrifice happiness for the sake of other goals.
Name: favorite_quote, dtype: object

To make sense of this, we probably also want to look at the actual text review, which doesn't live in X_test but does live in the original dataframe we loaded from the CSV file, i.e. `df`. Since our `train_test_split` function has randomised the order of the data before splitting into training and testing sets, we need to get the id (row number) for `df` corresponding to this first test example.

In [19]:
test_ids = list(X_test.index) #gets the original indices in the df dataframe
#test_ids[n] now refers to the id number of the nth test example
originalFavorite_quote = df.iloc[[test_ids[0]]].favorite_quote
originalFavorite_quote

11    Although happiness is a very important goal for most people- they also seem to devalue it as they go about their lives. That is- people seem to routinely sacrifice happiness for the sake of other goals.
Name: favorite_quote, dtype: object

Try this with a few more examples and see what you find. If you're comfortable with python, can you think of a way to identify misclassified test examples and just print out those? Or, even better, find test examples that are confidently classified correctly, or test examples that are "confidently" misclassified, and examine those?

For more fun, how about testing this classifier on our own new, fake "reviews"? Here's code to create an example of your own and apply the classifier to it. We'll have to first convert a string of text to a vector of word counts and put it in a dataframe, so here's a function for that:

In [20]:
# Turns a text string into a dataframe example (***Note you'll need to change this from 'review' for your own dataset)
def createExample(myText):

    newExample = np.array([[myText]])
    tdf = pd.DataFrame(newExample, columns=["review"])
    matrix = vectorizer.transform(tdf['review'])
    newDf = pd.DataFrame(matrix.toarray(), columns=vectorizer.get_feature_names_out())
    return newDf 

In [21]:
#Here's a text about zombies
myText = "This book is really great.I love it!"
t = createExample(myText) #When we print the dataframe, you see zombies' word count is 3:
t

Unnamed: 0,10,advantage,aren,art,book,brain,buddha,changing,don,equation,...,start,stop,stuff,stumbling,sweat,thinking,tidying,try,trying,worrying
0,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [22]:
#apply the model to classify your new text:
model.predict(t)

array([[0.74233204]], dtype=float32)

Now try writing some "great" and "terrible" reviews and see what happens to the classification outputs:

In [23]:
model.predict(createExample("This book is talk about happiness"))

array([[0.7402961]], dtype=float32)

## Explore on your own

Change the code above to explore:
* Does changing the number of neurons in the hidden layer change the results? What happens to accuracy when you use 1 neuron? 100 neurons? 
* Try editing the neural network so that you have 2 hidden layers of 10 neurons each. What happens to accuracy? 

Investigating the model
* Can you examine the model's performance on the test data to discover anything about what mistakes this model makes? Or anything about what types of reviews are easy to classify accurately?
* Can you come up with your own, new examples of positive or negative reviews that illustrate the mistakes the model makes?