# Multi Layer Perceptron

MLP is a deep model (i.e., a model with multiple layers), and is a 'feed-forward' network.

MLPs are stacked Linear layers that map tensors to other tensors.

Nonlinearities are used between each pair of Linear layers to break the linear relationship and allow for the model to twist the vector space around. In a classification setting, this twisting should result in linear separability between
classes. 

Additionally, you can use the softmax (or sigmoid in binary classifications) function to interpret MLP outputs as probabilities, but you should not use softmax with specific PyTorch loss functions, because the underlying implementations can leverage superior mathematical/computational shortcuts.

**Note one of the major disadvantages/limitations of MLPs—lack of parameter sharing**

In [1]:
import pandas as pd
import numpy as np
import re
import time

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader

from sklearn.model_selection import train_test_split

import spacy

In [2]:
pd.set_option('display.max_row', None)              # show all rows of a dataframe
pd.set_option('display.max_column', None)           # show all columns of a dataframe
pd.set_option('display.max_colwidth', None)         # show the full width of columns
pd.set_option('precision', 2)                       # round to 2 decimal points
pd.options.display.float_format = '{:,.2f}'.format  # comma separators and two decimal points: 4756.7890 => 4,756.79 and 4656 => 4,656.00 

In [3]:
seed=1337

## Read datasets

In [4]:
path = r'C:\Users\Mari\Desktop\MACHINE_LEARNING\NLP_Stanford_University\BOOK\data\yelp'
train_filename = 'raw_train.csv'
test_filename = 'raw_test.csv'

train_full_path = '{}\{}'.format(path, train_filename)
test_full_path = '{}\{}'.format(path, test_filename)

In [5]:
print(train_full_path)
print(test_full_path)

C:\Users\Mari\Desktop\MACHINE_LEARNING\NLP_Stanford_University\BOOK\data\yelp\raw_train.csv
C:\Users\Mari\Desktop\MACHINE_LEARNING\NLP_Stanford_University\BOOK\data\yelp\raw_test.csv


In [6]:
train_original = pd.read_csv(train_full_path, encoding='utf-8', names=['target', 'review'])
test = pd.read_csv(test_full_path, encoding='utf-8', names=['target', 'review'])

In [7]:
print(type(train_original))
print(train_original.shape)
print(train_original.head())

<class 'pandas.core.frame.DataFrame'>
(560000, 2)
   target  \
0       1   
1       2   
2       1   
3       1   
4       2   

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       

In [8]:
print(type(test))
print(test.shape)
print(test.head())

<class 'pandas.core.frame.DataFrame'>
(38000, 2)
   target  \
0       1   
1       2   
2       1   
3       2   
4       1   

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               review  


## Split Targets from Features

In [9]:
X_train_original = train_original['review']
y_train_original = train_original['target']

In [10]:
X_test = test['review']
y_test = test['target']

In [11]:
print(X_train_original.head())
print(y_train_original.head())
print()
print(X_test.head())
print(y_test.head())

0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Unfortunately, the frustration of being Dr. Goldberg's patient is a repeat of the experience I've had with so many other doctors in NYC -- good doctor, terrible staff.  It seems that his staff simply never answers the phone.  It usually takes 2 hours of repeated calling to get an answer.  Who has time for that or wants to deal with it?  I have run into this problem with many other doctors and I just don't get it.  You have office workers, you have patients with medical needs, why isn't anyo

## Re-label Targets

- 1. negative and 2.positive 
become
- 0. negative and 1.positive

In [12]:
y_train_original = y_train_original - 1

In [13]:
y_train_original.unique()

array([0, 1], dtype=int64)

In [14]:
y_test = y_test - 1

In [15]:
y_test.unique()

array([0, 1], dtype=int64)

## Split 'train_original' in 'train' and 'val' datasets

In [16]:
X_train, X_val, y_train, y_val = train_test_split(X_train_original, y_train_original, test_size=0.3, random_state=seed)

In [17]:
print(X_train.shape)
print(y_train.shape)
print(type(X_train))
print(type(y_train))
print()
print(X_val.shape)
print(y_val.shape)
print(type(X_val))
print(type(y_val))
print()
print(X_test.shape)
print(y_test.shape)
print(type(X_test))
print(type(y_test))

(392000,)
(392000,)
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>

(168000,)
(168000,)
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>

(38000,)
(38000,)
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>


## Frequency Distribution of Labels

- 1: negative sentiment
- 2: positive sentiment

In [18]:
y_train.value_counts(normalize=True, sort=False, ascending=False, bins=None, dropna=False)

0   0.50
1   0.50
Name: target, dtype: float64

In [19]:
y_val.value_counts(normalize=True, sort=False, ascending=False, bins=None, dropna=False)

0   0.50
1   0.50
Name: target, dtype: float64

In [20]:
y_test.value_counts(normalize=True, sort=False, ascending=False, bins=None, dropna=False)

0   0.50
1   0.50
Name: target, dtype: float64

## Minimally cleaning the data

In addition to creating a subset that has three partitions for training, validation, and testing, we also minimally clean the data by adding whitespace around
punctuation symbols and removing extraneous symbols that aren’t punctuation for all the splits.

In [21]:
def preprocess_text(text):
    text = text.lower()
    text = re.sub(r"([.,!?])", r" \1 ", text)
    text = re.sub(r"[^a-zA-Z.,!?]+", r" ", text)
    return text

In [22]:
X_train = X_train.map(preprocess_text)

In [23]:
X_val = X_val.map(preprocess_text)

In [24]:
X_test = X_test.map(preprocess_text)

In [25]:
# show a random selection of n rows 

print(X_train.sample(3))
print()
print(X_val.sample(3))
print()
print(X_test.sample(3))

29188                                                                                                                                      you could receive better customer service at a walmart ! ! ! ! ! that is being very generous . at first i was really impressed with their selection and very eager to buy quite a few things . i then i proceeded to ask if i could try on a wig you don t know what it looks like unless you try it the first sales associate seemed like she was going to let me then this extremely grumpy and very unpersonable lady approached us and said we don t have time for that get back to work . . . i turned around to see if there was people behind me , no one who looked like they needed help . i was assuming she was the owner , she did not greet me nor did she ask if she could help me with something . all she had was an extremely unfriendly and rude personality . oh did i mention she wasnt even helping customers she was just bitching the whole time . i would really hope

## Tokenization

In [26]:
tok = spacy.load('en_core_web_sm')

In [27]:
def tokenize(string):
    return [token.text for token in tok.tokenizer(string)]

In [28]:
X_train = pd.DataFrame(X_train)

In [29]:
X_val = pd.DataFrame(X_val)

In [30]:
X_test = pd.DataFrame(X_test)

In [31]:
print(type(X_train))
print(type(X_val))
print(type(X_test))

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>


In [32]:
print(X_train.shape)
print(X_val.shape)
print(X_test.shape)

(392000, 1)
(168000, 1)
(38000, 1)


In [33]:
print(X_train.columns)
print(X_val.columns)
print(X_test.columns)

Index(['review'], dtype='object')
Index(['review'], dtype='object')
Index(['review'], dtype='object')


In [34]:
X_train['tokens'] = X_train['review'].apply(tokenize)

In [35]:
X_val['tokens'] = X_val['review'].apply(tokenize)

In [36]:
X_test['tokens'] = X_test['review'].apply(tokenize)

In [37]:
X_train.sample(2)

Unnamed: 0,review,tokens
353621,brew grow personnel are total hypocrates . nthey sell all kinds of products for growing pot but don t mention pot or you will get kicked out with all their disdain . don t even mention arugula which i guess is code for pot now . nthe employess are illinois corporate servants . all male and all sexist pigs . n nthe new store is so boring . small and totally devoid of any personality . nleave it to the flatlanders to screw up a potentially great space . nthey are not very smart or creative . n ngo to wine and hop shop and paradigm for much better products and service . nthese guys have no idea what they are talking about . ntrust me . namy .,"[brew, grow, personnel, are, total, hypocrates, ., nthey, sell, all, kinds, of, products, for, growing, pot, but, don, t, mention, pot, or, you, will, get, kicked, out, with, all, their, disdain, ., don, t, even, mention, arugula, which, i, guess, is, code, for, pot, now, ., nthe, employess, are, illinois, corporate, servants, ., all, male, and, all, sexist, pigs, ., n, nthe, new, store, is, so, boring, ., small, and, totally, devoid, of, any, personality, ., nleave, it, to, the, flatlanders, to, screw, up, a, potentially, great, space, ., nthey, are, not, very, smart, or, creative, ., n, ngo, to, ...]"
357766,"uhhh ! ! ! my stomach feels like it is going to explode . i am so full . i cannot eat another bite . . . but there is still half a dish of truffle mac n cheese . all are common responses when visiting herbs rye . n ni have been so bogged down with work , or work related situations that i was set on treating myself to a nice meal . i called up a friend and we met up for happy hour . both of us didn t feel like drinking , so we opted for their food specials . the deals are limited to five or six items , but they offer some meals for half off . instead of just filling up on quick fryer food , h r includes a few steak and a pork chop entree . we began with the spicy mussels hh , regular . the tomato sauce was so good that i was scooping it into empty shells to slurp down . for my entree i got the oz . bone in strip cooked to a medium rare with the peppercorn brandy sauce and the truffle mac . . i have to say this was near perfection . yet , i would have been happier without the sauce . my friend had got the same steak with the chimiccuri and asparagus . i stole most of her sauce . i should have listened to our server , the chimicurri was much more flavorful and bold . usually i am not a mac n cheese person . my gaze usually zooms over these words , but robyn and kevin l . swears by it . luckily , i trusted them and ended up a winner . n nby the end , two steak dinners with an large order of mussels to start cost us less than . even though their hh list is minimal , herbs rye s offerings are more than satisfactory .","[uhhh, !, !, !, my, stomach, feels, like, it, is, going, to, explode, ., i, am, so, full, ., i, can, not, eat, another, bite, ., ., ., but, there, is, still, half, a, dish, of, truffle, mac, n, cheese, ., all, are, common, responses, when, visiting, herbs, rye, ., n, ni, have, been, so, bogged, down, with, work, ,, or, work, related, situations, that, i, was, set, on, treating, myself, to, a, nice, meal, ., i, called, up, a, friend, and, we, met, up, for, happy, hour, ., both, of, us, didn, t, feel, like, drinking, ,, so, we, ...]"


In [38]:
X_val.sample(2)

Unnamed: 0,review,tokens
392675,"so , i went here for lunch . paid extra to have the large sub combo so i would have leftovers for dinner . didn t happen . n ni ordered the blt meal , paid extra to have the larger sub , cost about and a half bucks . ok if i can make two meals out of it , which was my plan . this did not happen . not too impressed . n nnow the bread , stars on that . thick and soft , good flavor . sadly , that is where the compliments end . there was three pieces of bacon on the sub . three . and a half dollars . three pieces of bacon . and a half bucks . i am trying to prove a point here i am not happy . it was basically a lettuce sub . everyone else i went to lunch with had more meat in their sub than ron jeremy has in his pants . not mine . n ni threw away the second half of my lettuce sub because , well it was a lettuce sub . n nthe staff was friendly and service was quick . sadly , does not make up for screwing me out of the bacon content of my sub . the slices of bacon that were evident were small , like the ones you get on a cent menu fast food burger . n ni was told to try something else by my friends if i go back there . i don t think they will get the chance . n nbacon up the blt . until then . . . pass .","[so, ,, i, went, here, for, lunch, ., paid, extra, to, have, the, large, sub, combo, so, i, would, have, leftovers, for, dinner, ., didn, t, happen, ., n, ni, ordered, the, blt, meal, ,, paid, extra, to, have, the, larger, sub, ,, cost, about, and, a, half, bucks, ., ok, if, i, can, make, two, meals, out, of, it, ,, which, was, my, plan, ., this, did, not, happen, ., not, too, impressed, ., n, nnow, the, bread, ,, stars, on, that, ., thick, and, soft, ,, good, flavor, ., sadly, ,, that, is, where, the, compliments, end, ., ...]"
334192,i had ordered on the phone for my order to go . when i picked my order up it was fast . so when i got home i could not believe the small portions that was in the the bag . i had ordered the penne franco and the italian classics dishes . for the amount they charge it is a ripoff . my strong suggestion is to head over to romano s macaroni grill . much more worth it .,"[i, had, ordered, on, the, phone, for, my, order, to, go, ., when, i, picked, my, order, up, it, was, fast, ., so, when, i, got, home, i, could, not, believe, the, small, portions, that, was, in, the, the, bag, ., i, had, ordered, the, penne, franco, and, the, italian, classics, dishes, ., for, the, amount, they, charge, it, is, a, ripoff, ., my, strong, suggestion, is, to, head, over, to, romano, s, macaroni, grill, ., much, more, worth, it, .]"


In [39]:
X_test.sample(2)

Unnamed: 0,review,tokens
2741,came here on a friday around and was seated right away . its a cute little place and had a nice vibe . i was here with jessie b and we ordered a few of the small plates . since we sat at a table we did not get hh prices . . o well . we wanted to be comfy not sit at the bar . nso we started with crab artichoke dip . . it had real chunks of crab and was super creamy and came with toasted bread slices . nnext was the sashimi . . i know its a tapas style place but there was only pieces . it was so yummy i could have eaten orders ! nlast was something we got talked into by our waiter who was very friendly even brought us extra ice for our sippy cup drinks we already had walking in . nwe were gonna get the pizza pops but he said . . ehh nso we ended up with the crazy mac cheese flat bread . nit came with a pesto ranchy sauce for dipping . it was soooo wierd . ni still dont kno if i loved it but it was something different for sure . nthere is a very good selection of yummy small plates . . something for everyone . i love that ! nand i really loved the server being so nice helpful . thumbs up ! !,"[came, here, on, a, friday, around, and, was, seated, right, away, ., its, a, cute, little, place, and, had, a, nice, vibe, ., i, was, here, with, jessie, b, and, we, ordered, a, few, of, the, small, plates, ., since, we, sat, at, a, table, we, did, not, get, hh, prices, ., ., o, well, ., we, wanted, to, be, comfy, not, sit, at, the, bar, ., nso, we, started, with, crab, artichoke, dip, ., ., it, had, real, chunks, of, crab, and, was, super, creamy, and, came, with, toasted, bread, slices, ., nnext, was, the, sashimi, ., ., i, ...]"
36391,"charcoal room , we are breaking up . it s not me it s you , you have changed and not for the better . last week my husband and i went to t bones chophouse at red rock , and it made me miss the charcoal room . so this week we made our reservations , and i had one thing on my mind , boneless baby back ribs . my most favorite thing here has changed . i thought they were made of pork but these were beef and seemed like instead of baby backs , they were more like spare ribs . they changed the meat and the quality on me and kept the price . sad day . n nwe had reservations at five , but they didn t open on time . service wasn t as good as it was before and i felt like we got a little attitude because we didn t order alcohol . the last slap was a cheap thank you take out bag and generic styrofoam to go containers instead of the classic red string reusable bag they had before . i know it seems silly , but if you are spending sixty dollars plus a head for dinner , dressing up and bother to make a reservation , you want a good experience right down the the last detail . the search for a new favorite is on .","[charcoal, room, ,, we, are, breaking, up, ., it, s, not, me, it, s, you, ,, you, have, changed, and, not, for, the, better, ., last, week, my, husband, and, i, went, to, t, bones, chophouse, at, red, rock, ,, and, it, made, me, miss, the, charcoal, room, ., so, this, week, we, made, our, reservations, ,, and, i, had, one, thing, on, my, mind, ,, boneless, baby, back, ribs, ., my, most, favorite, thing, here, has, changed, ., i, thought, they, were, made, of, pork, but, these, were, beef, and, seemed, like, instead, of, baby, backs, ,, they, were, ...]"


## Frequency Table

We count the number of occurrences of each token in our corpus and get rid of the ones that don’t occur too frequently

In [40]:
# Count number of occurrences of each word

def count_occurrences(df_col):
    start = time.time()

    counts = {}

    for i,lst in enumerate(df_col):
        for word in lst:
            if word not in counts:
                counts[word] = 1
            else: 
                counts[word] += 1

    end = time.time()
    print(end - start)
    return counts

In [41]:
count_train = count_occurrences(X_train['tokens'])
count_val = count_occurrences(X_val['tokens'])
count_test = count_occurrences(X_test['tokens'])

10.629961729049683
4.8203020095825195
1.0712366104125977


In [42]:
print(len(count_train))
print(len(count_val))
print(len(count_test))

179181
118981
58356


In [43]:
# Delete infrequent words

def delete_infrequent(count_dict):
    print('num_words before: {}'.format(len(count_dict.keys())))
    for word in list(count_dict):
        if count_dict[word] < 2:
            del count_dict[word]
    print('num_words after: {}'.format(len(count_dict.keys())))
    return count_dict

In [44]:
short_count_train = delete_infrequent(count_train)
short_count_val = delete_infrequent(count_val)
short_count_test = delete_infrequent(count_test)

num_words before: 179181
num_words after: 95564
num_words before: 118981
num_words after: 65270
num_words before: 58356
num_words after: 33125


## Encoding

**We need to convert our text into a numerical form that can be fed to our model as input.**

**1. Create a vocabulary where each key is a word from the 'counts' dictionary, and each value is the index of each word in the 'count' dictionary.**

In [45]:
# We add '' because there are blank reviews in our dataset, and there could be in future reviews
# We add 'UNK' in case there will be unknown words in future reviews

vocab2index = {'': 0, 'UNK': 1}        
words = ['', 'UNK']

for word in short_count_train:
    vocab2index[word] = len(words)
    words.append(word)

**2. Choose the maximum length of any review. I'm choosing 70 because the average length of the tokens lists is around 61.**

**3. Encode each list of tokens replacing each word with its index from the 'vocab2index' dictionary.**

In [46]:
def encoding(tokens_lst, N):
    encoded = np.zeros(N, dtype=int)
    encoded_lst = np.array([vocab2index.get(word, vocab2index['UNK']) for word in tokens_lst])
    length = min(N, len(encoded_lst))
    encoded[:length] = encoded_lst[:length]
    return encoded

In [47]:
X_train['encoded'] = X_train['tokens'].apply(lambda x: encoding(x, 70))
X_val['encoded'] = X_val['tokens'].apply(lambda x: encoding(x, 70))
X_test['encoded'] = X_test['tokens'].apply(lambda x: encoding(x, 70))

In [48]:
X_train.sample(2)

Unnamed: 0,review,tokens,encoded
12702,"okay , so because i m so excited to have eaten here , and i want to use obscenities to describe the deliciousness , i have decided to replace any pg words with pg words . and . . . . n nthis place was flippin good ! i mean , gosh darn ! how the heck did they just do that ? each bite made whoopee to my taste buds and stayed around afterwards to cuddle ! it s not everyday that i find a restaurant that i absolutely want to be a regular at , but la tolteca is a restaurant that i can see myself having a healthy relationship with , and , if i m lucky , possibly growing old with . i m all for market style restaurants , and la tolteca seriously nailed it ! n nokay , now that that s out of my system , here s what i tried i tried the chips and salsas , huevos rancheros and the chilaquiles . the chips were fresh as can be still warm in the bag , and the salsas . . . oh the salsas . . . they were all great , but the stand out had to be the habanero . it was definitely one of the best salsas i ve ever had , and my girlfriend considered it the best she had ever had . that s not to be taken lightly because we eat salsa almost religiously . because my girlfriend got the chilaquiles , i went with the huevos rancheros . they were piping fresh and everything one would expect . the second i cut open the yolk of the egg and saw the yellow and red run together , i knew i was in for a treat . the flavors of that mixed with beans and a little rice were really great ! for those of you that have had good huevos rancheros before you ll know what to expect . alright , and lastly the chilaquiles . in all seriousness , these were the best , and i mean the best chilaquiles i ve ever had ! i m not sure exactly what was going on in the kitchen , but i suspect a tiny sprinkling of crack right before it was served to us , but that may have just been the garlic . either way though , it was phenomenal ! every single bite made me so happy , and even after everyone was full at the table , we still managed to pick away at it until it was completely consumed . n nevery person reading this needs to do themselves a favor and endure the trip to central phoenix for this place . if you love mexican food especially chilaquiles , la tolteca should not disappoint in the slightest .","[okay, ,, so, because, i, m, so, excited, to, have, eaten, here, ,, and, i, want, to, use, obscenities, to, describe, the, deliciousness, ,, i, have, decided, to, replace, any, pg, words, with, pg, words, ., and, ., ., ., ., n, nthis, place, was, flippin, good, !, i, mean, ,, gosh, darn, !, how, the, heck, did, they, just, do, that, ?, each, bite, made, whoopee, to, my, taste, buds, and, stayed, around, afterwards, to, cuddle, !, it, s, not, everyday, that, i, find, a, restaurant, that, i, absolutely, want, to, be, a, regular, at, ,, but, la, tolteca, ...]","[1300, 14, 10, 484, 63, 499, 10, 1883, 48, 297, 326, 194, 14, 26, 63, 174, 48, 495, 33972, 48, 1506, 6, 5626, 14, 63, 297, 220, 48, 4725, 121, 50718, 4431, 78, 50718, 4431, 18, 26, 18, 18, 18, 18, 19, 344, 208, 3, 7939, 44, 140, 63, 2110, 14, 4138, 8212, 140, 262, 6, 1382, 117, 74, 202, 191, 89, 114, 2014, 1773, 354, 29638, 48, 104, 1313]"
58891,"before a few months ago , wildfire was the type of scummy locals casino you avoided like a scummy bar where people s tattoos were bigger than their brains . so that review by wesley was accurate . n nhowever , stations casinos finished their takeover and completely renovated the place top to bottom this summer . and the difference is truly night and day . thus , carrie s review in may is also accurate . n nbut as clean as the renovations are , the biggest renovation has to be one of the nicest and hard working staffs i have ever seen in this town . the cordiality , smiles and willingness to help from the folks in the restaurant to the bowling alley actually reminded me more of disneyland than a local casino . n nand the great thing about great service is it always makes everything else seem better . the food at the restaurant wouldn t be described as top notch , but the great service made it all the better . so did the fact you could get a dinner , soup or salad and a desert for just . n nwhile smaller than most of the mega bowling alleys we see in the vegas area , the technical gizmos in the wildfire lanes was top notch , complete with speed guns to show how fast you were throwing the ball . kids had a great time .","[before, a, few, months, ago, ,, wildfire, was, the, type, of, scummy, locals, casino, you, avoided, like, a, scummy, bar, where, people, s, tattoos, were, bigger, than, their, brains, ., so, that, review, by, wesley, was, accurate, ., n, nhowever, ,, stations, casinos, finished, their, takeover, and, completely, renovated, the, place, top, to, bottom, this, summer, ., and, the, difference, is, truly, night, and, day, ., thus, ,, carrie, s, review, in, may, is, also, accurate, ., n, nbut, as, clean, as, the, renovations, are, ,, the, biggest, renovation, has, to, be, one, of, the, nicest, and, hard, working, staffs, ...]","[380, 49, 401, 2363, 346, 14, 57337, 3, 6, 880, 36, 19200, 4314, 1125, 77, 4210, 40, 49, 19200, 203, 485, 151, 102, 6056, 12, 5238, 24, 85, 9455, 18, 10, 89, 1205, 92, 10578, 3, 1975, 18, 19, 5667, 14, 7563, 4776, 2248, 85, 31846, 26, 1843, 3504, 6, 208, 754, 48, 4640, 207, 332, 18, 26, 6, 7152, 128, 2801, 383, 26, 440, 18, 6360, 14, 17555, 102]"


In [49]:
X_val.sample(2)

Unnamed: 0,review,tokens,encoded
73328,"i work fairly close to panera and at first , i was excited to be so nearby , but after all the mediocre experiences and the last nail in the coffin , i won t be returning here . ni ve always felt that the customer service here has been sub par to the service i have received at most other panera locations . most associates have either seemed bored or snarky . but i can overlook this most times as long as i receive tasty food and drinks at a timely fashion . nthis certainly did not happen on valentines day , when i decided to use the very last bit of money i had to treat myself and get a cappuccino . i had thought about stopping off at the mcdonalds near my house , but i dislike mickey ds and wanted to use my money on something that i would enjoy . nwhen i walked in . . . . . there was one cashier and many many customers . and about employees scrambling around behind the cashier to get pastries for people strange logic . nafter waiting patiently for my turn for minutes , i ordered a cappuccino with some sugar free vanilla syrup from a guy who acted a little short with me . then i waited another minutes for my drink and when the guy handed me my drink , i knew immediately that something was wrong because the cup had no weight to it . ni looked inside , stirred around trying to find espresso , but the cup was pure foam . ni wasted almost on a cup of milk foam . at this time , i was needing to hurry and get to work , so i tried to fix it and put some coffee in my cup , but all of the coffees were out except for the hazelnut . i didn t want the hazelnut , but there was no choice . nand that s it . i will never be back again . n nit pains me to say this , but i probably would have received a better drink or at least a full one and better customer service at mc donalds .","[i, work, fairly, close, to, panera, and, at, first, ,, i, was, excited, to, be, so, nearby, ,, but, after, all, the, mediocre, experiences, and, the, last, nail, in, the, coffin, ,, i, won, t, be, returning, here, ., ni, ve, always, felt, that, the, customer, service, here, has, been, sub, par, to, the, service, i, have, received, at, most, other, panera, locations, ., most, associates, have, either, seemed, bored, or, snarky, ., but, i, can, overlook, this, most, times, as, long, as, i, receive, tasty, food, and, drinks, at, a, timely, fashion, ., nthis, certainly, did, not, happen, on, ...]","[63, 477, 1301, 59, 48, 475, 26, 5, 34, 14, 63, 3, 1883, 48, 195, 10, 2930, 14, 15, 101, 277, 6, 1337, 853, 26, 6, 345, 1514, 13, 6, 16867, 14, 63, 280, 133, 195, 1272, 194, 18, 96, 64, 144, 921, 89, 6, 534, 393, 194, 71, 239, 247, 248, 48, 6, 393, 63, 297, 422, 5, 61, 122, 475, 1480, 18, 61, 13323, 297, 942, 1286, 3192]"
97199,"the yelp reviews led me here . i loved the independent feel of the small restaurant and the food looked fresh . having no idea what to order , i saw the guy in front of me get a chicken salad sandwich and it looked good . i took his cue and got the same thing with a side of pasta salad . n nthe food was really good . lots of flavor . it was a bit heavy on the sprouts for me . the portions are huge . but what really made the meal was the raspberry iced tea . it s unsweetened and very fruity . that alone was worth the tab . n nthis place is a must stop for lunch if you are near mill street .","[the, yelp, reviews, led, me, here, ., i, loved, the, independent, feel, of, the, small, restaurant, and, the, food, looked, fresh, ., having, no, idea, what, to, order, ,, i, saw, the, guy, in, front, of, me, get, a, chicken, salad, sandwich, and, it, looked, good, ., i, took, his, cue, and, got, the, same, thing, with, a, side, of, pasta, salad, ., n, nthe, food, was, really, good, ., lots, of, flavor, ., it, was, a, bit, heavy, on, the, sprouts, for, me, ., the, portions, are, huge, ., but, what, really, made, the, meal, was, the, raspberry, iced, ...]","[6, 1345, 1346, 3797, 88, 194, 18, 63, 1877, 6, 9333, 361, 36, 6, 782, 355, 26, 6, 269, 576, 301, 18, 1064, 22, 1035, 313, 48, 976, 14, 63, 977, 6, 1826, 13, 1828, 36, 88, 80, 49, 309, 310, 949, 26, 16, 576, 44, 18, 63, 865, 30, 11389, 26, 438, 6, 607, 902, 78, 49, 492, 36, 2678, 310, 18, 19, 415, 269, 3, 434, 44, 18]"


In [50]:
X_test.sample(2)

Unnamed: 0,review,tokens,encoded
13275,"this place is great ! finally something different to eat on mill ave ! i was so tired of the shitty wannabe mexican food on mill . i work on mill avenue so when i heard they where opening a new bbq place i was already excited . i would walk by everyday waiting for this place to open it s doors , til finally the day came last week . their award winning smoked meat is the brisket , it was incredibly tender and just melted away with every bite . their spicy bbq sauce is made in house , ask for matt s spicy sauce . the beach pit takes pride in the food , and serves it with care and great service . i will be back at least once a week to enjoy some brisket and mac and cheese . keep up the good work guys !","[this, place, is, great, !, finally, something, different, to, eat, on, mill, ave, !, i, was, so, tired, of, the, shitty, wannabe, mexican, food, on, mill, ., i, work, on, mill, avenue, so, when, i, heard, they, where, opening, a, new, bbq, place, i, was, already, excited, ., i, would, walk, by, everyday, waiting, for, this, place, to, open, it, s, doors, ,, til, finally, the, day, came, last, week, ., their, award, winning, smoked, meat, is, the, brisket, ,, it, was, incredibly, tender, and, just, melted, away, with, every, bite, ., their, spicy, bbq, sauce, is, made, in, house, ...]","[207, 208, 128, 72, 140, 1660, 176, 1341, 48, 193, 29, 5923, 9075, 140, 63, 3, 10, 1626, 36, 6, 2169, 15512, 190, 269, 29, 5923, 18, 63, 477, 29, 5923, 4202, 10, 378, 63, 1781, 74, 485, 1978, 49, 187, 728, 208, 63, 3, 228, 1883, 18, 63, 320, 214, 92, 907, 1269, 21, 207, 208, 48, 893, 16, 102, 2713, 14, 4104, 1660, 6, 440, 802, 345, 1053]"
37941,"stayed at the pallazo in mid june mid week n ncheck in was friendly and very helpful n nhowever if you have wifi in your room , the connection cuts in and out if you have an ipad . same happens to my friends mac . n nthe suites are very spacious n nand the staff was friendly . n nthis place is missing a buffet but it s close enough to the wynn and treasure island and the fashion shoes mall n n ni would stay here again","[stayed, at, the, pallazo, in, mid, june, mid, week, n, ncheck, in, was, friendly, and, very, helpful, n, nhowever, if, you, have, wifi, in, your, room, ,, the, connection, cuts, in, and, out, if, you, have, an, ipad, ., same, happens, to, my, friends, mac, ., n, nthe, suites, are, very, spacious, n, nand, the, staff, was, friendly, ., n, nthis, place, is, missing, a, buffet, but, it, s, close, enough, to, the, wynn, and, treasure, island, and, the, fashion, shoes, mall, n, n, ni, would, stay, here, again]","[2128, 5, 6, 21311, 13, 147, 2756, 147, 1053, 19, 3202, 13, 3, 358, 26, 196, 1522, 19, 5667, 126, 77, 297, 5298, 13, 157, 572, 14, 6, 2296, 8311, 13, 26, 222, 126, 77, 297, 449, 8984, 18, 607, 4556, 48, 104, 218, 406, 18, 19, 415, 4168, 143, 196, 3205, 19, 1137, 6, 357, 3, 358, 18, 19, 344, 208, 128, 2526, 49, 226, 15, 16, 102, 59]"


In [51]:
X_train['len_vec'] = X_train['encoded'].apply(lambda x: len(x))

In [52]:
X_train['len_vec'].mean()

70.0

In [53]:
X_val['len_vec'] = X_val['encoded'].apply(lambda x: len(x))

In [54]:
X_val['len_vec'].mean()

70.0

In [55]:
X_test['len_vec'] = X_test['encoded'].apply(lambda x: len(x))

In [56]:
X_test['len_vec'].mean()

70.0

## PyTorch

### Dataset

In [57]:
# In Deep Learning, we see tensors everywhere.
# In Numpy, you may have an array that has three dimensions. That's, technically speaking, a tensor.
# A scalar (a single number) has zero dimensions, a vector has one dimension, a matrix has two dimensions and a tensor has 
# three or more dimensions. That’s it!
# But, to keep things simple, it is commonplace to call vectors and matrices tensors as well — so, from now on, 
# everything is either a scalar or a tensor.


# What if I want my code to fallback to CPU if no GPU is available?
# You can use cuda.is_available() to find out if you have a GPU at your disposal and set your device accordingly.
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

  return torch._C._cuda_getDeviceCount() > 0


In [58]:
# PANDAS SERIES

print(type(X_train['encoded']))
print()
X_train['encoded'].head()

<class 'pandas.core.series.Series'>



391327                                                              [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 3, 17, 18, 19, 1, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 21, 33, 18, 6, 34, 35, 36, 30, 37, 3, 38, 26, 39, 40, 6, 41, 42, 20, 5, 6, 7, 18, 6, 43, 33, 3, 44, 45, 16, 46, 47, 48, 49, 38, 50, 51, 18]
10116     [6, 136, 137, 138, 13, 139, 140, 141, 26, 142, 143, 144, 145, 21, 49, 146, 70, 147, 148, 149, 14, 26, 143, 150, 151, 18, 19, 152, 153, 128, 154, 6, 136, 138, 48, 155, 48, 126, 19, 19, 77, 143, 156, 157, 158, 26, 82, 159, 14, 160, 14, 161, 19, 77, 143, 162, 48, 163, 49, 164, 165, 21, 49, 166, 167, 168, 19, 77, 169, 170]
269200                                                                                  [126, 77, 143, 190, 1, 191, 192, 193, 194, 18, 77, 75, 195, 196, 197, 18, 198, 3, 196, 199, 18, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
5372   

In [59]:
# ARRAY OF ARRAYS
# numpy array of 392,000 numpy arrays, i.e. a vector of shape 392,000x1 made up of 392,000 vectors of shape 70x1

print(type(X_train['encoded'].values))
print()
X_train['encoded'].values   

<class 'numpy.ndarray'>



array([array([ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,  3, 17,
       18, 19,  1, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 21,
       33, 18,  6, 34, 35, 36, 30, 37,  3, 38, 26, 39, 40,  6, 41, 42, 20,
        5,  6,  7, 18,  6, 43, 33,  3, 44, 45, 16, 46, 47, 48, 49, 38, 50,
       51, 18]),
       array([  6, 136, 137, 138,  13, 139, 140, 141,  26, 142, 143, 144, 145,
        21,  49, 146,  70, 147, 148, 149,  14,  26, 143, 150, 151,  18,
        19, 152, 153, 128, 154,   6, 136, 138,  48, 155,  48, 126,  19,
        19,  77, 143, 156, 157, 158,  26,  82, 159,  14, 160,  14, 161,
        19,  77, 143, 162,  48, 163,  49, 164, 165,  21,  49, 166, 167,
       168,  19,  77, 169, 170]),
       array([126,  77, 143, 190,   1, 191, 192, 193, 194,  18,  77,  75, 195,
       196, 197,  18, 198,   3, 196, 199,  18,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0, 

In [60]:
# LIST OF ARRAYS

print(type(list(X_train['encoded'].values)))
print()
list(X_train['encoded'].values)

<class 'list'>



[array([ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,  3, 17,
        18, 19,  1, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 21,
        33, 18,  6, 34, 35, 36, 30, 37,  3, 38, 26, 39, 40,  6, 41, 42, 20,
         5,  6,  7, 18,  6, 43, 33,  3, 44, 45, 16, 46, 47, 48, 49, 38, 50,
        51, 18]),
 array([  6, 136, 137, 138,  13, 139, 140, 141,  26, 142, 143, 144, 145,
         21,  49, 146,  70, 147, 148, 149,  14,  26, 143, 150, 151,  18,
         19, 152, 153, 128, 154,   6, 136, 138,  48, 155,  48, 126,  19,
         19,  77, 143, 156, 157, 158,  26,  82, 159,  14, 160,  14, 161,
         19,  77, 143, 162,  48, 163,  49, 164, 165,  21,  49, 166, 167,
        168,  19,  77, 169, 170]),
 array([126,  77, 143, 190,   1, 191, 192, 193, 194,  18,  77,  75, 195,
        196, 197,  18, 198,   3, 196, 199,  18,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  

In [61]:
# Convert pd.Series to PyTorch Tensors

# NB: set the values in X_train, X_val and X_test as a list of arrays (as opposed to array of arrays) --- see above

x_train_tensor = torch.Tensor(list(X_train['encoded'].values))
x_val_tensor = torch.Tensor(list(X_val['encoded'].values))
x_test_tensor = torch.Tensor(list(X_test['encoded'].values))
y_train_tensor = torch.Tensor(list(y_train.values))
y_val_tensor = torch.Tensor(list(y_val.values))
y_test_tensor = torch.Tensor(list(y_test.values))

In [62]:
print(x_train_tensor.requires_grad)
print(x_val_tensor.requires_grad)
print(x_test_tensor.requires_grad)
print(y_train_tensor.requires_grad)
print(y_val_tensor.requires_grad)
print(y_test_tensor.requires_grad)

False
False
False
False
False
False


In [63]:
print(type(x_train_tensor))
print(type(x_val_tensor))
print(type(x_test_tensor))
print()
print(type(y_train_tensor))
print(type(y_val_tensor))
print(type(y_test_tensor))

<class 'torch.Tensor'>
<class 'torch.Tensor'>
<class 'torch.Tensor'>

<class 'torch.Tensor'>
<class 'torch.Tensor'>
<class 'torch.Tensor'>


In [64]:
print(x_train_tensor.shape)
print(x_val_tensor.shape)
print(x_test_tensor.shape)
print()
print(y_train_tensor.shape)
print(y_val_tensor.shape)
print(y_test_tensor.shape)

torch.Size([392000, 70])
torch.Size([168000, 70])
torch.Size([38000, 70])

torch.Size([392000])
torch.Size([168000])
torch.Size([38000])


In [65]:
# Create a full dataset (like a DataFrame in Pandas) from the two tensors

train_dataset =  TensorDataset(x_train_tensor, y_train_tensor)
val_dataset = TensorDataset(x_val_tensor, y_val_tensor)
test_dataset = TensorDataset(x_test_tensor, y_test_tensor)

In [66]:
len_train = len(train_dataset)
len_val = len(val_dataset)
len_test = len(test_dataset)

In [67]:
print(len_train)
print(len_val)
print(len_test)

392000
168000
38000


## DataLoader

In [68]:
# For small dataset is fine to use the whole training data at every training step (i.e. batch gradient descent). 
# If we want to go serious about all this, we must use mini-batch gradient descent. Thus, we need mini-batches. 
# Thus, we need to slice our dataset accordingly. Do you want to do it manually?! Me neither!
# So we use the 'DataLoader' class for this job. We tell it which dataset to use, the desired mini-batch size and if we’d 
# like to shuffle it or not. That’s it!
# Our loader will behave like an iterator, so we can loop over it and fetch a different mini-batch every time.

train_loader = DataLoader(dataset=train_dataset, batch_size=1048, shuffle=True)
val_loader = DataLoader(dataset=val_dataset, batch_size=1048, shuffle=False)
test_loader = DataLoader(dataset=test_dataset, batch_size=1048, shuffle=False)

# To retrieve a sample mini-batch, one can simply run the command below.
# It will return a list containing two tensors: one for the features, another one for the labels:
# next(iter(train_loader))

In [69]:
# for x, y in train_loader:
#     print(x)
#     print(y)
#     print('-' * 70)

## A Perceptron Classifier

The model we use in this example is a Perceptron classifier. 
The 'PerceptronClassifier' inherits from PyTorch’s Module and creates a single Linear layer with a single
output. Because this is a binary classification setting (negative or positive review), this is an appropriate setup. 

The sigmoid function is used as the finalnonlinearity.

We parameterize the forward() method to allow for the sigmoid function to be optionally applied. To understand why, it is important to first point out that in a binary classification task, binary cross-entropy loss (torch.nn.BCELoss()) is
the most appropriate loss function. It is mathematically formulated for binary probabilities. However, there are numerical stability issues with applying a sigmoid and then using this loss function. To provide its users with shortcuts that
are more numerically stable, PyTorch provides BCEWithLogitsLoss(). To use this loss function, the output should not have the sigmoid function applied. Therefore, by default, we do not apply the sigmoid. However, in the case that the
user of the classifier would like a probability value, the sigmoid is required, and it is left as an option.

In [70]:
#import torch.nn as nn
#import torch.nn.functional as F

class MLPClassifier(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):   # input_dim: the size of the input feature vector
        super(MLPClassifier, self).__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, output_dim)

    def forward(self, x_in, apply_sigmoid=False):
    # Note that we don't use the Sigmoid activation in our final layer during training because we use the 
    # nn.BCEWithLogitsLoss() loss function which automatically applies the the Sigmoid activation.
        hidden_layer = F.relu(self.fc1(x_in))
        output_layer = self.fc2(hidden_layer).squeeze(1)
        
        if apply_sigmoid:
            output_layer = F.sigmoid(output_layer, dim=1)
        return output_layer

## The Training Routine

At its core, the training routine is responsible for instantiating the model, iterating over the dataset, computing the output of the model when given the data as input, computing the loss (how wrong the model is), and updating the model proportional to the loss. 

Although this may seem like a lot of details to manage, there are not many places to change the training routine, and as such it will become habitual in your deep learning development process.

In [71]:
input_dim = x_train_tensor.shape[1]
hidden_dim = 300
output_dim = 1

In [72]:
# Create a model.
# We need to send our model to the same device where the data is. If our data is made of GPU tensors, 
# our model must “live” inside the GPU as well. That's what '.to(device)' is there for.

classifier = MLPClassifier(input_dim=input_dim, hidden_dim=hidden_dim, output_dim=output_dim).to(device)

In [73]:
print(classifier)

MLPClassifier(
  (fc1): Linear(in_features=70, out_features=300, bias=True)
  (fc2): Linear(in_features=300, out_features=1, bias=True)
)


In [74]:
# loss and optimizer
loss_func = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(classifier.parameters(), lr=0.001)

In [75]:
def binary_acc(y_hat, y):
    y_hat_label = torch.round(torch.sigmoid(y_hat))

    correct_predictions_sum = (y_hat_label == y).sum().float()
    acc = correct_predictions_sum/y.shape[0]
    acc = torch.round(acc * 100)
    
    return acc

# The training loop

The training loop is composed of two loops: an inner loop over minibatches in the dataset, and an outer loop, which repeats the inner loop a number of times. In the inner loop, losses are computed for each minibatch, and the optimizer is used to
update the model parameters.

In [76]:
n_epochs = 50

print('Starting training')
print()

# Print model parameters before training
#print(classifier.state_dict())  #equivalent to line below
#print()
print(list(classifier.parameters()))
print()


# Enumerate epochs
for epoch in range(n_epochs):
    
    # Training part
    classifier.train()
    
    epoch_train_loss = 0
    epoch_train_acc = 0
    
    for i, (x_train, y_train) in enumerate(train_loader):
        x_train = x_train.to(device)
        y_train = y_train.unsqueeze(1) 
        y_train = y_train.to(device)

        # Clear the gradients
        optimizer.zero_grad()
        
        # Forward propagation: compute the model output (i.e. predictions)
        y_pred = classifier(x_in=x_train)
        y_pred = y_pred.unsqueeze(1)
        
        #print(x_train.requires_grad, y_train.requires_grad, y_pred.requires_grad)
        #print(x_train.shape, y_train.shape, y_pred.shape)
                                                                                                        
        # Loss calculation
        t_loss = loss_func(y_pred, y_train)
        
        # Accuracy
        t_acc = binary_acc(y_pred, y_train)
        
        # Backward propagation: use loss to produce gradients
        t_loss.backward()
        
        # Weight optimization: use optimizer to take gradient step and update parameters (w,b) 
        optimizer.step()
        
        epoch_train_loss += t_loss.item()
        epoch_train_acc += t_acc.item()
                                                                                                             
   
    # Evaluation part
    classifier.eval() # .eval() tells PyTorch that we do not want to perform back-propagation during inference
    
    epoch_val_loss = 0
    epoch_val_acc = 0
    
    #We use torch.no_grad() which reduces memory usage and speeds up computation.
    with torch.no_grad():     #https://discuss.pytorch.org/t/model-eval-vs-with-torch-no-grad/19615/3 : torch.no_grad() deals with the autograd engine and stops it from calculating the gradients, which is the recommended way of doing validation
        for i, (x_val, y_val) in enumerate(val_loader):
            x_val = x_val.to(device)
            y_val = y_val.unsqueeze(1) 
            y_val = y_val.to(device)
        
            # Forward propagation: compute the model output (i.e. predictions)
            y_pred = classifier(x_in=x_val)     #tensors of probabilities
            y_pred = y_pred.unsqueeze(1)
        
            # Loss calculation
            v_loss = loss_func(y_pred, y_val)  
            
            # Accuracy
            v_acc = binary_acc(y_pred, y_val)
            
            epoch_val_loss += v_loss.item()
            epoch_val_acc += v_acc.item()
            
    print('Epoch: {} | Train Loss: {:.3f} | Val Loss: {:.3f} | Train Acc: {:.3f} | Val Acc: {:.3f}'.format(epoch,
                                                                                epoch_train_loss/len(train_loader),
                                                                                epoch_val_loss/len(val_loader),
                                                                                epoch_train_acc/len(train_loader),
                                                                                epoch_val_acc/len(val_loader)))

print()
print('Training complete')
print()

# Print model parameters after training
#print(classifier.state_dict())   #equivalent to line below
#print()
print(list(classifier.parameters()))

Starting training

[Parameter containing:
tensor([[-0.0962, -0.1143, -0.0610,  ...,  0.0348,  0.0977, -0.0272],
        [ 0.0391, -0.0953,  0.0944,  ..., -0.1064, -0.0047,  0.0708],
        [ 0.1099,  0.0971,  0.0887,  ...,  0.0204,  0.0130,  0.0643],
        ...,
        [-0.0880,  0.0787, -0.0488,  ...,  0.0218, -0.0442,  0.0688],
        [-0.0933, -0.0847,  0.0993,  ...,  0.0538,  0.0910, -0.0162],
        [ 0.0072, -0.0068, -0.1073,  ..., -0.0896,  0.0960,  0.0304]],
       requires_grad=True), Parameter containing:
tensor([-0.0413,  0.0916,  0.0650,  0.0350,  0.1114, -0.0785, -0.0951,  0.0264,
        -0.1068,  0.0801,  0.0532, -0.0787,  0.0465,  0.0496,  0.0122, -0.0627,
         0.0545,  0.0385,  0.0578, -0.0319,  0.0339,  0.0343,  0.1164, -0.0397,
         0.0335,  0.0975, -0.0642,  0.0454,  0.1126, -0.0212,  0.0700, -0.1188,
        -0.1115, -0.0672,  0.0199, -0.0987, -0.1091, -0.1189,  0.0143, -0.0112,
        -0.0970,  0.0625,  0.0020,  0.0331, -0.1169,  0.0069,  0.0286, -0.

Epoch: 18 | Train Loss: 0.721 | Val Loss: 0.737 | Train Acc: 51.856 | Val Acc: 51.273
Epoch: 19 | Train Loss: 0.714 | Val Loss: 0.706 | Train Acc: 52.048 | Val Acc: 52.335
Epoch: 20 | Train Loss: 0.705 | Val Loss: 0.703 | Train Acc: 52.256 | Val Acc: 52.280
Epoch: 21 | Train Loss: 0.701 | Val Loss: 0.704 | Train Acc: 52.416 | Val Acc: 50.845
Epoch: 22 | Train Loss: 0.699 | Val Loss: 0.697 | Train Acc: 52.296 | Val Acc: 52.478
Epoch: 23 | Train Loss: 0.696 | Val Loss: 0.697 | Train Acc: 52.539 | Val Acc: 52.087
Epoch: 24 | Train Loss: 0.694 | Val Loss: 0.696 | Train Acc: 52.696 | Val Acc: 52.118
Epoch: 25 | Train Loss: 0.693 | Val Loss: 0.695 | Train Acc: 52.928 | Val Acc: 52.292
Epoch: 26 | Train Loss: 0.693 | Val Loss: 0.695 | Train Acc: 52.787 | Val Acc: 52.273
Epoch: 27 | Train Loss: 0.693 | Val Loss: 0.694 | Train Acc: 52.661 | Val Acc: 52.360
Epoch: 28 | Train Loss: 0.692 | Val Loss: 0.694 | Train Acc: 52.787 | Val Acc: 52.509
Epoch: 29 | Train Loss: 0.691 | Val Loss: 0.694 | Trai

## Evaluating on test data

To evaluate the data on the held-out test set, the code is exactly the same as the validation loop in the training routine we saw in the previous step. 

The test set should be run as little as possible. Each time you run a trained model on the test set, make a new model decision (such as changing the size of the layers), and remeasure the new retrained model on the test set, you are biasing your
modeling decisions toward the test data. In other words, if you repeat that process often enough, the test set will become meaningless as an accurate measure of truly held-out data.

In [77]:
classifier.eval()  

test_loss = 0
test_acc = 0
    
with torch.no_grad():     #torch.no_grad() deals with the autograd engine and stops it from calculating the gradients, which is the recommended way of doing validation
    for i, (x_test, y_test) in enumerate(test_loader):
        x_test = x_test.to(device)
        y_test = y_test.unsqueeze(1) 
        y_test = y_test.to(device)
        
        # Forward propagation: compute the model output (i.e. predictions)
        y_pred = classifier(x_in=x_test)     #tensors of probabilities
        y_pred = y_pred.unsqueeze(1)
        
        # Loss calculation
        tst_loss = loss_func(y_pred, y_test)
        
        # Accuracy
        tst_acc = binary_acc(y_pred, y_test)
            
        test_loss += tst_loss.item()
        test_acc += tst_acc.item()
            
print('Test Loss: {:.3f} | Test Acc: {:.3f}'.format(test_loss/len(test_loader), test_acc/len(test_loader)))

print()
print('Done!')

Test Loss: 0.691 | Test Acc: 53.811

Done!


## Inference and classifying new data points

Another method for evaluating the model is to do inference on new data and make qualitative judgments about whether the model is working.

## Inspecting model weights

Finally, the last way to understand whether a model is doing well after it has finished training is to inspect the weights and make qualitative judgments about whether they seem correct.

In [78]:
# Sort weights

fc2_weights = classifier.fc2.weight.detach()[0]
_, indices = torch.sort(fc2_weights, dim=0, descending=True)
indices = indices.numpy().tolist()

In [79]:
#print(len(indices))
#indices

In [80]:
#indices[:20]

In [81]:
#indices[-20:]

In [82]:
#indices[:-21:-1]

In [83]:
# Top 20 words

print("Influential words in Positive Reviews:")
print("--------------------------------------")

for idx in indices[:20]:
    print(idx, words[idx])

Influential words in Positive Reviews:
--------------------------------------
173 design
45 then
144 always
37 set
20 played
32 helmet
227 sushi
66 attended
217 games
114 ?
48 to
18 .
71 has
43 next
3 was
263 never
107 experience
78 with
210 cool
209 dim


In [84]:
# Top 20 negative words

print("Influential words in Negative Reviews:")
print("--------------------------------------")

for idx in indices[:-21:-1]:
    print(idx, words[idx])

Influential words in Negative Reviews:
--------------------------------------
278 uncomfortable
117 did
293 head
284 tip
68 let
124 party
129 headline
139 champaign
125 even
275 dining
73 employees
192 not
31 signature
294 over
4 playing
38 boring
21 for
240 long
103 express
9 club


# Regularizing MLPs: Weight Regularization and Structural Regularization (or Dropout)

Regularization is a solution for the overfitting problem. There are two important types of weight regularization — L1 and L2.
These weight regularization methods also apply to MLPs as well as convolutional neural networks. 

In addition to weight regularization, a structural regularization approach called dropout becomes very important.

In simple terms, **dropout probabilistically drops connections between units belonging to two adjacent layers during training**.

Neural networks — especially deep networks with a large number of layers — can create interesting co-adaptation between the units. “Coadaptation” is a term from neuroscience, but here it simply refers to a situation in which the connection
between two units becomes excessively strong at the expense of connections between other units. This usually results in the model overfitting to the data. By probabilistically dropping connections between units, we can ensure no single unit will always depend on another single unit, leading to robust models.

Dropout does not add additional parameters to the model, but requires a single hyperparameter — the “drop probability.” This is
the probability with which the connections between units are dropped. It is typical to set the drop probability to 0.5.

## MLP with Dropout

**It is important to note that dropout is applied only during training and not during evaluation.**

In [85]:
#import torch.nn as nn
#import torch.nn.functional as F

class MLPDropoutClf(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):   # input_dim: the size of the input feature vector
        super(MLPDropoutClf, self).__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, output_dim)

    def forward(self, x_in, apply_sigmoid=False):
    # Note that we don't use the Sigmoid activation in our final layer during training because we use the 
    # nn.BCEWithLogitsLoss() loss function which automatically applies the the Sigmoid activation.
        hidden_layer = F.relu(self.fc1(x_in))
        output_layer = self.fc2(F.dropout(hidden_layer, p=0.5)).squeeze(1)
        
        if apply_sigmoid:
            output_layer = F.sigmoid(output_layer, dim=1)
        return output_layer

In [86]:
input_dim = x_train_tensor.shape[1]
hidden_dim = 300
output_dim = 1

In [87]:
# Create a model.
# We need to send our model to the same device where the data is. If our data is made of GPU tensors, 
# our model must “live” inside the GPU as well. That's what '.to(device)' is there for.

classifier = MLPClassifier(input_dim=input_dim, hidden_dim=hidden_dim, output_dim=output_dim).to(device)

In [88]:
# loss and optimizer
loss_func = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(classifier.parameters(), lr=0.001)

In [89]:
n_epochs = 50

print('Starting training')
print()

# Print model parameters before training
#print(classifier.state_dict())  #equivalent to line below
#print()
print(list(classifier.parameters()))
print()


# Enumerate epochs
for epoch in range(n_epochs):
    
    # Training part
    classifier.train()
    
    epoch_train_loss = 0
    epoch_train_acc = 0
    
    for i, (x_train, y_train) in enumerate(train_loader):
        x_train = x_train.to(device)
        y_train = y_train.unsqueeze(1) 
        y_train = y_train.to(device)

        # Clear the gradients
        optimizer.zero_grad()
        
        # Forward propagation: compute the model output (i.e. predictions)
        y_pred = classifier(x_in=x_train)
        y_pred = y_pred.unsqueeze(1)
        
        #print(x_train.requires_grad, y_train.requires_grad, y_pred.requires_grad)
        #print(x_train.shape, y_train.shape, y_pred.shape)
                                                                                                        
        # Loss calculation
        t_loss = loss_func(y_pred, y_train)
        
        # Accuracy
        t_acc = binary_acc(y_pred, y_train)
        
        # Backward propagation: use loss to produce gradients
        t_loss.backward()
        
        # Weight optimization: use optimizer to take gradient step and update parameters (w,b) 
        optimizer.step()
        
        epoch_train_loss += t_loss.item()
        epoch_train_acc += t_acc.item()
                                                                                                             
   
    # Evaluation part
    classifier.eval() # .eval() tells PyTorch that we do not want to perform back-propagation during inference
    
    epoch_val_loss = 0
    epoch_val_acc = 0
    
    #We use torch.no_grad() which reduces memory usage and speeds up computation.
    with torch.no_grad():     #https://discuss.pytorch.org/t/model-eval-vs-with-torch-no-grad/19615/3 : torch.no_grad() deals with the autograd engine and stops it from calculating the gradients, which is the recommended way of doing validation
        for i, (x_val, y_val) in enumerate(val_loader):
            x_val = x_val.to(device)
            y_val = y_val.unsqueeze(1) 
            y_val = y_val.to(device)
        
            # Forward propagation: compute the model output (i.e. predictions)
            y_pred = classifier(x_in=x_val)     #tensors of probabilities
            y_pred = y_pred.unsqueeze(1)
        
            # Loss calculation
            v_loss = loss_func(y_pred, y_val)  
            
            # Accuracy
            v_acc = binary_acc(y_pred, y_val)
            
            epoch_val_loss += v_loss.item()
            epoch_val_acc += v_acc.item()
            
    print('Epoch: {} | Train Loss: {:.3f} | Val Loss: {:.3f} | Train Acc: {:.3f} | Val Acc: {:.3f}'.format(epoch,
                                                                                epoch_train_loss/len(train_loader),
                                                                                epoch_val_loss/len(val_loader),
                                                                                epoch_train_acc/len(train_loader),
                                                                                epoch_val_acc/len(val_loader)))

print()
print('Training complete')
print()

# Print model parameters after training
#print(classifier.state_dict())   #equivalent to line below
#print()
print(list(classifier.parameters()))

Starting training

[Parameter containing:
tensor([[ 0.0417, -0.0786,  0.0139,  ..., -0.0589,  0.0801,  0.0903],
        [ 0.0723, -0.0565,  0.0946,  ...,  0.0465,  0.0791, -0.1172],
        [-0.0899, -0.0619, -0.0685,  ..., -0.0264,  0.0311, -0.0446],
        ...,
        [ 0.0244,  0.0158,  0.0024,  ..., -0.0861,  0.1102, -0.0193],
        [ 0.0267, -0.0733, -0.0617,  ...,  0.0996,  0.0555,  0.0716],
        [ 0.0288, -0.1115,  0.0632,  ..., -0.0613, -0.0934,  0.0169]],
       requires_grad=True), Parameter containing:
tensor([ 0.0925, -0.0732,  0.0741,  0.0874,  0.0240,  0.0553,  0.0609, -0.0818,
         0.1018,  0.0480, -0.0612, -0.0992, -0.0714,  0.0385, -0.0970, -0.1077,
         0.1186, -0.1039, -0.0303,  0.0684, -0.0815,  0.0320,  0.0301,  0.0286,
         0.0736, -0.0216, -0.1010, -0.0262,  0.0299, -0.0138, -0.0572,  0.0728,
        -0.0044,  0.0832, -0.0551, -0.0832,  0.0489, -0.0823,  0.0314, -0.1188,
         0.0808, -0.0320,  0.0836,  0.1134,  0.1095, -0.1077,  0.1100, -0.

Epoch: 18 | Train Loss: 0.707 | Val Loss: 0.708 | Train Acc: 52.208 | Val Acc: 52.056
Epoch: 19 | Train Loss: 0.702 | Val Loss: 0.701 | Train Acc: 52.440 | Val Acc: 52.155
Epoch: 20 | Train Loss: 0.698 | Val Loss: 0.698 | Train Acc: 52.688 | Val Acc: 52.578
Epoch: 21 | Train Loss: 0.696 | Val Loss: 0.695 | Train Acc: 52.899 | Val Acc: 52.193
Epoch: 22 | Train Loss: 0.694 | Val Loss: 0.696 | Train Acc: 53.155 | Val Acc: 53.273
Epoch: 23 | Train Loss: 0.693 | Val Loss: 0.694 | Train Acc: 53.333 | Val Acc: 52.888
Epoch: 24 | Train Loss: 0.692 | Val Loss: 0.694 | Train Acc: 53.443 | Val Acc: 53.273
Epoch: 25 | Train Loss: 0.691 | Val Loss: 0.693 | Train Acc: 53.637 | Val Acc: 53.416
Epoch: 26 | Train Loss: 0.691 | Val Loss: 0.693 | Train Acc: 53.509 | Val Acc: 52.863
Epoch: 27 | Train Loss: 0.691 | Val Loss: 0.693 | Train Acc: 53.515 | Val Acc: 53.211
Epoch: 28 | Train Loss: 0.691 | Val Loss: 0.693 | Train Acc: 53.645 | Val Acc: 53.292
Epoch: 29 | Train Loss: 0.691 | Val Loss: 0.694 | Trai

In [90]:
classifier.eval()  

test_loss = 0
test_acc = 0
    
with torch.no_grad():     #torch.no_grad() deals with the autograd engine and stops it from calculating the gradients, which is the recommended way of doing validation
    for i, (x_test, y_test) in enumerate(test_loader):
        x_test = x_test.to(device)
        y_test = y_test.unsqueeze(1) 
        y_test = y_test.to(device)
        
        # Forward propagation: compute the model output (i.e. predictions)
        y_pred = classifier(x_in=x_test)     #tensors of probabilities
        y_pred = y_pred.unsqueeze(1)
        
        # Loss calculation
        tst_loss = loss_func(y_pred, y_test)
        
        # Accuracy
        tst_acc = binary_acc(y_pred, y_test)
            
        test_loss += tst_loss.item()
        test_acc += tst_acc.item()
            
print('Test Loss: {:.3f} | Test Acc: {:.3f}'.format(test_loss/len(test_loader), test_acc/len(test_loader)))

print()
print('Done!')

Test Loss: 0.691 | Test Acc: 54.000

Done!
