El objetivo de este problema consiste en implementar una red neuronal que permita clasificar\
la revisiones realizadas en IMDB como positivas o negativas. Para ello vamos a trabajar con\
los datos que provee la librería keras (datasets.imbd.load_data(num_words=10000)).\
Los datos se encuentran codificados según el diccionario imdb.get_word_index(), donde\
cada palabra esta codificada con un número. Antes de poder trabajar con los datos es nece-\
sario preprocesar los mismos de forma similar a cómo se hizo con la BD mnist. Analizar el\
rendimiento de la red propuesta, y en caso de overfitting estudie el impacto que tienen las\
distintas variantes de regularización vistas en clase para mejorar la generalización de la red\
(𝐿2, BN y Drop out).

In [1]:
import numpy as np
import pandas as pd
from tensorflow import keras
from matplotlib import pyplot as plt

%matplotlib inline

In [39]:
(x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(num_words=10000)

word_index = keras.datasets.imdb.get_word_index()

In [48]:
top_values = [i for i in word_index if word_index[i]<10]
print(top_values)

['it', 'is', 'in', 'of', 'a', 'br', 'the', 'and', 'to']


In [68]:
def get_word(word_index, idx):
    l = [word for word in word_index if word_index[word] == idx]
    return l[0]

#Como las reviews ya vienen ordenadas por index, no puedo imprimir el texto,
#Solo una secuenca de palabras ordenadas por frecuencia de aparicion
def print_review(review, word_index):
    streview = []
    for idx in review:
        streview.append(get_word(word_index, idx))
    print(' '.join(streview))
    return None

print_review(x_train[9], word_index)

the as on there plot she's iii film that for find that saw better just is along wrong silly awesome or play this you doing was one in own that successful are make and old plot gets unfortunately of on was although except value omar that with her do they gets for that with timing really way that is played character i i what poor set but is along 100 studio on film is missing br received fact to is mercifully br fabulous and them powers is tapes br enjoys indicate good women show to one good played i i was plain film because avoid for of totally it time do period it couple in college in viewers get br of my to of material it yet br out more


In [91]:
X = [1, 3, 4, 1, 1, 1, 5, 6, 9, 9]
d = {x:X.count(x) for x in X}
data = np.zeros((1, 10))
data[0, list(d.keys())]+=list(d.values())
data

array([[0., 4., 0., 1., 1., 1., 1., 0., 0., 2.]])

In [92]:
def preprocess(X, num_words=10000):
    m = len(X)
    
    data = np.zeros((m, num_words), dtype=float)

    for i, review in enumerate(X):
        d = {idx:review.count(idx) for idx in review}
        data[i, list(d.keys())] += list(d.values())
    
    return data

In [93]:
Xtrain = preprocess(x_train)
Xtrain.shape

(25000, 10000)

In [95]:
Xtest = preprocess(x_test)
Xtest.shape

(25000, 10000)

Una vez preprocesados los datos, vamos a implementar una red neuronal de 2 capas ocultas de 30 neuronas cada una

In [105]:
opt = keras.optimizers.Adam(learning_rate=.0001)

input = keras.layers.Input(shape=(10000,))
l1 = keras.layers.Dense(10, activation='relu', use_bias=True, kernel_regularizer=keras.regularizers.L2(0))(input)
l2 = keras.layers.Dense(25, activation='relu', use_bias=True, kernel_regularizer=keras.regularizers.L2(0))(l1)
output = keras.layers.Dense(1, activation='linear', use_bias=True)(l2)

model = keras.Model(inputs=input, outputs=output)

model.summary()

Model: "model_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_4 (InputLayer)        [(None, 10000)]           0         
                                                                 
 dense_9 (Dense)             (None, 10)                100010    
                                                                 
 dense_10 (Dense)            (None, 25)                275       
                                                                 
 dense_11 (Dense)            (None, 1)                 26        
                                                                 
Total params: 100,311
Trainable params: 100,311
Non-trainable params: 0
_________________________________________________________________


In [106]:
model.compile(optimizer=opt, loss=keras.losses.BinaryCrossentropy(from_logits=True), 
              metrics=[keras.metrics.BinaryAccuracy(threshold=.8)])
hist3 = model.fit(Xtrain, y_train, validation_data=(Xtest, y_test), epochs=2, 
                  batch_size=2048, verbose=2)

Epoch 1/2
13/13 - 47s - loss: 0.6838 - binary_accuracy: 0.5038 - val_loss: 0.6763 - val_binary_accuracy: 0.5030 - 47s/epoch - 4s/step
Epoch 2/2
13/13 - 2s - loss: 0.6666 - binary_accuracy: 0.5117 - val_loss: 0.6608 - val_binary_accuracy: 0.5159 - 2s/epoch - 173ms/step
