#### Data description
This dataset has three columns - label (party name), twitter handle, tweet text


#### Problem Description:

Design a feed forward deep neural network to predict the political party using the pytorch or tensorflow. 
Build two models

1. Without using the handle

2. Using the handle


#### Deliverables

- Report the performance on the test set.

- Try multiple models and with different hyperparameters. Present the results of each model on the test set. No need to create a dev set.

- Experiment with:
    -L2 and dropout regularization techniques
    -SGD, RMSProp and Adamp optimization techniques



- Creating a fixed-sized vocabulary: Give a unique id to each word in your selected vocabulary and use it as the input to the network

    - Option 1: Feedforward networks can only handle fixed-sized inputs. You can choose to have a fixed-sized K words from the tweet text (e.g. the first K word, randomly selected K word etc.). K can be a hyperparameter. 

    - Option 2: you can choose top N (e.g. N=1000) frequent words from the dataset and use an N-sized input layer. If a word is present in a tweet, pass the id, 0 otherwise
    
    -  Clearly state your design choices and assumptions. Think about the pros and cons of each option.

 

<b> Tabulate your results, either at the end of the code file or in the text box on the submission page. The final result should have:</b>

1. Experiment description

2. Hyperparameter used and their values

3. Performance on the test set

 

### Imports

In [None]:
from keras import layers, losses
from keras.models import Sequential
from keras.preprocessing import text
from keras.utils import np_utils
from random import randrange

import keras
import keras.backend as K
import numpy as np
import os
import pandas as pd
import random
import tensorflow as tf

PATH = r"C:\Users\samue\Documents\Applied Data Science\INFO-H518 Deep Learning\Assignments\A3\Input"

In [2]:
# Grab the data
train = pd.read_pickle(PATH + r'\train_tokenized.pickle').dropna().sample(frac=1)
train_vocab = pd.read_csv(PATH + r'\train_vocab_frequency.csv', index_col=0).dropna()
test = pd.read_pickle(PATH + r'\test_tokenized.pickle').dropna().sample(frac=1)
test_vocab = pd.read_csv(PATH + r'\test_vocab_frequency.csv', index_col=0).dropna()
vocab = train_vocab.append(test_vocab).reset_index()

# Change Party/Handle to Categoricals
train.Party = pd.Categorical(train.Party)
train['Party'] = train.Party.cat.codes

train_hands = train.Handle.unique().rename(columns={'Handle':'Terms'})
train.Handle = pd.Categorical(train.Handle)
train['Handle'] = train.Handle.cat.codes

test.Party = pd.Categorical(test.Party)
test['Party'] = test.Party.cat.codes

test
test.Handle = pd.Categorical(test.Handle)
test['Handle'] = test.Handle.cat.codes

# Remove 10% of the vocabulary (specifcially infrequent terms)
vocab_cut = int(train_vocab.shape[0] * (10 / 100))
vocab = vocab.iloc[vocab_cut:].drop_duplicates('Terms').reset_index().drop(columns=['index', 'level_0'])
vocab = {v:k for k, v in vocab.to_dict()['Terms'].items()}

# Provide each term a unique id
train_words_i = {v:vocab[v] for v in list(train_vocab.to_dict()['Terms'].values()) if v in vocab.keys()}
test_words_i = {v:vocab[v] for v in list(test_vocab.to_dict()['Terms'].values()) if v in vocab.keys()}

## Model #1: Select the first K words from the Tweet

In [4]:
def first_k_words(words_i:dict, data: pd.DataFrame, k: int):
    for i, row in data.iterrows():
        terms, token_count = [], len(row['Tokens'])
        if token_count > 0:
            j = 0
            while len(terms) < k:
                if row['Tokens'][j] in words_i.keys():
                    terms.append(words_i[row['Tokens'][j]])
                else:
                    terms.append(np.float32(0))
                j = j+1 if j+1 < token_count else 0
        else:
            continue
        x = np.array(terms, dtype=np.float32)
        y = row['Party']
        yield (x, y)

### M1 | Build the Data

In [5]:
# Parameters
epochs = 5
embedding_dim = 32
k = 20

# Train
print(" Build Training Data...\n[#", end='')
train_data, train_labels = [], []
for x, y in first_k_words(train_words_i, train, k):
    train_data.append(x)
    train_labels.append(y)
    if len(train_labels) % int(train.shape[0]*.1) == 0: 
        print(f"##", end='')
train_labels = tf.convert_to_tensor(train_labels, dtype=tf.float32)
train_data = tf.cast(train_data, dtype=tf.float32)
print("#]")

# Validation
split = round(train_data.shape[0]*.8)
valid_data, valid_labels = train_data[split:], train_labels[split:]
train_data, train_labels = train_data[:split], train_labels[:split]

# Test
print(" Build Testing Data...\n[#", end='')
test_data, test_labels = [], []
for x, y in first_k_words(test_words_i, test, k):
    test_data.append(x)
    test_labels.append(y)
    if len(test_labels) % int(test.shape[0]*.1) == 0: 
        print(f"##", end='')
    
test_labels = tf.convert_to_tensor(test_labels, dtype=tf.float32)
test_data = tf.cast(test_data, dtype=tf.float32)
print("#]")

(train_data[0], train_labels[0])

 Build Training Data...
[####################]
 Build Testing Data...
[####################]


(<tf.Tensor: shape=(20,), dtype=float32, numpy=
 array([63422., 63869., 63120., 63874., 63647., 61277., 63746., 63703.,
        63784., 63871., 63863., 63725., 63442., 63817., 63858., 63822.,
        62282., 63854., 63422., 63869.], dtype=float32)>,
 <tf.Tensor: shape=(), dtype=float32, numpy=0.0>)

### M1 | Create the model

In [6]:
# Structure
k_words = Sequential([
    layers.Embedding(len(vocab)+1, embedding_dim),
    layers.Dropout(0.1),
    layers.GlobalAveragePooling1D(),
    layers.Dropout(0.2),
    layers.Dense(1)
])

k_words.compile(
    loss=losses.BinaryCrossentropy(from_logits=True),
    optimizer=keras.optimizers.Adam(
        learning_rate=0.002,
        beta_1=0.8,
        beta_2=0.999,
        epsilon=1e-07,
        amsgrad=False,
        name='Adam'
    ),
    metrics=tf.metrics.BinaryAccuracy(threshold=0.0)
)

### M1 | Training

In [7]:
history = k_words.fit(
    train_data,
    train_labels,
    validation_data=(valid_data, valid_labels),
    epochs=epochs
)
k_words.save(r'C:\Users\samue\Documents\Applied Data Science\INFO-H518 Deep Learning\Assignments\A3\Models\M1')

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
INFO:tensorflow:Assets written to: C:\Users\samue\Documents\Applied Data Science\INFO-H518 Deep Learning\Assignments\A3\Models\M1\assets


### M1 | Testing

In [8]:
loss, accuracy = k_words.evaluate(test_data, test_labels)
print(f"Loss: {loss} \nAccuracy: {accuracy}")

Loss: 0.5386927723884583 
Accuracy: 0.7904887199401855


Use FIRST K words from a tweet to predict the tweeter's political party
(fill feature vector to length K if len(feature vector) < K)

Epoch = 5 

K = 20 

Embedding_dim = 32 

Optimizer = 'adam', lr = .002, b1 = .8 

Loss = .5387

Accuracy = 0.7904

## Model #2: Select K-random words from the tweet

In [9]:
def random_k_words(words_i:dict, data: pd.DataFrame, k: int):
    for i, row in data.iterrows():
        terms = []
        for _ in range(k):
            if len(row['Tokens']) > 0:
                token = random.choice(row['Tokens'])
                val = words_i[token] \
                    if token in words_i.keys() \
                    else np.float32(0)
                terms.append(val)
            else:
                terms = [np.float32(0) for _ in range(k)]
                break
        x = np.array(terms, dtype=np.float32)
        y = row['Party']
        yield (x, y)

### M2 | Build the data (let params remain the same)

In [10]:
# Parameters
epochs = 5
embedding_dim = 32
k = 20

# Train
train_data, train_labels = [], []
for x, y in random_k_words(train_words_i, train, k):
    train_data.append(x)
    train_labels.append(y)

train_labels = tf.convert_to_tensor(train_labels, dtype=tf.float32)
train_data = tf.cast(train_data, dtype=tf.float32)

# Validation
split = round(train_data.shape[0]*.8)
valid_data, valid_labels = train_data[split:], train_labels[split:]
train_data, train_labels = train_data[:split], train_labels[:split]

# Test
test_data, test_labels = [], []
for x, y in random_k_words(test_words_i, test, k):
    test_data.append(x)
    test_labels.append(y)
    
test_labels = tf.convert_to_tensor(test_labels, dtype=tf.float32)
test_data = tf.cast(test_data, dtype=tf.float32)
(train_data[0], train_labels[0])

(<tf.Tensor: shape=(20,), dtype=float32, numpy=
 array([62282., 63822., 63817., 63854., 63647., 63869., 63422., 63422.,
        63746., 63442., 63442., 63874., 63863., 63422., 61277., 63869.,
        63863., 61277., 63120., 62282.], dtype=float32)>,
 <tf.Tensor: shape=(), dtype=float32, numpy=0.0>)

### M2 | Leave the structure the same to allow for comparison of k-selection methods

In [11]:
# Structure
kr_words = Sequential([
    layers.Embedding(len(vocab)+1, embedding_dim),
    layers.Dropout(0.1),
    layers.GlobalAveragePooling1D(),
    layers.Dropout(0.2),
    layers.Dense(1)
])

kr_words.compile(
    loss=losses.BinaryCrossentropy(from_logits=True),
    optimizer=keras.optimizers.Adam(
        learning_rate=0.002,
        beta_1=0.8,
        beta_2=0.999,
        epsilon=1e-07,
        amsgrad=False,
        name='Adam'
    ),
    metrics=tf.metrics.BinaryAccuracy(threshold=0.0)
)

### M2 | Training

In [12]:
history = kr_words.fit(
    train_data,
    train_labels,
    validation_data=(valid_data, valid_labels),
    epochs=epochs
)
kr_words.save(r'C:\Users\samue\Documents\Applied Data Science\INFO-H518 Deep Learning\Assignments\A3\Models\M2')

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
INFO:tensorflow:Assets written to: C:\Users\samue\Documents\Applied Data Science\INFO-H518 Deep Learning\Assignments\A3\Models\M2\assets


### M2 | Testing

In [13]:
loss, accuracy = kr_words.evaluate(test_data, test_labels)
print(f"Loss: {loss} \nAccuracy: {accuracy}")

Loss: 0.6418583393096924 
Accuracy: 0.7323328256607056


Use K random words from a tweet to predict the tweeter's political party
(fill feature vector to length K if len(feature vector) < K)

Epoch = 5 

K = 20 

Embedding_dim = 32 

Optimizer = 'adam', lr = 0.002, b1 = 0.8 

Loss = BinaryCrossEntropy : 0.6419

Accuracy = 0.7323

## Model #3: N-Most Frequent Words

In [29]:
def n_chosen_words(words_i:dict, data: pd.DataFrame, n: int):
    words_i = {list(train_words_i.keys())[-1:-n:-1][i]: i+1 for i in range(n-1)}
    for i, row in data.iterrows():
        terms = dict.fromkeys(list(words_i.keys()), np.float32(0.0))
        for term in row['Tokens']:
            if term in terms.keys(): terms[term] = words_i[term] 
        x = tf.convert_to_tensor(list(terms.values()), dtype=tf.float32)
        y = row['Party']
        yield (x, y)

### M3 | Build the data

In [34]:
# Parameters
epochs = 10
embedding_dim = 64
n = 1024

# Train
print(" Build Training Data...\n[#", end='')
train_data, train_labels = [], []
for x, y in n_chosen_words(train_words_i, train, n):
    train_data.append(x)
    train_labels.append(y)
    if len(train_labels) % int(train.shape[0]*.1) == 0: 
        print(f"##", end='')
train_labels = tf.convert_to_tensor(train_labels, dtype=tf.float32)
train_data = tf.convert_to_tensor(train_data, dtype=tf.float32)
print("#]")

# Validation
split = round(train_data.shape[0]*.8)
valid_data, valid_labels = train_data[split:], train_labels[split:]
train_data, train_labels = train_data[:split], train_labels[:split]

# Test
test_data, test_labels = [], []
print(" Build Testing Data...\n[#", end='')
for x, y in n_chosen_words(test_words_i, test, n):
    test_data.append(x)
    test_labels.append(y)
    if len(test_labels) % int(test.shape[0]*.1) == 0: 
        print(f"##", end='')
test_labels = tf.convert_to_tensor(test_labels, dtype=tf.float32)
test_data = tf.cast(test_data, dtype=tf.float32)
print("#]")

(train_data[0], train_labels[0])

 Build Training Data...
[######################]
 Build Testing Data...
[######################]


(<tf.Tensor: shape=(1023,), dtype=float32, numpy=array([0., 2., 0., ..., 0., 0., 0.], dtype=float32)>,
 <tf.Tensor: shape=(), dtype=float32, numpy=0.0>)

### M3 | Create the model

In [35]:
# Structure
choose_n = Sequential([
    layers.Embedding(n+1, embedding_dim),
    layers.Dropout(0.1),
    layers.GlobalAveragePooling1D(),
    layers.Dropout(0.2),
    layers.Dense(1)
])

choose_n.compile(
    loss=losses.BinaryCrossentropy(from_logits=True),
    optimizer=keras.optimizers.Adam(
        learning_rate=0.002,
        beta_1=0.8,
        beta_2=0.999,
        epsilon=1e-07,
        amsgrad=False,
        name='Adam'
    ),
    metrics=tf.metrics.BinaryAccuracy(threshold=0.0)
)

### M3 | Training

In [36]:
history = choose_n.fit(
    train_data,
    train_labels,
    validation_data=(valid_data, valid_labels),
    epochs=epochs
)
choose_n.save(r'C:\Users\samue\Documents\Applied Data Science\INFO-H518 Deep Learning\Assignments\A3\Models\M3')

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
INFO:tensorflow:Assets written to: C:\Users\samue\Documents\Applied Data Science\INFO-H518 Deep Learning\Assignments\A3\Models\M3\assets


### M3 | Testing

In [37]:
loss, accuracy = choose_n.evaluate(test_data, test_labels)
print(f"Loss: {loss} \nAccuracy: {accuracy}")

Loss: 0.5777549147605896 
Accuracy: 0.6708436608314514


Use N words from the vocabulary with values set to zero UNLESS the value appears in the tweet of interest. Use this feature vector to predict the political party of the user.

Epoch = 10

N = 1024

Embedding_dim = 64

Optimizer = Adam

Loss = 0.5778

Accuracy = 0.6708

## Model #4: K-random-words with Handle

In [38]:
def handle_random_k_words(words_i:dict, data: pd.DataFrame, k: int):
    for i, row in data.iterrows():
        terms = []
        for _ in range(k):
            if len(row['Tokens']) > 0:
                token = random.choice(row['Tokens'])
                val = words_i[token] \
                    if token in words_i.keys() \
                    else np.float32(0)
                terms.append(val)
            else:
                terms = [np.float32(0) for _ in range(k)]
                break
        x = np.array(terms+[np.float32(row['Handle'])], dtype=np.float32)
        y = row['Party']
        yield (x, y)

### M4 | Build the data

In [39]:
# Parameters
epochs = 10
embedding_dim = 256
k = 256

# Train
print(" Build Training Data...\n[#", end='')
train_data, train_labels = [], []
for x, y in handle_random_k_words(train_words_i, train, k):
    train_data.append(x)
    train_labels.append(y)
    if len(train_labels) % int(train.shape[0]*.1) == 0: 
        print(f"##", end='')
train_labels = tf.convert_to_tensor(train_labels, dtype=tf.float32)
train_data = tf.cast(train_data, dtype=tf.float32)
print("#]")

# Validation
split = round(train_data.shape[0]*.8)
valid_data, valid_labels = train_data[split:], train_labels[split:]
train_data, train_labels = train_data[:split], train_labels[:split]

# Test
print(" Build Testing Data...\n[#", end='')
test_data, test_labels = [], []
for x, y in handle_random_k_words(test_words_i, test, k):
    test_data.append(x)
    test_labels.append(y)
    if len(test_labels) % int(test.shape[0]*.1) == 0: 
        print(f"##", end='')
test_labels = tf.convert_to_tensor(test_labels, dtype=tf.float32)
test_data = tf.cast(test_data, dtype=tf.float32)
print("#]")

(train_data[0], train_labels[0])

 Build Training Data...
[######################]
 Build Testing Data...
[######################]


(<tf.Tensor: shape=(257,), dtype=float32, numpy=
 array([63854., 63858., 63871., 63874., 62282., 63120., 63120., 63647.,
        63647., 63120., 63854., 63746., 61277., 63817., 63725., 63784.,
        63120., 63784., 63120., 63703., 63854., 63647., 61277., 63863.,
        63120., 63822., 63422., 63871., 62282., 63854., 63725., 63647.,
        61277., 63422., 63647., 63422., 63871., 63854., 63442., 63858.,
        63869., 63746., 62282., 63442., 63874., 63725., 63858., 61277.,
        63871., 62282., 63863., 63784., 63703., 63822., 63746., 63120.,
        63863., 63854., 63784., 63422., 63703., 63817., 63746., 63874.,
        63746., 63874., 63871., 63422., 62282., 63817., 62282., 63858.,
        63120., 61277., 63822., 63120., 63858., 63869., 63703., 63647.,
        63120., 61277., 63863., 63869., 63725., 63869., 63784., 63817.,
        63854., 63817., 61277., 63858., 61277., 63817., 63725., 63854.,
        61277., 63422., 63817., 63817., 63784., 63647., 63422., 63422.,
        63858.,

### M4 | Create the structure

In [40]:
# Structure
handle_kr_words = Sequential([
    layers.Embedding(len(vocab)+1, embedding_dim),
    layers.Dropout(0.2),
    layers.GlobalAveragePooling1D(),
    layers.Dense(1)
])

handle_kr_words.compile(
    loss=losses.BinaryCrossentropy(from_logits=True),
    optimizer=keras.optimizers.Adam(
        learning_rate=0.001,
        beta_1=0.85,
        beta_2=0.990,
        epsilon=1e-05,
        amsgrad=False,
        name='Adam'
    ),
    metrics=tf.metrics.BinaryAccuracy(threshold=0.0)
)

### M4 | Training

In [41]:
history = handle_kr_words.fit(
    train_data,
    train_labels,
    validation_data=(valid_data, valid_labels),
    epochs=epochs,
    batch_size=64
)
handle_kr_words.save(r'C:\Users\samue\Documents\Applied Data Science\INFO-H518 Deep Learning\Assignments\A3\Models\M4')

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
INFO:tensorflow:Assets written to: C:\Users\samue\Documents\Applied Data Science\INFO-H518 Deep Learning\Assignments\A3\Models\M4\assets


### M4 | Testing

In [42]:
loss, accuracy = handle_kr_words.evaluate(test_data, test_labels)
print(f"Loss: {loss} \nAccuracy: {accuracy}")

Loss: 0.7322496175765991 
Accuracy: 0.7758268713951111


### Model #5: First K-words with Handle

In [71]:
def handle_fk_words(words_i:dict, data: pd.DataFrame, k: int):
    for i, row in data.iterrows():
        terms, token_count = [], len(row['Tokens'])
        if token_count > 0:
            j = 0
            while len(terms) < k:
                if row['Tokens'][j] in words_i.keys():
                    terms.append(words_i[row['Tokens'][j]])
                else:
                    terms.append(np.float32(0))
                j = j+1 if j+1 < token_count else 0
        else:
            continue
        x = np.array(terms+[np.float32(row['Handle'])], dtype=np.float32)
        y = row['Party']
        yield (x, y)

In [7]:
train_hands = train.rename(columns={'Handle':'Terms'})['Terms'].unique()
train_hands

array([408, 415,  36,  52, 230, 337, 331, 418, 367, 228, 121,  74, 324,
       255, 205,  98, 283, 220, 134, 222,  10, 240, 333, 139, 160,  93,
       417, 243, 391,  50, 247, 236, 290,   3, 298, 182, 378, 244, 128,
       270, 421, 289,  63,  21, 322, 354,  40, 214, 335, 392, 149, 286,
       288, 225, 280, 365, 241, 217, 159, 143, 138, 406, 261,  60,  12,
       302, 178, 104,  35, 154, 284, 266, 265, 133, 193,  66, 151, 248,
       136, 390, 282, 400,  91, 371,  19, 328,  47, 140, 314, 281, 357,
       109, 350, 430,  49, 315,  94, 277, 377, 207, 211,  20, 124, 410,
       148, 300, 113, 344, 258,   4,  55,  70, 239, 276,  13, 338, 112,
        26, 419, 368, 242,  81, 231, 273, 330, 353, 246, 147, 360, 183,
       101, 186,  15,  34,  72, 278, 166, 385, 105, 122, 257,   2,  71,
        68,  65, 201, 126,  29, 363,  24, 414, 125, 170, 127,  43, 346,
        84, 299, 108,  11, 249, 413, 123, 309, 224, 342,  25, 194, 210,
       422, 389,  22, 187,  87, 411, 326, 252, 374, 232, 293, 31

In [72]:
# Parameters
epochs = 5
embedding_dim = 32
k = 10

# # Remove 10% of the vocabulary (specifcially infrequent terms)
train_hands = train.Handle.unique().rename(columns={'Handle':'Terms'})
# vocab = train_vocab.append(test_vocab).append(train['Handle']).append(test['Handle'].reset_index()
# vocab_cut = int(train_vocab.shape[0] * (10 / 100))
# vocab = vocab.iloc[vocab_cut:].drop_duplicates('Terms').reset_index().drop(columns=['index', 'level_0'])
# vocab = {v:k for k, v in vocab.to_dict()['Terms'].items()}

# # Provide each term a unique id
# train_words_i = {v:vocab[v] for v in list(train_vocab.to_dict()['Terms'].values()) if v in vocab.keys()}
# test_words_i = {v:vocab[v] for v in list(test_vocab.to_dict()['Terms'].values()) if v in vocab.keys()}

# Train
print(" Build Training Data...\n[#", end='')
train_data, train_labels = [], []
for x, y in handle_fk_words(train_words_i, train, k):
    train_data.append(x)
    train_labels.append(y)
    if len(train_labels) % int(train.shape[0]*.1) == 0: 
        print(f"##", end='')
train_labels = tf.convert_to_tensor(train_labels, dtype=tf.float32)
train_data = tf.cast(train_data, dtype=tf.float32)
print("#]")

# Validation
split = round(train_data.shape[0]*.8)
valid_data, valid_labels = train_data[split:], train_labels[split:]
train_data, train_labels = train_data[:split], train_labels[:split]

# Test
print(" Build Testing Data...\n[#", end='')
test_data, test_labels = [], []
for x, y in handle_fk_words(test_words_i, test, k):
    test_data.append(x)
    test_labels.append(y)
    if len(test_labels) % int(test.shape[0]*.1) == 0: 
        print(f"##", end='')
test_labels = tf.convert_to_tensor(test_labels, dtype=tf.float32)
test_data = tf.cast(test_data, dtype=tf.float32)
print("#]")

(train_data[0], train_labels[0])

 Build Training Data...
[####################]
 Build Testing Data...
[####################]


(<tf.Tensor: shape=(11,), dtype=float32, numpy=
 array([63422., 63869., 63120., 63874., 63647., 61277., 63746., 63703.,
        63784., 63871.,   333.], dtype=float32)>,
 <tf.Tensor: shape=(), dtype=float32, numpy=0.0>)

In [75]:
# Structure
handle_k_words = Sequential([
    layers.Embedding(len(vocab)+1, embedding_dim),
    layers.Dropout(0.1),
    layers.GlobalAveragePooling1D(),
    layers.Dropout(0.2),
    layers.Dense(1)
])

handle_k_words.compile(
    loss=losses.BinaryCrossentropy(from_logits=True),
    optimizer=keras.optimizers.Adam(
        learning_rate=0.001,
        beta_1=0.95,
        beta_2=0.999,
        epsilon=1e-07,
        amsgrad=False,
        name='Adam'
    ),
    metrics=tf.metrics.BinaryAccuracy(threshold=0.0)
)

In [76]:
history = handle_k_words.fit(
    train_data,
    train_labels,
    validation_data=(valid_data, valid_labels),
    epochs=epochs
)
handle_k_words.save(r'C:\Users\samue\Documents\Applied Data Science\INFO-H518 Deep Learning\Assignments\A3\Models\M4')

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
INFO:tensorflow:Assets written to: C:\Users\samue\Documents\Applied Data Science\INFO-H518 Deep Learning\Assignments\A3\Models\M4\assets


In [70]:
loss, accuracy = handle_k_words.evaluate(test_data, test_labels)
print(f"Loss: {loss} \nAccuracy: {accuracy}")

Loss: 2.704810857772827 
Accuracy: 0.571407675743103


### MODEL 1:

Use FIRST K words from a tweet to predict the tweeter's political party
(fill feature vector to length K if len(feature vector) < K)

Epoch = 5 

K = 20 

Embedding_dim = 32 

Optimizer = 'adam', lr = .002, b1 = .8 

Loss = .6934

Accuracy = .5060

### MODEL 2:

Use K random words from a tweet to predict the tweeter's political party
(fill feature vector to length K if len(feature vector) < K)

Epoch = 5 

K = 20 

Embedding_dim = 32 

Optimizer = 'adam', lr = 0.002, b1 = 0.8 

Loss = BinaryCrossEntropy : 0.6932

Accuracy = 0.5060

### MODEL 3:
Use N words from the vocabulary with values set to zero UNLESS the value appears in the tweet of interest. Use this feature vector to predict the political party of the user.

Epoch = 10

N = 1000

Embedding_dim = 32

Optimizer = 'adam', lr = 0.002, b1 = 0.8 

Loss = 0.6931

Accuracy = 0.5060

### MODEL 4: TAKE 1
Use K random words PLUS the user's Handle from a tweet to predict the user's political party.
Additionally, we will randomly sample K words and pick a new word if the word at hand is not in the vocabulary, as opposed to selecting 0 as the token. 

Epoch = 10

K = 64 

Embedding_dim = 128 

Optimizer = 'adam', lr = 0.001, b1 = 0.85

Loss: 1.470948338508606 

Accuracy: 0.5401427745819092


### MODEL 4: TAKE 2
Clearly, we are over fitting since our training accuracy and loss are drastically different from our testing results.
Let's try increasing the dropout.

Double the dropout percentage in D1 from .1 to .2 

Double the dropout percentage in D2 from .2 to .4

RESULTS:

Loss: 1.4692225456237793 

Accuracy: 0.5401427745819092

### MODEL 4: TAKE 3
Since increasing Dropout didn't work, let's try decreasing it.

Removed D2.

RESULTS:

Train: Loss: 0.4890 | Accuracy: 0.6533

Test: Loss: 1.2920 | Accuracy: 0.5496

### MODEL 4: TAKE 4
Let's try changing over to an optimizer that is a bit simple when compared to Adam. With SGD we will use the same learning rate and initially 0.0 momentum. Just to see where we are at.

RESULT:

Though SGD did solve our overfitting problem, it also cost us in test performance too as Accuracy was only 0.0506

Unfortunately, I couldn't get RMSProp to work correctly with CUDA on the GPU and CPU, so instead I want to see if I keep increasing the sample size k, does test accuracy also increase with training while using Adam.

RESULT:

By increasing k to 256, 

decreasing lr to 0.001,

decreasing b2 to 0.990,

and increasing epsilon to 1e-5 (up from 1e-7)

We were able to get

Loss: 0.8411399126052856 

Accuracy: 0.5572636127471924