<a href="https://colab.research.google.com/github/ivyclare/Project-50_Projects_In_Deep_Learning/blob/master/IMDB_Federated_Learning_With_LSTM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# The Complete Beginners Guide to Federated Learning With LSTM 
on Movie Reviews Dataset

We are going to implement this in the following steps:
- Create  devices (Virtual Workers)
- Distribute our data to those devices
- Create our model
- Send our model to the devices (Cause our model is located in our computer , while the data is located in their machines)
- Do normal Training
-Get the smarter model back from devices


In [0]:
from google.colab import drive
drive.mount('/content/drive')
# Change To The WORKING DIRECTORY
%cd /content/drive/My Drive/Colab Notebooks/PrivateAI
!pwd

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
/content/drive/My Drive/Colab Notebooks/PrivateAI
/content/drive/My Drive/Colab Notebooks/PrivateAI


## Installing Pysyft

Pysyft is an extension of Pytorch that is needed inorder to perform Federated Learning. Since we are using Google Colab for this project, we only need to run the command below to install Pysyft and we are good to go. 

If you are not using Google Colab, please follow the instructions here , to set up your environment.

In [0]:
!pip install syft



## Import Required Libraries


The next step is to import the required libraries and hook Pytorch using  *sy.TorchHook* which makes the extended extended functions on Pytorch tensors available to us.

In [0]:
import torch
import syft as sy
hook = sy.TorchHook(torch)


W0820 14:58:35.139327 140301770483584 secure_random.py:26] Falling back to insecure randomness since the required custom op could not be found for the installed version of TensorFlow. Fix this by compiling custom ops. Missing file was '/usr/local/lib/python3.6/dist-packages/tf_encrypted/operations/secure_random/secure_random_module_tf_1.14.0.so'
W0820 14:58:35.161387 140301770483584 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tf_encrypted/session.py:26: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.



## Creating New Workers

As earlier mentioned, in order to perform Federated Learning we need to have data on more different devices and we have to send our model to those devices where the training will be done. Hence, we need to create 2 devices. We will assume the person with the first device is called Bob and the second device called Alice. 

In [0]:
bob = sy.VirtualWorker(hook, id="bob")
alice = sy.VirtualWorker(hook, id="alice")

## The LSTM Network

A short description of lstm and how they work  here

## Importing Libraries for LSTM


In [0]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import seaborn as sns

import torch
torch.set_default_tensor_type(torch.cuda.FloatTensor)

import torch.nn.functional as F
from torchtext import datasets
from torchtext import data
import torch.optim as optim
from torch import nn,optim
import torch.nn.functional as F
from torch.utils.data import *

import random
import matplotlib.pyplot as plt
%matplotlib inline

import time
import copy
import os


# check if CUDA is available
train_on_gpu = torch.cuda.is_available()

if not train_on_gpu:
    device = torch.device('cpu')
    print('CUDA is not available.  Training on CPU ...')
else:
    device = torch.device('cuda')
    print('CUDA is available!  Training on GPU ...')

CUDA is available!  Training on GPU ...


## Loading the Data

We are going to use the IMDB dataset which is provided in Pytorch in the [torchtext.data.Dataset](https://torchtext.readthedocs.io/en/latest/datasets.html#imdb). 
And we split the data into train and test sets. 


In [0]:
import numpy as np

# read data from text files
with open('data/reviews.txt', 'r') as f:
    reviews = f.read()
with open('data/labels.txt', 'r') as f:
    labels = f.read()

We take a look at what our data and then split the training data into a train set and validation set.

In [0]:
print(reviews[:5000])
print()
print(labels[:20])


bromwell high is a cartoon comedy . it ran at the same time as some other programs about school life  such as  teachers  . my   years in the teaching profession lead me to believe that bromwell high  s satire is much closer to reality than is  teachers  . the scramble to survive financially  the insightful students who can see right through their pathetic teachers  pomp  the pettiness of the whole situation  all remind me of the schools i knew and their students . when i saw the episode in which a student repeatedly tried to burn down the school  i immediately recalled . . . . . . . . . at . . . . . . . . . . high . a classic line inspector i  m here to sack one of your teachers . student welcome to bromwell high . i expect that many adults of my age think that bromwell high is far fetched . what a pity that it isn  t   
story of a man who has unnatural feelings for a pig . starts out with a opening scene that is a terrific example of absurd comedy . a formal orchestra audience is turn

## Data Preprocessing

In [0]:
from string import punctuation

print(punctuation)

# get rid of punctuation
reviews = reviews.lower() # lowercase, standardize
all_text = ''.join([c for c in reviews if c not in punctuation])

!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~


In [0]:
# split by new lines and spaces

reviews_split = all_text.split('\n')

all_text = ' '.join(reviews_split)

# create a list of words
words = all_text.split()

In [0]:
words[:30]

['bromwell',
 'high',
 'is',
 'a',
 'cartoon',
 'comedy',
 'it',
 'ran',
 'at',
 'the',
 'same',
 'time',
 'as',
 'some',
 'other',
 'programs',
 'about',
 'school',
 'life',
 'such',
 'as',
 'teachers',
 'my',
 'years',
 'in',
 'the',
 'teaching',
 'profession',
 'lead',
 'me']

### Encoding the words

In [0]:
# feel free to use this import 
from collections import Counter

## Build a dictionary that maps words to integers
counts =  Counter(words)
vocab = sorted(counts, key=counts.get, reverse=True)
vocab_to_int = {word: ii for ii, word in enumerate(vocab, 1)}

## use the dict to tokenize each review in reviews_split
## store the tokenized reviews in reviews_ints
reviews_ints = []
for review in reviews_split:
  reviews_ints.append([vocab_to_int[word] for word in review.split()])
  

  # stats about vocabulary
print('Unique words: ', len((vocab_to_int)))  # should ~ 74000+
print()

# print tokens in first review
print('Tokenized review: \n', reviews_ints[:1])

Unique words:  74072

Tokenized review: 
 [[21025, 308, 6, 3, 1050, 207, 8, 2138, 32, 1, 171, 57, 15, 49, 81, 5785, 44, 382, 110, 140, 15, 5194, 60, 154, 9, 1, 4975, 5852, 475, 71, 5, 260, 12, 21025, 308, 13, 1978, 6, 74, 2395, 5, 613, 73, 6, 5194, 1, 24103, 5, 1983, 10166, 1, 5786, 1499, 36, 51, 66, 204, 145, 67, 1199, 5194, 19869, 1, 37442, 4, 1, 221, 883, 31, 2988, 71, 4, 1, 5787, 10, 686, 2, 67, 1499, 54, 10, 216, 1, 383, 9, 62, 3, 1406, 3686, 783, 5, 3483, 180, 1, 382, 10, 1212, 13583, 32, 308, 3, 349, 341, 2913, 10, 143, 127, 5, 7690, 30, 4, 129, 5194, 1406, 2326, 5, 21025, 308, 10, 528, 12, 109, 1448, 4, 60, 543, 102, 12, 21025, 308, 6, 227, 4146, 48, 3, 2211, 12, 8, 215, 23]]


### Encoding the labels

In [0]:
# 1=positive, 0=negative label conversion
labels_split = labels.split('\n')
encoded_labels = np.array([1 if label == 'positive' else 0 for label in labels_split])

### Removing outliers

In [0]:
# stats about vocabulary
print('Unique words: ', len((vocab_to_int)))  # should ~ 74000+
print()

# print tokens in first review
print('Tokenized review: \n', reviews_ints[:1])

Unique words:  74072

Tokenized review: 
 [[21025, 308, 6, 3, 1050, 207, 8, 2138, 32, 1, 171, 57, 15, 49, 81, 5785, 44, 382, 110, 140, 15, 5194, 60, 154, 9, 1, 4975, 5852, 475, 71, 5, 260, 12, 21025, 308, 13, 1978, 6, 74, 2395, 5, 613, 73, 6, 5194, 1, 24103, 5, 1983, 10166, 1, 5786, 1499, 36, 51, 66, 204, 145, 67, 1199, 5194, 19869, 1, 37442, 4, 1, 221, 883, 31, 2988, 71, 4, 1, 5787, 10, 686, 2, 67, 1499, 54, 10, 216, 1, 383, 9, 62, 3, 1406, 3686, 783, 5, 3483, 180, 1, 382, 10, 1212, 13583, 32, 308, 3, 349, 341, 2913, 10, 143, 127, 5, 7690, 30, 4, 129, 5194, 1406, 2326, 5, 21025, 308, 10, 528, 12, 109, 1448, 4, 60, 543, 102, 12, 21025, 308, 6, 227, 4146, 48, 3, 2211, 12, 8, 215, 23]]


In [0]:
print('Number of reviews before removing outliers: ', len(reviews_ints))

## remove any reviews/labels with zero length from the reviews_ints list.
non_zero_idx = [ii for ii, review in enumerate(reviews_ints) if len(review) != 0]

reviews_ints = [reviews_ints[ii] for ii in non_zero_idx]
encoded_labels = np.array([encoded_labels[ii] for ii in non_zero_idx])

print('Number of reviews after removing outliers: ', len(reviews_ints))

Number of reviews before removing outliers:  25001
Number of reviews after removing outliers:  25000


### Padding Sequences

In [0]:
def pad_features(reviews_ints, seq_length):
    ''' Return features of review_ints, where each review is padded with 0's 
        or truncated to the input seq_length.
    '''    
    #getting the correct row x col 
    features= np.zeros((len(reviews_ints), seq_length), dtype=int)
    
    # for each review, grab that review and 
    for i, row in enumerate(reviews_ints):
      features[i, -len(row):] =  np.array(row)[:seq_length]
    
    return features

In [0]:
# Test your implementation!

seq_length = 200

features = pad_features(reviews_ints, seq_length=seq_length)

## test statements - do not change - ##
assert len(features)==len(reviews_ints), "Your features should have as many rows as reviews."
assert len(features[0])==seq_length, "Each feature row should contain seq_length values."

# print first 10 values of the first 30 batches 
print(features[:30,:10])

[[    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0]
 [22382    42 46418    15   706 17139  3389    47    77    35]
 [ 4505   505    15     3  3342   162  8312  1652     6  4819]
 [    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0]
 [   54    10    14   116    60   798   552    71   364     5]
 [    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0]
 [    0     0     0     0     0     0     0     0     0     0]
 [    1   330   578    34     3   162   748  2731     9   325]
 [    9    11 10171  5305  1946   689   444    22   280   673]
 [    0     0     0     0     0     0     0     0     0

In [0]:
type(features)

torch.from_numpy

<function _VariableFunctions.from_numpy>

### Splitting Data 

In [0]:
split_frac = 0.8

## split data into training, validation, and test data (features and labels, x and y)
split_idx = int(len(features)*0.8)
train_x, remaining_x = features[:split_idx], features[split_idx:]
train_y, remaining_y = encoded_labels[:split_idx], encoded_labels[split_idx:]


test_idx = int(len(remaining_x)*0.5)
val_x, test_x = remaining_x[:test_idx], remaining_x[test_idx:]
val_y, test_y = remaining_y[:test_idx], remaining_y[test_idx:]

## print out the shapes of your resultant feature data
print("\t\t\tFeature Shapes:")
print("Train set: \t\t{}\n".format(train_x.shape),
      "Val set: \t\t{}\n".format(val_x.shape),
      "Test set: \t\t{}\n".format(test_x.shape))


			Feature Shapes:
Train set: 		(20000, 200)
 Val set: 		(2500, 200)
 Test set: 		(2500, 200)



## Distributing the Data

Our virtual workers have been created but they don't have any data on them. After loading our data, we distribute the data between Alice and Bob. We do this by creating the appropriate dataset using sy.BaseDataset. Then we load these datasets and send them to Alice and Bob using FederatedDataLoader.

In [0]:
alice._objects

{}

In [0]:
BATCH_SIZE = 32
train_dataset = sy.BaseDataset(torch.from_numpy(train_x),torch.from_numpy(train_y))

valid_dataset = sy.BaseDataset(torch.from_numpy(val_x),torch.from_numpy(val_y))

test_dataset = sy.BaseDataset(torch.from_numpy(test_x), torch.from_numpy(test_y))


federated_train_loader = sy.FederatedDataLoader(train_dataset.federate((bob,alice)), batch_size=BATCH_SIZE, shuffle=True)
federated_valid_loader = sy.FederatedDataLoader(valid_dataset.federate((bob,alice)), batch_size=BATCH_SIZE, shuffle=True)
federated_test_loader = sy.FederatedDataLoader(test_dataset.federate((bob,alice)), batch_size=BATCH_SIZE, shuffle=True)

In [0]:
alice._objects
print(alice._objects)

# train_idx = int(train_x.shape[0]/2)
# valid_idx = int(val_x.shape[0]/2)
# test_idx = int(test_x.shape[0]/2)

# # Sending toy datasets to virtual workers
# bob_train_dataset = sy.BaseDataset(torch.from_numpy(train_x[:train_idx]), 
#                                   torch.from_numpy(train_y[:train_idx])).send(bob)

# alice_train_dataset = sy.BaseDataset(torch.from_numpy(train_x[train_idx:]), 
#                                     torch.from_numpy(train_y[train_idx:])).send(alice)


# bob_valid_dataset = sy.BaseDataset(torch.from_numpy(val_x[:valid_idx]), 
#                                   torch.from_numpy(val_y[:valid_idx])).send(bob)
                                     
# alice_valid_dataset = sy.BaseDataset(torch.from_numpy(val_x[valid_idx:]), 
#                                   torch.from_numpy(val_y[valid_idx:])).send(alice)


# bob_test_dataset = sy.BaseDataset(torch.from_numpy(test_x[:test_idx]), 
#                                   torch.from_numpy(test_y[:test_idx])).send(bob)
# alice_test_dataset = sy.BaseDataset(torch.from_numpy(test_x[test_idx:]), 
#                                   torch.from_numpy(test_y[test_idx:])).send(alice)


{3793490452: tensor([[ 159,   30,   16,  ...,  423,    4,   62],
        [  54,   10,  329,  ...,  146,  695,    9],
        [ 244,  159, 1005,  ...,  440,   22,   12],
        ...,
        [   0,    0,    0,  ...,   28,   77,  384],
        [   0,    0,    0,  ...,    1, 1893, 3610],
        [   0,    0,    0,  ...,    2, 2428,    8]], device='cpu'), 91643162938: tensor([1, 0, 1,  ..., 0, 1, 0], device='cpu')}


## Creating Federated DataLoaders

Now, we load datasets using dataloaders. In Federated learning, we load datasets from different devices in a federated manner using **Federated DataLoaders**

In [0]:
# # Creating federated datasets, an extension of Pytorch TensorDataset class
# federated_train_dataset = sy.FederatedDataset([bob_train_dataset, alice_train_dataset])
# federated_valid_dataset = sy.FederatedDataset([bob_valid_dataset, alice_valid_dataset])
# federated_test_dataset = sy.FederatedDataset([bob_test_dataset, alice_test_dataset])

# BATCH_SIZE = 50

# # Creating federated datal/oaders, an extension of Pytorch DataLoader class
# federated_train_loader = sy.FederatedDataLoader(federated_train_dataset, 
#                                                 shuffle=True, batch_size=BATCH_SIZE)

# federated_valid_loader = sy.FederatedDataLoader(federated_valid_dataset, 
#                                                 shuffle=True, batch_size=BATCH_SIZE)

# federated_test_loader = sy.FederatedDataLoader(federated_test_dataset, 
#                                                shuffle=False, batch_size=BATCH_SIZE)


### Building Our Network 

In [0]:
class SentimentRNN(nn.Module):
    """
    The RNN model that will be used to perform Sentiment analysis.
    """

    def __init__(self, vocab_size, output_size, embedding_dim, hidden_dim, n_layers, drop_prob=0.5):
        """
        Initialize the model by setting up the layers.
        """
        super(SentimentRNN, self).__init__()
        
        
        self.output_size = output_size
        self.n_layers = n_layers
        self.hidden_dim = hidden_dim
        
        # define all layers
        #embedding layer
        self.embedding = nn.Embedding(vocab_size,embedding_dim)
        
        #lstm layer
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, n_layers, dropout=drop_prob, batch_first = True)
        
        self.dropout = nn.Dropout(0.3)
        
        #fully connected layer
        self.fc = nn.Linear(hidden_dim, output_size)
        
        self.sig = nn.Sigmoid()

    def forward(self, x, hidden):
      
        """
        Perform a forward pass of our model on some input and hidden state.
        """
        # Batch_size used for shaping data
        # batch_size = x.size(0)
        
        print("****Batch SIZE IS ",x.shape)
                
        # embeddings and lstm_out
        embeds = self.embedding(x) 
        
        print("****x SIZE IS ",x.shape)
        print("****Embeds SIZE IS ",embeds.shape)
        print("****Hidden SIZE IS ",hidden)
        
        lstm_out, hidden = self.lstm(embeds, hidden)
        
        print("ENTERS HERE*********")
        # stack up lstm outputs
        lstm_out = lstm_out.contiguous().view(-1, self.hidden_dim)
        
        # dropout and fully-connected layer
        out = self.dropout(lstm_out)
        
        out = self.fc(out)
        
        # sigmoid funtion
        sig_out = self.sig(out)
        
        # reshape to be batch_size first
        sig_out = sig_out.view(batch_size, -1)
        sig_out = sig_out[:, -1]  # get last batch of labels
        
        # return last sigmoid output and hidden state
        return sig_out, hidden
      
    
    def init_hidden(self, batch_size):
        ''' Initializes hidden state '''
        # Create two new tensors with sizes n_layers x batch_size x hidden_dim,
        # initialized to zero, for hidden state and cell state of LSTM
        
        weight = next(self.parameters()).data
        
        if(train_on_gpu):
          hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda(),
                   weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda())
        else:
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_(),
                      weight.new(self.n_layers, batch_size, self.hidden_dim).zero_())
        
        return hidden
        

In [0]:
# class LSTM_Model(nn.Module):
    
#     def __init__(self, vocab_size, embedding_dim, hidden_dim, batch_size):
      
#       super(LSTM_Model, self).__init__()
#       self.num_layers = 1
#       self.batch_size = batch_size
#       self.hidden_dim = hidden_dim
      
#       self.word_embeddings = nn.Embedding(vocab_size, embedding_dim) 
#       # The LSTM takes word embeddings as inputs, and outputs hidden states
#       # with dimensionality hidden_dim.
#       self.lstm = nn.LSTM(embedding_dim, hidden_dim, num_layers=self.num_layers) 
#       self.fc = nn.Linear(hidden_dim, 1)
#       self.hidden = self.init_hidden()      
      
#     def forward(self, sentence):
      
#         embeds = self.word_embeddings(sentence)
#         # [sent_len, batch_size] --> [sent_len, batch_size, emb_dim]
#         lstm_out, self.hidden = self.lstm(embeds, self.hidden) 
#         # [sent_len, batch_size, emb_dim] --> [seq_len, batch, num_directions*hidden_size]
#         (hidden, cell) =  self.hidden
#         preds = self.fc(lstm_out[-1].squeeze(0))
#         # [batch, num_directions*hidden_size] --> [batch_size, 1]
#         return preds
      
      
#     def init_hidden(self):
#         # Before we've done anything, we dont have any hidden state.
#         # The axes semantics are (num_layers, minibatch_size, hidden_dim)
#         return (torch.zeros(self.num_layers, self.batch_size, self.hidden_dim).to(device),
#                 torch.zeros(self.num_layers, self.batch_size, self.hidden_dim).to(device))

In [0]:
# hh = model.init_hidden()
# hh

In [0]:
# HIDDEN_DIM = 10
# hhh = torch.Tensor(np.zeros((BATCH_SIZE, HIDDEN_DIM)))
# hhh

In [0]:
# Instantiate the model w/ hyperparams
vocab_size = len(vocab_to_int)+1 # +1 for the 0 padding + our word tokens
output_size = 1
embedding_dim = 400
hidden_dim = 256
n_layers = 2 

net = SentimentRNN(vocab_size, output_size, embedding_dim, hidden_dim, n_layers)

print(net)

# model = LSTM_Model(vocab_size, embedding_dim, hidden_dim, batch_size=BATCH_SIZE)
# model.to(device)


SentimentRNN(
  (embedding): Embedding(74073, 400)
  (lstm): LSTM(400, 256, num_layers=2, batch_first=True, dropout=0.5)
  (dropout): Dropout(p=0.3)
  (fc): Linear(in_features=256, out_features=1, bias=True)
  (sig): Sigmoid()
)


## Training The Network

In [0]:
# loss and optimization functions
lr=0.001

criterion = nn.BCELoss()
#optimizer = torch.optim.Adam(model.parameters(), lr=lr)

optimizer = torch.optim.Adam(net.parameters(), lr=lr)

# training params

epochs = 4 # 3-4 is approx where I noticed the validation loss stop decreasing

counter = 0
print_every = 100
clip=5 # gradient clipping

# move model to GPU, if available
# if(train_on_gpu):
#     net.cuda()

net.to(device)

#model.to(device)
device

device(type='cuda')

In [0]:
# # Create training and validation dataloaders
# dataloaders_dict = {'train': federated_train_loader, 
#                     'val': federated_valid_loader}

In [0]:
#model, history = train_model(model, dataloaders_dict, criterion, optimizer, num_epochs=10)

In [0]:
# print(torch.__version__)
# bob._objects

# #h = torch.Tensor(np.zeros((BATCH_SIZE, 10))).send(worker)    
# s = np.zeros((BATCH_SIZE, 10))
# type(s)

1.1.0


numpy.ndarray

In [0]:
HIDDEN_DIM = 10
net.train()
# train for some number of epochs
for e in range(epochs):
    # initialize hidden state
    h = net.init_hidden(BATCH_SIZE)
    losses = []

    # batch loop
    for inputs, labels in federated_train_loader:
        counter += 1
        
        print("ROUND")
        # Location of current batch
        worker = inputs.location  # <---- Where will send the model to
        
        print(worker)
        
        inputs, labels = inputs.to(device), labels.to(device)

#         if(train_on_gpu):
#             inputs, labels = inputs.cuda(), labels.cuda()

        # Creating new variables for the hidden state, otherwise
        # we'd backprop through the entire training history
    
        h = tuple([each.data for each in h])      
       
#         h = h[0].send(worker)   # <---- These steps are crucial
#         h = h[1].send(worker)   # <---- These steps are crucial
        
        #h = torch.Tensor(np.zeros((BATCH_SIZE, HIDDEN_DIM))).send(worker)
    
        #h = torch.cat(hid, dim=0).send(worker)
      
        #print("****H SIZE IS ",h.shape)
      
        net.send(worker)   # <---- for Federated Learning

        # zero accumulated gradients
        #net.zero_grad()
        optimizer.zero_grad()
        
        print("****INPUT SIZE IS ",inputs.shape)

        # get the output from the model
        output, h = net(inputs.unsqueeze(dim=0), h)

        # calculate the loss and perform backprop
        loss = criterion(output.squeeze(), labels.float())
        
        #loss.requres_grad = True

        loss.backward()
        # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
        nn.utils.clip_grad_norm_(net.parameters(), clip)
        optimizer.step()
        
        # Get the model back to the local worker
        net.get() # <--- We need get the model back before sending to another worker
        losses.append(loss.get())

        # loss stats
        if counter % print_every == 0:
            # Get validation loss
            val_h = net.init_hidden(BATCH_SIZE)
            val_losses = []
            net.eval()
            with torch.no_grad():
              for inputs, labels in federated_valid_loader:
                # get current location
                worker = inputs.location           
                
                # Creating new variables for the hidden state, otherwise
              # we'd backprop through the entire training history
              #val_h = torch.Tensor(tuple([each.data for each in val_h]))

              #               val_h = net.init_hidden(batch_size)
#               val_h[0].send(worker)
#               val_h[1].send(worker) 
              
              #h = torch.Tensor(np.zeros((BATCH_SIZE, HIDDEN_DIM))).send(worker)
          
              h = tuple([each.data for each in h]) 
              
              #h = torch.cat(hid, dim=1).send(worker)
              
              inputs, labels = inputs.to(device), labels.to(device)
              
              # Send model to worker
              net.send(worker)

              output, val_h = net(inputs, val_h)
              val_loss = criterion(output.squeeze(), labels.float())
              
              
              # val_losses.append(val_loss.item())
              val_losses.append(val_loss.get())
              
              net.get()

            net.train()
            print("Epoch: {}/{}...".format(e+1, epochs),
                  "Step: {}...".format(counter),
                  "Loss: {:.6f}...".format(loss.item()),
                  "Val Loss: {:.6f}".format(np.mean(val_losses)))

ROUND
<VirtualWorker id:bob #objects:18>
****INPUT SIZE IS  torch.Size([32, 200])
****Batch SIZE IS  torch.Size([1, 32, 200])
****x SIZE IS  torch.Size([1, 32, 200])
****Embeds SIZE IS  torch.Size([1, 32, 200])
****Hidden SIZE IS  ((Wrapper)>[PointerTensor | me:9709228355 -> bob:17425394974]::data, (Wrapper)>[PointerTensor | me:11026984648 -> bob:59887112607]::data)


RuntimeError: ignored

In [0]:
alice._objects

In [0]:
# HIDDEN_DIM = 10
# for e in range(epochs):
    
#     ######### Training ##########
    
#     losses = []
#     # Batch loop
#     for inputs, labels in federated_train_loader:
#         # Location of current batch
#         worker = inputs.location
#         # Initialize hidden state and send it to worker
#         h = torch.Tensor(np.zeros((BATCH_SIZE, HIDDEN_DIM))).send(worker)
#         # Send model to current worker
#         net.send(worker)
#         # Setting accumulated gradients to zero before backward step
#         optimizer.zero_grad()
#         # Output from the model
#         output, _ = net(inputs, h)
#         # Calculate the loss and perform backprop
#         loss = criterion(output.squeeze(), labels.float())
#         loss.backward()
#         # Clipping the gradient to avoid explosion
#         nn.utils.clip_grad_norm_(net.parameters(), clip)
#         # Backpropagation step
#         optimizer.step() 
#         # Get the model back to the local worker
#         net.get()
#         losses.append(loss.get())
    
#     ######## Evaluation ##########
#     # Model in evaluation mode
#     net.eval()

#     with torch.no_grad():
#         test_preds = []
#         test_labels_list = []
#         eval_losses = []

#         for inputs, labels in federated_test_loader:
#             # get current location
#             worker = inputs.location
#             # Initialize hidden state and send it to worker
#             h = torch.Tensor(np.zeros((BATCH_SIZE, HIDDEN_DIM))).send(worker)    
#             # Send model to worker
#             net.send(worker)
            
#             output, _ = net(inputs, h)
#             loss = criterion(output.squeeze(), labels.float())
#             eval_losses.append(loss.get())
#             preds = output.squeeze().get()
#             test_preds += list(preds.numpy())
#             test_labels_list += list(labels.get().numpy().astype(int))
#             # Get the model back to the local worker
#             net.get()
        
#         score = roc_auc_score(test_labels_list, test_preds)
    
#     print("Epoch {}/{}...  \
#     AUC: {:.3%}...  \
#     Training loss: {:.5f}...  \
#     Validation loss: {:.5f}".format(e+1, EPOCHS, score, sum(losses)/len(losses), sum(eval_losses)/len(eval_losses)))
    
#     net.train()

### Creating Glove Embeddings 
Now, we have to create an embedding layer. We need an embedding layer because we have tens of thousands of words, so we'll need a more efficient representation for our input data than one-hot encoded vectors. We use Glove create our embedding layer. 
GloVe stands for global vectors for word representation. It is an unsupervised learning algorithm developed by Stanford for generating word embeddings by aggregating global word-word co-occurrence matrix from a corpus. To read more about Glove and how it can be used refer to this [blog post](https://medium.com/@japneet121/word-vectorization-using-glove-76919685ee0b)

In [0]:
# build the vocabulary

TEXT.build_vocab(train_data, max_size=25000, vectors="glove.6B.100d")
LABEL.build_vocab(train_data)

In [0]:
# def train_model(model, dataloaders, criterion, optimizer, num_epochs, batch_size=BATCH_SIZE):
#     since = time.time()

#     history = dict()

#     best_model_wts = copy.deepcopy(model.state_dict())
#     best_acc = 0.0
#     skip_count = 0

#     for epoch in range(num_epochs):
#         print('Epoch {}/{}'.format(epoch, num_epochs - 1))
#         print('-' * 10)

#         # Each epoch has a training and validation phase
#         for phase in ['train', 'val']:
#             if phase == 'train':
#                 model.train()  # Set model to training mode
#             else:
#                 model.eval()   # Set model to evaluate mode

#             running_loss = 0.0
#             running_corrects = 0
            

#             # Iterate over data.
#             for inputs,labels in dataloaders[phase]:
#                 #inputs, labels = data.text, data.label
#                 inputs = inputs.to(device)
#                 labels = labels.to(device)
                
#                 # Location of current batch
#                 worker = inputs.location  # <---- Where will send the model to
                
#                 # zero the parameter gradients
#                 optimizer.zero_grad()

#                 # forward
#                 # track history if only in train
#                 with torch.set_grad_enabled(phase == 'train'):
#                     # Get model outputs and calculate loss

#                     # backward + optimize only if in training phase
#                     if phase == 'train':
#                         # we need to clear out the hidden state of the LSTM,
#                         # detaching it from its history on the last instance.
#                         model.batch_size = inputs.shape[1]
#                         model.hidden = model.init_hidden()
                        
#                         model.hidden = model.hidden
#                         model.hidden[0].send(worker)   # <---- These steps are crucial
#                         model.hidden[1].send(worker)   # <---- These steps are crucial
#                         model.send(worker)   # <---- for Federated Learning
                        
#                         outputs = model(inputs).squeeze(1)
#                         loss = criterion(outputs, labels)
#                         loss.backward()
#                         optimizer.step()
                        
#                     else:
#                         model.batch_size = inputs.shape[1]
#                         model.hidden = model.init_hidden()
                        
#                         model.hidden[0].send(worker)   # <---- These steps are crucial
#                         model.hidden[1].send(worker)   # <---- These steps are crucial
                        
#                         model.hidden = model.hidden.send(worker)   # <---- These steps are crucial
#                         model.send(worker)   # <---- for Federated Learning
                        
                        
#                         outputs = model(inputs).squeeze(1)
#                         loss = criterion(outputs, labels)
                        
#                     # Get the model back to the local worker
#                     model.get() # <--- We need get the model back before sending to another worker


#                 # statistics
#                 running_loss += loss.item()
#                 outputs = torch.round(torch.sigmoid(outputs))
#                 corrects = (outputs == labels).float()
#                 acc = corrects.sum()/len(corrects)
#                 running_corrects += acc.item()

#             epoch_loss = running_loss / len(dataloaders[phase])
#             epoch_acc = running_corrects / len(dataloaders[phase])

#             print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))

#             # deep copy the model
#             if phase == 'val' and epoch_acc > best_acc:
#                 best_acc = epoch_acc
#                 best_model_wts = copy.deepcopy(model.state_dict())
            
#             if phase+'_acc' in history:
#                 # append the new number to the existing array at this slot 
#                                    history[phase+'_acc'].append(epoch_acc)
#             else:
#                 # create a new array in this slot
#                 history[phase+'_acc'] = [epoch_acc]
            
#             if phase+'_loss' in history:
#                 # append the new number to the existing array at this slot
#                 history[phase+'_loss'].append(epoch_loss)
#             else:
#                 # create a new array in this slot
#                 history[phase+'_loss'] = [epoch_loss]            

#     time_elapsed = time.time() - since
#     print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
#     print('Best val Acc: {:4f}'.format(best_acc))

#     # load best model weights
#     model.load_state_dict(best_model_wts)
#     return model, history

In [0]:
pretrained_embeddings = TEXT.vocab.vectors
print(pretrained_embeddings.shape)

Finding words with the highest frequency

In [0]:
print(TEXT.vocab.freqs.most_common(20))

In [0]:
# # build the vocabulary
# TEXT.build_vocab(train, vectors=GloVe(name='6B', dim=300))
# LABEL.build_vocab(train)

# # make iterator for splits
# train_iter, test_iter = data.BucketIterator.splits(
#     (train, test), batch_size=3, device=0)

# BATCH_SIZE = 64

# train_iterator, val_iterator, test_iterator = data.BucketIterator.splits(
#     (train_data, val_data, test_data),
#     batch_size=BATCH_SIZE,
#     device=device)

Using Torch Text

https://medium.com/@sonicboom8/sentiment-analysis-torchtext-55fb57b1fab8

https://towardsdatascience.com/use-torchtext-to-load-nlp-datasets-part-i-5da6f1c89d84

https://medium.com/@adam.wearne/lets-get-sentimental-with-pytorch-dcdd9e1ea4c9

https://github.com/OpenMined/PySyft/blob/dev/examples/tutorials/Part%2007%20-%20Federated%20Learning%20with%20Federated%20Dataset.ipynb