# Word Level Federated Text Generation with Stack Overflow
- Joel Stremmel
- 01-27-20

**About:**

This notebook loads the Stack Overflow data available through `tff.simulation.datasets` and trains an LSTM model with Federared Averaging by following the Federated Learning for Text Generation [example notebook](https://github.com/tensorflow/federated/blob/master/docs/tutorials/federated_learning_for_text_generation.ipynb).

**Notes:**

This notebook prepares the Stack Overflow dataset for word level language modeling using this [module](https://github.com/tensorflow/federated/blob/master/tensorflow_federated/python/research/baselines/stackoverflow/dataset.py
).


**Data:** 
- https://www.kaggle.com/stackoverflow/stackoverflow

**License:** 
- https://creativecommons.org/licenses/by-sa/3.0/

**Data and Model References:**
- https://www.tensorflow.org/federated/api_docs/python/tff/simulation/datasets/stackoverflow/load_data
- https://github.com/tensorflow/federated/blob/master/docs/tutorials/federated_learning_for_text_generation.ipynb
- https://github.com/tensorflow/federated/
- https://www.tensorflow.org/tutorials/text/text_generation
- https://ruder.io/deep-learning-nlp-best-practices/

**Environment Setup References:**
- https://www.tensorflow.org/install/gpu
- https://gist.github.com/matheustguimaraes/43e0b65aa534db4df2918f835b9b361d
- https://www.tensorflow.org/install/source#tested_build_configurations
- https://anbasile.github.io/programming/2017/06/25/jupyter-venv/

### Environment Setup
Pip install these packages in the order listed.

In [1]:
# !pip install --upgrade pip
# !pip install --upgrade tensorflow-federated
# !pip uninstall tensorflow -y
# !pip install --upgrade tensorflow-gpu==2.0
# !pip install --upgrade nltk
# !pip install matplotlib
# !pip install nest_asyncio

### Imports

In [2]:
import nest_asyncio
nest_asyncio.apply()

In [3]:
import os, sys
sys.path.append(os.path.dirname(os.path.dirname(os.getcwd())))

In [4]:
# from https://github.com/tensorflow/federated/blob/master/tensorflow_federated/python/research/baselines/stackoverflow/dataset.py
from utils.dataset import construct_word_level_datasets, get_vocab, get_special_tokens

In [5]:
import collections
import functools
import six
import time
import string

import numpy as np
import matplotlib.pyplot as plt
from nltk.corpus import stopwords

import tensorflow as tf
import tensorflow_federated as tff

### Set Compatability Behavior

In [6]:
tf.compat.v1.enable_v2_behavior()

### Check Tensorflow Install

In [7]:
print('Built with Cuda: {}'.format(tf.test.is_built_with_cuda()))
print('Build with GPU support: {}'.format(tf.test.is_built_with_gpu_support()))
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

Built with Cuda: True
Build with GPU support: True
Num GPUs Available:  1


### Set Tensorflow to Use GPU

In [8]:
physical_devices = tf.config.experimental.list_physical_devices(device_type=None)
tf.config.experimental.set_memory_growth(physical_devices[-1], enable=True)
for device in physical_devices:
    print(device)

PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')
PhysicalDevice(name='/physical_device:XLA_CPU:0', device_type='XLA_CPU')
PhysicalDevice(name='/physical_device:XLA_GPU:0', device_type='XLA_GPU')
PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')


### Test TFF

In [9]:
tff.federated_computation(lambda: 'Hello, World!')()

'Hello, World!'

### Set Some Parameters for Preprocessing the Data and Training the Model
**Note:** Ask Keith how he's been setting there for internal experiments.

In [10]:
VOCAB_SIZE = 5000
BATCH_SIZE = 16
CLIENTS_EPOCHS_PER_ROUND = 1
MAX_SEQ_LENGTH = 100
MAX_ELEMENTS_PER_USER = 100
CENTRALIZED_TRAIN = False
SHUFFLE_BUFFER_SIZE = 5000
NUM_VALIDATION_EXAMPLES = 200
NUM_TEST_EXAMPLES = 200

NUM_ROUNDS = 10
NUM_TRAIN_CLIENTS = 10

### Load and Preprocess Word Level Datasets

In [11]:
train_data, val_data, test_data = construct_word_level_datasets(
    vocab_size=VOCAB_SIZE,
    batch_size=BATCH_SIZE,
    client_epochs_per_round=CLIENTS_EPOCHS_PER_ROUND,
    max_seq_len=MAX_SEQ_LENGTH,
    max_elements_per_user=MAX_ELEMENTS_PER_USER,
    centralized_train=CENTRALIZED_TRAIN,
    shuffle_buffer_size=SHUFFLE_BUFFER_SIZE,
    num_validation_examples=NUM_VALIDATION_EXAMPLES,
    num_test_examples=NUM_TEST_EXAMPLES)

  collections.OrderedDict((name, ds.value) for name, ds in sorted(


### Retrieve the Dataset Vocab

In [12]:
vocab = get_vocab(VOCAB_SIZE)

### Retrieve the Special Characters Created During Preprocessing
The four special tokens are:
- pad: padding token
- oov: out of vocabulary
- bos: begin of sentence
- eos: end of sentence

In [13]:
special2idx = dict(zip(['pad', 'oov', 'bos', 'eos'], get_special_tokens(VOCAB_SIZE)))
idx2special = {v:k for k, v in special2idx.items()}

### Set Vocabulary
Add one to account for the pad token which has idx 0.

In [14]:
word2idx = {word:i+1 for i, word in enumerate(vocab)}
idx2word = {i+1:word for i, word in enumerate(vocab)}

### Add Special Characters

In [15]:
word2idx = {**word2idx, **special2idx}
idx2word = {**idx2word, **idx2special}

### Reset Vocab Size
This accounts for having added the special characters.

In [16]:
VOCAB_SIZE = VOCAB_SIZE + len(special2idx)

### Build Model

In [17]:
def build_model(batch_size, vocab_size, seq_length, embedding_dim=256, rnn_units=512):
    """
    Build model with architecture from: https://www.tensorflow.org/tutorials/text/text_generation.
    """

    model1_input = tf.keras.Input(shape=(seq_length, ),
                                  name='model1_input')
    
    model1_embedding = tf.keras.layers.Embedding(input_dim=vocab_size,
                                                 output_dim=embedding_dim,
                                                 input_length=seq_length,
                                                 batch_input_shape=[batch_size, None],
                                                 name='model1_embedding')(model1_input)
    
    model1_lstm = tf.keras.layers.LSTM(units=rnn_units,
                                       return_sequences=True,
                                       recurrent_initializer='glorot_uniform',
                                       name='model1_lstm')(model1_embedding)
    
    model1_dense = tf.keras.layers.Dense(units=vocab_size)(model1_lstm)
    
    final_model = tf.keras.Model(inputs=model1_input, outputs=model1_dense)
                 
    return final_model

### Define the Text Generation Strategy

In [18]:
def generate_text(model, start_string):
    """
    Generate text by sampling from the model output distribution
    as in From https://www.tensorflow.org/tutorials/sequences/text_generation.
    """
    
    start_words = [word.lower() for word in start_string.split(' ')]

    num_generate = 50
    input_eval = [word2idx[word] for word in start_words]
    input_eval = tf.expand_dims(input_eval, 0)
    text_generated = []
    temperature = 1.0

    model.reset_states()
    for i in range(num_generate):
        predictions = model(input_eval)
        predictions = tf.squeeze(predictions, 0)
        predictions = predictions / temperature
        predicted_id = tf.random.categorical(predictions, num_samples=1)[-1, 0].numpy()
        input_eval = tf.expand_dims([predicted_id], 0)
        text_generated.append(idx2word[predicted_id])

    return (' '.join(start_words) + ' '.join(text_generated))

### Load or Build the Model
Text generation requires a batch_size=1 model.

In [19]:
keras_model_batch1 = build_model(batch_size=1, vocab_size=VOCAB_SIZE, seq_length=MAX_SEQ_LENGTH)
print(generate_text(keras_model_batch1, "How are you today"))

how are you todayperl popup conventions removes shortcuts signals di messages single publicly although objective realise persistent 27 wrong tries letter somebody webkit look importantly at case template buggy must beginning watching greatly fancy persons illustrate camera 10 postgresql reused homepage param vertex numbers what misunderstood signing we've succeeds what collect tries aggregation


### Define Lists to Track Loss and Accuracy at Each Training Round

In [20]:
train_loss = []
train_accuracy = []
val_loss = []
val_accuracy = []

### Define the Evaluation Function to Use During Training

In [21]:
def keras_evaluate(keras_model, state, val_dataset):
    
    tff.learning.assign_weights_to_keras_model(keras_model, state.model)
    loss, accuracy = keras_model.evaluate(val_dataset, steps=2)
    
    val_loss.append(loss)
    val_accuracy.append(accuracy)

### Define Loss Function and Metrics

In [22]:
class FlattenedCategoricalAccuracy(tf.keras.metrics.SparseCategoricalAccuracy):

    def __init__(self, name='accuracy', dtype=None):
        super(FlattenedCategoricalAccuracy, self).__init__(name, dtype=dtype)

    def update_state(self, y_true, y_pred, sample_weight=None):
        
        y_true = tf.reshape(y_true, [-1, 1])
        y_pred = tf.reshape(y_pred, [-1, VOCAB_SIZE, 1])
        
        return super(FlattenedCategoricalAccuracy, self).update_state(y_true, y_pred, sample_weight)

In [23]:
def compile(keras_model):
    
    keras_model.compile(
        optimizer=tf.keras.optimizers.Adam(),
        loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
        metrics=[FlattenedCategoricalAccuracy()]
    )
    
    return keras_model

### Load and Compile the Model
The keras model will be accessed as a global variable to create a copy to be called by TFF and will be updated within the training loop to follow.

In [24]:
keras_model = build_model(batch_size=BATCH_SIZE,
                          vocab_size=VOCAB_SIZE,
                          seq_length=MAX_SEQ_LENGTH)
compile(keras_model)

<tensorflow.python.keras.engine.training.Model at 0x7eff48225890>

In [25]:
keras_model.summary()

Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
model1_input (InputLayer)    [(None, 100)]             0         
_________________________________________________________________
model1_embedding (Embedding) (None, 100, 256)          1281024   
_________________________________________________________________
model1_lstm (LSTM)           (None, 100, 512)          1574912   
_________________________________________________________________
dense_1 (Dense)              (None, 100, 5004)         2567052   
Total params: 5,422,988
Trainable params: 5,422,988
Non-trainable params: 0
_________________________________________________________________


### Create TFF Version of the Model to be Trained with Federated Averaging
- Clone the keras_model inside `create_tff_model()`, which TFF will call to produce a new copy of the model inside the graph that it will serialize.
- TFF uses a `dummy_batch` so it knows the types and shapes that your model expects.
- Build and serialize the Tensorflow graph with `build_federated_averaging_process`.

In [26]:
def create_tff_model():
    
    x = tf.constant(np.random.randint(1, VOCAB_SIZE, size=[BATCH_SIZE, MAX_SEQ_LENGTH]))
    dummy_batch = collections.OrderedDict([('x', x), ('y', x)]) 
    keras_model_clone = compile(tf.keras.models.clone_model(keras_model))
    
    return tff.learning.from_compiled_keras_model(keras_model_clone, dummy_batch=dummy_batch)

In [27]:
fed_avg = tff.learning.build_federated_averaging_process(model_fn=create_tff_model)

Instructions for updating:
If using Keras pass *_constraint arguments to layers.


### Initialize the Federated Averaging Process and the Starting Model State

In [28]:
# NOTE: If the statement below fails, it means that you are
# using an older version of TFF without the high-performance
# executor stack. Call `tff.framework.set_default_executor()`
# instead to use the default reference runtime.
if six.PY3:
    tff.framework.set_default_executor(tff.framework.create_local_executor())

In [29]:
# The state of the FL server, containing the model and optimization state.
state = fed_avg.initialize()

state = tff.learning.state_with_new_model_weights(
    state,
    trainable_weights=[v.numpy() for v in keras_model.trainable_weights],
    non_trainable_weights=[v.numpy() for v in keras_model.non_trainable_weights]
)

### Define Function to Create Training Datsets from Randomly Sampled Clients

In [30]:
def get_sample_clients(dataset, num_clients):
    
    random_indices = np.random.choice(len(dataset.client_ids), size=num_clients, replace=False)
    
    return np.array(dataset.client_ids)[random_indices]

### Train Model Across Many Randomly Sampled Clients with Federated Averaging

In [31]:
for round_num in range(NUM_ROUNDS):
    
#     # Examine validation metrics
#     print(f'Evaluating before training round #{round_num} on {NUM_VALIDATION_EXAMPLES} clients.')
#     keras_evaluate(keras_model, state, val_data)
    
    # Sample train clients to create a train dataset
    print(f'Sampling {NUM_TRAIN_CLIENTS} new clients.')
    train_clients = get_sample_clients(train_data, num_clients=NUM_TRAIN_CLIENTS)
    train_datasets = [train_data.create_tf_dataset_for_client(client) for client in train_clients]
    
    # Apply federated training round
    print('Applying federated training round.')
    state, metrics = fed_avg.next(state, train_datasets)
    
    # Examine training metrics
    print(f'Training metrics - loss: {metrics[1]:4.4f}; accuracy: {metrics[0]:4.4f}')
    train_loss.append(metrics[1])
    train_accuracy.append(metrics[0])

Sampling 10 new clients.


  collections.OrderedDict((name, ds.value) for name, ds in sorted(


Applying federated training round.
Training metrics - loss: 7.0183; accuracy: 0.7115
Sampling 10 new clients.
Applying federated training round.
Training metrics - loss: 1.6740; accuracy: 0.8480
Sampling 10 new clients.
Applying federated training round.
Training metrics - loss: 1.4121; accuracy: 0.8418
Sampling 10 new clients.
Applying federated training round.
Training metrics - loss: 1.1855; accuracy: 0.8563
Sampling 10 new clients.
Applying federated training round.
Training metrics - loss: 1.2930; accuracy: 0.8414
Sampling 10 new clients.
Applying federated training round.
Training metrics - loss: 1.1104; accuracy: 0.8367
Sampling 10 new clients.
Applying federated training round.
Training metrics - loss: 1.2125; accuracy: 0.8303
Sampling 10 new clients.
Applying federated training round.
Training metrics - loss: 1.0321; accuracy: 0.8385
Sampling 10 new clients.
Applying federated training round.
Training metrics - loss: 0.9202; accuracy: 0.8582
Sampling 10 new clients.
Applying f

### Plot Model Objective Function

In [None]:
fig, ax = plt.subplots()
x_axis = range(0, NUM_ROUNDS)
ax.plot(x_axis, train_loss, label='Train')
ax.plot(x_axis, val_loss, label='Validation')
ax.legend(loc='best')
plt.ylabel('Value of Objective Function')
plt.title('Model Objective Function at Each Training Round')
plt.show()

### Plot Model Accuracy

In [None]:
fig, ax = plt.subplots()
x_axis = range(0, NUM_ROUNDS)
ax.plot(x_axis, train_accuracy, label='Train')
ax.plot(x_axis, val_accuracy, label='Validation')
ax.legend(loc='best')
plt.ylabel('Accuracy')
plt.title('Model Accuracy at Each Training Round')
plt.show()

### Get Final Evaluation

In [None]:
keras_evaluate(keras_model, state, val_dataset)

### Generate Text
Text generation requires batch_size=1.

In [None]:
keras_model_batch1.set_weights([v.numpy() for v in keras_model.weights])
print(generate_text(keras_model_batch1, "How's the water today? "))

**Suggested extensions:**

- Use ".repeat(NUM_EPOCHS)" on the client datasets to try multiple epochs of local training (e.g., as in McMahan et. al.). See also Federated Learning for Image Classification which does this.
- Change the compile() command to experiment with using different optimization algorithms on the client.
- Try the server_optimizer argument to build_federated_averaging_process to try different algorithms for applying the model updates on the server.
- Try the client_weight_fn argument to to build_federated_averaging_process to try different weightings of the clients. The default weights client updates by the number of examples on the client, but you can do e.g. client_weight_fn=lambda _: tf.constant(1.0).