<a href="https://colab.research.google.com/github/frank-morales2020/MLxDL/blob/main/Copy_of_news_RNN_language_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RNN-based language model

## Utility functions and classes

In the cell below, we import the dependencies and define the utility functions and the model class:

In [2]:
!nvidia-smi

Tue Dec 10 19:47:00 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA A100-SXM4-40GB          Off | 00000000:00:04.0 Off |                    0 |
| N/A   34C    P0              44W / 400W |      2MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [3]:
# Import required libraries
import os               # For file and path operations (check_file_exists, extract_dataset)
import urllib.request   # For downloading dataset files from URLs
import tarfile          # For extracting .tar.gz dataset archives
import torch            # Main PyTorch library for tensor operations and deep learning
import torch.nn as nn   # Neural network modules, layers, and utilities
from torch.utils.data import DataLoader, IterableDataset  # For efficient data loading and streaming
import random           # For setting random seeds in reproducibility
from tqdm import tqdm   # For progress bars in training and evaluation
import math             # For computing perplexity using exp()
import re               # For preprocessing text (replacing numbers with placeholders)
from transformers import AutoTokenizer # For loading a pre-trained tokenizer

# ----------------------------
# Utility Functions
# ----------------------------

def set_seed(seed):
    """
    Sets random seeds for reproducibility across different Python libraries.
    This ensures that random operations give the same results across runs.

    Args:
        seed (int): Seed value for random number generation
    """
    # Set seed for Python's built-in random module
    random.seed(seed)
    # Set seed for PyTorch's CPU random number generator
    torch.manual_seed(seed)
    # Set seed for PyTorch's GPU random number generator
    torch.cuda.manual_seed_all(seed)
    # Forces cuDNN to use deterministic algorithms for better reproducibility
    torch.backends.cudnn.deterministic = True
    # Disables cuDNN's auto-tuner which finds the best algorithm for your specific input size
    # Ensures consistent behavior but might be slower as it doesn't optimize for input sizes
    torch.backends.cudnn.benchmark = False

class IterableTextDataset(IterableDataset):
    """
    An iterable dataset for processing text data in a memory-efficient way.
    Instead of loading all data into memory, it streams data from disk.
    Inherits from PyTorch's IterableDataset for streaming support.

    Args:
        file_path (str): Path to the text file containing sentences
        tokenizer: Tokenizer object for converting text to tokens
        max_length (int): Maximum sequence length to process (default: 30)
    """
    def __init__(self, file_path, tokenizer, max_length=30):
        # Store file path for reading data
        self.file_path = file_path
        # Store tokenizer for text processing
        self.tokenizer = tokenizer
        # Set maximum sequence length to truncate long sequences
        self.max_length = max_length
        self._count_sentences()

    def __iter__(self):
        """
        Creates an iterator over the dataset.
        This method is called when iterating over the dataset.

        Yields:
            tuple: (input_sequence, target_sequence) pairs for language modeling
                  input_sequence is the sequence up to the last token
                  target_sequence is the sequence shifted one position right
        """
        # Open file in read mode with UTF-8 encoding
        with open(self.file_path, 'r', encoding='utf-8') as f:
            # Process each line (sentence) in the file
            for line in f:
                # Remove leading/trailing whitespace
                sentence = line.strip()
                # Replace all numbers with ### placeholder
                # This reduces vocabulary size and helps model generalize
                sentence = re.sub(r'\d+', '###', sentence)

                # Convert sentence to token IDs
                encoded_sentence = self.tokenizer.encode(
                    sentence,
                    max_length=self.max_length,
                    truncation=True
                )

                # Only use sequences with at least 2 tokens
                # (need at least one input and one target token)
                if len(encoded_sentence) >= 2:
                    # Input is all tokens except last
                    input_seq = encoded_sentence[:-1]
                    # Target is all tokens except first
                    target_seq = encoded_sentence[1:]
                    # Convert to PyTorch tensors and yield
                    yield torch.tensor(input_seq, dtype=torch.long), torch.tensor(target_seq, dtype=torch.long)
    def __len__(self):
        return self._num_sentences

    def _count_sentences(self):
        print(f"Counting sentences in {self.file_path}...")
        with open(self.file_path, 'r', encoding='utf-8') as f:
            self._num_sentences = sum(1 for _ in f)
        print(f"Found {self._num_sentences} sentences in {self.file_path}.")

## ----------------------------
## Download and prepare data
## ----------------------------

def create_collate_fn(tokenizer):
    """
    Creates a collate function for batching sequences of different lengths.
    This function pads shorter sequences to match the longest sequence in the batch.

    Args:
        tokenizer: Tokenizer object containing padding token information

    Returns:
        function: Collate function that handles padding in batches
    """
    def collate_fn(batch):
        # Separate inputs and targets from batch
        input_seqs, target_seqs = zip(*batch)
        # Get padding token ID from tokenizer
        pad_index = tokenizer.pad_token_id
        # Pad input sequences to same length
        input_padded = nn.utils.rnn.pad_sequence(input_seqs, batch_first=True, padding_value=pad_index)
        # Pad target sequences to same length
        target_padded = nn.utils.rnn.pad_sequence(target_seqs, batch_first=True, padding_value=pad_index)
        return input_padded, target_padded
    return collate_fn

def check_file_exists(filename):
    """
    Checks if a file exists in the current directory.
    Args:
        filename (str): Name of the file to check
    Returns:
        bool: True if file exists, False otherwise
    """
    return os.path.exists(filename)

def download_file(url):
    """
    Downloads a file from the given URL if it doesn't exist locally, handling redirects.
    Forces the downloaded file to be saved as news.tar.gz regardless of URL.

    Args:
        url (str): URL of the file to download
    Returns:
        str: Name of the downloaded file
    """
    # Always use news.tar.gz as the filename, regardless of URL
    filename = "news.tar.gz"

    if not check_file_exists(filename):
        print(f"Downloading dataset from {url}...")
        req = urllib.request.Request(
            url,
            headers={'User-Agent': 'Mozilla/5.0'}
        )
        with urllib.request.urlopen(req) as response:
            with open(filename, 'wb') as out_file:
                out_file.write(response.read())
        print("Download completed.")
    else:
        print(f"{filename} already downloaded.")
    return filename

def is_within_directory(directory, target):
    """
    Checks if a target path is within a specified directory (prevents path traversal).
    Args:
        directory (str): Base directory path
        target (str): Target path to check
    Returns:
        bool: True if target is within directory, False otherwise
    """
    abs_directory = os.path.abspath(directory)
    abs_target = os.path.abspath(target)
    prefix = os.path.commonprefix([abs_directory, abs_target])
    return prefix == abs_directory

def extract_dataset(filename):
    """
    Extracts train.txt and test.txt from the downloaded archive.
    Includes debug information about archive contents.

    Args:
        filename (str): Name of the archive file
    Returns:
        tuple: Paths to extracted train and test files
    """
    data_dir = os.path.join(os.path.dirname(filename), 'news')
    train_path = os.path.join(data_dir, 'train.txt')
    test_path = os.path.join(data_dir, 'test.txt')

    if check_file_exists(train_path) and check_file_exists(test_path):
        print("Data files already extracted.")
        return train_path, test_path

    print("\nListing archive contents:")
    with tarfile.open(filename, 'r:gz') as tar:
        for member in tar.getmembers():
            print(f"Archive member: {member.name}")

        print("\nExtracting files...")
        # Extract to current directory first
        tar.extractall('.')

    if not (check_file_exists(train_path) and check_file_exists(test_path)):
        raise FileNotFoundError(f"Required files not found in the archive. Please check the paths above.")

    print("Extraction completed.")
    return train_path, test_path

def create_datasets(train_file, test_file, tokenizer):
    """
    Creates IterableTextDataset objects for training and testing.
    These datasets will stream data from disk instead of loading it all into memory.

    Args:
        train_file (str): Path to training data file
        test_file (str): Path to test data file
        tokenizer: Tokenizer object for text processing

    Returns:
        tuple: (train_dataset, test_dataset) - Dataset objects for training and testing
    """
    # Create training dataset
    train_dataset = IterableTextDataset(train_file, tokenizer)
    # Create test dataset
    test_dataset = IterableTextDataset(test_file, tokenizer)

    # Print dataset sizes
    print(f"Training sentences: {len(train_dataset)}")
    print(f"Test sentences: {len(test_dataset)}")

    return train_dataset, test_dataset

def create_dataloaders(train_dataset, test_dataset, batch_size, collate_fn):
    """
    Creates DataLoader objects that handle batching and shuffling of data.
    DataLoaders provide iterators over the datasets with automatic batching.

    Args:
        train_dataset: Training dataset
        test_dataset: Test dataset
        batch_size (int): Number of sequences per batch
        collate_fn: Function to handle padding when creating batches

    Returns:
        tuple: (train_dataloader, test_dataloader) - DataLoader objects
    """
    # Create training data loader
    train_dataloader = DataLoader(
        train_dataset,
        batch_size=batch_size,
        collate_fn=collate_fn,    # Function to handle padding
        num_workers=0             # Number of worker processes (0 = single process)
    )
    # Create test data loader
    test_dataloader = DataLoader(
        test_dataset,
        batch_size=batch_size,
        collate_fn=collate_fn,
        num_workers=0
    )
    return train_dataloader, test_dataloader

def download_and_prepare_data(url, batch_size, tokenizer):
    """
    Main function to handle the complete data preparation pipeline.
    Downloads data, extracts it, and creates necessary dataset objects.

    Args:
        url (str): URL where the dataset archive can be downloaded
        batch_size (int): Batch size for data loading
        tokenizer: Tokenizer object for text processing

    Returns:
        tuple: (train_dataloader, test_dataloader) - Ready-to-use data loaders
    """
    # Step 1: Download dataset archive from URL
    filename = download_file(url)

    # Step 2: Extract training and test files from archive
    train_file, test_file = extract_dataset(filename)

    # Step 3: Create dataset objects for streaming data
    train_dataset, test_dataset = create_datasets(train_file, test_file, tokenizer)

    # Step 4: Create function to handle batch creation
    collate_fn = create_collate_fn(tokenizer)

    # Step 5: Create and return data loaders
    return create_dataloaders(train_dataset, test_dataset, batch_size, collate_fn)

# ----------------------------
# Recurrent Language Model Class
# ----------------------------

def initialize_weights(model):
    """
    Initializes model weights using Xavier uniform initialization for multi-dimensional
    parameters and uniform initialization for biases and other 1D parameters.

    Args:
        model (nn.Module): PyTorch model whose weights need to be initialized
    """
    # Loop through all named parameters in the model
    for name, param in model.named_parameters():
        # Check if parameter has more than 1 dimension (e.g., weight matrices)
        if param.dim() > 1:
            # Use Xavier uniform initialization for weight matrices
            # This helps prevent vanishing/exploding gradients by keeping the variance constant
            nn.init.xavier_uniform_(param)
        else:
            # For 1D parameters (like biases), use simple uniform initialization
            nn.init.uniform_(param)

class ElmanRNNUnit(nn.Module):
    """
    Implementation of a single Elman RNN unit (a simple recurrent neural network cell).
    This is the basic building block of our RNN that processes one time step of input.

    Args:
        emb_dim (int): Dimension of the embedding/hidden state vectors
    """
    def __init__(self, emb_dim):
        super(ElmanRNNUnit, self).__init__()
        # Hidden-to-hidden weight matrix: transforms previous hidden state
        # Shape: (emb_dim, emb_dim)
        self.Uh = nn.Parameter(torch.rand(emb_dim, emb_dim))

        # Input-to-hidden weight matrix: transforms current input
        # Shape: (emb_dim, emb_dim)
        self.Wh = nn.Parameter(torch.rand(emb_dim, emb_dim))

        # Bias term added to the sum of transformations
        # Shape: (emb_dim,)
        self.b = nn.Parameter(torch.rand(emb_dim))

    def forward(self, x, h):
        """
        Computes one step of the RNN unit.

        Args:
            x (torch.Tensor): Current input tensor of shape (batch_size, emb_dim)
            h (torch.Tensor): Previous hidden state of shape (batch_size, emb_dim)

        Returns:
            torch.Tensor: New hidden state of shape (batch_size, emb_dim)

        The formula implemented is: h_new = tanh(x @ Wh + h @ Uh + b)
        where @ represents matrix multiplication
        """
        # 1. Transform current input: x @ Wh
        input_transform = x @ self.Wh

        # 2. Transform previous hidden state: h @ Uh
        hidden_transform = h @ self.Uh

        # 3. Add both transformations and bias
        # 4. Apply tanh activation function to get new hidden state
        # tanh squashes values to range (-1, 1), helping prevent exploding gradients
        return torch.tanh(input_transform + hidden_transform + self.b)

class ElmanRNN(nn.Module):
    """
    Multi-layer Elman RNN implementation that processes entire sequences.
    Stacks multiple RNN units to create a deeper network that can learn more complex patterns.

    Args:
        emb_dim (int): Dimension of embeddings and hidden states
        num_layers (int): Number of stacked RNN layers
    """
    def __init__(self, emb_dim, num_layers):
        super().__init__()
        self.emb_dim = emb_dim
        self.num_layers = num_layers

        # Create a list of RNN units, one for each layer
        # ModuleList is used so PyTorch tracks all parameters
        self.rnn_units = nn.ModuleList(
            [ElmanRNNUnit(emb_dim) for _ in range(num_layers)]
        )

    def forward(self, x):
        """
        Processes input sequence through all RNN layers.

        Args:
            x (torch.Tensor): Input tensor of shape (batch_size, seq_len, emb_dim)

        Returns:
            torch.Tensor: Output tensor of shape (batch_size, seq_len, emb_dim)
        """
        # Get dimensions from input tensor
        batch_size, seq_len, emb_dim = x.size()

        # Initialize hidden states for each layer with zeros
        # Each hidden state has shape (batch_size, emb_dim)
        h_prev = [
            torch.zeros(batch_size, emb_dim, device=x.device)
            for _ in range(self.num_layers)
        ]

        # Will store outputs for each time step
        outputs = []

        # Process each time step
        for t in range(seq_len):
            # Get input for current time step
            input_t = x[:, t]

            # Process through each layer
            for l, rnn_unit in enumerate(self.rnn_units):
                # Compute new hidden state for this layer
                h_new = rnn_unit(input_t, h_prev[l])

                # Update hidden state for this layer
                h_prev[l] = h_new

                # Output of this layer becomes input to next layer
                input_t = h_new

            # Add final layer's output to results
            outputs.append(input_t)

        # Stack all time steps' outputs into a single tensor
        # Shape: (batch_size, seq_len, emb_dim)
        return torch.stack(outputs, dim=1)

class RecurrentLanguageModel(nn.Module):
    """
    Complete language model implementation combining embedding layer,
    multi-layer RNN, and output projection layer.

    The model architecture is:
    1. Input tokens -> Embedding Layer -> Embedded Vectors
    2. Embedded Vectors -> RNN Layers -> Context Vectors
    3. Context Vectors -> Linear Layer -> Vocabulary Predictions

    Args:
        vocab_size (int): Size of the vocabulary (number of unique tokens)
        emb_dim (int): Dimension of embeddings and hidden states
        num_layers (int): Number of RNN layers
        pad_index (int): Index used for padding tokens
    """
    def __init__(self, vocab_size, emb_dim, num_layers, pad_index):
        super().__init__()

        # Save model parameters
        self.vocab_size = vocab_size
        self.emb_dim = emb_dim
        self.num_layers = num_layers
        self.pad_index = pad_index

        # Embedding layer: converts token indices to dense vectors
        # pad_index tokens will be mapped to zero vectors
        self.embedding = nn.Embedding(vocab_size, emb_dim, pad_index)

        # RNN layers for processing sequences
        self.rnn = ElmanRNN(
            emb_dim=emb_dim,
            num_layers=num_layers
        )

        # Final linear layer to convert RNN outputs to vocabulary predictions
        # Output size is vocab_size to get logits for each possible token
        self.fc = nn.Linear(emb_dim, vocab_size)

    def forward(self, x):
        """
        Processes input sequences through the entire model.

        Args:
            x (torch.Tensor): Input tensor of token indices
                            Shape: (batch_size, seq_len)

        Returns:
            torch.Tensor: Output logits for next token prediction
                         Shape: (batch_size, seq_len, vocab_size)

        Process:
        1. Convert token indices to embeddings
        2. Process embeddings through RNN layers
        3. Project RNN outputs to vocabulary size
        """
        # Convert token indices to embeddings
        # Shape: (batch_size, seq_len) -> (batch_size, seq_len, emb_dim)
        embeddings = self.embedding(x)

        # Process through RNN layers
        # Shape: (batch_size, seq_len, emb_dim) -> (batch_size, seq_len, emb_dim)
        rnn_output = self.rnn(embeddings)

        # Project to vocabulary size to get logits
        # Shape: (batch_size, seq_len, emb_dim) -> (batch_size, seq_len, vocab_size)
        logits = self.fc(rnn_output)

        return logits

def compute_loss_and_perplexity(model, vocab_size, val_dataloader, criterion, device, max_sentences=1000):
    """
    Evaluates model performance by computing loss and perplexity on validation data.

    Args:
        model (nn.Module): The language model to evaluate
        val_dataloader (DataLoader): Validation data loader
        criterion: Loss function (usually CrossEntropyLoss)
        device: Device to run computation on (cuda/cpu)
        max_sentences (int): Maximum number of sentences to evaluate

    Returns:
        tuple: (average_loss, perplexity, sentences_processed)
    """
    # Set model to evaluation mode (disables dropout, etc.)
    model.eval()

    # Initialize counters for loss calculation
    total_loss = 0.0          # Accumulator for total loss across all batches
    total_tokens = 0          # Counter for total number of tokens processed
    sentences_processed = 0    # Counter for number of sentences processed

    # Disable gradient computation for efficiency
    with torch.no_grad():
        # Iterate through validation data with progress bar
        for input_seq, target_seq in tqdm(val_dataloader, desc="Evaluating", leave=False):
            # Move input and target sequences to specified device
            input_seq = input_seq.to(device)      # Shape: (batch_size, seq_len)
            target_seq = target_seq.to(device)    # Shape: (batch_size, seq_len)

            # Get current batch size (might be smaller for last batch)
            batch_size_current = input_seq.size(0)

            # Forward pass through the model
            output = model(input_seq)                     # Shape: (batch_size, seq_len, vocab_size)
            # Reshape output and target for loss calculation
            output = output.view(-1, vocab_size)          # Shape: (batch_size * seq_len, vocab_size)
            target = target_seq.view(-1)                  # Shape: (batch_size * seq_len)

            mask = target != tokenizer.pad_token_id

            # Compute loss
            loss = criterion(output, target)

            # Update counters
            # multiply loss by number of tokens to get total loss
            loss_value = loss.item() * mask.sum().item()
            total_loss += loss_value
            # Add number of actual tokens (non-padding) processed
            total_tokens += mask.sum().item()

            # Update sentence counter and check if we've reached maximum
            sentences_processed += batch_size_current
            if sentences_processed >= max_sentences:
                break

    # Calculate final metrics
    average_loss = total_loss / total_tokens           # Normalize loss by number of tokens
    perplexity = math.exp(average_loss)               # Convert loss to perplexity (lower is better)
    return average_loss, perplexity, sentences_processed

def perform_model_evaluation(model, test_dataloader, criterion, tokenizer, device, contexts):
    """
    Perform evaluation of the model including loss calculation, perplexity, and text generation.

    Args:
        model: The neural network model
        test_dataloader: DataLoader containing test/validation data
        criterion: Loss function
        tokenizer: Tokenizer for text generation
        device: Device to run computations on (cuda/cpu)
        contexts: List of context strings for text generation

    Returns:
        tuple: (average_loss, perplexity)
    """
    # Switch to evaluation mode
    model.eval()

    # Compute metrics
    average_loss, perplexity, sentences_processed = compute_loss_and_perplexity(
        model=model,
        vocab_size=len(tokenizer),
        val_dataloader=test_dataloader,
        criterion=criterion,
        device=device,
        max_sentences=1000
    )

    print(f"Validation Average Loss: {average_loss:.4f}, Perplexity: {perplexity:.2f}")
    print(f"Computed on {sentences_processed} sentences")

    # Generate text using the contexts
    print("Generating text based on contexts using generate_text:\n")
    for context in contexts:
        generated_text = generate_text(
            model=model,
            start_string=context,
            tokenizer=tokenizer,
            device=device,
            max_length=50
        )
        print(f"\nContext: {context}")
        print(f"\nGenerated text: {generated_text}\n")

    return average_loss, perplexity

def generate_text(model, start_string, tokenizer, device, max_length=50):
    """
    Generates text continuation from a given start string using greedy decoding.
    This method always chooses the most likely next token.

    Args:
        model (nn.Module): Trained language model
        start_string (str): Initial text to continue from
        tokenizer: Tokenizer for text processing
        device: Device to run generation on (cuda/cpu)
        max_length (int): Maximum length of generated sequence

    Returns:
        str: Generated text continuation
    """
    # Set model to evaluation mode
    model.eval()

    # Convert start string to token ids and move to device
    # return_tensors='pt' returns PyTorch tensor instead of list
    tokens = tokenizer.encode(start_string, return_tensors='pt', max_length=max_length, truncation=True).to(device)

    # Initialize generated sequence with input tokens
    generated = tokens

    # Generate new tokens one at a time
    for _ in range(max_length):
        # Get model's predictions
        output = model(generated)                    # Shape: (1, seq_len, vocab_size)
        # Get logits for the next token (last position)
        next_token_logits = output[0, -1, :]        # Shape: (vocab_size)

        # Choose token with highest probability (greedy decoding)
        # unsqueeze twice to match expected shape (1, 1)
        next_token_id = torch.argmax(next_token_logits, dim=-1).unsqueeze(0).unsqueeze(0)

        # Add new token to generated sequence
        generated = torch.cat((generated, next_token_id), dim=1)

        # Stop if end of sequence token is generated
        if next_token_id.item() == tokenizer.eos_token_id:
            break

    # Convert token ids back to text
    generated_text = tokenizer.decode(generated.squeeze().tolist())
    return generated_text

def save_model(model, tokenizer, file_prefix):
    model_state = {
        'state_dict': model.state_dict(),
        'vocab_size': model.vocab_size,
        'emb_dim': model.emb_dim,
        'num_layers': model.num_layers,
        'pad_index': model.pad_index,
        'training': model.training  # Save training state
    }

    torch.save(model_state, f'{file_prefix}_model.pth')
    tokenizer.save_pretrained(f'{file_prefix}_tokenizer')

def load_model(file_prefix):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    # Load state dict to the correct device first
    model_state = torch.load(f'{file_prefix}_model.pth', map_location=device)

    # Create model and move it to device before loading state dict
    model = RecurrentLanguageModel(
        model_state['vocab_size'],
        model_state['emb_dim'],
        model_state['num_layers'],
        model_state['pad_index']
    ).to(device)

    # Load state dict after model is on correct device
    model.load_state_dict(model_state['state_dict'])

    # Keep model on device
    model.eval()  # Set to evaluation mode

    tokenizer = AutoTokenizer.from_pretrained(f'{file_prefix}_tokenizer')
    return model, tokenizer

def get_hyperparameters():
    """
    Returns default hyperparameters for model training.

    Returns:
        tuple: (emb_dim, num_layers, batch_size, learning_rate, num_epochs)
    """
    emb_dim = 128         # Embedding dimension
    num_layers = 2        # Number of RNN layers
    batch_size = 128      # Training batch size
    learning_rate = 0.001 # Learning rate for optimization
    num_epochs = 1        # Number of training epochs
    return emb_dim, num_layers, batch_size, learning_rate, num_epochs



## Training the language model

In the cell below, we load the data, train, and save the language model:

In [4]:
# ----------------------------
# Main training loop for a Recurrent Neural Network Language Model
# This script handles the entire training process including data loading,
# model training, validation, and text generation
# ----------------------------

if __name__ == "__main__":
    # Initialize random seeds to ensure reproducible results
    set_seed(42)

    # Check for CUDA-capable GPU and set the device accordingly
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    # Retrieve model architecture and training hyperparameters from configuration
    # emb_dim: dimensionality of word embeddings and hidden states
    # num_layers: number of recurrent layers in the model
    # batch_size: mini-batch size
    # learning_rate: step size for optimizer updates
    # num_epochs: number of complete passes through the training dataset
    emb_dim, num_layers, batch_size, learning_rate, num_epochs = get_hyperparameters()

    # Initialize the tokenizer using Microsoft's Phi-3.5-mini model
    tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3.5-mini-instruct")

    # Get the size of the vocabulary that the model needs to handle
    vocab_size = len(tokenizer)

    # Download the news dataset and create DataLoader objects for training and testing
    # DataLoaders handle batching and shuffling
    data_url = "https://www.thelmbook.com/data/news"
    train_dataloader, test_dataloader = download_and_prepare_data(data_url, batch_size, tokenizer)

    # Initialize the RNN language model with specified architecture parameters
    # vocab_size: determines output layer dimensionality
    # emb_dim: size of word embeddings and hidden states
    # num_layers: number of RNN layers
    # pad_token_id: special token ID used for padding shorter sequences
    model = RecurrentLanguageModel(vocab_size, emb_dim, num_layers, tokenizer.pad_token_id)

    # Move the model to GPU if available
    model.to(device)

    # Initialize model weights using custom initialization scheme
    # This is important for stable training of deep neural networks
    initialize_weights(model)

    # Calculate and display the total number of trainable parameters in the model
    total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    print(f"Total trainable parameters: {total_params}\n")

    # Initialize the loss function (Cross Entropy) for training
    # ignore_index=pad_token_id ensures that padding tokens don't contribute to the loss
    # This prevents the model from learning to predict padding tokens
    criterion = nn.CrossEntropyLoss(ignore_index=tokenizer.pad_token_id)

    # Initialize the AdamW optimizer with specified learning rate
    optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate)

    # Set evaluation interval (number of examples after which to perform validation)
    # 100,000 examples provides a good balance between training time and monitoring frequency
    eval_interval = 200_000
    examples_processed = 0  # Counter for tracking progress toward next evaluation

    # Define test contexts for generating sample text during evaluation
    contexts = [
        "Moscow",
        "New York",
        "The hurricane",
        "The President"
    ]

    # Main training loop - iterate through specified number of epochs
    for epoch in range(num_epochs):
        # Set model to training mode
        model.train()
        print(f"Starting Epoch {epoch+1}/{num_epochs}, Model in training mode: {model.training}")

        # Initialize tracking variables for this epoch
        total_loss = 0.0      # Accumulator for loss across all batches
        total_tokens = 0      # Counter for actual tokens processed (excluding padding)
        # Create progress bar for monitoring training progress
        progress_bar = tqdm(train_dataloader, desc=f"Epoch {epoch+1}/{num_epochs}")

        # Iterate through batches in the training data
        for batch_idx, (input_seq, target_seq) in enumerate(progress_bar):
            # Move input and target sequences to GPU if available
            input_seq = input_seq.to(device)
            target_seq = target_seq.to(device)
            # Get current batch dimensions for reshaping operations
            batch_size_current, seq_len = input_seq.shape

            # Clear gradients from previous batch
            # This is necessary as PyTorch accumulates gradients by default
            optimizer.zero_grad()

            # Forward pass: get model predictions for this batch
            # output shape: (batch_size, seq_len, vocab_size)
            output = model(input_seq)

            # Reshape output and target tensors for loss computation
            # - output: reshape to (batch_size * seq_len, vocab_size) for CrossEntropyLoss
            # - target: reshape to (batch_size * seq_len) to match CrossEntropyLoss requirements
            output = output.reshape(batch_size_current * seq_len, vocab_size)
            target = target_seq.reshape(batch_size_current * seq_len)

            # Count number of non-padding tokens in target
            # This is needed because to calculate the average loss for multiple batches we need to divide the total loss
            # by the number of tokens in these batches, but criterion(output, target) returns the average loss per token in a batch.
            # So, we will multiply the loss per token by the number of tokens to get the loss per batch
            non_padding_token_count = (target != tokenizer.pad_token_id).sum().item()

            # Compute loss between model predictions and actual targets
            loss = criterion(output, target)
            # Backward pass: compute gradients of loss with respect to model parameters
            loss.backward()
            # Update model parameters using calculated gradients
            optimizer.step()

            # Calculate actual loss value for this batch
            # Multiply the loss per token by number of non-padding tokens to get total loss for the batch
            loss_value = loss.item() * non_padding_token_count
            # Accumulate total loss for epoch statistics
            total_loss += loss_value
            # Accumulate total number of actual tokens processed
            total_tokens += non_padding_token_count
            # Increment counter for examples processed
            examples_processed += batch_size_current

            # Update progress bar with current batch loss
            progress_bar.set_postfix({'loss': f"{loss.item():.4f}"})

            # Periodic evaluation after processing specified number of examples
            if examples_processed >= eval_interval:
                # Calculate average loss over the last eval_interval examples
                avg_loss = total_loss / total_tokens
                print(f"\nAfter {examples_processed} examples, Average Loss: {avg_loss:.4f}")

                # Switch to evaluation mode
                model.eval()

                # Compute validation metrics
                # average_loss: mean loss on validation set
                # perplexity: exponential of average loss, lower is better
                # sentences_processed: number of validation sequences evaluated
                average_loss, perplexity, sentences_processed = compute_loss_and_perplexity(
                    model=model,
                    vocab_size=vocab_size,
                    val_dataloader=test_dataloader,
                    criterion=criterion,
                    device=device,
                    max_sentences=1000  # Limit validation to 1000 sentences for speed
                )
                print(f"Validation Average Loss: {average_loss:.4f}, Perplexity: {perplexity:.2f}")
                print(f"Computed on {sentences_processed} sentences")

                # Generate sample texts to qualitatively assess model performance
                print("Generating text based on contexts using generate_text:\n")
                for context in contexts:
                    # Generate text continuation for each test context
                    generated_text = generate_text(
                        model=model,
                        start_string=context,
                        tokenizer=tokenizer,
                        device=device,
                        max_length=50  # Limit generation to 50 tokens for brevity
                    )
                    print(f"\nContext: {context}")
                    print(f"\nGenerated text: {generated_text}")

                # Switch back to training mode for continued training
                model.train()
                # Reset counters for next evaluation interval
                examples_processed = 0
                total_loss = 0.0
                total_tokens = 0

        # End-of-epoch reporting
        if total_tokens > 0:
            # Calculate and display average loss for the epoch
            avg_loss = total_loss / total_tokens
            print(f"\nEpoch {epoch+1}/{num_epochs}, Average Loss: {avg_loss:.4f}")
        else:
            # Handle edge case where no tokens were processed
            print(f"\nEpoch {epoch+1}/{num_epochs} completed.")

        # Perform end-of-epoch validation
        model.eval()
        average_loss, perplexity, sentences_processed = compute_loss_and_perplexity(
            model=model,
            vocab_size=vocab_size,
            val_dataloader=test_dataloader,
            criterion=criterion,
            device=device
        )
        print(f"Validation Average Loss: {average_loss:.4f}, Perplexity: {perplexity:.2f}\n")

        # Reset to training mode for next epoch
        model.train()

    # Save the trained model and tokenizer for later use
    # This includes model architecture, weights, and tokenizer configuration
    model_name = "RNN_LM"
    save_model(model, tokenizer, model_name)

tokenizer_config.json:   0%|          | 0.00/3.98k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading dataset from https://www.thelmbook.com/data/news...
Download completed.

Listing archive contents:
Archive member: news
Archive member: news/train.txt
Archive member: news/test.txt

Extracting files...
Extraction completed.
Counting sentences in news/train.txt...
Found 22034911 sentences in news/train.txt.
Counting sentences in news/test.txt...
Found 449693 sentences in news/test.txt.
Training sentences: 22034911
Test sentences: 449693
Total trainable parameters: 8292619

Starting Epoch 1/1, Model in training mode: True


Epoch 1/1:   1%|          | 1561/172148 [01:11<2:11:35, 21.61it/s, loss=5.7836]


After 200064 examples, Average Loss: 6.3757



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:43, 33.76it/s][A
                                                            [A

Validation Average Loss: 5.5840, Perplexity: 266.13
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow , the ##-year-old , who was a lot of the world . '' he said . 's the first time . '' he said . 's the first time , the first time of the time , I 've been a lot of

Context: New York

Generated text: New York , ## , was a lot of the world . '' he said . 's the first time . '' he said . 's the first time , the first time of the time , I 've been a lot of the world . '' he said

Context: The hurricane

Generated text: The hurricane , the first-year-old , who was a lot of the world . '' he said . 's the first time . '' he said . 's the first time , the first time of the time , I 've been a lot of


Epoch 1/1:   1%|          | 1567/172148 [01:16<17:28:43,  2.71it/s, loss=5.6652]


Context: The President

Generated text: The President was a lot of the world . '' he said . 's the first time . '' he said . 's the first time , 's the first time . '' he said . 's the first time . '' he said . 's the


Epoch 1/1:   2%|▏         | 3123/172148 [02:29<2:17:25, 20.50it/s, loss=5.4569]


After 200064 examples, Average Loss: 5.4135



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:40, 34.92it/s][A
                                                            [A

Validation Average Loss: 5.2107, Perplexity: 183.23
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's mother , who was a `` to be able to be a good way . '' he said . 's the time . '' he said . 's the time . '' he said . 's the time . '' he said . 's

Context: New York

Generated text: New York City , who was a `` to be able to be a good way . '' he said . 's the time . '' he said . 's the time . '' he said . 's the time . '' he said . 's the time

Context: The hurricane

Generated text: The hurricane , the ##-year-old was a `` to be able to be a good way . '' he said . 's the time . '' he said . 's the time . '' he said . 's the time . '' he said .


Epoch 1/1:   2%|▏         | 3129/172148 [02:30<6:51:10,  6.85it/s, loss=5.2555]


Context: The President

Generated text: The President , ## , said : 'We are not a good way . '' he said . 's the time . '' he said . 's the time . '' he said . 's the time . '' he said . 's the time . ''


Epoch 1/1:   3%|▎         | 4688/172148 [03:43<2:07:05, 21.96it/s, loss=5.1191]


After 200064 examples, Average Loss: 5.1565



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:37, 35.94it/s][A
                                                            [A

Validation Average Loss: 5.0319, Perplexity: 153.22
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's mother , who was a key of the most of the most of the most of the most of the most of the most of the most of the most of the most of the most of the most of the most of the most of the most

Context: New York

Generated text: New York City Council , who was a second-time . 's a lot of people . 's a lot of people . '' he said . 's a lot of people . 's a lot of . '' . 's a time . '' said

Context: The hurricane

Generated text: The hurricane was a free kick in the #### . ' I 'm not going to be a bit of the right . '' . 's a time . '' said the . 's the time . '' said the . 's the time . '' said the


Epoch 1/1:   3%|▎         | 4691/172148 [03:44<8:40:39,  5.36it/s, loss=5.0359]


Context: The President

Generated text: The President of the UK , the ##-year-old was a `` <rare> '' . 's the right . '' he said . 's a lot of people . 's a lot of . '' . 's a time . '' said the


Epoch 1/1:   4%|▎         | 6250/172148 [04:58<2:08:52, 21.46it/s, loss=4.9857]


After 200064 examples, Average Loss: 5.0092



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:39, 35.34it/s][A
                                                            [A

Validation Average Loss: 4.9154, Perplexity: 136.38
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's mother , who was a ##-year-old . ' I 'm not going to be a lot of people . '' said the . 's . '' said the ##-year-old was a `` <rare> ' , and

Context: New York

Generated text: New York , the ##-year-old was a `` <rare> ' , and the family 's mother , who was a ##-year-old . ' I 'm not going to be a lot of people . '' said the . 's

Context: The hurricane

Generated text: The hurricane was a very good thing . ' '' he said . 's the time . 's the time . 's the time . 's the time . 's the time . 's the time . 's the time . 's the time


Epoch 1/1:   4%|▎         | 6256/172148 [05:00<6:43:07,  6.86it/s, loss=5.0330]


Context: The President

Generated text: The President Barack Obama said the ##-year-old was a `` <rare> ' , '' said the former <rare> , said : ' I 'm not going to be a lot of people . '' said the . 's . ''


Epoch 1/1:   5%|▍         | 7813/172148 [06:13<2:08:31, 21.31it/s, loss=4.8549]


After 200064 examples, Average Loss: 4.9085



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:40, 34.83it/s][A
                                                            [A

Validation Average Loss: 4.8310, Perplexity: 125.34
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's first-degree murder , was a `` <rare> '' . 's the end of the day . '' said . '' he said . 's the time . '' said . 's ) . '' he said . 's the

Context: New York

Generated text: New York City Council , the former presidential candidate , who was a `` <rare> '' . 's the end of the day . '' said . '' he said . 's the time . '' said . 's ) . '' he said . '

Context: The hurricane

Generated text: The hurricane was a very good way to the . 's ] . '' he said . 's the time . '' said . 's ) . '' he said . 's the time . '' said . 's ) . '' he said . 's


Epoch 1/1:   5%|▍         | 7819/172148 [06:14<6:35:35,  6.92it/s, loss=4.7741]


Context: The President

Generated text: The President Barack Obama has been a `` to be a good way to get a lot of people who are not going to be a good way . '' . 's . '' said the . 's ] . '' he said . 's the time


Epoch 1/1:   5%|▌         | 9375/172148 [07:27<2:03:30, 21.97it/s, loss=4.8292]


After 200064 examples, Average Loss: 4.8369



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:39, 35.12it/s][A
                                                            [A

Validation Average Loss: 4.7661, Perplexity: 117.47
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's first-team-class-year-old was also charged with murdering a few months ago . 's a . ' '' he said . 's the time . 's ] . 's a . '' . 's ] .

Context: New York

Generated text: New York City Council said the ##-year-old was found in the first half of the first time . 's a . ' '' he said . 's the time . 's ] . 's a . '' . 's ] . 's

Context: The hurricane

Generated text: The hurricane of the <rare> , the world 's most important thing you can be able to be able to be able to get out of the world . '' said . 's the right . 's ] . 's a . '' . 's


Epoch 1/1:   5%|▌         | 9381/172148 [07:29<6:28:31,  6.98it/s, loss=4.7916]


Context: The President

Generated text: The President of the National Assembly of the National Institute of Defence , which is also known as a result of the first time . 's a . ' '' he said . 's the time . 's ] . 's a . '' . 's


Epoch 1/1:   6%|▋         | 10938/172148 [08:42<2:10:36, 20.57it/s, loss=4.7031]


After 200064 examples, Average Loss: 4.7787



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:44, 33.54it/s][A
                                                            [A

Validation Average Loss: 4.7127, Perplexity: 111.36
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow is the first time in the first time in the ####s , which is the first time in the ####s , which is the first time in the ####s . 's the . '' . 's the . '' . 's the . ''

Context: New York

Generated text: New York City Council said the ##-year-old was a `` serious condition '' . 's the . '' . 's the . '' . 's the . '' . 's the . '' . 's the . '' . 's the .

Context: The hurricane

Generated text: The hurricane is a very good way to get the . '' . 's the . '' . 's the . '' . 's the . '' . 's the . '' . 's the . '' . 's the . '' . 's the


Epoch 1/1:   6%|▋         | 10943/172148 [08:43<7:10:08,  6.25it/s, loss=4.6813]


Context: The President

Generated text: The President Barack Obama said the ##-year-old was a `` serious condition '' . 's the . '' . 's the . '' . 's the . '' . 's the . '' . 's the . '' . 's


Epoch 1/1:   7%|▋         | 12502/172148 [09:57<2:07:10, 20.92it/s, loss=4.8597]


After 200064 examples, Average Loss: 4.7274



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:40, 34.83it/s][A
                                                            [A

Validation Average Loss: 4.6745, Perplexity: 107.18
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's official said the report is not the case . '' . 's ] . ' '' he said . '' said . 's the . '' . 's Office . '' said . 's the . '' . 's Office . '' said

Context: New York

Generated text: New York City Council said the ##-year-old was a `` unprecedented '' of the country 's most-elected . '' . 's ] . ' '' he said . '' said . 's the . '' . 's Office

Context: The hurricane

Generated text: The hurricane , which is the most popularity of the world 's most of the world . '' . 's ] . ' '' he said . '' said . 's the . '' . 's Office . '' said . 's the . '' .


Epoch 1/1:   7%|▋         | 12508/172148 [09:58<6:26:32,  6.88it/s, loss=4.7940]


Context: The President

Generated text: The President of the city of the city of the city of the city of the city of the city of the city of the city of the city . '' . 's ] . ' '' he said . '' said . 's the . '' . 's


Epoch 1/1:   8%|▊         | 14066/172148 [11:11<2:00:49, 21.81it/s, loss=4.7040]


After 200064 examples, Average Loss: 4.6893



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:46, 33.10it/s][A
                                                            [A

Validation Average Loss: 4.6457, Perplexity: 104.14
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's most important thing is the most important thing that is the most important thing . '' ' said the . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he

Context: New York

Generated text: New York City Council , who has been charged with murdering the death of the ##-year-old son , who was born in the ####s . ' '' he said . ' '' he said . ' '' he said . ' '' he said . '

Context: The hurricane

Generated text: The hurricane of the <rare> , the U.S. military has been linked to the U.S. military . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' ''


Epoch 1/1:   8%|▊         | 14069/172148 [11:13<8:19:29,  5.27it/s, loss=4.7456]


Context: The President

Generated text: The President 's death was not the first time to be a `` <rare> '' . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he


Epoch 1/1:   9%|▉         | 15627/172148 [12:26<2:02:56, 21.22it/s, loss=4.5647]


After 200064 examples, Average Loss: 4.6595



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:38, 35.50it/s][A
                                                            [A

Validation Average Loss: 4.6143, Perplexity: 100.92
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's most popular director , who is the most popular in the world , and the other side of the world . '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he

Context: New York

Generated text: New York City Council said the ##-year-old was a `` unfortunate decision . '' ' said the company said . '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' ''

Context: The hurricane

Generated text: The hurricane was found in the area of the city 's most popular in the world . '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said .


Epoch 1/1:   9%|▉         | 15633/172148 [12:27<6:15:44,  6.94it/s, loss=4.5966]


Context: The President

Generated text: The President of the U.S. government has been accused of the death of the ##-year-old . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said


Epoch 1/1:  10%|▉         | 17192/172148 [13:41<2:07:52, 20.20it/s, loss=4.6015]


After 200064 examples, Average Loss: 4.6339



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:44, 33.50it/s][A
                                                            [A

Validation Average Loss: 4.5910, Perplexity: 98.59
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's largest city of the country , and the U.S. military spokesman said . 'S ] . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . '

Context: New York

Generated text: New Yorkers have been a very good way to get the way . '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said

Context: The hurricane

Generated text: The hurricane is a big-day location . '' . ' ] is a good job . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said


Epoch 1/1:  10%|▉         | 17195/172148 [13:43<8:19:29,  5.17it/s, loss=4.5845]


Context: The President

Generated text: The President 's newborn was a `` unacceptable '' . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he


Epoch 1/1:  11%|█         | 18755/172148 [14:55<1:58:46, 21.53it/s, loss=4.5637]


After 200064 examples, Average Loss: 4.6104



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:43, 34.05it/s][A
                                                            [A

Validation Average Loss: 4.5709, Perplexity: 96.63
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's first-choice-born man was arrested in #### , was found in the ####s , and the city of the city of the city was in the middle of the country . '' . ' '' he said . ' '' he said . '

Context: New York

Generated text: New York City Council said the government was `` not to be a good thing . '' . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said

Context: The hurricane

Generated text: The hurricane is the first time in the UK , and the UK is the first time in the UK . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . '


Epoch 1/1:  11%|█         | 18758/172148 [14:57<8:00:22,  5.32it/s, loss=4.5959]


Context: The President

Generated text: The President 's office said the government was `` not to be a good thing . '' . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he


Epoch 1/1:  12%|█▏        | 20317/172148 [16:10<1:57:11, 21.59it/s, loss=4.5015]


After 200064 examples, Average Loss: 4.5863



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:36, 36.35it/s][A
                                                            [A

Validation Average Loss: 4.5529, Perplexity: 94.91
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's deaths were not immediately clear . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . '

Context: New York

Generated text: New York City Council , who is the most important thing that is the most important thing that is the most important thing . '' 's ] . '' 's ] . ' '' he said . ' '' he said . ' '' he said . ' '' he

Context: The hurricane

Generated text: The hurricane of the most common sense of the most important thing . '' 's ] , '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said .


Epoch 1/1:  12%|█▏        | 20323/172148 [16:12<6:05:39,  6.92it/s, loss=4.5711]


Context: The President

Generated text: The President 's death was a `` significant problem '' . '' 's ] , ' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . '


Epoch 1/1:  13%|█▎        | 21880/172148 [17:25<1:59:05, 21.03it/s, loss=4.6180]


After 200064 examples, Average Loss: 4.5733



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:38, 35.59it/s][A
                                                            [A

Validation Average Loss: 4.5302, Perplexity: 92.78
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's most popularity of the world 's largest , and the country 's most importantly , the country 's most important thing . '' ) . '' ) . '' . ' '' he said . '' ) . '' said . '' )

Context: New York

Generated text: New York City Council said the couple had been a `` <rare> '' . '' he said . '' ) . '' said . '' ) . '' said . '' ) . '' said . '' ) . '' said . '' ) . '' said . '' )

Context: The hurricane

Generated text: The hurricane is the first time in the UK . '' the report said . '' ) . '' . ' '' he said . '' ) . '' said . '' ) . '' said . '' ) . '' said . '' ) . '' said . '' ) .


Epoch 1/1:  13%|█▎        | 21886/172148 [17:27<6:06:53,  6.83it/s, loss=4.5460]


Context: The President

Generated text: The President of the U.S. military has been in the UK and the UK 's most important thing . '' ) . '' . '' ) . '' said . '' ) . '' said . '' ) . '' said . '' ) . '' said .


Epoch 1/1:  14%|█▎        | 23443/172148 [18:41<1:55:11, 21.52it/s, loss=4.5115]


After 200064 examples, Average Loss: 4.5587



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:44, 33.69it/s][A
                                                            [A

Validation Average Loss: 4.5235, Perplexity: 92.16
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's government has said the government 's decision to be `` a very serious '' . '' said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said

Context: New York

Generated text: New York City Council has been a major role in the public that the government has been `` not to be able to do . '' . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said .

Context: The hurricane

Generated text: The hurricane was found in the area . 's the club . '' ) . '' said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he


Epoch 1/1:  14%|█▎        | 23448/172148 [18:42<6:35:59,  6.26it/s, loss=4.4742]


Context: The President

Generated text: The President 's office said the government 's decision to be `` a very serious '' . '' said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said .


Epoch 1/1:  15%|█▍        | 25005/172148 [19:56<1:55:07, 21.30it/s, loss=4.5187]


After 200064 examples, Average Loss: 4.5378



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:43, 34.07it/s][A
                                                            [A

Validation Average Loss: 4.5127, Perplexity: 91.16
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been criticised by the U.S. District Court in #### . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said .

Context: New York

Generated text: New York City Council said the ##-year-old was arrested on the scene of the incident . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' ''

Context: The hurricane

Generated text: The hurricane of the city 's eastern town of the city of the city of the city of the city . '' ) said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . '


Epoch 1/1:  15%|█▍        | 25011/172148 [19:57<6:00:35,  6.80it/s, loss=4.5142]


Context: The President

Generated text: The President was the first time of the attack . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' ''


Epoch 1/1:  15%|█▌        | 26570/172148 [21:11<1:55:19, 21.04it/s, loss=4.6281]


After 200064 examples, Average Loss: 4.5264



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:45, 33.27it/s][A
                                                            [A

Validation Average Loss: 4.4918, Perplexity: 89.29
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been a major role in the first time in the ####s . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . '

Context: New York

Generated text: New York City : The ##-year-old man who was a ##-year-old girl who was a ##-year-old girl who was a member of the family . ' '' he said . ' '' he said . ' '' he said .

Context: The hurricane

Generated text: The hurricane of the city 's body was found in the area . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he


Epoch 1/1:  15%|█▌        | 26573/172148 [21:13<7:39:31,  5.28it/s, loss=4.5934]


Context: The President

Generated text: The President 's office said the government would be a `` unacceptable '' . '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . '


Epoch 1/1:  16%|█▋        | 28132/172148 [22:27<1:51:27, 21.53it/s, loss=4.4456]


After 200064 examples, Average Loss: 4.5144



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:39, 35.42it/s][A
                                                            [A

Validation Average Loss: 4.4849, Perplexity: 88.67
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been in the country 's most popular destination for the United States , and the United States , the United States , said : ‘ The incident was a very good person . '' said . ' '' he said . ' '' he said . ' ''

Context: New York

Generated text: New York City Council , who is the first time in the United States , said : ' I 'm not going to be a good player . '' said . ' '' he said . ' '' he said . ' '' he said . ' '' he said .

Context: The hurricane

Generated text: The hurricane of the country is the most popular in the world , and the world 's largest . 'S . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said


Epoch 1/1:  16%|█▋        | 28138/172148 [22:29<5:46:48,  6.92it/s, loss=4.3693]


Context: The President

Generated text: The President 's office said the government has been `` not to be a lot of people to do . '' . ' '' said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said .


Epoch 1/1:  17%|█▋        | 29694/172148 [23:42<1:47:42, 22.04it/s, loss=4.2938]


After 200064 examples, Average Loss: 4.5016



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:37, 35.94it/s][A
                                                            [A

Validation Average Loss: 4.4723, Perplexity: 87.56
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's death , the report said . 'Donnell said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . '

Context: New York

Generated text: New York City Council , who has been charged with the murder of the incident . ' '' said the .##m . 'F . ' '' said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . '

Context: The hurricane

Generated text: The hurricane was found in the water and thefts . 'Fotland is the first time . '' said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he


Epoch 1/1:  17%|█▋        | 29700/172148 [23:43<5:36:40,  7.05it/s, loss=4.5202]


Context: The President

Generated text: The President of the U.S. military has been killed in the attack . ' '' said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' '' he said . ' ''


Epoch 1/1:  18%|█▊        | 31257/172148 [24:56<1:51:02, 21.15it/s, loss=4.5722]


After 200064 examples, Average Loss: 4.4976



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:40, 34.84it/s][A
                                                            [A

Validation Average Loss: 4.4633, Perplexity: 86.77
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been in the UK and the UK 's most recent ##-year-old is the first time in the Premier League . ' '' he said . ' '' he said . ' '' she said . ' '' she said . ' '' she said

Context: New York

Generated text: New York City 's ##-year-old son was arrested in the attack , and the ##-year-old was a ##-year-old girl who was a ##-year-old girl who was a ##-year-old girl who was

Context: The hurricane

Generated text: The hurricane of the ##-year-old was a ##-year-old girl who was a ##-year-old girl who was a ##-year-old girl who was a ##-year-old girl who was a ##-year-old


Epoch 1/1:  18%|█▊        | 31263/172148 [24:57<5:40:30,  6.90it/s, loss=4.4452]


Context: The President

Generated text: The President of the U.S. military has been in the UK and the UK 's most recent ##-year-old has been in the UK . ' '' he said . ' '' he said . ' '' she said . ' '' she said .


Epoch 1/1:  19%|█▉        | 32820/172148 [26:10<1:48:03, 21.49it/s, loss=4.3727]


After 200064 examples, Average Loss: 4.4855



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:37, 35.92it/s][A
                                                            [A

Validation Average Loss: 4.4544, Perplexity: 86.01
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's government has been criticized by the government to the U.S. Embassy in #### . ' '' said the .### . ' '' he said . ' '' said the .### . ' '' he said . ' ''

Context: New York

Generated text: New York City , ## , was found in the area . 's the .### . ' '' he said . ' '' said the .### . ' '' he said . ' '' said the .### . ' '' he said . ' ''

Context: The hurricane

Generated text: The hurricane is the most popular destination for the first time , and the most popular players have been in the world . '' 's first lady . ' '' he said . ' '' said the .### . ' '' he said . ' '' said the .


Epoch 1/1:  19%|█▉        | 32826/172148 [26:12<5:38:56,  6.85it/s, loss=4.4859]


Context: The President

Generated text: The President 's newborns , the U.S. Embassy in #### , and the U.S. Embassy in #### . 's the . ' '' he said . ' '' said the .### . ' '' he said


Epoch 1/1:  20%|█▉        | 34384/172148 [27:25<1:45:38, 21.74it/s, loss=4.5607]


After 200064 examples, Average Loss: 4.4781



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:37, 35.90it/s][A
                                                            [A

Validation Average Loss: 4.4410, Perplexity: 84.86
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been charged with murder and murdered the death of the death of the death of the .### . ' '' said the .##.# . ' '' said the .### . ' '' said the statement . ' '' said the .

Context: New York

Generated text: New York City : The ##-year-old was shot in the attacking half . '## . ' '' said the .##m . ' '' said the statement . ' '' said the .### . ' '' said the statement . ' '' said

Context: The hurricane

Generated text: The hurricane of the ##-year-old was found in the area of the city of the city of the city . 's . '' ) said . ' '' said the .### . ' '' said the statement . ' '' said the .###


Epoch 1/1:  20%|█▉        | 34390/172148 [27:27<5:23:44,  7.09it/s, loss=4.2880]


Context: The President

Generated text: The President of the U.S. Embassy in the UK , the U.S. Embassy in the UK , the U.S. Embassy in #### , the ##-year-old , who has been charged with murder and


Epoch 1/1:  21%|██        | 35946/172148 [28:40<1:43:59, 21.83it/s, loss=4.4307]


After 200064 examples, Average Loss: 4.4688



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:37, 35.91it/s][A
                                                            [A

Validation Average Loss: 4.4416, Perplexity: 84.91
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's office said the law is not the case . ' '' said the .### , '' said the company . ' '' said the company . ' '' said the company . ' '' said the company . ' '' said the company . ' ''

Context: New York

Generated text: New York City : The ##-year-old woman was arrested in the attack , which was found dead in the city of the city . ' '' said the .### , '' said the U.S. . ' '' said the .### ,

Context: The hurricane

Generated text: The hurricane of the ship 's body was found in the water . '## . '' 's not . ' '' said the .### , '' said the .### , '' said the company . ' '' said the company . ' '' said the


Epoch 1/1:  21%|██        | 35952/172148 [28:42<5:27:52,  6.92it/s, loss=4.4555]


Context: The President

Generated text: The President 's ##-year-old was a member of the U.S. Embassy in the city of the city . ' '' said the .### , '' said the U.S. . ' '' said the company . ' ''


Epoch 1/1:  22%|██▏       | 37509/172148 [29:55<1:47:29, 20.88it/s, loss=4.3672]


After 200064 examples, Average Loss: 4.4635



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:38, 35.69it/s][A
                                                            [A

Validation Average Loss: 4.4315, Perplexity: 84.06
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's first lady , who was a ##-year-old daughter , said : ' I 'm not going to be a good friend . ' '' said the .### . ' '' said the .### . ' '' said the .

Context: New York

Generated text: New York City Council , who is the first of the largest city of the city , and the UK is the largest of the largest . 'S . ' '' said the report . ' I 'm not going to be a good friend . ' '' said the

Context: The hurricane

Generated text: The hurricane of the disease is the only way to the . ' '' said the company . ' '' said the .### . ' '' said the . ' I 'm not going to be a good friend . ' '' said the .### . '


Epoch 1/1:  22%|██▏       | 37515/172148 [29:57<5:29:20,  6.81it/s, loss=4.5374]


Context: The President

Generated text: The President 's name was the first time that the government has been a `` significant '' of the `` <rare> '' of the <rare> . ' '' said the company said . ' I 'm not going to be a good job . ' ''


Epoch 1/1:  23%|██▎       | 39074/172148 [31:11<1:42:32, 21.63it/s, loss=4.4933]


After 200064 examples, Average Loss: 4.4565



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:38, 35.74it/s][A
                                                            [A

Validation Average Loss: 4.5218, Perplexity: 92.00
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been charged with murder and the death of the death of the death of the death of the death of the death of the death of the death of the death of the death of the death of the death of the death of the death of the death

Context: New York

Generated text: New York 's most recent interview with the U.S. military in the UK and the U.S. military in the United States , where the death of the death of the death of the death of the .### , and the victim of a

Context: The hurricane

Generated text: The hurricane was found in the area , which was found in the area of the area . ' '' said the .### , '' he said . ' I 'm not going to be a very difficult time to do it . '' 's a very good


Epoch 1/1:  23%|██▎       | 39077/172148 [31:13<6:55:37,  5.34it/s, loss=4.5660]


Context: The President

Generated text: The President 's office has been released in the case of the case , but the police were not immediately returned to the hospital . ' ' I 'm not going to be a very difficult time to do it . '' 's a very good friend . '


Epoch 1/1:  24%|██▎       | 40635/172148 [32:27<1:42:47, 21.32it/s, loss=4.3749]


After 200064 examples, Average Loss: 4.4505



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:38, 35.54it/s][A
                                                            [A

Validation Average Loss: 4.4200, Perplexity: 83.10
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's office said the government would be `` a very difficult time '' . ' '' said the . ' I 'm not going to do . '' 's said . ' I 'm not going to do . ' '' said the .###

Context: New York

Generated text: New York City : The ##-year-old was arrested on suspicion of murdering a woman who was arrested on suspicion of murdering a woman who was arrested on suspicion of murdering a woman who was arrested on suspicion of murdering a

Context: The hurricane

Generated text: The hurricane in the UK are the largest city of the city . '' ' said the . 'S . ' '' said . ' I 'm not going to do . ' '' said the .### . ' '' said the statement . ' I 'm


Epoch 1/1:  24%|██▎       | 40641/172148 [32:29<5:17:16,  6.91it/s, loss=4.2400]


Context: The President

Generated text: The President was the first time of the attack . ' '' said the .### . ' '' said the statement . ' '' said the statement . ' I 'm not going to do . ' '' said the .### . ' '' said the statement


Epoch 1/1:  25%|██▍       | 42198/172148 [33:42<1:44:43, 20.68it/s, loss=4.3259]


After 200064 examples, Average Loss: 4.4474



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:40, 34.85it/s][A
                                                            [A

Validation Average Loss: 4.4173, Perplexity: 82.87
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been a member of the House of Commons in #### . ' I 'm not going to do . '' ' said the .##am . ' '' said . ' I 'm not going to do . '' ' said the .##am .

Context: New York

Generated text: New York Times : ' I 'm not going to be a good job . '' 's a .### . ' '' said . ' I 'm not going to do . '' ' said the .##am . ' '' said . ' I '

Context: The hurricane

Generated text: The hurricane , the ##-year-old , who was arrested in #### , was arrested in #### . ' I 'm not going to do . ' '' said the .##am . ' '' said . ' I 'm not going to do . ''


Epoch 1/1:  25%|██▍       | 42204/172148 [33:44<5:16:27,  6.84it/s, loss=4.5252]


Context: The President

Generated text: The President 's most recent cases are not the first time in the UK . ' I 'm not going to do . '' ' said the .##am . ' '' said . ' I 'm not going to do . '' ' said the .##


Epoch 1/1:  25%|██▌       | 43761/172148 [34:58<1:41:08, 21.16it/s, loss=4.5171]


After 200064 examples, Average Loss: 4.4414



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:41, 34.43it/s][A
                                                            [A

Validation Average Loss: 4.4110, Perplexity: 82.35
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been a very good idea . '' 's said . ' I 'm not going to be a very good job . '' 's said . ' I 'm not going to be a very good job . '' 's said . ' I

Context: New York

Generated text: New York City Council , who is a professor of the U.S. government to be able to provide a new deal with the government 's government to be able to make a new deal with the government . ' '' said the company . ' I 'm

Context: The hurricane

Generated text: The hurricane of the disease is a very rare condition , and it 's not a problem . '' 's said . ' I 'm not going to be a very good job . '' 's said . ' I 'm not going to be a very


Epoch 1/1:  25%|██▌       | 43767/172148 [35:00<5:16:14,  6.77it/s, loss=4.3998]


Context: The President

Generated text: The President 's family have been charged with a child in the case . ' I 'm not going to be a very good job . '' 's said . ' I 'm not going to be a very good job . '' 's said . '


Epoch 1/1:  26%|██▋       | 45325/172148 [36:13<1:37:56, 21.58it/s, loss=4.3642]


After 200064 examples, Average Loss: 4.4341



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:46, 32.88it/s][A
                                                            [A

Validation Average Loss: 4.3983, Perplexity: 81.31
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's first minister , who has been in the United States , which is the first time in the United States . 's . '' 's a .### . 'D# . ' '' said the . ' I 'm not going to

Context: New York

Generated text: New York City 's first-round tie was a second goal in the second half . ' I 'm not going to be a good job . '' 's a good job . ' '' said the .### . ' I 'm not going to

Context: The hurricane

Generated text: The hurricane is a very small number of people who have been killed . 's the first time . '' 's a .### . 'Dowell said . ' I 'm not going to be a good thing . '' 's a .##


Epoch 1/1:  26%|██▋       | 45331/172148 [36:15<5:07:46,  6.87it/s, loss=4.5039]


Context: The President

Generated text: The President 's new lawyers have been in the case , which is the first time in the United States . 's the .### . 'S.# . ) . '' 's a . ' I think it 's a good thing


Epoch 1/1:  27%|██▋       | 46887/172148 [37:28<1:37:41, 21.37it/s, loss=4.5328]


After 200064 examples, Average Loss: 4.4303



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:38, 35.50it/s][A
                                                            [A

Validation Average Loss: 4.4014, Perplexity: 81.56
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's government has been a member of the state 's government , which is the most important thing to do with the . '' . ' '' said . ' I 'm not going to do . '' 's said . ' I 'm not

Context: New York

Generated text: New York City Council , who has been a member of the House of Commons in #### . ' '' said the .### . ' '' said the statement . ' '' said the .##am . ' '' said . ' I 'm not going to do

Context: The hurricane

Generated text: The hurricane is a huge amount of time . '' 's a . ' '' said . ' I 'm not going to do . '' 's said . ' I 'm not going to do . '' 's said . ' I 'm not going


Epoch 1/1:  27%|██▋       | 46893/172148 [37:29<5:05:41,  6.83it/s, loss=4.4631]


Context: The President

Generated text: The President 's office said the government has been a `` unacceptable '' of the `` <rare> '' of the `` <rare> '' of the `` <rare> '' of the `` <rare> '' of the `` <rare> '' of


Epoch 1/1:  28%|██▊       | 48450/172148 [38:42<1:34:24, 21.84it/s, loss=4.4368]


After 200064 examples, Average Loss: 4.4254



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 3/3514 [00:00<02:00, 29.16it/s][A
Evaluating:   0%|          | 7/3514 [00:00<01:54, 30.75it/s][A
                                                            [A

Validation Average Loss: 4.3907, Perplexity: 80.69
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow , who has been charged with the death of the U.S. military . ' '' said the .##m . ' '' said the .##am . ' '' said the .##am . ' '' said the .##am . ' ''

Context: New York

Generated text: New York City , who was a member of the National Institute of Technology , and the U.S. military . '' 's said . ' I 'm not going to be a very good job . ' '' said the .##am . ' '' said

Context: The hurricane

Generated text: The hurricane of the plane was found in the area . 's the .### . 'S. . ' '' said the .##m . ' '' said the .##m . ' '' said the .##am . ' '' said the .##


Epoch 1/1:  28%|██▊       | 48456/172148 [38:44<5:03:12,  6.80it/s, loss=4.3742]


Context: The President

Generated text: The President 's office said the government would be `` a very serious '' of the case . ' '' said the .##am . ' '' said the .##am . ' '' said the .##am . ' '' said the .##am . '


Epoch 1/1:  29%|██▉       | 50015/172148 [39:58<1:35:45, 21.26it/s, loss=4.3985]


After 200064 examples, Average Loss: 4.4214



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:43, 34.05it/s][A
                                                            [A

Validation Average Loss: 4.3863, Perplexity: 80.35
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been accused of the murder of the ##-year-old , who was arrested on suspicion of murder and was arrested on suspicion of murder . ' ' I 'm not going to do . ' '' said the .### . '

Context: New York

Generated text: New York City Council , who has been accused of the murder of the ##-year-old , who was arrested on suspicion of murder and was arrested on suspicion of murder . ' ' I 'm not going to do . ' '' said the .

Context: The hurricane

Generated text: The hurricane is also known to be a popularity of the world 's most popular destination for the world . '' 's said . '' 's a great-grandmother . ' ) . '' 's a great man . ' '' said the .


Epoch 1/1:  29%|██▉       | 50020/172148 [39:59<5:26:24,  6.24it/s, loss=4.3058]


Context: The President

Generated text: The President 's office has been accused of being a member of the country 's most recent history . '' . 'Fort Office said . '' 's a very good thing . ' '' said . ' I 'm not going to do . ' ''


Epoch 1/1:  30%|██▉       | 51576/172148 [41:14<1:33:37, 21.46it/s, loss=4.4827]


After 200064 examples, Average Loss: 4.4175



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:41, 34.51it/s][A
                                                            [A

Validation Average Loss: 4.3941, Perplexity: 80.98
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been a member of the country 's most recent years . '' 's said . ' '' said the . ' I 'm not going to do . '' 's said . ' I 'm not going to be a . ' I '

Context: New York

Generated text: New York City Council , the chief executive of the U.S. Army , said : 'The police are not going to be a very good guy . '' 's said . ' I 'm not going to be a . ' I 'm not

Context: The hurricane

Generated text: The hurricane of the attack is the largest of the world , and the country is the most common . '' . ' '' said . ' I 'm not going to be a . '' . ' '' said . ' I 'm not going to be a .


Epoch 1/1:  30%|██▉       | 51582/172148 [41:16<4:51:14,  6.90it/s, loss=4.5257]


Context: The President

Generated text: The President 's office has been charged with the murder of the . 'Fort Office said . ' I 'm not going to be a . '' . ' '' said . ' I 'm not going to be a . ' I 'm not going


Epoch 1/1:  31%|███       | 53139/172148 [42:29<1:35:26, 20.78it/s, loss=4.4079]


After 200064 examples, Average Loss: 4.4105



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:43, 33.81it/s][A
                                                            [A

Validation Average Loss: 4.3859, Perplexity: 80.31
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been a member of the country 's government . ' '' said the company . 'Fox . ' '' said the .### . 'Fort . '' 's said . ' I 'm not going to do . ' '' said

Context: New York

Generated text: New York City manager , who has been charged with murder and attempted murder . ' ' I 'm not going to do . ' '' said the .### . 'Fort . '' 's said . ' I 'm not going to do . '

Context: The hurricane

Generated text: The hurricane is to be a great place in the world . '' 's said . ' '' said the .### . 'Fort . '' 's said . ' I 'm not going to do . ' '' said the .### . '


Epoch 1/1:  31%|███       | 53145/172148 [42:30<4:49:53,  6.84it/s, loss=4.3846]


Context: The President

Generated text: The President 's office has been charged with murder . ' '' said the .### . 'Fort . '' 's said . ' I 'm not going to do . ' '' said the .### . 'Fort . '' 's


Epoch 1/1:  32%|███▏      | 54704/172148 [43:44<1:32:59, 21.05it/s, loss=4.4164]


After 200064 examples, Average Loss: 4.4119



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:40, 34.83it/s][A
                                                            [A

Validation Average Loss: 4.3771, Perplexity: 79.61
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been accused of the death penalty . ' '' said the .### . ' '' said the company . 's . ' '' said the company . 's . ' '' said the company . 's . ' '' said the company . '

Context: New York

Generated text: New York City Council , who has been charged with murdering a child , and the family said . ' I 'm not going to be able to get the best friend . '' 's a .### . ' '' said the company . 's .

Context: The hurricane

Generated text: The hurricane was found in the area , which was found in the area , which was found in the area , which was found in the area , which was found in the area . 's the .### . ' '' said the company . 's .


Epoch 1/1:  32%|███▏      | 54707/172148 [43:45<6:07:57,  5.32it/s, loss=4.5318]


Context: The President

Generated text: The President 's decision to be a `` tough '' of the president 's decision to be a good job . '' 's not . ' '' said the .### . ' '' said the .### . ' '' said the company . '


Epoch 1/1:  33%|███▎      | 56265/172148 [44:59<1:30:36, 21.32it/s, loss=4.3546]


After 200064 examples, Average Loss: 4.4035



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:40, 34.89it/s][A
                                                            [A

Validation Average Loss: 4.3719, Perplexity: 79.19
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been charged with murder and a ##-year-old man was shot dead . ' I was . ' I 'm not going to be a very good job . ' '' said the .### . 'S . ' '' said . '

Context: New York

Generated text: New York City : The ##-year-old was arrested in connection with the death of the death of the .### . 'S. . ' '' said the company said . ' I 'm not going to be a very good job . ' ''

Context: The hurricane

Generated text: The hurricane is a very difficult time for the future . '' 's official said . ' I 'm not going to be a very good job . ' '' said the .### . 'S . ' '' said . ' I 'm not going to


Epoch 1/1:  33%|███▎      | 56271/172148 [45:00<4:41:57,  6.85it/s, loss=4.4099]


Context: The President

Generated text: The President 's office has been charged with murder and a ##-year-old man was shot dead . ' I was . ' I 'm not going to be a very good job . ' '' said the .### . 'S . ' ''


Epoch 1/1:  34%|███▎      | 57829/172148 [46:14<1:30:58, 20.94it/s, loss=4.3800]


After 200064 examples, Average Loss: 4.4031



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:39, 35.38it/s][A
                                                            [A

Validation Average Loss: 4.3674, Perplexity: 78.84
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been a major role in the attacking half . ' '' said the .### . '' #### #### . ) . '' said . '' 's a .### . '' said . '' 'Suchy said . '' 's a

Context: New York

Generated text: New York City Council , who has been in the UK , and the United States has been in the country . ' '' said the .### . '' #### #### . ) . '' said . '' 's a .### . '' said . '' '

Context: The hurricane

Generated text: The hurricane is a major problem , but the government has been in the country . '' 's a . '' said . '' said . '' 's a .### . '' said . '' 'Suchy said . '' 's a . '' said


Epoch 1/1:  34%|███▎      | 57835/172148 [46:15<4:37:02,  6.88it/s, loss=4.3439]


Context: The President

Generated text: The President 's lawyers have been in the country 's largest city in the city of the city of the city . 'Fort . '' 's most . '' said the .### . '' #### #### . ) . '' said . ''


Epoch 1/1:  35%|███▍      | 59392/172148 [47:28<1:26:57, 21.61it/s, loss=4.4734]


After 200064 examples, Average Loss: 4.4007



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:39, 35.36it/s][A
                                                            [A

Validation Average Loss: 4.3672, Perplexity: 78.82
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's ##-year-old , who was a ##-year-old son of a ##-year-old girl , who was a ##-year-old son of a woman who was a ##-year-old son of a woman

Context: New York

Generated text: New York City Council , who has been charged with murder and a police officer who was a `` tough '' of the `` unacceptable '' of the `` unacceptable '' of the `` unacceptable '' of the `` American '' . '' . '

Context: The hurricane

Generated text: The hurricane is expected to be a third-party agreement with the United States . '' 's said . 'Fort . '' said the company . 's . '' said the company . 's . '' said the company . 's . '' said the


Epoch 1/1:  35%|███▍      | 59398/172148 [47:30<4:25:20,  7.08it/s, loss=4.3757]


Context: The President

Generated text: The President 's office has been charged with a ##-year-old , who was a `` tougher '' to be a `` tough '' of the `` unacceptable '' of the `` American '' . '' . 'Fort . '' said


Epoch 1/1:  35%|███▌      | 60955/172148 [48:42<1:27:15, 21.24it/s, loss=4.3208]


After 200064 examples, Average Loss: 4.3974



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:36, 36.24it/s][A
                                                            [A

Validation Average Loss: 4.3711, Perplexity: 79.13
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow is not the first time in the UK . ' '' said the .### . ' '' said the .### . ' '' said the .### . ' '' said the .### . ' '' said the .### . '

Context: New York

Generated text: New York City : The ##-year-old was arrested on suspicion of murder and was arrested . ' I was a very good friend . '' 's said . '' said the .##pm . ' '' said . '' said . '' said the .

Context: The hurricane

Generated text: The hurricane is the first time in the world . '' 's said . '' said the .### . ' '' said the .### . ' '' said the .### . ' '' said the .### . ' '' said the .##


Epoch 1/1:  35%|███▌      | 60961/172148 [48:43<4:23:54,  7.02it/s, loss=4.4582]


Context: The President

Generated text: The President 's lawyers are not the first time in the UK . ' '' said the .### . ' '' said the .### . ' '' said the .### . ' '' said the .### . ' '' said the


Epoch 1/1:  36%|███▋      | 62518/172148 [49:56<1:23:07, 21.98it/s, loss=4.5049]


After 200064 examples, Average Loss: 4.3918



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:38, 35.53it/s][A
                                                            [A

Validation Average Loss: 4.3683, Perplexity: 78.91
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been a `` significant step '' for the first time in the case . ' '' said the .### . '' said . ' I 'm not going to do . '' 's home . ' '' said the .### . ' ''

Context: New York

Generated text: New York City 's first-half-year-old was arrested in the attack , and the ##-year-old was arrested in the attack . ' I 'm not going to be a good job . '' 's home . ' '' said the

Context: The hurricane

Generated text: The hurricane is the largest , the largest city of the world . '' 's official said . '' said the .### . '' . 'S. . '' . 'S. . ' '' said . ' I 'm not going to do . ''


Epoch 1/1:  36%|███▋      | 62524/172148 [49:58<4:21:02,  7.00it/s, loss=4.2597]


Context: The President

Generated text: The President 's lawyers said the government has been `` not the most important thing '' . ' '' said the .### . '' said . ' I 'm not going to do . '' 's home . ' '' said the .###


Epoch 1/1:  37%|███▋      | 64081/172148 [51:10<1:24:17, 21.37it/s, loss=4.4122]


After 200064 examples, Average Loss: 4.3873



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:40, 34.85it/s][A
                                                            [A

Validation Average Loss: 4.3658, Perplexity: 78.72
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been a `` great example of the issue of the government , '' he said . 'We are not going to be a very good job . '' 's not . ' '' said the .### . 'S.D . 'S.

Context: New York

Generated text: New York City 's first-half-year-old was a ##-year-old man who was a ##-year-old man , who was a ##-year-old man , who was a ##-year-old man , who was

Context: The hurricane

Generated text: The hurricane is a major problem in the UK , and the UK 's economy . '' '' said . 'Fort . '' said . 'Fort . '' said . 'Fort . '' said . 'Fort . '' said . 'Fort


Epoch 1/1:  37%|███▋      | 64087/172148 [51:11<4:23:11,  6.84it/s, loss=4.4445]


Context: The President

Generated text: The President 's office has been in the case , and the government has been in the case . ' '' said the company . 'Fort . '' said . 'Fort . '' said . 'Fort . '' said . 'Fort . ''


Epoch 1/1:  38%|███▊      | 65643/172148 [52:24<1:24:29, 21.01it/s, loss=4.3692]


After 200064 examples, Average Loss: 4.3863



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:36, 36.55it/s][A
                                                            [A

Validation Average Loss: 4.3605, Perplexity: 78.30
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been in the country 's largest city in the UK . ' '' said the .### . '' ) said . '' said the .### . '' . 'S ) . '' said the .### . 'S. . ''

Context: New York

Generated text: New York City Council , who has been in the country , but the government has not been able to use the money to be able to use the money . ' '' said the .### . 'S. . '' . 'S ) . '' said the

Context: The hurricane

Generated text: The hurricane is not the first time in the country . '' 's official said . '' said the .### . '' . 'S ) . '' said the .### . 'S. . '' . 'S ) . '' said the .##


Epoch 1/1:  38%|███▊      | 65649/172148 [52:25<4:11:59,  7.04it/s, loss=4.4611]


Context: The President

Generated text: The President was also charged with murder and a police officer . 's said . '' said the .### . 'S. . '' . 'S ) . '' said the .### . 'S. . '' . 'S ) . '' said


Epoch 1/1:  39%|███▉      | 67207/172148 [53:38<1:23:29, 20.95it/s, loss=4.3660]


After 200064 examples, Average Loss: 4.3825



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:39, 35.29it/s][A
                                                            [A

Validation Average Loss: 4.3598, Perplexity: 78.24
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's first minister , who has been accused of being charged with murder . ' ' I 'm not going to be able to get to the floor . ' '' said the .##pm . ' '' said . ' I 'm not going to

Context: New York

Generated text: New York City Council , who has been accused of being charged with murder . ' ' I 'm not going to be able to get to the floor . ' '' said the .##pm . ' '' said . ' I 'm not going to be able

Context: The hurricane

Generated text: The hurricane is the first time in the area , and the ##-year-old was killed . ' '' said the .##pm . ' '' said the .##pm . ' '' said . ' I 'm not going to be able to get to


Epoch 1/1:  39%|███▉      | 67213/172148 [53:39<4:10:32,  6.98it/s, loss=4.4699]


Context: The President

Generated text: The President of the country 's government has been accused of being a `` very serious issue '' . ' '' said the .##pm . ' '' said . ' I 'm not going to be able to get to the floor . ' '' said the .


Epoch 1/1:  40%|███▉      | 68770/172148 [54:52<1:20:31, 21.40it/s, loss=4.5108]


After 200064 examples, Average Loss: 4.3829



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:40, 35.03it/s][A
                                                            [A

Validation Average Loss: 4.3553, Perplexity: 77.89
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's government has said the government has not yet been released . ' '' said the .##pm . ' '' said . ' I 'm not going to do . ' '' said the .##pm . ' '' said . ' I 'm

Context: New York

Generated text: New York City 's first-half years ago , he said : ' I 'm not going to be a little bit . ' '' said the .##-caliber pistol . ' I 'm not going to be a little bit . ' ''

Context: The hurricane

Generated text: The hurricane is to be a huge amount of time . ' '' said the .##-caliber gunfire . ' '' said the .##-caliber gunfire . ' '' said the .##-caliber gunfire . ' '' said the .


Epoch 1/1:  40%|███▉      | 68776/172148 [54:53<4:11:43,  6.84it/s, loss=4.4126]


Context: The President

Generated text: The President was a ##-year-old woman who was a ##-year-old woman , who was arrested in #### , was arrested in connection with the death of the ##-year-old woman who was arrested in the ##-year-old ,


Epoch 1/1:  41%|████      | 70332/172148 [56:06<1:18:25, 21.64it/s, loss=4.3449]


After 200064 examples, Average Loss: 4.3789



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:39, 35.14it/s][A
                                                            [A

Validation Average Loss: 4.3459, Perplexity: 77.16
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been a `` significant '' of the country 's most recent incidents , including the <rare> , and the most important thing . '' said . '' said . ' I said . '' said . ' I said . ' I think it '

Context: New York

Generated text: New York City Council has been accused of being a `` significant '' of the `` significant '' of the `` <rare> '' of the `` <rare> '' of the `` <rare> '' of the `` <rare> '' of the `` <rare>

Context: The hurricane

Generated text: The hurricane was the first time in the area , and the ##-year-old was found in the area . 's said . ' I said . '' said . ' I said . ' I think it 's a very good thing . '' 's


Epoch 1/1:  41%|████      | 70338/172148 [56:07<4:05:49,  6.90it/s, loss=4.3671]


Context: The President

Generated text: The President was a `` significant '' of the `` significant '' of the `` <rare> '' of the `` <rare> '' of the `` <rare> '' of the `` <rare> '' of the `` <rare> '' of the `` <rare


Epoch 1/1:  42%|████▏     | 71897/172148 [57:20<1:17:41, 21.51it/s, loss=4.3304]


After 200064 examples, Average Loss: 4.3780



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:39, 35.27it/s][A
                                                            [A

Validation Average Loss: 4.3457, Perplexity: 77.14
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been a major role in the country 's largest city . '' 's official . '' said . '' said the .### .### .### .### . ) . '' 's a .### .### .

Context: New York

Generated text: New York City 's #-# draw with the goalkeeper , but the club is a good job . ' '' said the .### .### .### .### . ) . '' 's a .### .### .

Context: The hurricane

Generated text: The hurricane is the first time the world 's largest city of the country . '' 's a .### .### .### .### . ) . '' 's a .### .### .### .### .


Epoch 1/1:  42%|████▏     | 71900/172148 [57:21<5:16:09,  5.28it/s, loss=4.3734]


Context: The President

Generated text: The President 's office is a `` good-class '' and `` unacceptable '' of the `` <rare> '' -- the .### . '' said . '' said the .### .### .### .### . ) .


Epoch 1/1:  43%|████▎     | 73460/172148 [58:34<1:15:51, 21.68it/s, loss=4.3886]


After 200064 examples, Average Loss: 4.3763



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:38, 35.62it/s][A
                                                            [A

Validation Average Loss: 4.3463, Perplexity: 77.19
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been accused of the attackers . ' '' said the incident . ' '' said the incident . ' '' said the incident . ' '' said the incident . ' '' said the incident . ' '' said the incident . ' '' said the incident .

Context: New York

Generated text: New York City , the ##-year-old , was a former president of the United States . ' '' said the . 'Following of the . ' '' said the . 'Following of the . ' '' said the . 'Follow

Context: The hurricane

Generated text: The hurricane is expected to be a major role in the first time . '' 's a . 'Furbart , '' he said . ' '' said the . 'Following of the incident . ' '' said the . 'Following of


Epoch 1/1:  43%|████▎     | 73463/172148 [58:35<5:05:42,  5.38it/s, loss=4.2916]


Context: The President

Generated text: The President 's office has been accused of the attackers . ' '' said the incident . ' '' said the incident . ' '' said the incident . ' '' said the incident . ' '' said the incident . ' '' said the incident . ' '' said


Epoch 1/1:  44%|████▎     | 75023/172148 [59:47<1:13:04, 22.15it/s, loss=4.3844]


After 200064 examples, Average Loss: 4.3729



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:36, 36.26it/s][A
                                                            [A

Validation Average Loss: 4.3500, Perplexity: 77.48
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been criticised for the first time in the country 's government . ' '' said the .### . '' ) said . ' '' said the .### . '' ) said . ' '' said the .### . '' ) said

Context: New York

Generated text: New York City Council has been criticised for the first time in the case . ' '' said the .### . '' ) said . ' '' said the .### . '' ) said . ' '' said the .### . '' ) said .

Context: The hurricane

Generated text: The hurricane is a major threat to the country . ' '' said the .### . '' ) said . ' '' said the .### . '' ) said . ' '' said the .### . '' ) said . ' '' said the .##


Epoch 1/1:  44%|████▎     | 75026/172148 [59:49<4:58:23,  5.42it/s, loss=4.5674]


Context: The President

Generated text: The President has been criticised for the first time in the case . ' '' said the .### . '' ) said . ' '' said the .### . '' ) said . ' '' said the .### . '' ) said . ' ''


Epoch 1/1:  44%|████▍     | 76585/172148 [1:01:01<1:14:03, 21.51it/s, loss=4.2767]


After 200064 examples, Average Loss: 4.3711



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:39, 35.39it/s][A
                                                            [A

Validation Average Loss: 4.3427, Perplexity: 76.92
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been a major factor in the United States , and the United States , the United States , and the United States , and the United States , the United States , said the government has been in the UK . ' '' said . ' I 'm

Context: New York

Generated text: New York City 's first-ever-minute video shows the video of the video . ' I 'm not going to be a very good friend . ' '' said . ' I 'm not going to be a very good friend . ' '' said

Context: The hurricane

Generated text: The hurricane is the first time in the world , and the United States is the largest of the world 's largest cities . '' . ) . '' 's a .### .### .### .### .### . ) . ''


Epoch 1/1:  44%|████▍     | 76591/172148 [1:01:03<3:47:09,  7.01it/s, loss=4.3731]


Context: The President

Generated text: The President of the U.S. government is not the first time in the United States , and the United States , the United States , and the United States , and the United States , the United States , said the government has been in the UK . '


Epoch 1/1:  45%|████▌     | 78147/172148 [1:02:15<1:13:38, 21.27it/s, loss=4.3380]


After 200064 examples, Average Loss: 4.3747



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:38, 35.64it/s][A
                                                            [A

Validation Average Loss: 4.3410, Perplexity: 76.79
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's office said the government 's decision to make a new deal of the new rules . ' '' said the company 's website . 'F . ) . ' I 'm not going to do . ' '' said the .### .

Context: New York

Generated text: New York City Council said the ##-year-old was arrested in #### , but he was arrested in #### . ' I was a very good man . ' '' said the .### . 'Fort . '' said . ' I think it 's

Context: The hurricane

Generated text: The hurricane is the first time in the UK , which is the largest of the largest population of ## % of the population . '' 's official . 'F.C. , . 'Fort . '' said . 'Fort . '' said . '


Epoch 1/1:  45%|████▌     | 78153/172148 [1:02:17<3:46:59,  6.90it/s, loss=4.4432]


Context: The President

Generated text: The President 's office has been a `` very difficult time '' . ' '' said the .### . 'Fort . '' said . ) . ' I 'm not going to do . ' '' said the .### . 'Fort .


Epoch 1/1:  46%|████▋     | 79712/172148 [1:03:30<1:09:59, 22.01it/s, loss=4.3885]


After 200064 examples, Average Loss: 4.3651



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:37, 35.89it/s][A
                                                            [A

Validation Average Loss: 4.3445, Perplexity: 77.05
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's government has been criticised for the first time in the ####s . ' '' he said . ' I 'm not going to be a little bit . ' '' said the .##pm . '' ) said . ' I 'm not

Context: New York

Generated text: New York City Council , who has been a major role in the United States . ' '' said the .### . ' '' said the .### . 'Fort . '' said the .### . 'Fort . '' said the .##

Context: The hurricane

Generated text: The hurricane is to be seen in the city of the city . ' '' said the .### . ' '' said the .### . 'Fort . '' said the .### . 'Fort . '' said the .### . '


Epoch 1/1:  46%|████▋     | 79715/172148 [1:03:31<4:44:38,  5.41it/s, loss=4.3442]


Context: The President

Generated text: The President 's office has been criticised for the `` austerity '' of the `` <rare> '' . ' '' said the .### . ' '' said the .### . 'Fort . '' said the .### . '


Epoch 1/1:  47%|████▋     | 81275/172148 [1:04:45<1:09:42, 21.73it/s, loss=4.3220]


After 200064 examples, Average Loss: 4.3678



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:38, 35.75it/s][A
                                                            [A

Validation Average Loss: 4.3365, Perplexity: 76.44
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's government has been criticised for the first time in the ####s . '' said the .### . '' said . '' said the .##pm . '' said . '' said the .##pm . '' said . '' said the .

Context: New York

Generated text: New York City 's first lady , who was a ##-year-old girl who was a ##-year-old girl , who was arrested in #### , was arrested in connection with the death of the death of the .##am . '' said .

Context: The hurricane

Generated text: The hurricane is the first time in the United States . 's the same time . '' said the .### . '' said . '' said the .##pm . '' said . '' said the .##pm . '' said . '' said the .##


Epoch 1/1:  47%|████▋     | 81278/172148 [1:04:46<4:38:19,  5.44it/s, loss=4.3253]


Context: The President

Generated text: The President 's office said the government would not be allowed to pay for the money . ' '' said the .### . '' said . '' said the .##pm . '' said . '' said the .##pm . '' said . '' said the


Epoch 1/1:  48%|████▊     | 82838/172148 [1:05:59<1:08:59, 21.57it/s, loss=4.3338]


After 200064 examples, Average Loss: 4.3635



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:41, 34.57it/s][A
                                                            [A

Validation Average Loss: 4.3440, Perplexity: 77.02
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been a `` significant step in the country '' . ' '' said the .##-caliber gun . ' '' said the .##-caliber gun . ' '' said the .##-## . 'Fort . '' said . '

Context: New York

Generated text: New York City , ## , was arrested and charged with murder and a police officer . ' I was a very good person . ' '' said . ' I was a very good person . ' '' said . ' I 'm not sure . ' '' said the

Context: The hurricane

Generated text: The hurricane is to be a major problem . '' 's a . 'Fort . '' said . 'Following . '' said the .##-caliber gun . ' '' said the .##-caliber gun . ' '' said the .


Epoch 1/1:  48%|████▊     | 82841/172148 [1:06:00<4:37:11,  5.37it/s, loss=4.3178]


Context: The President

Generated text: The President 's office has been a `` significant step in the country '' . ' '' said the .##-caliber gun . ' '' said the .##-caliber gun . ' '' said the .##-## . 'Fort . ''


Epoch 1/1:  49%|████▉     | 84401/172148 [1:07:13<1:07:57, 21.52it/s, loss=4.4841]


After 200064 examples, Average Loss: 4.3607



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:44, 33.65it/s][A
                                                            [A

Validation Average Loss: 4.3337, Perplexity: 76.22
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been charged with murder and a criminal investigation . ' '' said the .### .### ) . '' ) said . ' I 'm not going to do . ' '' said the .### .### ) . '' ) said

Context: New York

Generated text: New York City Council said the ##-year-old was arrested in the hospital , and the police were found in the area . ' '' said the .### . ' '' said the .### . 'F ) , '' he said . ' ''

Context: The hurricane

Generated text: The hurricane is the first time in the country , and the country 's most recent cases were reported . ' '' said the .### . '' ) said . ' I 'm not going to do . ' '' said the .### .###


Epoch 1/1:  49%|████▉     | 84404/172148 [1:07:15<4:38:12,  5.26it/s, loss=4.3879]


Context: The President

Generated text: The President has been charged with murder and a criminal investigation . ' '' said the .### .### ) . '' ) said . ' I 'm not going to do . ' '' said the .### .### ) . '' ) said


Epoch 1/1:  50%|████▉     | 85964/172148 [1:08:27<1:04:17, 22.34it/s, loss=4.3207]


After 200064 examples, Average Loss: 4.3601



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:37, 36.00it/s][A
                                                            [A

Validation Average Loss: 4.3315, Perplexity: 76.06
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been accused of the alleged attack on the death of the ##-year-old . 'Forta . '' said . 'Furbart , which is the first time , the first time , the president 's office said the

Context: New York

Generated text: New York City 's ##-year-old , who was a former presidential candidate , who has been accused of being a `` very serious '' of the `` terrorist '' of the Islamic State . '' . 'Forta said . ' I

Context: The hurricane

Generated text: The hurricane was the first time in the ####s , which is the first time of the ####s , which is the first time of the ####s , which is the first time in the ####s , which is the first time in the ####s , which


Epoch 1/1:  50%|████▉     | 85967/172148 [1:08:29<4:27:08,  5.38it/s, loss=4.3711]


Context: The President

Generated text: The President of the U.S. military has been accused of the attack , which is the first of the most recent years , the president said . ' I 'm not going to be able to get the money . ' '' said the . 'Science


Epoch 1/1:  51%|█████     | 87527/172148 [1:09:40<1:04:18, 21.93it/s, loss=4.3941]


After 200064 examples, Average Loss: 4.3601



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:39, 35.37it/s][A
                                                            [A

Validation Average Loss: 4.3288, Perplexity: 75.85
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's government has been a `` very important thing '' . ' '' said the .### . ' '' said the .### . ' '' said the .### . ' '' said the .### . ' '' said the .##

Context: New York

Generated text: New York City 's #-# win over the World Cup . 's win . '' 's the World Cup . ' '' said the .### . ' '' said the .### . ' '' said the .### . ' '' said

Context: The hurricane

Generated text: The hurricane is the first time in the UK , and the ##-year-old has been in the UK . '##s . '' 's the World Cup . ' '' said the .### . ' '' said the .### . ' ''


Epoch 1/1:  51%|█████     | 87530/172148 [1:09:42<4:19:43,  5.43it/s, loss=4.4521]


Context: The President

Generated text: The President of the U.S. Embassy in the U.S. Embassy in the U.S. Embassy in the U.S. Embassy in the U.S. Embassy in the U.S


Epoch 1/1:  52%|█████▏    | 89090/172148 [1:10:54<1:05:06, 21.26it/s, loss=4.3759]


After 200064 examples, Average Loss: 4.3584



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:39, 35.13it/s][A
                                                            [A

Validation Average Loss: 4.3299, Perplexity: 75.94
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's first-half-year-old was a ##-year-old , who has been a ##-year-old , who has been a ##-year-old , who has been a ##-year-old , who has been

Context: New York

Generated text: New York City 's ##-year-old , who has been a ##-year-old , who has been a ##-year-old , who has been a ##-year-old , who has been a ##-year-old , who

Context: The hurricane

Generated text: The hurricane , the U.S. military , and the U.S. military has been killed in the attack . '## . '' said . 'Bothers and the .### . '' ) said . ' I 'm not going to be


Epoch 1/1:  52%|█████▏    | 89093/172148 [1:10:55<4:17:57,  5.37it/s, loss=4.3973]


Context: The President

Generated text: The President 's office has been criticised for the first time in the UK . '## . '' said . 'Bothers and the .### . '' ) said . ' I 'm not going to be a good job . ' '' said


Epoch 1/1:  53%|█████▎    | 90653/172148 [1:12:09<1:04:23, 21.09it/s, loss=4.3013]


After 200064 examples, Average Loss: 4.3566



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:42, 34.16it/s][A
                                                            [A

Validation Average Loss: 4.3224, Perplexity: 75.37
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's first-largest city of the country is the first time in the country . '' 's a .##-caliber and a .##-caliber and a bit of a good day . '' ' I said . '' ' I

Context: New York

Generated text: New York City 's #-#-# win over the Champions League . '' ' I 'm not going to be a good player . '' 's a .##-caliber and a bit of a good day . '' ' I said . ''

Context: The hurricane

Generated text: The hurricane is the first time the attack was a `` very serious '' and the case . ' '' said the incident . ' '' said the incident . ' I was a very good friend . '' 's a .##-caliber handgun . ' ''


Epoch 1/1:  53%|█████▎    | 90656/172148 [1:12:10<4:15:11,  5.32it/s, loss=4.4102]


Context: The President

Generated text: The President of the country 's largest group of the country 's largest city of the country is the first time . '' 's a .##-caliber and a .##-caliber and a bit of a good day . '' ' I said


Epoch 1/1:  54%|█████▎    | 92215/172148 [1:13:24<1:03:07, 21.11it/s, loss=4.3260]


After 200064 examples, Average Loss: 4.3561



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:38, 35.74it/s][A
                                                            [A

Validation Average Loss: 4.3267, Perplexity: 75.70
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's government has been a `` very difficult time '' . ' '' said the .### . '' ) said . '' 's a .##-caliber and right footed shot from the right side of the box is saved in the bottom

Context: New York

Generated text: New York City 's #-#-# win over the World Cup in #### . ' '' said the .### . '' ) . ' '' said the .### . '' ) . 'Fort . '' 's office said . '' said

Context: The hurricane

Generated text: The hurricane was the first time in the city of the city . '' 's office said . '' said . 'Fort . '' said . 'Fort . '' said . 'Fort . '' said . 'Fort . '' said . 'F


Epoch 1/1:  54%|█████▎    | 92221/172148 [1:13:25<3:10:57,  6.98it/s, loss=4.4005]


Context: The President

Generated text: The President 's office is a `` very difficult time '' . ' '' said the .### . '' ) said . '' 's a .##-caliber and right footed shot from the right side of the box is saved in the bottom right


Epoch 1/1:  54%|█████▍    | 93778/172148 [1:14:38<1:00:27, 21.61it/s, loss=4.3673]


After 200064 examples, Average Loss: 4.3531



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:38, 35.68it/s][A
                                                            [A

Validation Average Loss: 4.3332, Perplexity: 76.19
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's government has been in the midst of a `` significant impact on the ground '' . ' '' said the .### . '' said . '' said . '' said . '' said . '' said . '' said . '' said . '' said .

Context: New York

Generated text: New York City Council , who has been in the UK , and the US government has been given a number of people to pay for the UK . ' '' said the .### . '' said . '' said . '' said . '' said . '' said .

Context: The hurricane

Generated text: The hurricane is the first time in the ####s . '' #### #### #### . '' #### #### #### . '' ) said . '' 's a source of the <rare> . '' said . '' said . '' said . '' said . '' said . ''


Epoch 1/1:  54%|█████▍    | 93784/172148 [1:14:39<3:07:09,  6.98it/s, loss=4.3566]


Context: The President

Generated text: The President 's office is a `` significant '' of the `` <rare> '' -- a `` <rare> '' -- a `` <rare> '' -- a `` <rare> '' -- a `` <rare> '' -- a `` <rare> ''


Epoch 1/1:  55%|█████▌    | 95342/172148 [1:15:52<1:03:14, 20.24it/s, loss=4.2964]


After 200064 examples, Average Loss: 4.3505



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:44, 33.67it/s][A
                                                            [A

Validation Average Loss: 4.3276, Perplexity: 75.76
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's government has been in the UK to be a `` significant '' of the country 's most recent financial crisis . '' 's a .### , '' he said . ' I 'm sure it 's a good thing . '' '

Context: New York

Generated text: New York City , ## , #### , and the ##-year-old woman was found dead in the ##th minute , and the ##-year-old was found dead in the attack . ' I was in the middle of the day . ' '' said

Context: The hurricane

Generated text: The hurricane , the ##-year-old , who was killed in the attack , and the police said . ' I 'm not sure . ' '' said the .### . '' said . 'Farms said . ' '' said the .##


Epoch 1/1:  55%|█████▌    | 95345/172148 [1:15:53<3:59:11,  5.35it/s, loss=4.2796]


Context: The President

Generated text: The President of the U.S. government has been in the country 's most recent financial crisis . '' 's a .### , '' he said . ' I 'm sure it 's a good thing . '' 's a .###


Epoch 1/1:  56%|█████▋    | 96903/172148 [1:17:06<57:05, 21.96it/s, loss=4.3469]


After 200064 examples, Average Loss: 4.3521



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:36, 36.34it/s][A
                                                            [A

Validation Average Loss: 4.3213, Perplexity: 75.29
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been charged with murder and was not a member of the family . ' '' said the .### . '' said . 'Furbart , '' said the .### . '' said . ) . '' 's a .###

Context: New York

Generated text: New York City Council said the ##-year-old was killed in the attack . '##s . '' 's a .### . '' said . ) . '' 's a .### . '' said . 'Furbart , ''

Context: The hurricane

Generated text: The hurricane is the first time in the UK . '##s . '' said the .### . '' said . ) . '' 's a .### . '' said . 'Furbart , '' said the .### . '' said


Epoch 1/1:  56%|█████▋    | 96909/172148 [1:17:07<2:55:49,  7.13it/s, loss=4.2564]


Context: The President

Generated text: The President of the UK 's most recent report , which is the first time in the UK , the UK government said . '### . '' said . 'Furbart , '' said the .### . '' said . ) . '' '


Epoch 1/1:  57%|█████▋    | 98466/172148 [1:18:19<56:02, 21.91it/s, loss=4.2648]


After 200064 examples, Average Loss: 4.3502



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:39, 35.41it/s][A
                                                            [A

Validation Average Loss: 4.3235, Perplexity: 75.45
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's official statement said : 'The ICC 's decision is not the first time . ' '' said the .### . '' said . 'Fort . '' 's official website . 'Five . 'Fort . '' said

Context: New York

Generated text: New York City officials said the ##-year-old was arrested in connection with the death of the ##-year-old . ' I was a very good friend . ' '' said the .### . ' I 'm not going to do . '

Context: The hurricane

Generated text: The hurricane is the first time in the country . ' '' said the .### . '' said . 'Fort . '' said . 'Five . ' '' said the .### . ' '' said the .### . ' '' said .


Epoch 1/1:  57%|█████▋    | 98472/172148 [1:18:21<2:53:36,  7.07it/s, loss=4.3644]


Context: The President

Generated text: The President 's office said the government 's `` unacceptable '' of the `` unacceptable '' of the case is not the first time . ' '' said the .### . '' said . 'Fort . '' 's official website .


Epoch 1/1:  58%|█████▊    | 100029/172148 [1:19:34<54:08, 22.20it/s, loss=4.2603]


After 200064 examples, Average Loss: 4.3464



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:47, 32.52it/s][A
                                                            [A

Validation Average Loss: 4.3186, Perplexity: 75.08
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been charged with murder and murder , and the ##-year-old man was arrested in connection with the incident . ' I was a man . ' '' said the .### . '' said . ' I 'm not sure . ' ''

Context: New York

Generated text: New York City Council , who have been charged with murder and murder , and the ##-year-old man was arrested in connection with the incident . ' I was a man . ' '' said the .### . '' said . ' I 'm not

Context: The hurricane

Generated text: The hurricane is the first time in the UK , and the ##-year-old is the most expensive . '' #### .### ) # .### . '' ) said . '' ' I 'm not sure . ' '' he said . ' I


Epoch 1/1:  58%|█████▊    | 100035/172148 [1:19:35<2:56:06,  6.82it/s, loss=4.3628]


Context: The President

Generated text: The President 's office has been criticised for the first time in the UK . ' '' said the company 's `` <rare> '' -- a . '' said . ' I 'm not sure . ' '' said the .### . '' said


Epoch 1/1:  59%|█████▉    | 101592/172148 [1:20:48<54:14, 21.68it/s, loss=4.3285]


After 200064 examples, Average Loss: 4.3461



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:37, 36.15it/s][A
                                                            [A

Validation Average Loss: 4.3133, Perplexity: 74.69
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been a `` significant step in the way '' . '' 's the news . ' ) . '' ' I 'm not going to be a good person . '' 's funny . ' ) . '' ' I 'm not going to

Context: New York

Generated text: New York City Council , who has been a `` good idea '' . ' '' he said . ' I 'm not going to be a bit of a bit of a bit of a bit of a bit of a bit of a bit of a bit of a

Context: The hurricane

Generated text: The hurricane was the first time in the city of the city of the city . 'Fifseyside . '' ) said . '' 's a source . ' ) . '' 's a very high school . '' ) . ' I 'm not


Epoch 1/1:  59%|█████▉    | 101598/172148 [1:20:49<2:45:34,  7.10it/s, loss=4.4346]


Context: The President

Generated text: The President has been a `` significant step in the way '' . '' 's the news . ' ) . '' ' I 'm not going to be a good person . '' 's funny . ' ) . '' ' I 'm not going to


Epoch 1/1:  60%|█████▉    | 103155/172148 [1:22:02<53:36, 21.45it/s, loss=4.3693]


After 200064 examples, Average Loss: 4.3509



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 3/3514 [00:00<02:00, 29.17it/s][A
Evaluating:   0%|          | 7/3514 [00:00<01:48, 32.31it/s][A
                                                            [A

Validation Average Loss: 4.3144, Perplexity: 74.77
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been a `` very important step '' . ' '' said the .### . '' ) . '' ) . ' I 'm not going to be able to get the best wishes . '' ' I 'm not going to be able to get

Context: New York

Generated text: New York City Council has been charged with the murder of the murder of the murder of the . ' I 'm not going to be able to get the chance to get a lot of people . '' 's naked . ' I 'm not going to

Context: The hurricane

Generated text: The hurricane was the first time in the ####s . '' #### . ) # .### ) # . '' ) . '' ) . '' ) . ' I 'm not going to be able to get the chance to get a lot of money . ''


Epoch 1/1:  60%|█████▉    | 103160/172148 [1:22:03<3:04:07,  6.24it/s, loss=4.3363]


Context: The President

Generated text: The President of the U.S. military , the U.S. military , the U.S. military , the U.S. military , and the U.S. military . '' ) said . '' 's a . '' ) said .


Epoch 1/1:  61%|██████    | 104719/172148 [1:23:16<52:09, 21.55it/s, loss=4.2933]


After 200064 examples, Average Loss: 4.3441



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:39, 35.15it/s][A
                                                            [A

Validation Average Loss: 4.3180, Perplexity: 75.04
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been in the same way of the .### . '' ) said . '' 's a big part of the game . '' 's nudge . ' '' he said . ' I 'm not going to be a good friend .

Context: New York

Generated text: New York 's ##-year-old , who was a ##-year-old man , who was a ##-year-old man , who was arrested in the hospital , and the police were killed . ' I 'm not going to be a

Context: The hurricane

Generated text: The hurricane was found in the city of the city . 'Farms said . 's a lot of people . '' 's a very high school . '' ) . ' I 'm not going to be a big . '' ) . ' I '


Epoch 1/1:  61%|██████    | 104725/172148 [1:23:18<2:42:44,  6.90it/s, loss=4.2774]


Context: The President

Generated text: The President 's office has been criticised by the government 's office . ' '' said the company 's chief executive of the United States . 's the same time . '' 's not . ' '' said the .##-caliber pig


Epoch 1/1:  62%|██████▏   | 106283/172148 [1:24:31<50:44, 21.63it/s, loss=4.2497]


After 200064 examples, Average Loss: 4.3437



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:37, 35.82it/s][A
                                                            [A

Validation Average Loss: 4.3153, Perplexity: 74.84
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been a `` very serious '' of the `` terrorist '' of the `` terrorist '' of the `` terrorist '' of the `` terrorist '' of the `` terrorist '' of the `` terrorist '' of the `` terrorist '' of

Context: New York

Generated text: New York City 's first-half-year-old was shot in the attacking half . ' I was a bit of a good friend . ' '' said the .### . ' '' said . ' I 'm not sure what I 'm

Context: The hurricane

Generated text: The hurricane is the first time in the first time , the first time in the world 's top-flight match was a first-half-year-old 's first goal of the goal . ' I was going to be a bit of a good


Epoch 1/1:  62%|██████▏   | 106286/172148 [1:24:33<3:22:07,  5.43it/s, loss=4.2938]


Context: The President

Generated text: The President 's office has been accused of being a `` very serious '' of the `` terrorist '' of the `` terrorist '' of the `` terrorist '' of the `` terrorist '' of the `` terrorist '' of the `` terrorist '' of


Epoch 1/1:  63%|██████▎   | 107846/172148 [1:25:46<51:37, 20.76it/s, loss=4.3404]


After 200064 examples, Average Loss: 4.3450



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:45, 33.19it/s][A
                                                            [A

Validation Average Loss: 4.3130, Perplexity: 74.66
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been charged with conspiracy to commit suicide . ' . ' '' she said . ' I 'm not going to be a very good thing . '' ' I said . '' ' I 'm not sure . ' '' he said .

Context: New York

Generated text: New York City Council , who has been charged with conspiracy to commit suicide . ' . ' '' she said . ' I 'm not going to be a very good thing . '' ' I said . '' ' I 'm not sure . '

Context: The hurricane

Generated text: The hurricane is the first time of the crash , the police said . ' I 'm not going to be a very good thing . '' ' I said . '' ' I 'm not sure . ' '' he said . ' I 'm not going to


Epoch 1/1:  63%|██████▎   | 107849/172148 [1:25:48<3:20:11,  5.35it/s, loss=4.3235]


Context: The President

Generated text: The President of the U.S. military has been charged with conspiracy to commit suicide . ' . ' '' said the company 's name . ' '' said the company 's name . ' '' said the company 's name . ' ''


Epoch 1/1:  64%|██████▎   | 109407/172148 [1:27:01<50:12, 20.83it/s, loss=4.1687]


After 200064 examples, Average Loss: 4.3431



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:38, 35.76it/s][A
                                                            [A

Validation Average Loss: 4.3025, Perplexity: 73.88
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's government has been criticised for the first time in the ####s . ' '' said the .### . '' ) . '' ) . '' ' I said . '' ' I said . '' ' I was a very good friend . '

Context: New York

Generated text: New York City Council , who has been charged with murdering a ##-year-old , who was arrested in the hospital , where he was arrested in the hospital . ' I was a very good man . '' ' I said . '' ' I was a

Context: The hurricane

Generated text: The hurricane is a very difficult time for the future . '' 's time . '' ' I said . '' ' I said . '' ' I was a very good friend . ' '' said the .### . '' ) . ' I 'm not going


Epoch 1/1:  64%|██████▎   | 109412/172148 [1:27:03<2:43:13,  6.41it/s, loss=4.3469]


Context: The President

Generated text: The President of the country 's largest cities are expected to be a major boost for the country . ' '' said the company 's website . ) . '' ) . '' ' I 'm not going to be a good thing . '' ' I said .


Epoch 1/1:  64%|██████▍   | 110970/172148 [1:28:16<47:22, 21.52it/s, loss=4.2438]


After 200064 examples, Average Loss: 4.3389



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:43, 34.02it/s][A
                                                            [A

Validation Average Loss: 4.3107, Perplexity: 74.49
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been a `` very serious issue '' of the `` <rare> '' of the `` <rare> '' of the `` <rare> '' of the `` <rare> '' of the `` <rare> '' of the `` <rare> ''

Context: New York

Generated text: New York City , the ##-year-old , who was born in the village of the city , and the same . ' '' said the .### . '' ) . '' ' I 've got a lot of people . ' '' said the .

Context: The hurricane

Generated text: The hurricane is the same as the first time in the world . '' 's not . '' said . '' ' I think it 's a good thing . '' ' I 've got a lot of money . ' '' said the .### . ''


Epoch 1/1:  64%|██████▍   | 110976/172148 [1:28:18<2:26:08,  6.98it/s, loss=4.1974]


Context: The President

Generated text: The President of the U.S. military has been a `` very serious '' of the government 's government . '' 's not clear whether the government is not the first time . '' 's not clear whether the government is not the first time . ''


Epoch 1/1:  65%|██████▌   | 112535/172148 [1:29:31<47:54, 20.74it/s, loss=4.3836]


After 200064 examples, Average Loss: 4.3408



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:39, 35.36it/s][A
                                                            [A

Validation Average Loss: 4.3095, Perplexity: 74.40
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow , the U.S. military said the government would not be able to use the new rules . ' '' said the .##-caliber rifle . ' '' said the .##-caliber rifle . ' I was a man who

Context: New York

Generated text: New York City Council said the ##-year-old was arrested in connection with the death of the ##-year-old man , who was arrested in connection with the death of the ##-year-old . ' I was a man who was a man

Context: The hurricane

Generated text: The hurricane was the first time in the area , the U.S. military said . 's a lot of people . '' 's a time . ' '' said the .##pm . '' ) . ' I 'm a man who was a man


Epoch 1/1:  65%|██████▌   | 112538/172148 [1:29:32<3:05:11,  5.36it/s, loss=4.4693]


Context: The President

Generated text: The President of the U.S. military has been arrested in connection with the deaths of the U.S. and the U.S. military . '' 's the court heard . ' I 'm not going to be a man . ' ''


Epoch 1/1:  66%|██████▋   | 114097/172148 [1:30:46<45:27, 21.28it/s, loss=4.3274]


After 200064 examples, Average Loss: 4.3418



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:46, 33.05it/s][A
                                                            [A

Validation Average Loss: 4.3198, Perplexity: 75.17
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's official statement said : 'The other was a very good idea , '' he said . ' I 'm not going to be a very good friend . ' '' said the .##-caliber gun . ' '' he said . ' I

Context: New York

Generated text: New York City : The ##-year-old was a ##-year-old girl who was a ##-year-old girl , who was a member of the family , who was a member of the family , who was a member of the family ,

Context: The hurricane

Generated text: The hurricane was the first time in the ####s , and the ##-year-old was shot dead . ' I was a very good man . ' '' said the .##-caliber gun . ' I 'm not going to be a very good


Epoch 1/1:  66%|██████▋   | 114103/172148 [1:30:47<2:24:34,  6.69it/s, loss=4.2517]


Context: The President

Generated text: The President has been a `` very serious '' of the country 's government to be a `` very good '' and a `` very good '' and a `` very good '' and a `` very good '' and a `` very good '' and a `` very good ''


Epoch 1/1:  67%|██████▋   | 115660/172148 [1:32:01<43:55, 21.43it/s, loss=4.3790]


After 200064 examples, Average Loss: 4.3388



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:38, 35.58it/s][A
                                                            [A

Validation Average Loss: 4.3015, Perplexity: 73.81
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's presidential candidate , who is the first of the most popular presidential candidate . ' '' said the .### . '' ) . '' ' I 've got to be able to play . '' ' I 'm not going to be

Context: New York

Generated text: New York City 's ##-year-old , who was ## , and the ##-year-old was arrested on suspicion of murder . ' ' I 'm not going to be a very good . ' '' I was . ' '' she said

Context: The hurricane

Generated text: The hurricane was the first time in the ####s , and the ##-year-old was in the top ## . ' I 've been a good player . ' '' he said . ' I 'm not going to be able to play . '' '


Epoch 1/1:  67%|██████▋   | 115666/172148 [1:32:03<2:16:44,  6.88it/s, loss=4.3209]


Context: The President

Generated text: The President has been charged with murder and a ##-year-old man . ' ' I 'm not going to be a very good . ' '' I 've been told . ' I 'm not going to be a very good player . ' ''


Epoch 1/1:  68%|██████▊   | 117223/172148 [1:33:15<42:22, 21.61it/s, loss=4.3882]


After 200064 examples, Average Loss: 4.3369



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:41, 34.48it/s][A
                                                            [A

Validation Average Loss: 4.3054, Perplexity: 74.10
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's first ministerial candidate has been charged with the murder of the ####s . ' I said . '' said the .### . '' ) said . ' I 'm not going to be a very good thing . ' '' said the .

Context: New York

Generated text: New York City Council said the ##-year-old was a member of the country 's largest city in the country . '' 's not . '' said . '' said . ' I said . '' said . ' I was going to be a very good

Context: The hurricane

Generated text: The hurricane was the first time in the city of the city of the city . ' '' said the .### ) . '' said . ' '' said the .### ) . '' said . ' I 've got to be a very good thing .


Epoch 1/1:  68%|██████▊   | 117229/172148 [1:33:17<2:11:16,  6.97it/s, loss=4.4126]


Context: The President

Generated text: The President of the U.S. military is a `` very difficult time '' to be a member of the country 's most recent . '' . ) # . '' ) said . ' I 'm not going to be a very good thing . ' ''


Epoch 1/1:  69%|██████▉   | 118785/172148 [1:34:30<42:24, 20.98it/s, loss=4.3110]


After 200064 examples, Average Loss: 4.3370



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 3/3514 [00:00<01:58, 29.73it/s][A
Evaluating:   0%|          | 7/3514 [00:00<01:50, 31.83it/s][A
                                                            [A

Validation Average Loss: 4.3036, Perplexity: 73.96
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been a `` very serious '' of the government 's decision . '' 's not . '' said . '' ) . '' said . '' ) . '' said . '' ) . '' said . '' said . '' ) . '' said . ''

Context: New York

Generated text: New York City 's ##-year-old was a ##-year-old man who was arrested in #### . ' I was a bit of a bit of a bit of a lot of time . '' ' I was a bit of a bit of a

Context: The hurricane

Generated text: The hurricane season was a ##-year-old man . ' I was a bit of a bit of a bit of a lot of the time . '' ' I 'm not going to be a good job . ' '' said the .### . ''


Epoch 1/1:  69%|██████▉   | 118791/172148 [1:34:32<2:11:50,  6.74it/s, loss=4.3990]


Context: The President

Generated text: The President of the U.S. military has been a `` very serious '' of the government 's decision . '' 's not . '' said . '' ) . '' said . '' ) . '' said . '' ) . '' said . '' said .


Epoch 1/1:  70%|██████▉   | 120348/172148 [1:35:45<41:57, 20.57it/s, loss=4.3366]


After 200064 examples, Average Loss: 4.3337



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:42, 34.11it/s][A
                                                            [A

Validation Average Loss: 4.3014, Perplexity: 73.80
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's official statement said . '' ) said . '' ' I 'm not sure . ' '' he said . ' I 'm not sure . ' '' he said . ' I 'm not sure . ' '' she said . ' I '

Context: New York

Generated text: New York City 's #-#-# #-# win over the ##th minute , but the goal is to be able to get the ball to the bottom of the table . '' ' I 'm not sure . '' ' I think it '

Context: The hurricane

Generated text: The hurricane is the first time in the world , and the ##-year-old was a 'tremont ' . '' ) said . ' I 'm not going to be a very good thing . ' '' said . ' I 'm not sure


Epoch 1/1:  70%|██████▉   | 120353/172148 [1:35:46<2:16:04,  6.34it/s, loss=4.3852]


Context: The President

Generated text: The President 's office said the ##-year-old was a 'very untrue ' . ' '' said . ' I 'm not sure . ' '' he said . ' I 'm not sure . ' '' she said . ' I 'm


Epoch 1/1:  71%|███████   | 121912/172148 [1:36:59<39:14, 21.33it/s, loss=4.2976]


After 200064 examples, Average Loss: 4.3319



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:39, 35.24it/s][A
                                                            [A

Validation Average Loss: 4.2999, Perplexity: 73.69
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's official statement said . ' I 'm not going to be a very important thing . ' '' said . ' I 'm not going to be a very nice man . ' '' said . ' I 'm not going to be a very

Context: New York

Generated text: New York City Council , which is the first time in the UK , which is the most common problem . ' '' said the company said . ' I 'm not going to be a very important thing . ' '' said . ' I 'm not going to

Context: The hurricane

Generated text: The hurricane is the first time in the world , and the world 's largest <rare> , the company said . ' I 'm not going to be a very important thing . ' '' said . ' I 'm not going to be a very nice


Epoch 1/1:  71%|███████   | 121918/172148 [1:37:01<2:01:11,  6.91it/s, loss=4.3987]


Context: The President

Generated text: The President 's office said the government is not the only way to make a new deal . '' 's not the same way . '' ' I 'm not going to be a good friend . ' '' said . ' I 'm not going to be


Epoch 1/1:  72%|███████▏  | 123475/172148 [1:38:14<37:52, 21.42it/s, loss=4.2147]


After 200064 examples, Average Loss: 4.3322



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:39, 35.43it/s][A
                                                            [A

Validation Average Loss: 4.2978, Perplexity: 73.54
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been a `` very serious '' of the government 's decision to be a `` very serious '' of the government 's decision to be a `` very serious '' of the government 's decision to be a `` very serious '' of the government '

Context: New York

Generated text: New York , the ##-year-old , who was a member of the family , said the man was killed in the attack . ' '' said the .## . '' ) . '' ' I 've got a lot of people . '' 's a

Context: The hurricane

Generated text: The hurricane was the first time in the city , and the ##-year-old was found in the city of the city . ' '' said the .## '' . ' '' said . '' ' I 'm a friend of the family . '' ' I


Epoch 1/1:  72%|███████▏  | 123481/172148 [1:38:15<1:56:32,  6.96it/s, loss=4.3273]


Context: The President

Generated text: The President of the U.S. government is not the first time in the country . '' 's not . '' said the .## '' . '' ) . '' ' I 've got a lot of people . '' 's a very good friend .


Epoch 1/1:  73%|███████▎  | 125037/172148 [1:39:28<36:42, 21.39it/s, loss=4.3047]


After 200064 examples, Average Loss: 4.3301



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:36, 36.21it/s][A
                                                            [A

Validation Average Loss: 4.3017, Perplexity: 73.83
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's first-half-year-old , who was a ##-year-old man who was a ##-year-old girl who was a ##-year-old girl , who was born in #### , was found guilty of murdering

Context: New York

Generated text: New York City Council , who have been arrested in connection with the death of the death of the .## . '' ) . '' ' I said . '' ' I 'm not sure what I 'm going to be a good person . ' '' said .

Context: The hurricane

Generated text: The hurricane , the ##-year-old , who was born in the ####s , and the ##-year-old was born in #### . ' I 'm not sure what I 'm going to be a good player . '' ' I think I


Epoch 1/1:  73%|███████▎  | 125043/172148 [1:39:30<1:51:47,  7.02it/s, loss=4.3341]


Context: The President

Generated text: The President 's office is not the first time in the world . '' 's report . '' ) said . '' 's a lot of people who are going to be a good person . '' 's n't . ' '' said . ' I


Epoch 1/1:  74%|███████▎  | 126600/172148 [1:40:43<36:05, 21.03it/s, loss=4.3640]


After 200064 examples, Average Loss: 4.3347



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:39, 35.32it/s][A
                                                            [A

Validation Average Loss: 4.2953, Perplexity: 73.35
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's first-half-year-old was a ##-year-old man who was shot dead in the attack . ' I was a bit of a bad foul . ' '' he said . ' I 'm not going to be a

Context: New York

Generated text: New York City Council said : 'The first time I was going to be a little bit of the way I 've got to be able to get the ball . '' ' I 'm not going to be a good player . ' '' he said . '

Context: The hurricane

Generated text: The hurricane is the first time in the world , and the most important thing to do . '' 's not the first time . ' '' he said . ' I 'm not going to be a little bit . ' '' he said . ' I 'm


Epoch 1/1:  74%|███████▎  | 126606/172148 [1:40:44<1:50:21,  6.88it/s, loss=4.4825]


Context: The President

Generated text: The President of the U.S. government is not the first time in the United States . '' 's not the first time . '' 's `` The Situation Room '' . ' '' he said . ' I 'm not going to be a little


Epoch 1/1:  74%|███████▍  | 128163/172148 [1:41:58<35:56, 20.40it/s, loss=4.2922]


After 200064 examples, Average Loss: 4.3320



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:41, 34.55it/s][A
                                                            [A

Validation Average Loss: 4.3186, Perplexity: 75.09
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been charged with murder and was arrested on suspicion of murdering a ##-year-old girl . ' I was a very good man . ' '' said the .### . '' ) . ' I 'm not going to be a

Context: New York

Generated text: New York City Council said the `` <rare> '' is the first time , the company has been in the UK . '## . '' ' I 'm a fan of the world . '' ' I 'm a fan of the world . '' ' I

Context: The hurricane

Generated text: The hurricane season , which is the first time in the ####s , and the ##-year-old is a member of the world 's largest in the world , and the most popular tourist group , which is the first time in the ####s ,


Epoch 1/1:  74%|███████▍  | 128169/172148 [1:41:59<1:51:27,  6.58it/s, loss=4.2988]


Context: The President

Generated text: The President has been charged with murder and was arrested on suspicion of murdering a ##-year-old girl . ' I was a very good man . ' '' said the .### . '' ) . ' I 'm not going to be a


Epoch 1/1:  75%|███████▌  | 129726/172148 [1:43:13<32:55, 21.47it/s, loss=4.3756]


After 200064 examples, Average Loss: 4.3311



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:43, 34.08it/s][A
                                                            [A

Validation Average Loss: 4.2982, Perplexity: 73.57
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's chief executive of the United Nations said the ##-year-old was killed in the attack . ' '' said the .### . '' ) . '' ) . '' ) . '' ' I 've got a lot of people . ''

Context: New York

Generated text: New York City officials said the ##-year-old was a `` very good '' and he was a very good friend . ' '' said the .### . '' ) . '' ) . '' ) . '' ' I 've got a lot of people

Context: The hurricane

Generated text: The hurricane was the first time in the ####s , and the ##-year-old was found dead in the car . ' '' said the .### . '' ) . '' ) . '' ) . '' ' I 've got a lot of people


Epoch 1/1:  75%|███████▌  | 129732/172148 [1:43:14<1:41:02,  7.00it/s, loss=4.3863]


Context: The President

Generated text: The President of the country 's parliamentary committee will be able to provide a `` thorough investigation '' . '' 's report . '' ) . '' ' I told me . ' I 'm not sure what I 'm going to be a very good job


Epoch 1/1:  76%|███████▋  | 131291/172148 [1:44:29<32:09, 21.18it/s, loss=4.3057]


After 200064 examples, Average Loss: 4.3251



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:40, 34.88it/s][A
                                                            [A

Validation Average Loss: 4.2960, Perplexity: 73.41
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's office , which is the first of the most popular , and the most popularity of the world 's most popular destinations are the most popular to the world . '' 's most of the world . '' 's most of the world

Context: New York

Generated text: New York City Council has been criticised for the first time in the ####s , and the ##-year-old was a `` very good idea '' . ' '' he said . '' ' I think I 'm not going to be a very good friend

Context: The hurricane

Generated text: The hurricane was the first time in the ####s , and the ##-year-old was a ##-year-old girl , who was arrested in #### , and the ##-year-old was arrested in connection with the death of the ##-year


Epoch 1/1:  76%|███████▋  | 131294/172148 [1:44:30<2:06:51,  5.37it/s, loss=4.4463]


Context: The President

Generated text: The President has been criticised for the first time in the ####s , and the ##-year-old was a `` very good idea '' . ' '' he said . '' ' I think I 'm not going to be a very good friend . ''


Epoch 1/1:  77%|███████▋  | 132854/172148 [1:45:44<31:36, 20.72it/s, loss=4.2561]


After 200064 examples, Average Loss: 4.3326



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:37, 36.12it/s][A
                                                            [A

Validation Average Loss: 4.2987, Perplexity: 73.60
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been a `` very good place '' in the country . '' 's narcotics , '' he said . '' ' I 'm not sure what 's going to happen . '' ' I 'm not sure what 's going to

Context: New York

Generated text: New York , the ##-year-old , who has been a great player , and he has been a good player . ' '' he said . ' I 'm not going to be a good player . ' '' he said . ' I 'm not

Context: The hurricane

Generated text: The hurricane is a very good idea . '' 's a great player . ' '' said the .### . '' ) . '' ' I 'm going to be a good player . ' '' said the .### . '' ) . '' ' I


Epoch 1/1:  77%|███████▋  | 132857/172148 [1:45:45<2:02:41,  5.34it/s, loss=4.3477]


Context: The President

Generated text: The President has been a `` very good idea '' to be a good idea . '' ' I 'm going to be a good player . ' '' said the .### . '' ) . '' ' I 'm going to be a good player . '


Epoch 1/1:  78%|███████▊  | 134415/172148 [1:46:58<29:21, 21.43it/s, loss=4.4530]


After 200064 examples, Average Loss: 4.3265



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:40, 35.02it/s][A
                                                            [A

Validation Average Loss: 4.2989, Perplexity: 73.62
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been charged with murder and a ##-year-old boy who was a `` very dangerous person '' . ' '' said the .### . '' ) . '' ' I said . ' I 'm not going to be a little bit .

Context: New York

Generated text: New York City 's #-# win over the World Cup . ' I 've got to get back to the ground . '' ' I 'm sure he 's a good player . '' ' I think it 's a good player . ' ''

Context: The hurricane

Generated text: The hurricane is a very difficult time for the first time . '' 's a lot of people . '' ' I think it 's a good thing . '' ' I said . ' I 'm not going to be a little bit . ' '' said the


Epoch 1/1:  78%|███████▊  | 134421/172148 [1:47:00<1:32:03,  6.83it/s, loss=4.2934]


Context: The President

Generated text: The President 's office has been charged with the murder of the ##-year-old boy who was killed in the attack . ' I was a bit of a bad foul . ' '' said . ' I 'm not going to be a little bit


Epoch 1/1:  79%|███████▉  | 135978/172148 [1:48:14<28:37, 21.06it/s, loss=4.3501]


After 200064 examples, Average Loss: 4.3274



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:43, 33.83it/s][A
                                                            [A

Validation Average Loss: 4.2953, Perplexity: 73.36
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been charged with murder and a ##-year-old man who was arrested on suspicion of murder and was arrested on suspicion of murder . ' ' I was a very nice man . ' '' said . ' I was a very nice man

Context: New York

Generated text: New York City 's ##-year-old , who has been in the world , and the ##-year-old was born in #### . ' I was a very nice man . ' '' said the .### . '' ) . '' ' I

Context: The hurricane

Generated text: The hurricane is a long-term solution , and the .### , '' said the .## '' of the #### World War . ' '' said . ' I 've been in the world . '' ' I said . ' I 'm not going to


Epoch 1/1:  79%|███████▉  | 135984/172148 [1:48:15<1:26:19,  6.98it/s, loss=4.3847]


Context: The President

Generated text: The President of the National Assembly , which is the most important thing for the country 's most recent . '' #### #### #### #### .### , #### . '' ) . '' said . ' I 've been in the world . '' ' I 'm


Epoch 1/1:  80%|███████▉  | 137542/172148 [1:49:29<27:14, 21.17it/s, loss=4.2844]


After 200064 examples, Average Loss: 4.3285



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:39, 35.24it/s][A
                                                            [A

Validation Average Loss: 4.2956, Perplexity: 73.38
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been a `` very good '' of the country 's most recent .##-caliber rifle . ' '' said the .##pm . '' ) . '' said . ' I 've been able to get a good time . ' ''

Context: New York

Generated text: New York City Council said the ##-year-old was found guilty of murdering ##-year-old daughter , who was found guilty of murdering ##-year-old daughter , who was found guilty of murdering ##-year-old daughter ,

Context: The hurricane

Generated text: The hurricane is the first time the first time in the world is the most commonplace . ' '' said the .##-caliber rifle . ' '' said the .##pm . '' ) . '' said . ' I 've been able to get


Epoch 1/1:  80%|███████▉  | 137548/172148 [1:49:30<1:23:37,  6.90it/s, loss=4.3947]


Context: The President

Generated text: The President 's office is the first time in the United States , which is the first time in the United States . '' 's . '' said . '' ' I said . '' ' I said . ' I 've been able to get a good time


Epoch 1/1:  81%|████████  | 139105/172148 [1:50:44<25:37, 21.50it/s, loss=4.4562]


After 200064 examples, Average Loss: 4.3237



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:40, 34.94it/s][A
                                                            [A

Validation Average Loss: 4.2957, Perplexity: 73.38
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been a `` very good '' of the president 's speech , '' he said . '' ' I 'm not sure . '' ' I think it was a very good thing . '' ' I think it was a great thing . '' ' I

Context: New York

Generated text: New York City 's ##-year-old was arrested in the hospital after the attack was found in the area . ' ' I was a very good man . '' ' I was a bit of a little bit . ' '' said the .##-cal

Context: The hurricane

Generated text: The hurricane of the crash occurred in the city of the city . '' 's . '' ) . '' ' I 'm not sure . '' ' I think it was a very good thing . '' ' I think it was a very good thing . '' '


Epoch 1/1:  81%|████████  | 139111/172148 [1:50:45<1:18:58,  6.97it/s, loss=4.2655]


Context: The President

Generated text: The President 's office is a very good man , '' he said . '' ' I 'm not sure . '' ' I think it was a very good thing . '' ' I think it was a very good thing . '' ' I think it was a


Epoch 1/1:  82%|████████▏ | 140667/172148 [1:51:59<24:25, 21.49it/s, loss=4.3837]


After 200064 examples, Average Loss: 4.3247



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:43, 33.90it/s][A
                                                            [A

Validation Average Loss: 4.2875, Perplexity: 72.79
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been charged with the murder of the ##-year-old , who was arrested in the hospital . ' ' I was in the house . ' '' said . ' I was a man who was a man who was a man who was a member

Context: New York

Generated text: New York City Council said the ##-year-old was arrested in the hospital , where he was arrested in the hospital . ' . ' '' said the .## . '' ) . ' I was a very good friend . ' '' said . ' I was

Context: The hurricane

Generated text: The hurricane is a huge increase in the number of people in the country . ' . '' ) . '' ' I think I 'm not going to be able to get the right to be able to get the best to the first time . ' '' he said


Epoch 1/1:  82%|████████▏ | 140672/172148 [1:52:01<1:24:08,  6.23it/s, loss=4.2623]


Context: The President

Generated text: The President of the U.S. government is not the first time in the country . '' 's . ' '' said . ' I 'm not sure what I 'm not going to be a very good thing . ' '' said . ' I '


Epoch 1/1:  83%|████████▎ | 142232/172148 [1:53:14<24:14, 20.56it/s, loss=4.3806]


After 200064 examples, Average Loss: 4.3265



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:40, 35.08it/s][A
                                                            [A

Validation Average Loss: 4.3570, Perplexity: 78.03
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been a `` very dangerous '' and that he would be a very good idea . '' 's a very good player . '' ' I 'm not going to be a bit of a very good team . 's a very good player . ''

Context: New York

Generated text: New York City 's #-# win over the last ## . ' I 've got to be a good player . '' ' I 'm not going to be a bit of a bit of a bit of a bit of a bit of a bit of

Context: The hurricane

Generated text: The hurricane is a long time , the government has been a `` very dangerous '' to the government . '' 's not the case . '' ' I 'm not going to be a very good idea . '' 's a very good player . '' ' I


Epoch 1/1:  83%|████████▎ | 142235/172148 [1:53:16<1:37:30,  5.11it/s, loss=4.4508]


Context: The President

Generated text: The President has been a `` very strong '' to the government . '' 's not the president , '' he said . '' ) . '' ' I said . '' ' I said . '' ' I do n't think I 'm going to be a


Epoch 1/1:  84%|████████▎ | 143793/172148 [1:54:29<21:51, 21.63it/s, loss=4.3707]


After 200064 examples, Average Loss: 4.3268



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:39, 35.23it/s][A
                                                            [A

Validation Average Loss: 4.2968, Perplexity: 73.47
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been a 'happy ' of the 'deal ' by the . ' I was in the room . ' '' he said . ' I 'm not sure . ' '' he said . ' I was a very good friend . ' ''

Context: New York

Generated text: New York City 's #-# win over the world , but the ##-year-old was a ##-year-old man who was a ##-year-old girl , who was a ##-year-old girl , who was found guilty

Context: The hurricane

Generated text: The hurricane is a long time , but it is not a problem . ' '' said the .### ) . '' ' I 've got to be a very good thing . ' '' said . ' I was a very good friend . ' '' said .


Epoch 1/1:  84%|████████▎ | 143799/172148 [1:54:31<1:08:17,  6.92it/s, loss=4.3940]


Context: The President

Generated text: The President has been a 'most important role in the first time , but the government has been in the case . '' 's not the case . ' '' he said . ' I 'm not sure . ' '' he said . ' I was a very


Epoch 1/1:  84%|████████▍ | 145356/172148 [1:55:44<20:07, 22.19it/s, loss=4.2614]


After 200064 examples, Average Loss: 4.3223



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:40, 35.05it/s][A
                                                            [A

Validation Average Loss: 4.2903, Perplexity: 72.99
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's official statement said : `` The U.S. '' of the U.S. military . '' 's `` <rare> '' . ' '' said the .### ) . '' ' I 'm not sure what I 'm

Context: New York

Generated text: New York City Council has been charged with murdering a ##-year-old girl , who was arrested in connection with the incident . ' ' I 'm not sure what I 'm not going to do . '' ' I said . ' I 'm

Context: The hurricane

Generated text: The hurricane is a long-term solution to the .### . '' ) . '' ' I said . ' I 'm not sure how to do it . ' '' he said . ' I 'm not sure how to do it . ' '' she


Epoch 1/1:  84%|████████▍ | 145362/172148 [1:55:46<1:03:43,  7.01it/s, loss=4.4544]


Context: The President

Generated text: The President of the U.S. military has been a `` very serious '' of the country 's most recent years . '' 's not the first time . ' '' he said . ' I 'm not sure how to do it . ' '' he


Epoch 1/1:  85%|████████▌ | 146919/172148 [1:56:59<20:08, 20.88it/s, loss=4.3030]


After 200064 examples, Average Loss: 4.3227



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:39, 35.30it/s][A
                                                            [A

Validation Average Loss: 4.2991, Perplexity: 73.63
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been a `` very good opportunity to get the best to the team . '' ' I think it 's a good thing . '' ' I think I 'm not sure what I 'm going to be a good player . ' '' he said

Context: New York

Generated text: New York City Council said the ##-year-old was a ##-year-old boy who was a woman who was a woman who was a woman who was a woman who was a woman who was a woman who was a woman who was a woman who

Context: The hurricane

Generated text: The hurricane was a major problem in the region . '' 's a source . ) . '' said . '' ' I said . '' ' I told him . ' I 'm not sure what I was going to be a bit . ' '' said . '


Epoch 1/1:  85%|████████▌ | 146925/172148 [1:57:01<1:01:17,  6.86it/s, loss=4.2996]


Context: The President

Generated text: The President 's office , which is the first time of the year , and the ##-year-old was in the midst of a ##-year-old man who was a ##-year-old boy who was shot dead by a police officer .


Epoch 1/1:  86%|████████▋ | 148484/172148 [1:58:16<19:33, 20.16it/s, loss=4.2752]


After 200064 examples, Average Loss: 4.3204



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:41, 34.53it/s][A
                                                            [A

Validation Average Loss: 4.2960, Perplexity: 73.40
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been a `` very sadistic '' of the president 's speech . '' 's `` The <rare> '' . ' '' said the .### . '' ) . '' said . ' I 'm a friend who was a little girl

Context: New York

Generated text: New York City 's #-# win over the world . ' '' he said . ' I 'm not going to be a little bit . ' '' said the .### . ' '' said the .### . ' '' said . '' '

Context: The hurricane

Generated text: The hurricane was discovered in the area . ' '' said the .### . '' ) . '' said . '' ' I said . ' I 'm not going to be a little bit . ' '' said the .## . '' ) . '' said .


Epoch 1/1:  86%|████████▋ | 148487/172148 [1:58:17<1:15:15,  5.24it/s, loss=4.4657]


Context: The President

Generated text: The President 's office has been a `` very sadistic '' of the president 's speech . '' 's `` The <rare> '' . ' '' said the .### . '' ) . '' said . ' I 'm a friend who was


Epoch 1/1:  87%|████████▋ | 150046/172148 [1:59:33<17:35, 20.94it/s, loss=4.4097]


After 200064 examples, Average Loss: 4.3193



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:41, 34.48it/s][A
                                                            [A

Validation Average Loss: 4.2945, Perplexity: 73.30
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's government has been accused of the attacking the Islamic State . ' '' said the .##-caliber handgun . ' '' said the .##pm . ' ) . '' ' I 'm a friend of the . ' I

Context: New York

Generated text: New York City 's ##-year-old , who was a ##-year-old , who was a ##-year-old , who was a ##-year-old , who was a ##-year-old , who was a ##-

Context: The hurricane

Generated text: The hurricane was found in the area , and the ##-year-old was killed . ' . ' '' said the .##pm . ' ) . '' ' I 'm a friend of the . ' I was a bit of a good job . '


Epoch 1/1:  87%|████████▋ | 150052/172148 [1:59:35<54:33,  6.75it/s, loss=4.2847]  


Context: The President

Generated text: The President 's office is not the first time . '' 's `` The right '' . ' '' said the .##pm . ' ) . '' ' I 'm a friend of the . ' I was a bit of a good job . ' ''


Epoch 1/1:  88%|████████▊ | 151610/172148 [2:00:51<16:22, 20.90it/s, loss=4.3501]


After 200064 examples, Average Loss: 4.3186



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:47, 32.53it/s][A
                                                            [A

Validation Average Loss: 4.2950, Perplexity: 73.33
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been accused of being a `` very good '' of the country 's most powerful . '' . '' said . '' ' I said . ' I was a very good friend . ' '' said . ' I was a very good friend . ' ''

Context: New York

Generated text: New York City 's official website said : 'The whole thing is that I 've been doing . ' I was a very good friend . ' '' said . ' I was a very good friend . ' '' said . ' I was a very good friend

Context: The hurricane

Generated text: The hurricane was the first time in the UK , and the ##-year-old was arrested in connection with the incident . ' I was a man . ' '' said . ' I was a very good friend . ' '' said . ' I was a very


Epoch 1/1:  88%|████████▊ | 151613/172148 [2:00:52<1:05:01,  5.26it/s, loss=4.2721]


Context: The President

Generated text: The President of the U.S. government has been accused of the attacking the government 's government to be able to control the government 's government to be able to control the government 's government to be able to control the government 's government to


Epoch 1/1:  89%|████████▉ | 153172/172148 [2:02:08<15:34, 20.30it/s, loss=4.2498]


After 200064 examples, Average Loss: 4.3202



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:52, 31.21it/s][A
                                                            [A

Validation Average Loss: 4.2971, Perplexity: 73.48
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's government has been accused of being a `` very dangerous '' of the country 's largest . '' '' said the .## '' . '' ) . '' said . '' ' I was a member of the school . ' '' said the .##

Context: New York

Generated text: New York City , the ##-year-old , who has been in the world , and the most popular destination for the world 's biggest-ever-generation of the world 's biggest-ever-generation is the most popular destination for the

Context: The hurricane

Generated text: The hurricane was the first time in the area , the ##-year-old said . 's not a case of the case . '' ' I was a member of the school . ' '' said the .## p.m. , the .## .


Epoch 1/1:  89%|████████▉ | 153177/172148 [2:02:09<51:54,  6.09it/s, loss=4.4326]


Context: The President

Generated text: The President was also accused of the `` serious threat '' to be a `` very dangerous '' of the country 's largest . '' '' said the .## '' . '' ) . '' said . '' ' I was a member of the school . ' '' said


Epoch 1/1:  90%|████████▉ | 154734/172148 [2:03:26<14:17, 20.31it/s, loss=4.1536]


After 200064 examples, Average Loss: 4.3179



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:43, 33.90it/s][A
                                                            [A

Validation Average Loss: 4.2869, Perplexity: 72.74
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been a major issue of the country 's most recent years . '' 's not . '' said . '' ' I said . ' I was a bit of a bit of a lot of people . ' '' said the .## . '' )

Context: New York

Generated text: New York City Council , which is the first time in the UK , which is expected to be a key part of the country 's most recent years . '' 's not . ' '' said the .### . '' ) . '' ' I was a

Context: The hurricane

Generated text: The hurricane was the first time in the crash , which was found in the area . ' '' said the police officer . ' I was a woman . ' '' said . ' I was a woman . ' '' said . ' I was a woman . ' ''


Epoch 1/1:  90%|████████▉ | 154740/172148 [2:03:27<43:32,  6.66it/s, loss=4.3694]


Context: The President

Generated text: The President has been charged with the death of the two-year-old , who was killed in the crash , which was found in the village of the city of the city of the city of the city of the city of the city of the city of the


Epoch 1/1:  91%|█████████ | 156299/172148 [2:04:44<13:11, 20.02it/s, loss=4.3814]


After 200064 examples, Average Loss: 4.3203



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:47, 32.67it/s][A
                                                            [A

Validation Average Loss: 4.2888, Perplexity: 72.88
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been a `` very good opportunity to be a good player . '' ' I 'm not sure what I 'm going to be a little bit . ' '' said the .### ) . '' ' I told him . ' I was a

Context: New York

Generated text: New York City 's ##-year-old was arrested in #### after the deaths of the ##-year-old . ' I was a little bit . ' '' said . ' I 'm not sure what I 'm going to do . '

Context: The hurricane

Generated text: The hurricane was the first time in the world , and the ##-year-old was in the first half . ' I was a bit of a good job . ' '' said the .### . ' I 'm not going to be a little bit


Epoch 1/1:  91%|█████████ | 156302/172148 [2:04:45<52:01,  5.08it/s, loss=4.3695]


Context: The President

Generated text: The President of the United States has been in the UK , which is expected to be a third of the most expensive . 's #.# million . '' ) . '' said . ' I 'm not sure what I 'm going to do . '


Epoch 1/1:  92%|█████████▏| 157860/172148 [2:06:03<11:47, 20.18it/s, loss=4.3152]


After 200064 examples, Average Loss: 4.3167



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:50, 31.80it/s][A
                                                            [A

Validation Average Loss: 4.2851, Perplexity: 72.61
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been charged with the murder of the murder of the murder of the murder of the murder of the woman . ' ' I was a man . ' '' said . ' I 've got a lot of people . ' '' said the . ' I

Context: New York

Generated text: New York City 's #-# draw with the United States is the first to be a great player in the Premier League . ' '' he said . ' I 've got a lot of the players . '' ' I 've got a lot of the

Context: The hurricane

Generated text: The hurricane was found in the area , and the ##-year-old was killed in the crash . ' ' I 've got a lot of people . ' '' said the . ' I 'm not sure . ' '' he said . ' I '


Epoch 1/1:  92%|█████████▏| 157865/172148 [2:06:05<39:48,  5.98it/s, loss=4.2367]


Context: The President

Generated text: The President 's office has been charged with the murder of the murder of the murder of the murder of the murder of the woman . ' ' I was a man . ' '' said . ' I 've got a lot of people . ' '' said the


Epoch 1/1:  93%|█████████▎| 159423/172148 [2:07:24<10:56, 19.39it/s, loss=4.2477]


After 200064 examples, Average Loss: 4.3144



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:44, 33.70it/s][A
                                                            [A

Validation Average Loss: 4.2908, Perplexity: 73.02
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been a `` very important '' of the country 's government . '' 's not the same . '' ) . '' said . ' I was a very good friend . ' '' said . ' I was a very good friend . ' '' said

Context: New York

Generated text: New York City , the ##-year-old , who is a member of the United States . 's the same . '' ) . '' ' I 'm not sure what happened . ' '' she said . ' I was a very good friend . '

Context: The hurricane

Generated text: The hurricane was the first time in the UK , the ##-year-old , who is the first of the most famous man . ' '' said the .##pm . '' ) . '' ' I 'm not sure what happened . ' '' she said


Epoch 1/1:  93%|█████████▎| 159429/172148 [2:07:25<35:04,  6.05it/s, loss=4.3521]


Context: The President

Generated text: The President 's office is a `` very important '' of the American government . '' 's not the same . '' ' I 'm not sure what happened . ' '' she said . ' I was a very good friend . ' '' said . ' I


Epoch 1/1:  94%|█████████▎| 160987/172148 [2:08:42<08:42, 21.34it/s, loss=4.4473]


After 200064 examples, Average Loss: 4.3161



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:38, 35.53it/s][A
                                                            [A

Validation Average Loss: 4.2853, Perplexity: 72.63
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's official statement said . 'Given the . '' ) . '' said . ' I 'm not going to be a little bit . ' '' she said . ' I 'm not going to be a little bit . ' '' she said

Context: New York

Generated text: New York City 's #-# win over the World Cup . ' I 'm going to be a good player . '' ' I 'm going to be a good player . '' ' I 'm going to be a good player . '' ' I

Context: The hurricane

Generated text: The hurricane was discovered in the city of the city . ' '' I said . ' '' she said . ' I 'm not going to be a little bit . ' '' she said . ' I 'm not going to be a little bit . ' ''


Epoch 1/1:  94%|█████████▎| 160993/172148 [2:08:43<26:50,  6.93it/s, loss=4.2262]


Context: The President

Generated text: The President has been a `` very good opportunity '' to the United States . '' 's a news conference . ' '' he said . ' I 'm not going to be a little bit . ' '' she said . ' I 'm not going to be


Epoch 1/1:  94%|█████████▍| 162550/172148 [2:10:00<07:51, 20.37it/s, loss=4.3133]


After 200064 examples, Average Loss: 4.3144



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:49, 32.18it/s][A
                                                            [A

Validation Average Loss: 4.2890, Perplexity: 72.89
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been a major factor in the attacking government . '' 's a statement . ) . '' ' I 'm sure . ' '' he said . ' I 'm not sure . ' '' he said . ' I 'm not sure .

Context: New York

Generated text: New York City 's #-# draw with the ##-year-old , who was a ##-year-old , who was a ##-year-old , who was a ##-year-old , who was arrested in the hospital , and

Context: The hurricane

Generated text: The hurricane was found in the water , and the ##-year-old was found in the area . ' '' I 'm a friend of the family and friends . ' '' said . ' I 'm a friend of the family and I was n'


Epoch 1/1:  94%|█████████▍| 162556/172148 [2:10:02<23:57,  6.67it/s, loss=4.3957]


Context: The President

Generated text: The President of the U.S. government is not the only one of the most important people . '' 's a .### ) . '' ' I 'm sure . ' '' said the .### ) . '' ' I 'm a friend


Epoch 1/1:  95%|█████████▌| 164113/172148 [2:11:19<06:38, 20.17it/s, loss=4.2926]


After 200064 examples, Average Loss: 4.3151



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:43, 34.07it/s][A
                                                            [A

Validation Average Loss: 4.2917, Perplexity: 73.09
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's official statement said : 'We are not going to be able to get the money to be able to do it . '' ' I told him . ' I 'm not sure what happened . ' '' she said . '' ' I told him

Context: New York

Generated text: New York City Council has been accused of being a `` very serious threat '' to be a . '' ) . '' ' I told the BBC . ' I 'm not sure what happened . ' '' she said . '' ' I told him . ' I '

Context: The hurricane

Generated text: The hurricane was found in the city of the city of the city of the city of the city of the city of the city of the city of the city of the city of the city of the city of the city of the city of the city of the city


Epoch 1/1:  95%|█████████▌| 164119/172148 [2:11:21<20:03,  6.67it/s, loss=4.3839]


Context: The President

Generated text: The President of the U.S. military has been accused of being a `` very serious threat '' to be a `` very serious threat '' . '' ' I told him . ' I 'm not sure what happened . ' '' she said . '' ' I


Epoch 1/1:  96%|█████████▌| 165676/172148 [2:12:38<05:29, 19.67it/s, loss=4.2406]


After 200064 examples, Average Loss: 4.3164



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:44, 33.63it/s][A
                                                            [A

Validation Average Loss: 4.2904, Perplexity: 72.99
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been a `` very good '' of the presidential election . '' . '' ) . '' said . '' ' I 'm a friend . '' . ) . '' ) . '' ' I 'm a friend . '' . ) . '' )

Context: New York

Generated text: New York City , the ##-year-old , who was a ##-year-old man who was a ##-year-old man who was a ##-year-old man who was shot dead by a police officer . ' ' I was a

Context: The hurricane

Generated text: The hurricane was found in the area . ' '' said the .### ) . '' . ) # . '' ) . ' I 'm a friend . '' . ) . '' ) . '' ' I 'm a friend . '' . ) . ''


Epoch 1/1:  96%|█████████▌| 165681/172148 [2:12:40<19:14,  5.60it/s, loss=4.4089]


Context: The President

Generated text: The President 's office said the government has not been able to identify the suspects . ' '' said the .## . '' ) . '' said . '' ' I 'm a friend . '' . ) . '' ) . '' ' I 'm a


Epoch 1/1:  97%|█████████▋| 167238/172148 [2:13:57<03:56, 20.74it/s, loss=4.2733]


After 200064 examples, Average Loss: 4.3129



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:47, 32.71it/s][A
                                                            [A

Validation Average Loss: 4.2897, Perplexity: 72.95
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's official statement said : 'We are not aware of the situation . '' ' I said . ' '' she said . ' I 'm not sure what happened . ' '' she said . ' I 'm not sure what happened . ' ''

Context: New York

Generated text: New York City Council said the `` <rare> '' of the `` <rare> '' of the `` <rare> '' of the `` <rare> '' of the `` <rare> '' of the <rare> , which is the first time . ''

Context: The hurricane

Generated text: The hurricane was the first time in the past year . ' '' he said . ' '' he said . ' I 'm not sure what happened . ' '' she said . ' I 'm not sure what happened . ' '' she said . ' I '


Epoch 1/1:  97%|█████████▋| 167244/172148 [2:13:59<12:05,  6.76it/s, loss=4.2491]


Context: The President

Generated text: The President has been a `` very strong '' of the president 's decision to end the . '' ) of the first time . ' '' he said . ' I 'm not sure what happened . ' '' she said . ' I 'm not sure what


Epoch 1/1:  98%|█████████▊| 168803/172148 [2:15:14<02:33, 21.84it/s, loss=4.3676]


After 200064 examples, Average Loss: 4.3152



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:49, 32.07it/s][A
                                                            [A

Validation Average Loss: 4.2865, Perplexity: 72.71
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been charged with the murder of a ##-year-old woman who was arrested in connection with the death of the death of the death of the death of the death of the death of the death of the death of the death of the death of

Context: New York

Generated text: New York City 's #-# #-# #-# #-# #-# #-# #-# #-# #-# #-# #-# #-# #-# #-# #-# #-

Context: The hurricane

Generated text: The hurricane was the first time of the year , and the ##-year-old was in the UK . ' ' I 'm not sure what I 'm going to be a little bit . ' '' said . ' I 'm not sure what I


Epoch 1/1:  98%|█████████▊| 168806/172148 [2:15:16<10:28,  5.32it/s, loss=4.2422]


Context: The President

Generated text: The President 's office said the government has been `` very concerned '' by the government . '' 's . ' ) . '' ' I said . ' I was a very good friend . ' '' said . ' I was a friend of the couple . '


Epoch 1/1:  99%|█████████▉| 170366/172148 [2:16:28<01:22, 21.63it/s, loss=4.3556]


After 200064 examples, Average Loss: 4.3114



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:39, 35.33it/s][A
                                                            [A

Validation Average Loss: 4.2846, Perplexity: 72.57
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow 's first-handed attacker was the first of the deadliest attack on the island of the city . '' ' I 'm a friend . '' ' I 'm a friend . ' '' said . '' ' I 'm a friend

Context: New York

Generated text: New York City 's #-# win over the World Cup . '' ' I 've got to be a bit more . '' ' I 've got to be a bit more . '' ' I 've got to be a bit more . '' '

Context: The hurricane

Generated text: The hurricane was found in the area , and the plane was found in the area . ' '' he said . ' '' she said . ' I 'm not sure what I 'm not going to be a bit of a good job . '' ' I '


Epoch 1/1:  99%|█████████▉| 170369/172148 [2:16:30<05:32,  5.36it/s, loss=4.2799]


Context: The President

Generated text: The President 's government has been in the country 's government and the government 's government to be a `` very dangerous '' . '' '' said . '' ) . '' ' I 'm not sure what I 'm going to be a very good thing


Epoch 1/1: 100%|█████████▉| 171929/172148 [2:17:43<00:10, 21.45it/s, loss=4.2463]


After 200064 examples, Average Loss: 4.3117



Evaluating:   0%|          | 0/3514 [00:00<?, ?it/s][A
Evaluating:   0%|          | 4/3514 [00:00<01:37, 35.86it/s][A
                                                            [A

Validation Average Loss: 4.2843, Perplexity: 72.55
Computed on 1024 sentences
Generating text based on contexts using generate_text:


Context: Moscow

Generated text: Moscow has been charged with the murder of the death of the ##-year-old . ' I was a very good friend . ' '' said . ' I was a friend . ' '' said . ' I was a friend . ' '' said . '

Context: New York

Generated text: New York City Mayor Michael Bloomberg said the attack was `` very concerned '' by the president 's government . '' ' I said . '' ' I said . ' I was a little bit of the car . ' '' she said . ' I was a

Context: The hurricane

Generated text: The hurricane was the first time in the UK , and the ##-year-old is the first time in the UK . ' '' he said . ' I 'm not going to be a good time . ' '' she said . ' I was a friend


Epoch 1/1: 100%|█████████▉| 171932/172148 [2:17:44<00:40,  5.33it/s, loss=4.2707]


Context: The President

Generated text: The President 's office is not the first time to be able to take the lead . '' ' I 'm not sure what 's going to happen . '' ' I said . '' ' I was a friend . ' '' said . ' I was a


Epoch 1/1: 100%|██████████| 172148/172148 [2:17:54<00:00, 20.80it/s, loss=4.3168]



Epoch 1/1, Average Loss: 4.3072




Validation Average Loss: 4.2824, Perplexity: 72.41



## Testing the model

In the cell below, we load and test the language model:

In [5]:
# ----------------------------
# Model tests
# ----------------------------

# Load the previously saved model and tokenizer from disk
# This recreates the exact model state from after training
model, tokenizer = load_model(model_name)

model.eval()

# Print header for test section
print("Testing the model:\n")

# Define a list of test prompts to evaluate model performance
contexts = [
    "Moscow",
    "New York",
    "A hurricane",
    "The President"
]

# Iterate through each test prompt and generate text
for context in contexts:
    # Generate text using greedy decoding (most likely tokens)
    # Add a space after the context to separate it from generated text
    generated_text = generate_text(
        model=model,          # The loaded language model
        start_string=context,  # Add space to avoid the generated text to be "glued" to the prompt
        tokenizer=tokenizer,  # Tokenizer for text conversion
        device=device,        # CPU or GPU device
        max_length=50         # Maximum length of generated sequence
    )

    # Print the original prompt and model's response
    print(f"\nPrompt: {context}")
    print(f"\nGenerated response: {generated_text}")

  model_state = torch.load(f'{file_prefix}_model.pth', map_location=device)


Testing the model:


Prompt: Moscow

Generated response: Moscow has been a `` very serious '' to the government 's decision . '' 's . '' ) . '' ) . '' ' I 'm not sure what 's wrong . ' '' ) . '' ) . ' '' she said . ' ''

Prompt: New York

Generated response: New York City 's #-#-# win over the world 's top ## . ' '' he said . ' '' he said . ' '' he said . ' '' I 'm not sure ] . ' '' she said . ' '' she said

Prompt: A hurricane

Generated response: A hurricane season , the ##-year-old was found dead in the car . ' '' . ' '' ) . '' ' I 'm not sure what 's ] . ' '' she said . ' '' she said . ' '' she said . '

Prompt: The President

Generated response: The President 's office has been criticised for the first time in the past . '' ' I 'm not sure what 's happening . '' ' I 'm not sure what I 'm not going to be a little bit . ' '' he said


In [6]:
from google.colab import runtime
runtime.unassign()