This code outlines the implementation of a **Memorizing Transformer** model, which incorporates **K-Nearest Neighbors (KNN)** memory retrieval and **relative position encoding** into a Transformer-like architecture. I'll walk you through the important parts of the code to help you understand what's going on, step-by-step.

### Key Concepts:

1. **Transformer Architecture**: This is a deep learning model used primarily for NLP tasks, where words (or tokens) are passed through layers of attention mechanisms. Transformers use "self-attention" to process all tokens simultaneously, allowing them to capture long-range dependencies in sequences.

2. **Relative Position Encoding**: Rather than using absolute positions (fixed numbers that represent token positions), this model uses relative positions between tokens. This helps the model to handle variable-length sequences better.

3. **K-Nearest Neighbors (KNN)**: In this context, KNN is used to retrieve relevant "memories" from previous computations. This means that instead of only considering the current input sequence, the model can also access stored memory (previous computations or knowledge), which is used to augment the attention mechanism.

### High-Level Overview of Workflow:

1. **Tokenization**: Raw text is tokenized into tokens using `AutoTokenizer`.
2. **Embedding**: Token IDs are passed through an embedding layer to convert them into continuous vectors.
3. **Layer Processing**: The input passes through multiple Transformer blocks:
   - **Attention**: Captures relationships between tokens using self-attention (or KNN-enhanced attention).
   - **Feedforward**: Further processes the attention outputs.
4. **Memory Augmentation**: Each layer may augment its attention with memories from the KNN structure.
5. **Output**: The final output is passed through a linear layer to generate token probabilities, and the loss is calculated using these predictions.

### Why This Approach is Important:

- **Relative Position Encoding**: This allows the model to handle long sequences and variable lengths without needing to explicitly memorize the position of each token in the sequence. It can generalize better over different lengths.
  
- **KNN Memory**: The addition of KNN memory helps the model "remember" past computations, making it more powerful for long-term sequence generation or tasks where context from distant tokens is necessary.

### Key Terms Explained:

- **Attention**: A mechanism in neural networks where each output token is computed as a weighted sum of input tokens. The weights (attention scores) are learned during training.
  
- **KNN**: A machine learning algorithm that stores a set of "memories" (previous computations) and retrieves the closest ones based on a distance metric.
  
- **Cross-entropy loss**: A loss function used for classification problems, comparing the predicted probabilities with the true labels.
  
- **Residual connections**: Skip connections that allow the model to retain previous layer outputs, which helps mitigate issues like vanishing gradients during training.

Let me know if you would like any part of this explained further!

#### **Imports**

- `torch`, `torch.nn`, and `torch.optim`: Standard PyTorch libraries for defining neural networks and optimization.
- `transformers` and `AutoTokenizer`: Used to load and preprocess data (e.g., converting text into tokens).
- `faiss`: A library used for efficient similarity search and clustering (useful for KNN memory retrieval).
- `datasets`: A library for managing datasets.
- `einops`: A powerful utility for tensor manipulation, used to rearrange, repeat, or unpack tensors efficiently.

In [1]:
import torch
from torch import nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader, Dataset
import numpy as np
import math
import os
import random
import tqdm
import gzip

!pip install einops
from einops import rearrange, repeat, pack, unpack, einsum
from einops.layers.torch import Rearrange


from functools import partial, wraps
from contextlib import contextmanager, ExitStack
from pathlib import Path
from filelock import FileLock
import pickle

import transformers
from transformers import AutoTokenizer

!pip install faiss-gpu
import faiss

!pip install datasets
import datasets


Collecting faiss-gpu
  Downloading faiss_gpu-1.7.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.4 kB)
Downloading faiss_gpu-1.7.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (85.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.5/85.5 MB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-gpu
Successfully installed faiss-gpu-1.7.2
Collecting datasets
  Downloading datasets-3.2.0-py3-none-any.whl.metadata (20 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.9.0,>=2023.1.0 (from fsspec[http]<=2024.9.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.9.0-py3-no

# Fix up Classes

#### **RelativePosition Class**
This class is responsible for creating **relative position embeddings**.

- `relative_position_bucket`: This function computes relative positions between tokens in the sequence and maps them to predefined buckets. It normalizes positions and uses a logarithmic scale to handle longer sequences more effectively.
  
- `forward`: This function calculates the position of each token relative to others and returns an embedding vector for each relative position.

In [3]:
class RelativePosition(nn.Module):
  def __init__(
      self,
      rp_scale,
      num_buckets = 32,
      rp_max_distance = 128,
      heads = 8
  ):
      super().__init__()
      self.scale = rp_scale
      self.num_buckets = num_buckets
      self.rp_max_distance = rp_max_distance
      self.relative_attention_embedding = nn.Embedding(num_buckets, heads)

  def relative_position_bucket(self, relative_position_matrix):
      n = -relative_position_matrix
      n = torch.max(n, torch.zeros_like(n))

      max_exact = self.num_buckets // 2

      is_small = n < max_exact
      val_if_large = max_exact + (torch.log(n.float() / max_exact) / math.log(self.rp_max_distance / max_exact) * (self.num_buckets - max_exact)).long()
      val_if_large = torch.min(val_if_large, torch.full_like(val_if_large, self.num_buckets - 1))

      return torch.where(is_small, n, val_if_large)

  def forward(self, sequence_length):

      # Change: In the new version, context_pos is created with 2 * sequence_length, and the positions are rearranged using rearrange() to create a more explicit relation between sequence and context positions.
      # Reason: The change is likely made to increase the context size relative to the sequence length, allowing the model to process larger contexts and better capture long-range dependencies. The use of rearrange clarifies how the positions are expanded in dimensions for later computations.
      sequence_pos = torch.arange(sequence_length, dtype=torch.long)
      #########
      context_pos = torch.arange(2 * sequence_length, dtype=torch.long)
      sequence_rel_pos = rearrange(sequence_pos, 'i -> i 1')
      context_rel_pos = rearrange(context_pos, 'j -> 1 j')
      rel_pos = context_rel_pos - sequence_rel_pos

      position_bucket_indices = self.relative_position_bucket(rel_pos)

      rp_values = self.relative_attention_embedding(position_bucket_indices)

      # Change: In the new version, rearrange() is used instead of transpose() and unsqueeze(). It reformats the rp_values tensor from its original dimensions (i, j, h) to ((), h, i, j), where () implies the batch dimension is empty or not used explicitly.
      # Reason: This change might be made to clarify or standardize the dimensionality of the output tensor, ensuring that the relative position embeddings align correctly with the model's expected shape. The rearrange operation here likely facilitates batch processing or ensures the correct dimensionality for multi-head attention.
      rp_values = rearrange(rp_values, 'i j h -> () h i j')
      return rp_values * self.scale



- Change:

In the new version, context_pos is created with 2 * sequence_length, and the positions are rearranged using rearrange() to create a more explicit relation between sequence and context positions.

- Reason:

The change is likely made to increase the context size relative to the sequence length, allowing the model to process larger contexts and better capture long-range dependencies. The use of rearrange clarifies how the positions are expanded in dimensions for later computations.

In [4]:
## Old

sequence_length = 5
sequence_pos = torch.arange(sequence_length, dtype=torch.long)
context_pos = torch.arange(sequence_length, dtype=torch.long)
context_pos = torch.arange(-sequence_length, sequence_length, dtype=torch.long)
sequence_rel_pos = rearrange(sequence_pos, 'i -> i 1')
context_rel_pos = rearrange(context_pos, 'j -> 1 j')
rel_pos = context_rel_pos - sequence_rel_pos
rel_pos

tensor([[-5, -4, -3, -2, -1,  0,  1,  2,  3,  4],
        [-6, -5, -4, -3, -2, -1,  0,  1,  2,  3],
        [-7, -6, -5, -4, -3, -2, -1,  0,  1,  2],
        [-8, -7, -6, -5, -4, -3, -2, -1,  0,  1],
        [-9, -8, -7, -6, -5, -4, -3, -2, -1,  0]])

In [5]:
## Improved

sequence_length = 5
sequence_pos = torch.arange(sequence_length, dtype=torch.long)
context_pos = torch.arange(2*sequence_length, dtype=torch.long)
#context_pos = torch.arange(-sequence_length, sequence_length, dtype=torch.long)
sequence_rel_pos = rearrange(sequence_pos, 'i -> i 1')
context_rel_pos = rearrange(context_pos, 'j -> 1 j')
rel_pos = context_rel_pos - sequence_rel_pos
rel_pos

tensor([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
        [-1,  0,  1,  2,  3,  4,  5,  6,  7,  8],
        [-2, -1,  0,  1,  2,  3,  4,  5,  6,  7],
        [-3, -2, -1,  0,  1,  2,  3,  4,  5,  6],
        [-4, -3, -2, -1,  0,  1,  2,  3,  4,  5]])

#### **KNN_XLAttention Class**
This class defines a custom attention mechanism called **KNN Attention**. It combines traditional self-attention with KNN retrieval to enhance the model's ability to memorize long-term dependencies.

- **Input Tensors**:
  - `queries`, `keys`, and `values`: These are standard components of the attention mechanism. Queries are compared to keys to calculate attention scores, which are then used to weight the values.
  
- **KNN Memory Retrieval**:
  - The `knn.search` function searches for the closest memories (key-value pairs) to the current query. These memories are then integrated into the attention mechanism.

- **Masking**:
  - A triangular mask is applied to ensure that the model does not attend to future tokens (important for causal language modeling).

- **Output**:
  - The attention outputs are combined with the KNN-based outputs, weighted by a gate bias.

In [6]:
class KNN_XLAttention(nn.Module):
    def __init__(
        self,
        embedding_dimension,
        knn,
        heads = 8,
        head_dimension = 32,
        topk_retrieved_memories = 3,
        dropout = 0.
    ):
        super().__init__()
        self.heads = heads
        self.scale = head_dimension ** -0.5

        # Change: A dropout parameter is introduced, which is used to apply dropout to the attention scores and softmax results.
        # Reason: Dropout is commonly used to prevent overfitting and improve generalization. By introducing a configurable dropout rate, the model can be adapted for different regularization needs.
        self.dropout = nn.Dropout(dropout)

        self.query_matrix = nn.Linear(embedding_dimension, heads * head_dimension)
        self.key_matrix = nn.Linear(embedding_dimension, heads * head_dimension)
        self.value_matrix = nn.Linear(embedding_dimension, heads * head_dimension)
        self.output_matrix = nn.Linear(heads * head_dimension, embedding_dimension)

        self.gate_bias = nn.Parameter(torch.randn(self.heads, 1, 1))
        self.topk_retrieved_memories = topk_retrieved_memories
        self.knn = knn

    def forward(
        self,
        x, # batch_size, sequence_length, embedding_dimension

        # Change: The new version introduces a relative_positions argument in the forward method.
        # Reason: This change enables the model to incorporate relative position encodings, which can help the model better capture long-range dependencies and improve the effectiveness of attention over sequences with varying lengths.
        relative_positions = None,
        xl_memory = None
    ):
        batch_size, sequence_length = x.shape[:2]
        queries = self.query_matrix(x)
        keys = self.key_matrix(x)
        values = self.value_matrix(x)

        # Change: In the new version, F.normalize is applied to the queries and keys along the last dimension.
        # Reason: Normalization helps to stabilize training by ensuring that the inputs to the attention mechanism are on a consistent scale. This can also help with convergence speed and overall model performance.
        queries = F.normalize(queries, dim=-1)
        keys = F.normalize(keys, dim=-1)

        if xl_memory is not None:
            k_xl, v_xl = xl_memory.unbind(dim = -2) # unstack
            keys = torch.cat((k_xl, keys), dim = -2) # prepend XL memory
            values = torch.cat((v_xl, values), dim = -2) # prepend XL memory
            xl_sequence_length = k_xl.shape[1]

        ### LOCAL ATTENTION

        queries = rearrange(queries, 'b t (h d) -> b h t d', h = self.heads)
        keys    = rearrange(keys, 'b t (h d) -> b h t d', h = self.heads)
        qk      = einsum(queries, keys, 'b h i d, b h j d -> b h i j')

        qk = qk * self.scale

        # Change: The new version adds the relative_positions information to the attention scores (qk), but only if relative_positions is provided.
        # Reason: The addition of relative position encoding enhances the model's ability to understand the relative position between different elements in the sequence. This is especially useful in tasks that require long-term dependencies or in transformers that are sensitive to sequence length and order.
        i, j = qk.shape[-2:]
        if relative_positions is not None:
            qk = relative_positions[..., -i:, -j:] + qk



        mask = torch.ones((i,j), dtype = torch.bool).triu(j-i+1)
        qk = qk.masked_fill(mask, float('-inf'))

        qk = F.softmax(qk, dim=-1)

        qk = self.dropout(qk)

        values = rearrange(values, 'b t (h d) -> b h t d', h=self.heads)
        qkv = qk@values
        qkv = rearrange(qkv, 'b h t d -> b t (h d)')

        ### KNN ATTENTION

        # Convert queries to search form
        queries = rearrange(queries, 'b h t d -> b t (h d)')
        mem_kv = knn.search(queries, topk = self.topk_retrieved_memories) # returns b t k 2 d
        mem_k, mem_v = mem_kv.unbind(dim = -2)
        mem_k = rearrange(mem_k, 'b t k (h d) -> b h t k d', h=self.heads)
        mem_v = rearrange(mem_v, 'b t k (h d) -> b h t k d', h=self.heads)

        # Convert queries to attention form
        queries = rearrange(queries, 'b t (h d) -> b h t d', h = self.heads)
        mem_qk = einsum('b h t d, b h t k d -> b h t k', queries, mem_k)
        mem_qk = mem_qk * self.scale

        mem_qk = F.softmax(mem_qk, dim=-1)
        mem_qk = self.dropout(mem_qk)
        mem_qkv = einsum('b h t k, b h t k d -> b h t d', mem_qk, mem_v)

        # Combined attentions
        # Change: The gate bias (self.gate_bias) is used to combine the local and KNN-based attention results, controlling how much influence each should have on the final output.
        # Reason: The gate bias introduces a learnable weighting mechanism between the two attention types (local and KNN), potentially improving the model's ability to blend these two sources of information in a task-specific way.
        combined_qkv = mem_qkv * self.gate_bias + qkv * (1 - self.gate_bias)
        combined_qkv = rearrange(combined_qkv, 'b h t d -> b t (h d)')
        out = self.output_matrix(combined_qkv)

        # New XL memories
        keys = rearrange(keys, 'b h t d -> b t (h d)', h = self.heads)
        values = rearrange(values, 'b h t d -> b t (h d)', h=self.heads)
        kv_memories = torch.stack((keys, values), dim=-2) # (batch, sequence_len, 2, dimension)

        if xl_memory is not None:
            # if we're on a middle/end segment of a document (there are previous XL memories)
            xl_memories, current_kv = kv_memories[:, :-xl_sequence_length], kv_memories[:, -xl_sequence_length:]
        else:
            # if we're at the first segment
            current_kv = kv_memories

        knn.add(current_kv)

        return out, current_kv

Let's build the model!

From the paper:

"The input text is tokenized, and the tokens
are embedded into vector space. The embedding vectors are passed through a series of transformer
layers, each of which does dense self-attention, followed by a feed-forward network (FFN). Since
this is a decoder-only language model, we use a causal attention mask and the token embeddings of
the last layer are used to predict the next token."


This code outline demonstrates the basic flow of a neural network model for natural language processing (NLP), possibly a transformer-based architecture like GPT or BERT. Here's a breakdown of each section:

### 1. **Embedding**:
- **Explanation**: The raw text is first tokenized using the `tokenizer`, which converts the text into a series of token IDs. These IDs represent words or subwords in a vocabulary. The `embedding` layer then transforms these token IDs into dense vector representations (embeddings), which are used as the initial input for the model.

### 2. **BLOCK (n layers)**:
This section indicates that the model consists of `n` layers of blocks. Each block typically contains an attention mechanism followed by a feedforward network, and each block operates on the output of the previous one.

### 3. **Attention**:
  - **Residual Connection**: A copy of the input `x` is saved as `residual` before any transformation. This is part of the **residual connection**, a technique introduced in ResNet (and used in transformers) where the original input is added back after the transformation to help with gradient flow and improve convergence during training.
  - **Layer Normalization**: Before applying attention, `x` is normalized using layer normalization. Layer normalization standardizes the inputs to each layer, helping stabilize training.
  - **Attention**: The core mechanism here is the attention layer, which could be **XL (Extended Memory) attention** or **KNN_XL** (K-nearest neighbors with extended memory). These attention mechanisms allow the model to focus on different parts of the input sequence, with KNN_XL additionally enabling the model to retrieve and attend to past memory sequences for better long-term dependency modeling.
  - **Residual Addition**: After the attention transformation, the original input (stored in `residual`) is added back to the result. This helps preserve important features from the input that might be lost in transformations.

### 4. **Feedforward**:
  - **Residual Connection**: As with the attention part, a copy of `x` is saved as `residual` to be added back later.
  - **Layer Normalization**: The input is normalized again to ensure stable training before passing it through the feedforward network.
  - **Linear Layers**: The `linear` and `linear_2` layers are fully connected layers that transform the input data to higher or lower dimensions.
  - **Activation Function**: After the first linear transformation, a nonlinear activation function (like ReLU, GELU, etc.) is applied to introduce nonlinearity into the model, enabling it to learn more complex functions.
  - **Dropout**: Dropout is applied to prevent overfitting. It randomly zeros out a fraction of the neurons during training to reduce reliance on any single neuron.
  - **Residual Addition**: As in the attention section, the original input to the feedforward block (`residual`) is added back to the transformed output to form the final result. This is another residual connection.

### 5. **Output**:
  - **Layer Normalization**: The final output `x` is normalized before being passed to the output layer. This ensures that the output has stable values and is well-scaled.
  - **Embedding Reverse**: The output embeddings are transformed back into token IDs using the `embedding_reverse` layer. This step converts the model's learned representation back into a form that can be compared to the target output (token IDs).
  - **Loss Calculation**: The model's predicted token IDs are compared to the true `labels` using the `cross_entropy` loss function. Cross-entropy is commonly used in classification tasks, where the model is trying to predict the correct label from a set of possible classes (in this case, token IDs). The loss is minimized during training, which helps the model improve its performance over time.

### **Summary of the Process**:
- **Input**: The raw text is tokenized and embedded.
- **Attention Mechanism**: The model processes the embeddings using a combination of attention (XL or KNN_XL) and residual connections.
- **Feedforward**: The output from attention is further processed through a feedforward network with normalization, linear transformations, activation, and dropout.
- **Output**: Finally, the output is transformed back into token IDs, and a loss is computed using cross-entropy.

The overall structure resembles that of transformer-based architectures (like GPT, BERT), where attention mechanisms allow the model to focus on different parts of the sequence and feedforward networks help to capture complex patterns in the data. The use of residual connections and layer normalization helps ensure stable and efficient training.

In [None]:
# Outline

# Embedding
token_ids = tokenizer(raw_text)
x = embedding(token_ids)

# BLOCK x n (layers)

# Attention
residual = x.copy()
x = layernorm(x)
x = attention(x) # XL, KNN_XL
x = x + residual

# Feedforward
residual = x.copy()
x = layernorm(x)
x = linear(x)
x = activation(x)
x = dropout(x)
x = linear_2(x)
x = x + residual


# Output
x = layernorm(x)
token_ids = embedding_reverse(x)
loss = cross_entropy(token_ids, labels)

In [None]:
# build a pseudocode version of things first
# build the simplest version possible
# keep a checklist
# test

#### **Block Class**
This class defines a standard block (layer) in the Transformer model, consisting of **Attention** and **Feedforward** sub-blocks.

- **Attention**:
  - The input is passed through attention layers (either XL or KNN-based). Residual connections (additive shortcuts) are used to stabilize learning.
  
- **Feedforward Network**:
  - After the attention step, the output goes through a feedforward network (linear layers with activation functions) to further transform the representations.

In [9]:
class Block(nn.Module):
    def __init__(self, knn=None, dropout=0.):
        if knn:
            self.attention = KNNAttention(self.embedding_dimension,
                            knn,
                            heads = self.heads,
                            head_dimension = self.head_dimension,
                            dropout = self.dropout)
        else:
            self.attention = XLAttention(self.embedding_dimension,
                            heads = self.heads,
                            head_dimension = self.head_dimension,
                            dropout = self.dropout)

        self.ff_block = nn.Sequential(
            nn.LayerNorm(dim),
            nn.Linear(dim, dim * 4),
            nn.GELU(),
            nn.Dropout(dropout),
            nn.Linear(dim * 4, dim))

    def forward(self, x, xl_memories, rel_pos):
        residual = x
        norm = nn.LayerNorm(dim)
        attn_out = norm(x)
        attn_out, new_xl_memories = self.attention(attn_out, relative_positions=rel_pos, xl_memory=xl_memories)
        attn_out += residual

        residual = attn_out
        ff_out = self.ff_block(attn_out)
        ff_out += residual
        return ff_out, new_xl_memories



#### **MemorizingTransformer Class**
This is the main model class, which builds the entire Transformer network.

- **Initialization**:
  - It initializes embedding layers, KNN, relative position encoders, and multiple layers of attention blocks. The KNN is used to store and retrieve memories throughout the model's processing.

- **Forward Pass**:
  - The input sequence `x` is passed through multiple layers of attention and feedforward blocks. Each layer uses either relative position encoding or KNN-based attention, depending on the layer index.
  
- **Memory Management**:
  - The model maintains "XL memories," which are propagated through the layers. This allows the model to retain information from previous sequences and incorporate it into the current computation.
  
- **Loss Calculation**:
  - The output logits (predictions) are compared to the true labels using **cross-entropy loss**.

In [None]:
class MemorizingTransformer(nn.Module):
    def __init__(
        self,
        embedding_dimension,
        vocab_size,
        heads = 8,
        depth = 10,
        dropout = 0,
        head_dimension = 64,
        max_knn_memories = 32000,
        topk = 5,

    ):
        super().__init__()
        self.heads = heads
        self.embedding_dimension = embedding_dimension
        self.dropout = dropout
        self.depth = depth
        self.head_dimension = head_dimension
        self.max_knn_memories = max_knn_memories
        self.topk = topk
        self.rel_pos = RelativePosition(rp_scale = head_dimension** 0.5, heads = self.heads)
        self.rel_pos_knn = RelativePosition(rp_scale = head_dimension** 0.5, heads = self.heads)
        self.embedding_matrix = nn.Embedding(vocab_size, self.embedding_dimension)
        self.knn = KNN(head_dimension * heads, self.max_knn_memories)
        self.layers = nn.ModuleList([])

        for i in range(self.depth):

            if i == self.depth-2:
                layer_knn = self.knn
            else:
                layer_knn = None

            self.layers.append(Block(layer_knn))

        self.to_logits = nn.Sequential(
            nn.LayerNorm(self.embedding_dimension),
            nn.Linear(self.embedding_dimension, vocab_size)
        )

    def forward(
        self,
        x,
        relative_positions = None,
        xl_memories = None,
        labels = None,
    ):

        batch_size, sequence_length = x.shape[0], x.shape[1]

        # Position values
        rel_pos = self.rel_pos(sequence_length)
        rel_pos_knn = self.rel_pos_knn(sequence_length)

        if xl_memories is not None:
            xl_memories = xl_memories
        else:
            xl_memories = (None,) * self.depth

        # Iterator
        xl_memories_iter = iter(xl_memories)

        # Store the XL memories for each pass
        new_xl_memories = []

        # Embeddings
        x = self.embedding_matrix(x)

        for ind, block in enumerate(self.layers):

            if ind == self.depth-2:
                layer_rel_pos = rel_pos_knn
            else:
                layer_rel_pos = rel_pos

            x, xl_mem = block(x, next(xl_memories_iter), layer_rel_pos)

            if xl_mem is not None:
                new_xl_memories.append(xl_mem)

        logits = self.to_logits(x)

        loss = F.cross_entropy(rearrange(logits, 'b n c -> b c n'), labels)
        if len(new_xl_memories) > 0:
            return loss, new_xl_memories
        return loss

# Full Model

Your code represents an implementation of a transformer architecture with support for different attention mechanisms, such as XLAttention and KNNAttention, which incorporate memory mechanisms like KNN (K-Nearest Neighbors) and external memories for long-range dependencies. Here's a summary of key components:

1. **RelativePosition**: This class computes relative position embeddings, which help the model capture the relationship between positions in the input sequence. It supports a variety of configurations to determine the bucket indices based on the distance between sequence positions.

2. **KNN Class**: Implements a memory-based approach using K-Nearest Neighbors (KNN). It stores embeddings (keys and values) of previous sequences and retrieves the top-k nearest neighbors during attention, enabling the model to use external memory to enhance its ability to process long-range dependencies efficiently. This is achieved by combining the current sequence with previously stored embeddings.

3. **XLAttention**: A standard self-attention mechanism, enhanced with external memory (via `xl_memory`). This allows the model to perform attention over previously seen sequence tokens, supporting long-range dependencies without quadratic complexity growth.

4. **KNNAttention**: A modification of the self-attention mechanism that incorporates KNN-based retrieval of memory. The top-k nearest neighbors are retrieved for each query, allowing the model to access relevant past context dynamically. This approach enhances memory management for transformers handling long sequences, improving performance on tasks like document summarization.

5. **Block**: Each block consists of a combination of attention (either XLAttention or KNNAttention) and a feed-forward layer. It integrates residual connections, normalization, and dropout for regularization.

6. **MemorizingTransformer**: The main model class that integrates multiple transformer blocks. It uses both relative position encodings and either XL or KNN attention mechanisms. This class manages the processing of sequences, including retrieval of memory, updating memory, and predicting outputs using a final linear layer.

### Training Setup:
- The dataset being processed is the **arxiv-summarization** dataset, which has been pre-processed into chunks of text to fit within a transformer model's input constraints.
- A training loop is set up to use a dynamic batch size, learning rate, and gradient clipping to handle the model’s optimization.

This architecture is designed to improve long-term dependency handling in sequence models by utilizing memory mechanisms like KNN and external XL memory, which can be especially useful in tasks requiring processing of large documents or long sequences.

Let me know if you'd like further clarifications or have specific questions!

In [None]:
class RelativePosition(nn.Module):
  def __init__(
      self,
      rp_scale,
      num_buckets = 32,
      rp_max_distance = 128,
      heads = 8
  ):
      super().__init__()
      self.scale = rp_scale
      self.num_buckets = num_buckets
      self.rp_max_distance = rp_max_distance
      self.relative_attention_embedding = nn.Embedding(num_buckets, heads)

  def relative_position_bucket(self, relative_position_matrix):
      n = -relative_position_matrix
      n = torch.max(n, torch.zeros_like(n))

      max_exact = self.num_buckets // 2

      is_small = n < max_exact
      val_if_large = max_exact + (torch.log(n.float() / max_exact) / math.log(self.rp_max_distance / max_exact) * (self.num_buckets - max_exact)).long()
      val_if_large = torch.min(val_if_large, torch.full_like(val_if_large, self.num_buckets - 1))

      return torch.where(is_small, n, val_if_large)

  def forward(self, sequence_length):

      sequence_pos = torch.arange(sequence_length, dtype=torch.long)
      context_pos = torch.arange(2 * sequence_length, dtype=torch.long)
      sequence_rel_pos = rearrange(sequence_pos, 'i -> i 1')
      context_rel_pos = rearrange(context_pos, 'j -> 1 j')
      rel_pos = context_rel_pos - sequence_rel_pos

      position_bucket_indices = self.relative_position_bucket(rel_pos)

      rp_values = self.relative_attention_embedding(position_bucket_indices)
      rp_values = rearrange(rp_values, 'i j h -> () h i j')
      return rp_values * self.scale



class KNN():
    def __init__(
        self,
        dim,
        max_memories,
        ):
        self.dim = dim
        self.max_memories = max_memories
        self.shape = (max_memories, 2, dim)
        self.db_offset = 0
        self.db_filepath = "./memory.memmap"
        self.db = np.memmap(self.db_filepath, mode = 'w+', dtype = np.float32, shape = self.shape)
        self.index = faiss.IndexFlatL2(dim)


    def add_to_db(self, new_data):
        new_data_len = new_data.shape[0]
        ids = (np.arange(new_data_len) + self.db_offset)
        self.db[ids] = new_data.detach().numpy()
        self.db_offset += new_data_len
        # Write to file
        self.db.flush()


    def search_and_retrieve(self, query_vecs, topk):
        query_vecs = query_vecs
        distances, indices = self.index.search(query_vecs, topk)
        kvs = self.db[indices]
        return kvs

    def add(self, new_data):
        # Input is b n 2 d, flatten to (b n) 2 d
        new_data = new_data.flatten(0,1)
        # Add to db
        self.add_to_db(new_data)
        # Only keys are used in knn index
        keys, vals = new_data.unbind(dim=-2)
        keys = keys.detach().numpy()
        # Add (b n) d tensors to index
        keys = np.ascontiguousarray(keys)
        # Add to index
        self.index.add(keys)

    def search(self, query_vecs, topk):
        # can override topk
        query_batch_size, query_seq_len = query_vecs.shape[0], query_vecs.shape[1]
        # Input is b n d, flatten to (b n) d
        query_vecs = query_vecs.flatten(0,1)
        kvs = self.search_and_retrieve(np.ascontiguousarray(query_vecs.detach().numpy()), topk)
        # kvs are (b n) k 2 d, unflatten to b n k 2 d
        kvs = torch.tensor(kvs)
        kvs = torch.unflatten(kvs, 0, (query_batch_size, query_seq_len))
        return kvs


    def clear(self):
        self.index.reset()
        self.db[:] = 0
        self.db_offset = 0


class XLAttention(nn.Module):
    def __init__(
        self,
        embedding_dimension,
        heads = 8,
        head_dimension = 64,
        dropout = 0.,
    ):
        super().__init__()
        self.heads = heads
        self.dropout = nn.Dropout(dropout)
        self.scale = head_dimension ** -0.5

        self.query_matrix = nn.Linear(embedding_dimension, self.heads * head_dimension)
        self.key_matrix = nn.Linear(embedding_dimension, self.heads * head_dimension)
        self.value_matrix = nn.Linear(embedding_dimension, self.heads * head_dimension)
        self.output_matrix = nn.Linear(self.heads * head_dimension, embedding_dimension)

    def forward(
        self,
        x, # batch_size, sequence_length, embedding_dimension
        relative_positions = None,
        xl_memory = None
    ):

        queries = self.query_matrix(x)
        keys = self.key_matrix(x)
        values = self.value_matrix(x)

        queries = queries * self.scale

        if xl_memory is not None:
            k_xl, v_xl = xl_memory.unbind(dim = -2) # assume stacked
            keys = torch.cat((k_xl, keys), dim = -2) # prepend XL memory
            values = torch.cat((v_xl, values), dim = -2) # prepend XL memory
            xl_sequence_length = k_xl.shape[1]

        queries = rearrange(queries, 'b t (h d) -> b h t d', h = self.heads)
        keys    = rearrange(keys, 'b t (h d) -> b h t d', h = self.heads)
        qk      = einsum(queries, keys, 'b h i d, b h j d -> b h i j')

        i, j = qk.shape[-2:]
        if relative_positions is not None:
            qk = relative_positions[..., -i:, -j:] + qk

        qk = qk * self.scale

        mask = torch.ones((i,j), dtype = torch.bool).triu(j-i+1)
        qk = qk.masked_fill(mask, float('-inf'))

        qk = F.softmax(qk, dim=-1)
        qk = self.dropout(qk)

        values = rearrange(values, 'b t (h d) -> b h t d', h=self.heads)
        qkv = qk@values
        qkv = rearrange(qkv, 'b h t d -> b t (h d)')

        out = self.output_matrix(qkv)

        # new XL memories

        keys = rearrange(keys, 'b h t d -> b t (h d)', h = self.heads)
        values = rearrange(values, 'b h t d -> b t (h d)', h=self.heads)
        kv_memories = torch.stack((keys, values), dim=-2) # (batch, sequence_len, 2, dimension)


        if xl_memory is not None:
            xl_memories, current_input = kv_memories[:, :-xl_sequence_length], kv_memories[:, -xl_sequence_length:]
            kv_to_add_xl = current_input
        else:
            kv_to_add_xl = kv_memories

        return out, kv_to_add_xl



class KNNAttention(nn.Module):
    def __init__(
        self,
        embedding_dimension,
        knn,
        heads = 8,
        head_dimension = 64,
        topk_retrieved_memories = 3,
        dropout = 0.
    ):
        super().__init__()
        self.heads = heads
        self.scale = head_dimension ** -0.5
        self.dropout = nn.Dropout(dropout)

        self.query_matrix = nn.Linear(embedding_dimension, heads * head_dimension)
        self.key_matrix = nn.Linear(embedding_dimension, heads * head_dimension)
        self.value_matrix = nn.Linear(embedding_dimension, heads * head_dimension)
        self.output_matrix = nn.Linear(heads * head_dimension, embedding_dimension)

        self.gate_bias = nn.Parameter(torch.randn(self.heads, 1, 1))
        self.topk_retrieved_memories = topk_retrieved_memories
        self.knn = knn

    def forward(
        self,
        x, # batch_size, sequence_length, embedding_dimension
        relative_positions = None,
        xl_memory = None
    ):
        batch_size, sequence_length = x.shape[:2]
        queries = self.query_matrix(x)
        keys = self.key_matrix(x)
        values = self.value_matrix(x)

        queries = F.normalize(queries, dim=-1)
        keys = F.normalize(keys, dim=-1)

        if xl_memory is not None:
            k_xl, v_xl = xl_memory.unbind(dim = -2) # unstack
            keys = torch.cat((k_xl, keys), dim = -2) # prepend XL memory
            values = torch.cat((v_xl, values), dim = -2) # prepend XL memory
            xl_sequence_length = k_xl.shape[1]

        ### LOCAL ATTENTION

        queries = rearrange(queries, 'b t (h d) -> b h t d', h = self.heads)
        keys    = rearrange(keys, 'b t (h d) -> b h t d', h = self.heads)
        qk      = einsum(queries, keys, 'b h i d, b h j d -> b h i j')

        i, j = qk.shape[-2:]
        if relative_positions is not None:
            qk = relative_positions[..., -i:, -j:] + qk

        qk = qk * self.scale

        mask = torch.ones((i,j), dtype = torch.bool).triu(j-i+1)
        qk = qk.masked_fill(mask, float('-inf'))

        qk = F.softmax(qk, dim=-1)

        qk = self.dropout(qk)

        values = rearrange(values, 'b t (h d) -> b h t d', h=self.heads)
        qkv = qk@values

        ### KNN ATTENTION

        # If there are knn memories (we're not on the first segment) then perform knn attention
        if self.knn.index.ntotal > 0:
            # Convert queries to search form
            queries = rearrange(queries, 'b h t d -> b t (h d)')
            mem_kv = self.knn.search(queries, topk = self.topk_retrieved_memories) # returns b t k 2 d
            mem_k, mem_v = mem_kv.unbind(dim = -2)
            mem_k = rearrange(mem_k, 'b t k (h d) -> b h t k d', h=self.heads)
            mem_v = rearrange(mem_v, 'b t k (h d) -> b h t k d', h=self.heads)

            # Convert queries to attention form
            queries = rearrange(queries, 'b t (h d) -> b h t d', h = self.heads)
            mem_qk = einsum(queries, mem_k, 'b h t d, b h t k d -> b h t k')
            mem_qk = mem_qk * self.scale

            mem_qk = F.softmax(mem_qk, dim=-1)
            mem_qk = self.dropout(mem_qk)
            mem_qkv = einsum(mem_qk, mem_v, 'b h t k, b h t k d -> b h t d')

            # Combined attentions

            combined_qkv = mem_qkv * self.gate_bias + qkv * (1 - self.gate_bias)
            combined_qkv = rearrange(combined_qkv, 'b h t d -> b t (h d)')
            out = self.output_matrix(combined_qkv)

        else:
            qkv = rearrange(qkv, 'b h t d -> b t (h d)')
            out = self.output_matrix(qkv)

        # New XL memories
        keys = rearrange(keys, 'b h t d -> b t (h d)', h = self.heads)
        values = rearrange(values, 'b h t d -> b t (h d)', h=self.heads)
        kv_memories = torch.stack((keys, values), dim=-2) # (batch, sequence_len, 2, dimension)

        if xl_memory is not None:
            # if we're on a middle/end segment of a document (there are previous XL memories)
            xl_memories, current_kv = kv_memories[:, :-xl_sequence_length], kv_memories[:, -xl_sequence_length:]
        else:
            # if we're at the first segment
            current_kv = kv_memories

        self.knn.add(current_kv)

        return out, current_kv


class Block(nn.Module):
    def __init__(self, embedding_dimension, attention_type, dropout=0.):
        super().__init__()
        self.attention = attention_type
        self.dim = embedding_dimension

        self.ff_block = nn.Sequential(
            nn.LayerNorm(self.dim),
            nn.Linear(self.dim, self.dim * 4),
            nn.GELU(),
            nn.Dropout(dropout),
            nn.Linear(self.dim * 4, self.dim))

    def forward(self, x, xl_memories, rel_pos):
        residual = x
        norm = nn.LayerNorm(self.dim)
        attn_out = norm(x)
        attn_out, new_xl_memories = self.attention(attn_out, relative_positions=rel_pos, xl_memory=xl_memories)
        attn_out += residual

        residual = attn_out
        ff_out = self.ff_block(attn_out)
        ff_out += residual
        return ff_out, new_xl_memories


class MemorizingTransformer(nn.Module):
    def __init__(
        self,
        embedding_dimension,
        vocab_size,
        max_knn_memories = 81920,
        heads = 8,
        depth = 10,
        dropout = 0,
        head_dimension = 64,
        topk = 5,

    ):
        super().__init__()
        self.heads = heads
        self.embedding_dimension = embedding_dimension
        self.dropout = dropout
        self.depth = depth
        self.head_dimension = head_dimension
        self.max_knn_memories = max_knn_memories
        self.topk = topk

        ###########
        self.rel_pos = RelativePosition(rp_scale = head_dimension** 0.5,
                                        heads = self.heads)
        self.rel_pos_knn = RelativePosition(rp_scale = head_dimension** 0.5,
                                        heads = self.heads)
        self.embedding_matrix = nn.Embedding(vocab_size, self.embedding_dimension)

        self.knn = KNN(head_dimension * heads, self.max_knn_memories)



        self.layers = nn.ModuleList([])
        for i in range(self.depth):

            if i == self.depth-2:
                attention_type = KNNAttention(self.embedding_dimension,
                            self.knn,
                            heads = self.heads,
                            head_dimension = self.head_dimension,
                            dropout = self.dropout)
            else:
                attention_type = XLAttention(self.embedding_dimension,
                            heads = self.heads,
                            head_dimension = self.head_dimension,
                            dropout = self.dropout)

            self.layers.append(Block(self.embedding_dimension, attention_type))

        self.to_logits = nn.Sequential(
            nn.LayerNorm(self.embedding_dimension),
            nn.Linear(self.embedding_dimension, vocab_size)
        )


    def forward(
        self,
        x,
        relative_positions = None,
        xl_memories = None,
        labels = None,
    ):

        batch_size, sequence_length = x.shape[0], x.shape[1]

        # Position values
        rel_pos = self.rel_pos(sequence_length)
        rel_pos_knn = self.rel_pos_knn(sequence_length)

        # If no XL memories (start of a sequence) then None type for each layer.
        # There is one set of XL memories for each layer
        # xl_memories = default(xl_memories, (None,) * self.num_xl_memory_layers)
        if xl_memories is not None:
            xl_memories = xl_memories
        else:
            xl_memories = (None,) * self.depth

        # Iterator
        xl_memories_iter = iter(xl_memories)

        # Embeddings
        x = self.embedding_matrix(x)

        # Store the XL memories for each pass
        new_xl_memories = []

        for ind, block in enumerate(self.layers):

            if i == self.depth-2:
                layer_rel_pos = rel_pos_knn
            else:
                layer_rel_pos = rel_pos

            x, xl_mem = block(x, next(xl_memories_iter), layer_rel_pos)

            if xl_mem is not None:
                ############
                new_xl_memories.append(xl_mem.detach())



        logits = self.to_logits(x)

        # Training
        loss = F.cross_entropy(rearrange(logits, 'b n c -> b c n'), labels)
        if len(new_xl_memories) > 0:
            return loss, new_xl_memories
        return loss

# Training loop

Your code outlines a training loop for a custom transformer model, `MemorizingTransformer`, with enhancements for memory management using both KNN-based memory and XL memory mechanisms. Here's a breakdown of key aspects and some points to consider:

### Key Elements of Your Training Loop:

1. **Model Setup**:
   - The model is initialized with `embedding_dimension = 128` and `vocab_size = 128`, implying a relatively small embedding space. The `max_knn_memories` parameter limits the amount of memory retained by the KNN mechanism.
   - The optimizer used is Adam with a learning rate defined by `LEARNING_RATE`.

2. **Training Loop**:
   - The outer loop runs for 200 iterations, and the training loss is calculated for each batch. You clear both the XL and KNN memories at the beginning of each iteration, allowing the model to learn afresh.
   - **Memory Clearing**: The use of `xl_memories = None` and `model.knn.clear()` ensures that no old memory persists between training iterations, which can help avoid excessive memory usage and stale context.
   - **Sequence Segmentation**: The input sequences (`seq`) and labels (`labels`) are chunked into smaller segments for training. This is beneficial when working with long sequences, as it reduces memory overhead and allows the model to focus on smaller portions of the sequence at a time.
   
3. **Loss Calculation**:
   - The loss is computed for each segment and averaged. The backward pass is done for each segment, and gradients are clipped using `torch.nn.utils.clip_grad_norm_` to avoid gradient explosion.

4. **Validation**:
   - Every few steps (`VALIDATE_EVERY`), the model switches to evaluation mode (`model.eval()`). It then processes validation data in a similar manner as the training loop but without gradient computation (`torch.no_grad()`).

### Potential Improvements or Considerations:

- **Memory Management**: If you're using external memory such as KNN or XL memories, the model's memory management should be carefully handled. This means ensuring that memories are cleared only when appropriate, and potentially saving key states between epochs or segments if needed for consistency across batches.
- **Efficient Data Loading**: If your data is large, consider optimizing the way data is loaded using `DataLoader` or parallel processing to ensure you're not bottlenecked by disk I/O.
- **Learning Rate and Grad Clipping**: The learning rate and gradient clipping threshold (`MAX_GRAD_CLIP_NORM`) should be tuned to the specific problem to prevent overfitting or slow convergence.

This setup appears to be designed for handling sequences with long-range dependencies, particularly where standard attention mechanisms may struggle due to memory constraints.

In [None]:
SEGMENTS = 10
SEQUENCE_LENGTH = 512
CHUNK_SIZE = (SEGMENTS * SEQUENCE_LENGTH) + 1 #### we need +1
BATCH_SIZE = 8
LEARNING_RATE = 2e-4
MAX_GRAD_CLIP_NORM = 0.5
VALIDATE_EVERY = 100
MAX_KNN_MEMORIES = BATCH_SIZE * 1 * SEQUENCE_LENGTH * SEGMENTS


dataset = datasets.load_dataset("ccdv/arxiv-summarization", split='train', streaming=True)
raw_dataset = list(dataset.take(3500))

raw_articles = [x['article'] for x in raw_dataset]
raw_articles = [x for x in raw_articles if len(x) > CHUNK_SIZE]
converted = [np.fromstring(doc, dtype=np.uint8) for doc in raw_articles]

def clip_article(doc, chunk_size):
    remainder = len(doc) % chunk_size
    return doc[:-remainder]

clipped = [clip_article(doc, CHUNK_SIZE) for doc in converted]
chunked = np.array([doc.reshape(-1, CHUNK_SIZE) for doc in clipped])
processed_data = torch.tensor(np.concatenate(chunked), dtype=torch.long)
processed_data.shape

eighty_split = int(processed_data.shape[0] * .8)
ninety_split = int(processed_data.shape[0] * .9)
train_loader = iter(DataLoader(processed_data[:eighty_split], batch_size = BATCH_SIZE, shuffle = True))
val_loader = iter(DataLoader(processed_data[eighty_split:ninety_split], batch_size = BATCH_SIZE, shuffle = True))
test_loader = iter(DataLoader(processed_data[ninety_split:], batch_size = BATCH_SIZE, shuffle = True))

Downloading builder script:   0%|          | 0.00/5.14k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/2.83k [00:00<?, ?B/s]

  converted = [np.fromstring(doc, dtype=np.uint8) for doc in raw_articles]
  chunked = np.array([doc.reshape(-1, CHUNK_SIZE) for doc in clipped])


In [None]:
model = MemorizingTransformer(embedding_dimension = 128,
                              vocab_size = 128,
                              max_knn_memories = MAX_KNN_MEMORIES)

optim = torch.optim.Adam(model.parameters(), lr = LEARNING_RATE)
model.train()


for i in tqdm.tqdm(range(200), mininterval = 10., desc = 'training'):

    model.train()
    train_loss = 0.
    #########
    # Clear XL memories
    xl_memories = None
    #########
    # Clear KNN memory
    model.knn.clear()

    data = next(train_loader)
    seq, labels = data[:, :-1], data[:, 1:]


    # Each pass will be (BATCH_SIZE * SEGMENTS) iterations
    for seq_segment, labels_segment in zip(seq.chunk(SEGMENTS, dim = -1), labels.chunk(SEGMENTS, dim = -1)):

        loss, xl_memories = model(
            seq_segment,
            labels = labels_segment,
            xl_memories = xl_memories
        )

        train_loss += loss.item() / SEGMENTS
        (loss / SEGMENTS).backward()
        print ("segment complete")


    print(f'training loss: {train_loss}')
    torch.nn.utils.clip_grad_norm_(model.parameters(), MAX_GRAD_CLIP_NORM)
    optim.step()
    optim.zero_grad()


    if not (i % VALIDATE_EVERY):
        model.eval()

        valid_data = next(val_loader)
        valid_loss = 0.

        with torch.no_grad():
            xl_memories = None
            model.knn.clear()
            seq, labels = data[:, :-1], data[:, 1:]

            for seq_segment, labels_segment in zip(seq.chunk(SEGMENTS, dim = -1), labels.chunk(SEGMENTS, dim = -1)):

                loss, xl_memories = model(
                    seq_segment,
                    labels = labels_segment,
                    xl_memories = xl_memories
                )

                valid_loss += loss.item() / SEGMENTS

        print(f'valid loss: {valid_loss}')

training:   0%|          | 0/200 [00:00<?, ?it/s]

segment complete
segment complete
segment complete
segment complete
segment complete
segment complete
segment complete
segment complete
segment complete
segment complete
training loss: 4.96113452911377


training:   0%|          | 1/200 [02:15<7:31:01, 135.99s/it]

valid loss: 4.396467971801758
segment complete
segment complete
segment complete
segment complete
segment complete
segment complete
segment complete
segment complete
segment complete


training:   1%|          | 2/200 [03:45<5:59:15, 108.87s/it]

segment complete
training loss: 4.425439643859864
segment complete
segment complete
segment complete
segment complete
segment complete
segment complete
segment complete
segment complete
segment complete


training:   1%|          | 2/200 [05:13<8:37:28, 156.81s/it]


KeyboardInterrupt: ignored

# End to End GPU

In [None]:
import torch
from torch import nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader, Dataset
import numpy as np
import math
import os
import random
import tqdm
import gzip
import time

!pip install einops
from einops import rearrange, repeat, pack, unpack, einsum
from einops.layers.torch import Rearrange


from functools import partial, wraps
from contextlib import contextmanager, ExitStack
from pathlib import Path
from filelock import FileLock
import pickle

import transformers
from transformers import AutoTokenizer

!pip install faiss-gpu
import faiss

!pip install datasets
import datasets

Collecting einops
  Downloading einops-0.7.0-py3-none-any.whl (44 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/44.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.6/44.6 kB[0m [31m1.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: einops
Successfully installed einops-0.7.0
Collecting faiss-gpu
  Downloading faiss_gpu-1.7.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (85.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.5/85.5 MB[0m [31m18.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-gpu
Successfully installed faiss-gpu-1.7.2
Collecting datasets
  Downloading datasets-2.15.0-py3-none-any.whl (521 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m521.2/521.2 kB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m
Collecting pyarrow-hotfix (from datasets)
  Downloading pyarrow_hotfix-0.6-py3-none-any.whl

In [None]:
# can check our GPU

In [None]:
!nvidia-smi

Thu Dec 21 17:48:33 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   50C    P8              12W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

The code provided outlines a large and complex deep learning model built using PyTorch. The model seems to be a variant of a transformer model with memory and attention mechanisms, specifically integrating a KNN-based (k-nearest neighbors) attention mechanism. Here's a breakdown of the key components:

### 1. **Relative Positioning** (`RelativePosition` class):
   - This class calculates relative positions of sequence elements in the input data. The relative position embeddings help the model learn dependencies between elements at various distances.
   - It utilizes a technique that buckets positions into a predefined number of ranges (`num_buckets`) to handle distances more effectively.
   - This is used in the main attention mechanism to improve performance over simple absolute position embeddings.

### 2. **KNN Memory** (`KNN` class):
   - This component provides a memory mechanism that stores and retrieves embeddings (or representations) of sequences to enhance performance.
   - It uses FAISS, a library for efficient similarity search, to perform nearest neighbor search and add new data to the memory (a form of external memory augmentation for transformers).

### 3. **Attention Mechanisms**:
   - The **XLAttention** class implements a standard transformer-style attention mechanism with the addition of extended memory (XL memory) for longer sequences.
   - The **KNNAttention** class adds the KNN-based attention mechanism that allows the model to retrieve relevant past memories during the attention calculation, improving the model’s ability to handle longer contexts and provide more relevant attention.

### 4. **Transformer Block** (`Block` class):
   - Each block contains a layer of attention followed by a feed-forward network. The attention type in each block could either be the standard XLAttention or the KNNAttention, depending on the depth of the block.
   - The `forward` method computes both the attention output and the updated memory.

### 5. **Main Model** (`MemorizingTransformer` class):
   - The `MemorizingTransformer` class is the overall transformer model that integrates these components.
   - It defines multiple layers of attention blocks (either XL or KNN-based) and manages the storage and retrieval of XL memory across segments.
   - The model is designed to handle long sequences and includes an optimized method for computing attention over long-term memory using KNN retrieval.
   - The final output is passed through a linear layer to produce logits, and a cross-entropy loss is computed for training.

### 6. **Data Preparation**:
   - The code uses the `ccdv/arxiv-summarization` dataset to train the model. It splits the dataset into chunks, ensuring that each chunk size matches the model's required input size (CHUNK_SIZE).
   - The data is then processed into batches, ready to be fed into the model for training.

### 7. **Training Setup**:
   - The model is set to be trained with a learning rate of `2e-4` and gradient clipping at a norm of `0.5`.
   - Training, validation, and test data loaders are created from the processed dataset.

### High-Level Summary:
This model is a custom transformer with both XL memory and KNN-based attention, designed to efficiently handle long sequences of text by leveraging external memory and relevant context from previous inputs. It processes large chunks of text data, integrates relative position embeddings, and dynamically retrieves memory during attention calculations. The training setup uses typical cross-entropy loss and data batching techniques suitable for text generation or summarization tasks.

In [None]:
class RelativePosition(nn.Module):
  def __init__(
      self,
      rp_scale,
      num_buckets = 32,
      rp_max_distance = 128,
      heads = 8
  ):
      super().__init__()
      self.scale = rp_scale
      self.num_buckets = num_buckets
      self.rp_max_distance = rp_max_distance
      self.relative_attention_embedding = nn.Embedding(num_buckets, heads)

  def relative_position_bucket(self, relative_position_matrix):
      n = -relative_position_matrix
      n = torch.max(n, torch.zeros_like(n))

      max_exact = self.num_buckets // 2

      is_small = n < max_exact
      val_if_large = max_exact + (torch.log(n.float() / max_exact) / math.log(self.rp_max_distance / max_exact) * (self.num_buckets - max_exact)).long()
      val_if_large = torch.min(val_if_large, torch.full_like(val_if_large, self.num_buckets - 1))

      return torch.where(is_small, n, val_if_large)

  def forward(self, sequence_length, device):

      sequence_pos = torch.arange(sequence_length, dtype=torch.long, device=device) ##########
      context_pos = torch.arange(2 * sequence_length, dtype=torch.long, device=device) ###########
      sequence_rel_pos = rearrange(sequence_pos, 'i -> i 1')
      context_rel_pos = rearrange(context_pos, 'j -> 1 j')
      rel_pos = context_rel_pos - sequence_rel_pos

      position_bucket_indices = self.relative_position_bucket(rel_pos)

      rp_values = self.relative_attention_embedding(position_bucket_indices)
      rp_values = rearrange(rp_values, 'i j h -> () h i j')
      return rp_values * self.scale



class KNN():
    def __init__(
        self,
        dim,
        max_memories,
        ):
        self.dim = dim
        self.max_memories = max_memories
        self.shape = (max_memories, 2, dim)
        self.db_offset = 0
        self.db_filepath = "./memory.memmap"
        self.db = np.memmap(self.db_filepath, mode = 'w+', dtype = np.float32, shape = self.shape)
        self.index = faiss.IndexFlatL2(dim)


    def add_to_db(self, new_data):
        new_data_len = new_data.shape[0]
        ids = (np.arange(new_data_len) + self.db_offset)
        self.db[ids] = new_data.detach().cpu().numpy() #######
        self.db_offset += new_data_len
        # Write to file
        self.db.flush()


    def search_and_retrieve(self, query_vecs, topk):
        query_vecs = query_vecs
        distances, indices = self.index.search(query_vecs, topk)
        kvs = self.db[indices]
        return kvs

    def add(self, new_data):
        # Input is b n 2 d, flatten to (b n) 2 d
        new_data = new_data.flatten(0,1)
        # Add to db
        self.add_to_db(new_data)
        # Only keys are used in knn index
        keys, vals = new_data.unbind(dim=-2)
        keys = keys.detach().cpu().numpy() ######
        # Add (b n) d tensors to index
        keys = np.ascontiguousarray(keys)
        # Add to index
        self.index.add(keys)

    def search(self, query_vecs, topk):
        query_batch_size, query_seq_len = query_vecs.shape[0], query_vecs.shape[1]
        device = query_vecs.device ######
        # Input is b n d, flatten to (b n) d
        query_vecs = query_vecs.flatten(0,1)
        kvs = self.search_and_retrieve(np.ascontiguousarray(query_vecs.detach().cpu().numpy()), topk) ######
        # kvs are (b n) k 2 d, unflatten to b n k 2 d
        kvs = torch.tensor(kvs)
        kvs = torch.unflatten(kvs, 0, (query_batch_size, query_seq_len))
        return kvs.to(device)

    def clear(self):
        self.index.reset()
        self.db[:] = 0
        self.db_offset = 0


class XLAttention(nn.Module):
    def __init__(
        self,
        embedding_dimension,
        heads = 8,
        head_dimension = 64,
        dropout = 0.,
    ):
        super().__init__()
        self.heads = heads
        self.dropout = nn.Dropout(dropout)
        self.scale = head_dimension ** -0.5

        self.query_matrix = nn.Linear(embedding_dimension, self.heads * head_dimension)
        self.key_matrix = nn.Linear(embedding_dimension, self.heads * head_dimension)
        self.value_matrix = nn.Linear(embedding_dimension, self.heads * head_dimension)
        self.output_matrix = nn.Linear(self.heads * head_dimension, embedding_dimension)

    def forward(
        self,
        x, # batch_size, sequence_length, embedding_dimension
        relative_positions = None,
        xl_memory = None
    ):

        device = x.device ##########
        queries = self.query_matrix(x)
        keys = self.key_matrix(x)
        values = self.value_matrix(x)

        queries = queries * self.scale

        if xl_memory is not None:
            k_xl, v_xl = xl_memory.unbind(dim = -2) # assume stacked
            keys = torch.cat((k_xl, keys), dim = -2) # prepend XL memory
            values = torch.cat((v_xl, values), dim = -2) # prepend XL memory
            xl_sequence_length = k_xl.shape[1]

        queries = rearrange(queries, 'b t (h d) -> b h t d', h = self.heads)
        keys    = rearrange(keys, 'b t (h d) -> b h t d', h = self.heads)
        qk      = einsum(queries, keys, 'b h i d, b h j d -> b h i j')

        i, j = qk.shape[-2:]
        if relative_positions is not None:
            qk = relative_positions[..., -i:, -j:] + qk

        qk = qk * self.scale

        mask = torch.ones((i,j), dtype = torch.bool, device=device).triu(j-i+1) ########
        qk = qk.masked_fill(mask, float('-inf'))

        qk = F.softmax(qk, dim=-1)
        qk = self.dropout(qk)

        values = rearrange(values, 'b t (h d) -> b h t d', h=self.heads)
        qkv = qk@values
        qkv = rearrange(qkv, 'b h t d -> b t (h d)')

        out = self.output_matrix(qkv)

        # new XL memories

        keys = rearrange(keys, 'b h t d -> b t (h d)', h = self.heads)
        values = rearrange(values, 'b h t d -> b t (h d)', h=self.heads)
        kv_memories = torch.stack((keys, values), dim=-2) # (batch, sequence_len, 2, dimension)


        if xl_memory is not None:
            xl_memories, current_input = kv_memories[:, :-xl_sequence_length], kv_memories[:, -xl_sequence_length:]
            kv_to_add_xl = current_input
        else:
            kv_to_add_xl = kv_memories

        return out, kv_to_add_xl



class KNNAttention(nn.Module):
    def __init__(
        self,
        embedding_dimension,
        knn,
        heads = 8,
        head_dimension = 64,
        topk_retrieved_memories = 3,
        dropout = 0.
    ):
        super().__init__()
        self.heads = heads
        self.scale = head_dimension ** -0.5
        self.dropout = nn.Dropout(dropout)

        self.query_matrix = nn.Linear(embedding_dimension, heads * head_dimension)
        self.key_matrix = nn.Linear(embedding_dimension, heads * head_dimension)
        self.value_matrix = nn.Linear(embedding_dimension, heads * head_dimension)
        self.output_matrix = nn.Linear(heads * head_dimension, embedding_dimension)

        self.gate_bias = nn.Parameter(torch.randn(self.heads, 1, 1))
        self.topk_retrieved_memories = topk_retrieved_memories
        self.knn = knn

    def forward(
        self,
        x, # batch_size, sequence_length, embedding_dimension
        relative_positions = None,
        xl_memory = None
    ):

        device = x.device ########
        batch_size, sequence_length = x.shape[:2]
        queries = self.query_matrix(x)
        keys = self.key_matrix(x)
        values = self.value_matrix(x)

        queries = F.normalize(queries, dim=-1)
        keys = F.normalize(keys, dim=-1)

        if xl_memory is not None:
            k_xl, v_xl = xl_memory.unbind(dim = -2) # unstack
            keys = torch.cat((k_xl, keys), dim = -2) # prepend XL memory
            values = torch.cat((v_xl, values), dim = -2) # prepend XL memory
            xl_sequence_length = k_xl.shape[1]

        ### LOCAL ATTENTION

        queries = rearrange(queries, 'b t (h d) -> b h t d', h = self.heads)
        keys    = rearrange(keys, 'b t (h d) -> b h t d', h = self.heads)
        qk      = einsum(queries, keys, 'b h i d, b h j d -> b h i j')

        i, j = qk.shape[-2:]
        if relative_positions is not None:
            qk = relative_positions[..., -i:, -j:] + qk

        qk = qk * self.scale

        mask = torch.ones((i,j), dtype = torch.bool, device=device).triu(j-i+1) ########
        qk = qk.masked_fill(mask, float('-inf'))

        qk = F.softmax(qk, dim=-1)

        qk = self.dropout(qk)

        values = rearrange(values, 'b t (h d) -> b h t d', h=self.heads)
        qkv = qk@values

        ### KNN ATTENTION

        # If there are knn memories (we're not on the first segment) then perform knn attention
        if self.knn.index.ntotal > 0:
            t1 = time.time()
            print ("Begin KNN operations")
            # Convert queries to search form
            queries = rearrange(queries, 'b h t d -> b t (h d)')
            mem_kv = self.knn.search(queries, topk = self.topk_retrieved_memories) # returns b t k 2 d
            mem_k, mem_v = mem_kv.unbind(dim = -2)
            mem_k = rearrange(mem_k, 'b t k (h d) -> b h t k d', h=self.heads)
            mem_v = rearrange(mem_v, 'b t k (h d) -> b h t k d', h=self.heads)

            # Convert queries to attention form
            queries = rearrange(queries, 'b t (h d) -> b h t d', h = self.heads)
            mem_qk = einsum(queries, mem_k, 'b h t d, b h t k d -> b h t k')
            mem_qk = mem_qk * self.scale

            mem_qk = F.softmax(mem_qk, dim=-1)
            mem_qk = self.dropout(mem_qk)
            mem_qkv = einsum(mem_qk, mem_v, 'b h t k, b h t k d -> b h t d')

            # Combined attentions

            combined_qkv = mem_qkv * self.gate_bias + qkv * (1 - self.gate_bias)
            combined_qkv = rearrange(combined_qkv, 'b h t d -> b t (h d)')
            out = self.output_matrix(combined_qkv)
            t2 = time.time()
            print ("End KNN operations, time taken:", t2-t1)

        else:
            qkv = rearrange(qkv, 'b h t d -> b t (h d)')
            out = self.output_matrix(qkv)

        # New XL memories
        keys = rearrange(keys, 'b h t d -> b t (h d)', h = self.heads)
        values = rearrange(values, 'b h t d -> b t (h d)', h=self.heads)
        kv_memories = torch.stack((keys, values), dim=-2) # (batch, sequence_len, 2, dimension)

        if xl_memory is not None:
            # if we're on a middle/end segment of a document (there are previous XL memories)
            xl_memories, current_kv = kv_memories[:, :-xl_sequence_length], kv_memories[:, -xl_sequence_length:]
        else:
            # if we're at the first segment
            current_kv = kv_memories

        self.knn.add(current_kv)

        return out, current_kv


class Block(nn.Module):
    def __init__(self, embedding_dimension, attention_type, dropout=0.):
        super().__init__()
        self.attention = attention_type
        self.dim = embedding_dimension
        self.norm = nn.LayerNorm(self.dim)

        self.ff_block = nn.Sequential(
            nn.LayerNorm(self.dim),
            nn.Linear(self.dim, self.dim * 4),
            nn.GELU(),
            nn.Dropout(dropout),
            nn.Linear(self.dim * 4, self.dim))

    def forward(self, x, xl_memories, rel_pos):
        residual = x
        #norm = nn.LayerNorm(self.dim) #########
        attn_out = self.norm(x)
        attn_out, new_xl_memories = self.attention(attn_out, relative_positions=rel_pos, xl_memory=xl_memories)
        attn_out += residual

        residual = attn_out
        ff_out = self.ff_block(attn_out)
        ff_out += residual
        return ff_out, new_xl_memories


class MemorizingTransformer(nn.Module):
    def __init__(
        self,
        embedding_dimension,
        vocab_size,
        max_knn_memories = 81920,
        heads = 8,
        depth = 10,
        dropout = 0,
        head_dimension = 64,
        topk = 5,

    ):
        super().__init__()
        self.heads = heads
        self.embedding_dimension = embedding_dimension
        self.dropout = dropout
        self.depth = depth
        self.head_dimension = head_dimension
        self.max_knn_memories = max_knn_memories
        self.topk = topk

        ###########
        self.rel_pos = RelativePosition(rp_scale = head_dimension** 0.5,
                                        heads = self.heads)
        self.rel_pos_knn = RelativePosition(rp_scale = head_dimension** 0.5,
                                        heads = self.heads)
        self.embedding_matrix = nn.Embedding(vocab_size, self.embedding_dimension)

        self.knn = KNN(head_dimension * heads, self.max_knn_memories)



        self.layers = nn.ModuleList([])
        for i in range(self.depth):

            if i == self.depth-2:
                attention_type = KNNAttention(self.embedding_dimension,
                            self.knn,
                            heads = self.heads,
                            head_dimension = self.head_dimension,
                            dropout = self.dropout)
            else:
                attention_type = XLAttention(self.embedding_dimension,
                            heads = self.heads,
                            head_dimension = self.head_dimension,
                            dropout = self.dropout)

            self.layers.append(Block(self.embedding_dimension, attention_type))

        self.to_logits = nn.Sequential(
            nn.LayerNorm(self.embedding_dimension),
            nn.Linear(self.embedding_dimension, vocab_size)
        )


    def forward(
        self,
        x,
        relative_positions = None,
        xl_memories = None,
        labels = None,
    ):

        device = x.device ########
        batch_size, sequence_length = x.shape[0], x.shape[1]

        # Position values
        rel_pos = self.rel_pos(sequence_length, device=device) ########
        rel_pos_knn = self.rel_pos_knn(sequence_length, device=device) ########

        # If no XL memories (start of a sequence) then None type for each layer.
        # There is one set of XL memories for each layer
        # xl_memories = default(xl_memories, (None,) * self.num_xl_memory_layers)
        if xl_memories is not None:
            xl_memories = xl_memories
        else:
            xl_memories = (None,) * self.depth

        # Iterator
        xl_memories_iter = iter(xl_memories)

        # Embeddings
        x = self.embedding_matrix(x)

        # Store the XL memories for each pass
        new_xl_memories = []

        for ind, block in enumerate(self.layers):

            if i == self.depth-2:
                layer_rel_pos = rel_pos_knn
            else:
                layer_rel_pos = rel_pos

            x, xl_mem = block(x, next(xl_memories_iter), layer_rel_pos)

            if xl_mem is not None:
                ############
                new_xl_memories.append(xl_mem.detach())



        logits = self.to_logits(x)

        # Training
        loss = F.cross_entropy(rearrange(logits, 'b n c -> b c n'), labels)
        if len(new_xl_memories) > 0:
            return loss, new_xl_memories
        return loss

The code you've shared preprocesses the **Arxiv Summarization Dataset** for use in a deep learning model. Here's a breakdown of the key steps involved in the preprocessing:

1. **Dataset Loading**:
   - The dataset `ccdv/arxiv-summarization` is loaded using `datasets.load_dataset()`, specifically the 'train' split, which is streamed to avoid loading the entire dataset into memory at once.
   - The `take(3500)` function fetches the first 3500 articles from the dataset.

2. **Filtering Articles**:
   - The articles are filtered to ensure that they are large enough to be split into chunks. This is done by checking if their length exceeds the `CHUNK_SIZE`.
   - The articles are then converted to a `numpy` array of `uint8` values.

3. **Chunking**:
   - The function `clip_article()` ensures that each article’s length is a multiple of `CHUNK_SIZE`, trimming the last part if needed.
   - After clipping, the articles are reshaped into chunks, each of size `CHUNK_SIZE` (which is determined by multiplying the number of segments by the sequence length).
   - This results in a 2D array of shape `(num_articles, num_chunks)`.

4. **Data Preparation**:
   - The articles are concatenated into a single tensor, `processed_data`, which is then split into training, validation, and test sets.
   - The dataset is split in an 80-10-10 ratio (train, validation, test), with each split fed into separate `DataLoader` instances for batching.

5. **Hyperparameters**:
   - `BATCH_SIZE` is set to 8, meaning 8 chunks of data will be processed at a time.
   - `SEQUENCE_LENGTH` and `SEGMENTS` define the size of each chunk, with the total `CHUNK_SIZE` calculated as `(SEGMENTS * SEQUENCE_LENGTH) + 1`.
   - The code also specifies other hyperparameters like `LEARNING_RATE`, `MAX_GRAD_CLIP_NORM`, and the maximum number of KNN memories (`MAX_KNN_MEMORIES`).

This data preparation process is essential for training the transformer model efficiently, particularly when dealing with large datasets such as the Arxiv Summarization Dataset. By chunking articles and splitting the dataset into manageable batches, the model can be trained on sequences of meaningful length without overwhelming memory limitations.

In [None]:
SEGMENTS = 10
SEQUENCE_LENGTH = 512
CHUNK_SIZE = (SEGMENTS * SEQUENCE_LENGTH) + 1
BATCH_SIZE = 8
LEARNING_RATE = 2e-4
MAX_GRAD_CLIP_NORM = 0.5
VALIDATE_EVERY = 100
MAX_KNN_MEMORIES = BATCH_SIZE * 1 * SEQUENCE_LENGTH * SEGMENTS


dataset = datasets.load_dataset("ccdv/arxiv-summarization", split='train', streaming=True)
raw_dataset = list(dataset.take(3500))

raw_articles = [x['article'] for x in raw_dataset]
raw_articles = [x for x in raw_articles if len(x) > CHUNK_SIZE]
converted = [np.fromstring(doc, dtype=np.uint8) for doc in raw_articles]

def clip_article(doc, chunk_size):
    remainder = len(doc) % chunk_size
    return doc[:-remainder]

clipped = [clip_article(doc, CHUNK_SIZE) for doc in converted]


chunked = np.array([doc.reshape(-1, CHUNK_SIZE) for doc in clipped])

processed_data = torch.tensor(np.concatenate(chunked), dtype=torch.long)
processed_data.shape
eighty_split = int(processed_data.shape[0] * .8)
ninety_split = int(processed_data.shape[0] * .9)
train_loader = iter(DataLoader(processed_data[:eighty_split], batch_size = BATCH_SIZE, shuffle = True))
val_loader = iter(DataLoader(processed_data[eighty_split:ninety_split], batch_size = BATCH_SIZE, shuffle = True))
test_loader = iter(DataLoader(processed_data[ninety_split:], batch_size = BATCH_SIZE, shuffle = True))

Downloading builder script:   0%|          | 0.00/5.14k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/2.83k [00:00<?, ?B/s]

  converted = [np.fromstring(doc, dtype=np.uint8) for doc in raw_articles]
  chunked = np.array([doc.reshape(-1, CHUNK_SIZE) for doc in clipped])


In the provided code, the **training loop** for the **Memorizing Transformer** model is implemented using the `torch` library. Here's a breakdown of what's happening in each part:

### Key Steps in the Training Loop:

1. **Model Initialization**:
   - The model is instantiated with `MemorizingTransformer`, using an `embedding_dimension` of 128 and `vocab_size` of 128. This means the transformer operates in an embedding space of 128-dimensional vectors, and its vocabulary size (number of unique tokens) is also 128.
   - `MAX_KNN_MEMORIES` is passed as a hyperparameter for managing the KNN memory size in the model.

2. **Optimizer Setup**:
   - The Adam optimizer is used to optimize the model’s parameters with a learning rate of `LEARNING_RATE` (set to `2e-4`).
   - The optimizer step is executed after each training iteration.

3. **Training Loop**:
   - The outer loop runs for 200 iterations (`range(200)`). During each iteration:
     - **Training Mode**: The model is set to training mode with `model.train()`.
     - The training loss is initialized and stored in `train_loss`. The KNN memory is also cleared at the start of each iteration (`model.knn.clear()`).
   
4. **Processing a Batch**:
   - A batch of data is fetched from the training data loader and moved to the appropriate device (e.g., GPU) using `data.to(device)`.
   - The batch is split into `seq` (input sequences) and `labels` (target sequences) where `seq` is everything except the last token, and `labels` is everything except the first token (for next-token prediction).

5. **Chunking and Model Forward Pass**:
   - The sequences (`seq`) and labels (`labels`) are split into smaller chunks using `.chunk(SEGMENTS, dim=-1)`. Each chunk represents one segment of the input sequence, where `SEGMENTS` defines how many chunks will be processed per document.
   - In each chunk, the model computes the loss, and `backward()` is called to accumulate gradients.
   - Gradients are clipped with `torch.nn.utils.clip_grad_norm_()` to avoid gradient explosion, a common issue with deep learning models like transformers.

6. **Validation**:
   - Every `VALIDATE_EVERY` iterations, the model enters evaluation mode (`model.eval()`), and validation data is used to compute the validation loss.
   - Validation loss is computed similarly to the training loss, but with `torch.no_grad()` to avoid computing gradients (since no parameter updates are needed during validation).
   - After processing the validation batch, the validation loss is printed out.

### Things to Note:
- **KNN Memory Management**:
   - The model makes use of KNN memory, where `model.knn.clear()` clears the memory between training iterations. This likely helps the model in retrieving past memories to influence future predictions, an important feature of memory-augmented neural networks.
   
- **Handling Segments**:
   - The chunking mechanism (`seq.chunk(SEGMENTS, dim=-1)`) splits sequences into smaller segments. This allows the model to handle long sequences efficiently without exceeding memory limits.

- **Loss Backpropagation**:
   - The loss is calculated for each segment (`loss / SEGMENTS`) and then backpropagated using `loss.backward()`. This ensures that the gradients are computed for each segment and accumulated before applying the optimizer step.

### Potential Improvements:
- **Gradient Accumulation**: If memory issues arise, gradient accumulation can be employed, where gradients are accumulated over several mini-batches before performing an optimization step.
- **Learning Rate Scheduler**: A learning rate scheduler could be added to decrease the learning rate as training progresses, often improving convergence.

This loop demonstrates an efficient and structured way of training a model with memory capabilities like KNN, using segmented sequences to ensure the model can handle large inputs. The validation step ensures that the model is evaluated periodically to prevent overfitting and monitor progress.

In [None]:
model = MemorizingTransformer(embedding_dimension = 128,
                              vocab_size = 128,
                              max_knn_memories = MAX_KNN_MEMORIES)

model.to(device) ###########

optim = torch.optim.Adam(model.parameters(), lr = LEARNING_RATE)
model.train()

for i in tqdm.tqdm(range(200), mininterval = 10., desc = 'training'):

    model.train()
    train_loss = 0.
    # Clear XL memories
    xl_memories = None
    # Clear KNN memory
    model.knn.clear()

    data = next(train_loader).to(device=device)
    seq, labels = data[:, :-1], data[:, 1:]

    t0 = time.time()
    print ("Begin document")

    # Each pass will be (BATCH_SIZE * SEGMENTS) iterations
    for seq_segment, labels_segment in zip(seq.chunk(SEGMENTS, dim = -1), labels.chunk(SEGMENTS, dim = -1)):

        loss, xl_memories = model(
            seq_segment,
            labels = labels_segment,
            xl_memories = xl_memories
        )

        train_loss += loss.item() / SEGMENTS
        (loss / SEGMENTS).backward()


    print(f'training loss: {train_loss}')
    t1 = time.time()
    print ("End document, total time:", t1 - t0)
    torch.nn.utils.clip_grad_norm_(model.parameters(), MAX_GRAD_CLIP_NORM)
    optim.step()
    optim.zero_grad()


    if not (i % VALIDATE_EVERY):
        model.eval()

        valid_data = next(val_loader)
        valid_loss = 0.

        with torch.no_grad():
            xl_memories = None
            model.knn.clear()
            seq, labels = data[:, :-1], data[:, 1:]

            for seq_segment, labels_segment in zip(seq.chunk(SEGMENTS, dim = -1), labels.chunk(SEGMENTS, dim = -1)):

                loss, xl_memories = model(
                    seq_segment,
                    labels = labels_segment,
                    xl_memories = xl_memories
                )

                valid_loss += loss.item() / SEGMENTS

        print(f'valid loss: {valid_loss}')


training:   0%|          | 0/200 [00:00<?, ?it/s]

Begin document
Begin KNN operations
End KNN operations, time taken: 0.4727809429168701
Begin KNN operations
End KNN operations, time taken: 0.7829794883728027
Begin KNN operations
End KNN operations, time taken: 0.9652132987976074
Begin KNN operations
End KNN operations, time taken: 1.1984717845916748
Begin KNN operations
End KNN operations, time taken: 1.5844461917877197
Begin KNN operations
End KNN operations, time taken: 1.8131227493286133
Begin KNN operations
End KNN operations, time taken: 1.985107183456421
Begin KNN operations
End KNN operations, time taken: 2.0774049758911133
Begin KNN operations
End KNN operations, time taken: 2.413853883743286
training loss: 5.044507646560668
End document, total time: 16.55513596534729
Begin KNN operations
End KNN operations, time taken: 0.4018399715423584
Begin KNN operations
End KNN operations, time taken: 0.6185629367828369
Begin KNN operations
End KNN operations, time taken: 0.8297295570373535
Begin KNN operations
End KNN operations, time 

training:   0%|          | 1/200 [00:30<1:40:42, 30.36s/it]

End KNN operations, time taken: 2.240504741668701
valid loss: 4.508983087539673
Begin document
Begin KNN operations
End KNN operations, time taken: 0.5132331848144531
Begin KNN operations
End KNN operations, time taken: 0.7702808380126953
Begin KNN operations
End KNN operations, time taken: 0.9941208362579346
Begin KNN operations
End KNN operations, time taken: 1.1803114414215088
Begin KNN operations
End KNN operations, time taken: 1.409938097000122
Begin KNN operations
End KNN operations, time taken: 1.6306748390197754
Begin KNN operations
End KNN operations, time taken: 1.9200448989868164
Begin KNN operations
End KNN operations, time taken: 2.0634706020355225
Begin KNN operations


training:   1%|          | 2/200 [00:45<1:09:48, 21.16s/it]

End KNN operations, time taken: 2.4543967247009277
training loss: 4.4936493873596195
End document, total time: 14.63275933265686
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47864532470703125
Begin KNN operations
End KNN operations, time taken: 0.7505688667297363
Begin KNN operations
End KNN operations, time taken: 0.9767007827758789
Begin KNN operations
End KNN operations, time taken: 1.1900599002838135
Begin KNN operations
End KNN operations, time taken: 1.459259271621704
Begin KNN operations
End KNN operations, time taken: 1.6699275970458984
Begin KNN operations
End KNN operations, time taken: 1.862091302871704
Begin KNN operations
End KNN operations, time taken: 2.0714664459228516
Begin KNN operations


training:   2%|▏         | 3/200 [00:59<59:41, 18.18s/it]  

End KNN operations, time taken: 2.3291754722595215
training loss: 4.209649324417114
End document, total time: 14.470992088317871
Begin document
Begin KNN operations
End KNN operations, time taken: 0.46877169609069824
Begin KNN operations
End KNN operations, time taken: 0.8324177265167236
Begin KNN operations
End KNN operations, time taken: 1.1357784271240234
Begin KNN operations
End KNN operations, time taken: 1.4139971733093262
Begin KNN operations
End KNN operations, time taken: 1.3958423137664795
Begin KNN operations
End KNN operations, time taken: 1.6528310775756836
Begin KNN operations
End KNN operations, time taken: 1.8521735668182373
Begin KNN operations
End KNN operations, time taken: 2.067760944366455
Begin KNN operations


training:   2%|▏         | 4/200 [01:14<55:23, 16.96s/it]

End KNN operations, time taken: 2.3974204063415527
training loss: 3.9477367401123047
End document, total time: 14.90838074684143
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4724555015563965
Begin KNN operations
End KNN operations, time taken: 0.7393865585327148
Begin KNN operations
End KNN operations, time taken: 0.9631404876708984
Begin KNN operations
End KNN operations, time taken: 1.2113425731658936
Begin KNN operations
End KNN operations, time taken: 1.4207866191864014
Begin KNN operations
End KNN operations, time taken: 1.6314060688018799
Begin KNN operations
End KNN operations, time taken: 1.8876287937164307
Begin KNN operations
End KNN operations, time taken: 2.2193658351898193
Begin KNN operations


training:   2%|▎         | 5/200 [01:29<52:33, 16.17s/it]

End KNN operations, time taken: 2.345904588699341
training loss: 3.8189241886138916
End document, total time: 14.60579514503479
Begin document
Begin KNN operations
End KNN operations, time taken: 0.46891236305236816
Begin KNN operations
End KNN operations, time taken: 0.7345902919769287
Begin KNN operations
End KNN operations, time taken: 0.9585428237915039
Begin KNN operations
End KNN operations, time taken: 1.198284387588501
Begin KNN operations
End KNN operations, time taken: 1.4171674251556396
Begin KNN operations
End KNN operations, time taken: 1.6903340816497803
Begin KNN operations
End KNN operations, time taken: 1.9220991134643555
Begin KNN operations
End KNN operations, time taken: 2.0827016830444336
Begin KNN operations


training:   3%|▎         | 6/200 [01:44<50:42, 15.68s/it]

End KNN operations, time taken: 2.3680665493011475
training loss: 3.7139760255813594
End document, total time: 14.544044494628906
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47707223892211914
Begin KNN operations
End KNN operations, time taken: 0.7348482608795166
Begin KNN operations
End KNN operations, time taken: 0.9631738662719727
Begin KNN operations
End KNN operations, time taken: 1.2477152347564697
Begin KNN operations
End KNN operations, time taken: 1.4663197994232178
Begin KNN operations
End KNN operations, time taken: 1.6496047973632812
Begin KNN operations
End KNN operations, time taken: 1.9732475280761719
Begin KNN operations
End KNN operations, time taken: 2.1018075942993164
Begin KNN operations


training:   4%|▎         | 7/200 [01:59<49:40, 15.44s/it]

End KNN operations, time taken: 2.466362953186035
training loss: 3.601336359977722
End document, total time: 14.766701698303223
Begin document
Begin KNN operations
End KNN operations, time taken: 0.49947428703308105
Begin KNN operations
End KNN operations, time taken: 0.8538920879364014
Begin KNN operations
End KNN operations, time taken: 1.0156290531158447
Begin KNN operations
End KNN operations, time taken: 1.2388160228729248
Begin KNN operations
End KNN operations, time taken: 1.427992343902588
Begin KNN operations
End KNN operations, time taken: 1.6942250728607178
Begin KNN operations
End KNN operations, time taken: 1.9053585529327393
Begin KNN operations
End KNN operations, time taken: 2.113377571105957
Begin KNN operations


training:   4%|▍         | 8/200 [02:14<49:06, 15.35s/it]

End KNN operations, time taken: 2.479506015777588
training loss: 3.59384458065033
End document, total time: 14.974068641662598
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47244787216186523
Begin KNN operations
End KNN operations, time taken: 0.7435071468353271
Begin KNN operations
End KNN operations, time taken: 0.9578852653503418
Begin KNN operations
End KNN operations, time taken: 1.2166187763214111
Begin KNN operations
End KNN operations, time taken: 1.4513208866119385
Begin KNN operations
End KNN operations, time taken: 1.6346697807312012
Begin KNN operations
End KNN operations, time taken: 1.9282689094543457
Begin KNN operations
End KNN operations, time taken: 2.1835741996765137
Begin KNN operations


training:   4%|▍         | 9/200 [02:29<48:23, 15.20s/it]

End KNN operations, time taken: 2.3962347507476807
training loss: 3.49993793964386
End document, total time: 14.691726207733154
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4682152271270752
Begin KNN operations
End KNN operations, time taken: 0.7831428050994873
Begin KNN operations
End KNN operations, time taken: 1.0058505535125732
Begin KNN operations
End KNN operations, time taken: 1.2197179794311523
Begin KNN operations
End KNN operations, time taken: 1.4976515769958496
Begin KNN operations
End KNN operations, time taken: 1.7119696140289307
Begin KNN operations
End KNN operations, time taken: 1.916504144668579
Begin KNN operations
End KNN operations, time taken: 2.1696271896362305
Begin KNN operations


training:   5%|▌         | 10/200 [02:44<47:59, 15.16s/it]

End KNN operations, time taken: 2.389899969100952
training loss: 3.5132748603820803
End document, total time: 14.860655307769775
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4852752685546875
Begin KNN operations
End KNN operations, time taken: 0.7510387897491455
Begin KNN operations
End KNN operations, time taken: 1.0050795078277588
Begin KNN operations
End KNN operations, time taken: 1.268385887145996
Begin KNN operations
End KNN operations, time taken: 1.4492740631103516
Begin KNN operations
End KNN operations, time taken: 1.6691501140594482
Begin KNN operations
End KNN operations, time taken: 1.8804457187652588
Begin KNN operations
End KNN operations, time taken: 2.1173648834228516
Begin KNN operations


training:   6%|▌         | 11/200 [02:59<47:31, 15.09s/it]

End KNN operations, time taken: 2.3890981674194336
training loss: 3.4747895240783695
End document, total time: 14.73562502861023
Begin document
Begin KNN operations
End KNN operations, time taken: 0.49285340309143066
Begin KNN operations
End KNN operations, time taken: 0.7544581890106201
Begin KNN operations
End KNN operations, time taken: 0.9725499153137207
Begin KNN operations
End KNN operations, time taken: 1.2382423877716064
Begin KNN operations
End KNN operations, time taken: 1.4407711029052734
Begin KNN operations
End KNN operations, time taken: 1.7210760116577148
Begin KNN operations
End KNN operations, time taken: 1.905010461807251
Begin KNN operations
End KNN operations, time taken: 2.190648078918457
Begin KNN operations


training:   6%|▌         | 12/200 [03:14<47:13, 15.07s/it]

End KNN operations, time taken: 2.4482409954071045
training loss: 3.386559796333313
End document, total time: 14.851515293121338
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4909632205963135
Begin KNN operations
End KNN operations, time taken: 0.7589480876922607
Begin KNN operations
End KNN operations, time taken: 0.9773187637329102
Begin KNN operations
End KNN operations, time taken: 1.2471277713775635
Begin KNN operations
End KNN operations, time taken: 1.444913387298584
Begin KNN operations
End KNN operations, time taken: 1.6832737922668457
Begin KNN operations
End KNN operations, time taken: 1.9738366603851318
Begin KNN operations
End KNN operations, time taken: 2.1448676586151123
Begin KNN operations


training:   6%|▋         | 13/200 [03:29<46:56, 15.06s/it]

End KNN operations, time taken: 2.422680616378784
training loss: 3.328277254104614
End document, total time: 14.856416463851929
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4757106304168701
Begin KNN operations
End KNN operations, time taken: 0.7565667629241943
Begin KNN operations
End KNN operations, time taken: 0.9876401424407959
Begin KNN operations
End KNN operations, time taken: 1.252009391784668
Begin KNN operations
End KNN operations, time taken: 1.5215215682983398
Begin KNN operations
End KNN operations, time taken: 1.685412883758545
Begin KNN operations
End KNN operations, time taken: 1.9056010246276855
Begin KNN operations
End KNN operations, time taken: 2.133671283721924
Begin KNN operations


training:   7%|▋         | 14/200 [03:44<46:39, 15.05s/it]

End KNN operations, time taken: 2.432290554046631
training loss: 3.343562436103821
End document, total time: 14.857432126998901
Begin document
Begin KNN operations
End KNN operations, time taken: 0.49019408226013184
Begin KNN operations
End KNN operations, time taken: 0.7760670185089111
Begin KNN operations
End KNN operations, time taken: 0.9874296188354492
Begin KNN operations
End KNN operations, time taken: 1.2164781093597412
Begin KNN operations
End KNN operations, time taken: 1.4383196830749512
Begin KNN operations
End KNN operations, time taken: 1.6609790325164795
Begin KNN operations
End KNN operations, time taken: 1.8868112564086914
Begin KNN operations
End KNN operations, time taken: 2.124955892562866
Begin KNN operations


training:   8%|▊         | 15/200 [03:59<46:15, 15.00s/it]

End KNN operations, time taken: 2.4269495010375977
training loss: 3.2937793970108027
End document, total time: 14.707275867462158
Begin document
Begin KNN operations
End KNN operations, time taken: 0.46740150451660156
Begin KNN operations
End KNN operations, time taken: 0.7640590667724609
Begin KNN operations
End KNN operations, time taken: 0.9636237621307373
Begin KNN operations
End KNN operations, time taken: 1.208648443222046
Begin KNN operations
End KNN operations, time taken: 1.4129302501678467
Begin KNN operations
End KNN operations, time taken: 1.6658267974853516
Begin KNN operations
End KNN operations, time taken: 1.932525873184204
Begin KNN operations
End KNN operations, time taken: 2.1615779399871826
Begin KNN operations


training:   8%|▊         | 16/200 [04:14<45:49, 14.94s/it]

End KNN operations, time taken: 2.3360683917999268
training loss: 3.221469950675964
End document, total time: 14.61255693435669
Begin document
Begin KNN operations
End KNN operations, time taken: 0.48010945320129395
Begin KNN operations
End KNN operations, time taken: 0.7415997982025146
Begin KNN operations
End KNN operations, time taken: 0.9570870399475098
Begin KNN operations
End KNN operations, time taken: 1.190089225769043
Begin KNN operations
End KNN operations, time taken: 1.4016540050506592
Begin KNN operations
End KNN operations, time taken: 1.6867079734802246
Begin KNN operations
End KNN operations, time taken: 1.9373207092285156
Begin KNN operations
End KNN operations, time taken: 2.0857913494110107
Begin KNN operations


training:   8%|▊         | 17/200 [04:28<45:19, 14.86s/it]

End KNN operations, time taken: 2.316899061203003
training loss: 3.213758087158203
End document, total time: 14.491326332092285
Begin document
Begin KNN operations
End KNN operations, time taken: 0.472273588180542
Begin KNN operations
End KNN operations, time taken: 0.7450556755065918
Begin KNN operations
End KNN operations, time taken: 0.9654116630554199
Begin KNN operations
End KNN operations, time taken: 1.2047827243804932
Begin KNN operations
End KNN operations, time taken: 1.5426788330078125
Begin KNN operations
End KNN operations, time taken: 1.6558418273925781
Begin KNN operations
End KNN operations, time taken: 1.9560770988464355
Begin KNN operations
End KNN operations, time taken: 2.151282787322998
Begin KNN operations


training:   9%|▉         | 18/200 [04:43<45:10, 14.89s/it]

End KNN operations, time taken: 2.373253107070923
training loss: 3.1794835090637212
End document, total time: 14.776739120483398
Begin document
Begin KNN operations
End KNN operations, time taken: 0.49563026428222656
Begin KNN operations
End KNN operations, time taken: 0.7708992958068848
Begin KNN operations
End KNN operations, time taken: 0.9785656929016113
Begin KNN operations
End KNN operations, time taken: 1.2354960441589355
Begin KNN operations
End KNN operations, time taken: 1.427525520324707
Begin KNN operations
End KNN operations, time taken: 1.6597142219543457
Begin KNN operations
End KNN operations, time taken: 1.8884117603302002
Begin KNN operations
End KNN operations, time taken: 2.190594434738159
Begin KNN operations


training:  10%|▉         | 19/200 [04:58<44:59, 14.92s/it]

End KNN operations, time taken: 2.430086851119995
training loss: 3.130792760848999
End document, total time: 14.777361392974854
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47788143157958984
Begin KNN operations
End KNN operations, time taken: 0.752366304397583
Begin KNN operations
End KNN operations, time taken: 0.9830517768859863
Begin KNN operations
End KNN operations, time taken: 1.20573091506958
Begin KNN operations
End KNN operations, time taken: 1.439152479171753
Begin KNN operations
End KNN operations, time taken: 1.697037935256958
Begin KNN operations
End KNN operations, time taken: 2.0117909908294678
Begin KNN operations
End KNN operations, time taken: 2.13496994972229
Begin KNN operations


training:  10%|█         | 20/200 [05:13<44:45, 14.92s/it]

End KNN operations, time taken: 2.336294174194336
training loss: 3.226847219467163
End document, total time: 14.751800537109375
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4813246726989746
Begin KNN operations
End KNN operations, time taken: 0.7528209686279297
Begin KNN operations
End KNN operations, time taken: 0.9790835380554199
Begin KNN operations
End KNN operations, time taken: 1.2063846588134766
Begin KNN operations
End KNN operations, time taken: 1.5338404178619385
Begin KNN operations
End KNN operations, time taken: 1.7010774612426758
Begin KNN operations
End KNN operations, time taken: 1.9007501602172852
Begin KNN operations
End KNN operations, time taken: 2.1467225551605225
Begin KNN operations


training:  10%|█         | 21/200 [05:28<44:39, 14.97s/it]

End KNN operations, time taken: 2.5020229816436768
training loss: 3.0892097949981694
End document, total time: 14.906522035598755
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47337794303894043
Begin KNN operations
End KNN operations, time taken: 0.773470401763916
Begin KNN operations
End KNN operations, time taken: 1.0073421001434326
Begin KNN operations
End KNN operations, time taken: 1.1949923038482666
Begin KNN operations
End KNN operations, time taken: 1.4644927978515625
Begin KNN operations
End KNN operations, time taken: 1.6687071323394775
Begin KNN operations
End KNN operations, time taken: 1.9097273349761963
Begin KNN operations
End KNN operations, time taken: 2.1129660606384277
Begin KNN operations


training:  11%|█         | 22/200 [05:43<44:21, 14.95s/it]

End KNN operations, time taken: 2.427049398422241
training loss: 3.089197564125061
End document, total time: 14.724693775177002
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47465085983276367
Begin KNN operations
End KNN operations, time taken: 0.7682478427886963
Begin KNN operations
End KNN operations, time taken: 0.9956700801849365
Begin KNN operations
End KNN operations, time taken: 1.196533441543579
Begin KNN operations
End KNN operations, time taken: 1.4422450065612793
Begin KNN operations
End KNN operations, time taken: 1.6435515880584717
Begin KNN operations
End KNN operations, time taken: 1.8868541717529297
Begin KNN operations
End KNN operations, time taken: 2.2037408351898193
Begin KNN operations


training:  12%|█▏        | 23/200 [05:58<44:00, 14.92s/it]

End KNN operations, time taken: 2.3340039253234863
training loss: 3.06550407409668
End document, total time: 14.643192768096924
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47101879119873047
Begin KNN operations
End KNN operations, time taken: 0.7480041980743408
Begin KNN operations
End KNN operations, time taken: 0.9704079627990723
Begin KNN operations
End KNN operations, time taken: 1.2045562267303467
Begin KNN operations
End KNN operations, time taken: 1.4370367527008057
Begin KNN operations
End KNN operations, time taken: 1.7046914100646973
Begin KNN operations
End KNN operations, time taken: 1.9006049633026123
Begin KNN operations
End KNN operations, time taken: 2.138112783432007
Begin KNN operations


training:  12%|█▏        | 24/200 [06:13<43:41, 14.89s/it]

End KNN operations, time taken: 2.3646271228790283
training loss: 2.9498643636703488
End document, total time: 14.65877389907837
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4741785526275635
Begin KNN operations
End KNN operations, time taken: 0.7709894180297852
Begin KNN operations
End KNN operations, time taken: 1.0089998245239258
Begin KNN operations
End KNN operations, time taken: 1.2497055530548096
Begin KNN operations
End KNN operations, time taken: 1.470510482788086
Begin KNN operations
End KNN operations, time taken: 1.6337504386901855
Begin KNN operations
End KNN operations, time taken: 1.8509299755096436
Begin KNN operations
End KNN operations, time taken: 2.08579683303833
Begin KNN operations


training:  12%|█▎        | 25/200 [06:27<43:15, 14.83s/it]

End KNN operations, time taken: 2.3334097862243652
training loss: 2.9686380147933957
End document, total time: 14.508661985397339
Begin document
Begin KNN operations
End KNN operations, time taken: 0.48314857482910156
Begin KNN operations
End KNN operations, time taken: 0.7755308151245117
Begin KNN operations
End KNN operations, time taken: 0.9779248237609863
Begin KNN operations
End KNN operations, time taken: 1.1929638385772705
Begin KNN operations
End KNN operations, time taken: 1.4462125301361084
Begin KNN operations
End KNN operations, time taken: 1.6658692359924316
Begin KNN operations
End KNN operations, time taken: 1.8584511280059814
Begin KNN operations
End KNN operations, time taken: 2.168804883956909
Begin KNN operations


training:  13%|█▎        | 26/200 [06:42<43:00, 14.83s/it]

End KNN operations, time taken: 2.3843650817871094
training loss: 2.944056987762451
End document, total time: 14.652920961380005
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4770162105560303
Begin KNN operations
End KNN operations, time taken: 0.7433979511260986
Begin KNN operations
End KNN operations, time taken: 0.9721581935882568
Begin KNN operations
End KNN operations, time taken: 1.1928770542144775
Begin KNN operations
End KNN operations, time taken: 1.418426275253296
Begin KNN operations
End KNN operations, time taken: 1.6342692375183105
Begin KNN operations
End KNN operations, time taken: 1.8772122859954834
Begin KNN operations
End KNN operations, time taken: 2.127389907836914
Begin KNN operations


training:  14%|█▎        | 27/200 [06:57<42:37, 14.78s/it]

End KNN operations, time taken: 2.344355344772339
training loss: 2.8968286991119387
End document, total time: 14.482965230941772
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47774410247802734
Begin KNN operations
End KNN operations, time taken: 0.7452883720397949
Begin KNN operations
End KNN operations, time taken: 0.9726924896240234
Begin KNN operations
End KNN operations, time taken: 1.1885566711425781
Begin KNN operations
End KNN operations, time taken: 1.4347498416900635
Begin KNN operations
End KNN operations, time taken: 1.7191710472106934
Begin KNN operations
End KNN operations, time taken: 1.8747823238372803
Begin KNN operations
End KNN operations, time taken: 2.1721982955932617
Begin KNN operations


training:  14%|█▍        | 28/200 [07:12<42:24, 14.79s/it]

End KNN operations, time taken: 2.3360049724578857
training loss: 2.9301986932754516
End document, total time: 14.630445003509521
Begin document
Begin KNN operations
End KNN operations, time taken: 0.481644868850708
Begin KNN operations
End KNN operations, time taken: 0.7387776374816895
Begin KNN operations
End KNN operations, time taken: 1.0193018913269043
Begin KNN operations
End KNN operations, time taken: 1.3591184616088867
Begin KNN operations
End KNN operations, time taken: 1.4519975185394287
Begin KNN operations
End KNN operations, time taken: 1.7114591598510742
Begin KNN operations
End KNN operations, time taken: 1.8902223110198975
Begin KNN operations
End KNN operations, time taken: 2.1285479068756104
Begin KNN operations


training:  14%|█▍        | 29/200 [07:27<42:22, 14.87s/it]

End KNN operations, time taken: 2.376988410949707
training loss: 2.851096415519714
End document, total time: 14.864301919937134
Begin document
Begin KNN operations
End KNN operations, time taken: 0.5014276504516602
Begin KNN operations
End KNN operations, time taken: 0.7511723041534424
Begin KNN operations
End KNN operations, time taken: 0.9982819557189941
Begin KNN operations
End KNN operations, time taken: 1.2075042724609375
Begin KNN operations
End KNN operations, time taken: 1.4458949565887451
Begin KNN operations
End KNN operations, time taken: 1.714268445968628
Begin KNN operations
End KNN operations, time taken: 1.92277193069458
Begin KNN operations
End KNN operations, time taken: 2.21049427986145
Begin KNN operations


training:  15%|█▌        | 30/200 [07:42<42:22, 14.96s/it]

End KNN operations, time taken: 2.501437187194824
training loss: 2.8863308429718018
End document, total time: 14.97035264968872
Begin document
Begin KNN operations
End KNN operations, time taken: 0.48146891593933105
Begin KNN operations
End KNN operations, time taken: 0.7465074062347412
Begin KNN operations
End KNN operations, time taken: 1.0283007621765137
Begin KNN operations
End KNN operations, time taken: 1.218005895614624
Begin KNN operations
End KNN operations, time taken: 1.4626636505126953
Begin KNN operations
End KNN operations, time taken: 1.6636149883270264
Begin KNN operations
End KNN operations, time taken: 1.9387452602386475
Begin KNN operations
End KNN operations, time taken: 2.1270751953125
Begin KNN operations


training:  16%|█▌        | 31/200 [07:57<42:07, 14.95s/it]

End KNN operations, time taken: 2.4086737632751465
training loss: 2.8669010162353517
End document, total time: 14.76512336730957
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4795355796813965
Begin KNN operations
End KNN operations, time taken: 0.7531108856201172
Begin KNN operations
End KNN operations, time taken: 0.9606637954711914
Begin KNN operations
End KNN operations, time taken: 1.2067415714263916
Begin KNN operations
End KNN operations, time taken: 1.4830780029296875
Begin KNN operations
End KNN operations, time taken: 1.6190588474273682
Begin KNN operations
End KNN operations, time taken: 1.8842883110046387
Begin KNN operations
End KNN operations, time taken: 2.119518756866455
Begin KNN operations


training:  16%|█▌        | 32/200 [08:12<41:44, 14.91s/it]

End KNN operations, time taken: 2.413867473602295
training loss: 2.8205511569976807
End document, total time: 14.627187252044678
Begin document
Begin KNN operations
End KNN operations, time taken: 0.48348021507263184
Begin KNN operations
End KNN operations, time taken: 0.7575562000274658
Begin KNN operations
End KNN operations, time taken: 0.9890642166137695
Begin KNN operations
End KNN operations, time taken: 1.1879761219024658
Begin KNN operations
End KNN operations, time taken: 1.4156365394592285
Begin KNN operations
End KNN operations, time taken: 1.6397037506103516
Begin KNN operations
End KNN operations, time taken: 1.8881428241729736
Begin KNN operations
End KNN operations, time taken: 2.1196882724761963
Begin KNN operations


training:  16%|█▋        | 33/200 [08:27<41:23, 14.87s/it]

End KNN operations, time taken: 2.427802562713623
training loss: 2.8472460508346558
End document, total time: 14.601011276245117
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4754905700683594
Begin KNN operations
End KNN operations, time taken: 0.7452170848846436
Begin KNN operations
End KNN operations, time taken: 0.9751391410827637
Begin KNN operations
End KNN operations, time taken: 1.2030563354492188
Begin KNN operations
End KNN operations, time taken: 1.4143941402435303
Begin KNN operations
End KNN operations, time taken: 1.646275520324707
Begin KNN operations
End KNN operations, time taken: 1.8994617462158203
Begin KNN operations
End KNN operations, time taken: 2.188796043395996
Begin KNN operations


training:  17%|█▋        | 34/200 [08:41<41:08, 14.87s/it]

End KNN operations, time taken: 2.4158904552459717
training loss: 2.959435081481934
End document, total time: 14.68399691581726
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4740316867828369
Begin KNN operations
End KNN operations, time taken: 0.7462482452392578
Begin KNN operations
End KNN operations, time taken: 0.9661769866943359
Begin KNN operations
End KNN operations, time taken: 1.2222347259521484
Begin KNN operations
End KNN operations, time taken: 1.4424116611480713
Begin KNN operations
End KNN operations, time taken: 1.7324423789978027
Begin KNN operations
End KNN operations, time taken: 1.8857624530792236
Begin KNN operations
End KNN operations, time taken: 2.1318373680114746
Begin KNN operations


training:  18%|█▊        | 35/200 [08:56<40:52, 14.86s/it]

End KNN operations, time taken: 2.3730199337005615
training loss: 2.7400868415832518
End document, total time: 14.679750919342041
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4837615489959717
Begin KNN operations
End KNN operations, time taken: 0.7577097415924072
Begin KNN operations
End KNN operations, time taken: 0.9823005199432373
Begin KNN operations
End KNN operations, time taken: 1.250227689743042
Begin KNN operations
End KNN operations, time taken: 1.4231040477752686
Begin KNN operations
End KNN operations, time taken: 1.6632652282714844
Begin KNN operations
End KNN operations, time taken: 1.8743431568145752
Begin KNN operations
End KNN operations, time taken: 2.100351333618164
Begin KNN operations


training:  18%|█▊        | 36/200 [09:11<40:35, 14.85s/it]

End KNN operations, time taken: 2.400280237197876
training loss: 2.806778407096863
End document, total time: 14.642825603485107
Begin document
Begin KNN operations
End KNN operations, time taken: 0.48655152320861816
Begin KNN operations
End KNN operations, time taken: 0.7907907962799072
Begin KNN operations
End KNN operations, time taken: 0.9788825511932373
Begin KNN operations
End KNN operations, time taken: 1.2261714935302734
Begin KNN operations
End KNN operations, time taken: 1.427187204360962
Begin KNN operations
End KNN operations, time taken: 1.7028543949127197
Begin KNN operations
End KNN operations, time taken: 1.8910143375396729
Begin KNN operations
End KNN operations, time taken: 2.113215923309326
Begin KNN operations


training:  18%|█▊        | 37/200 [09:26<40:25, 14.88s/it]

End KNN operations, time taken: 2.455847978591919
training loss: 2.7729701519012453
End document, total time: 14.760037422180176
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4813520908355713
Begin KNN operations
End KNN operations, time taken: 0.7643742561340332
Begin KNN operations
End KNN operations, time taken: 0.9680168628692627
Begin KNN operations
End KNN operations, time taken: 1.2202026844024658
Begin KNN operations
End KNN operations, time taken: 1.4482755661010742
Begin KNN operations
End KNN operations, time taken: 1.6623249053955078
Begin KNN operations
End KNN operations, time taken: 2.0018551349639893
Begin KNN operations
End KNN operations, time taken: 2.1277453899383545
Begin KNN operations


training:  19%|█▉        | 38/200 [09:41<40:15, 14.91s/it]

End KNN operations, time taken: 2.424384117126465
training loss: 2.844007110595703
End document, total time: 14.80804705619812
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47114133834838867
Begin KNN operations
End KNN operations, time taken: 0.7529392242431641
Begin KNN operations
End KNN operations, time taken: 0.967336893081665
Begin KNN operations
End KNN operations, time taken: 1.2306413650512695
Begin KNN operations
End KNN operations, time taken: 1.4748492240905762
Begin KNN operations
End KNN operations, time taken: 1.7129149436950684
Begin KNN operations
End KNN operations, time taken: 1.8675670623779297
Begin KNN operations
End KNN operations, time taken: 2.147794246673584
Begin KNN operations


training:  20%|█▉        | 39/200 [09:56<39:58, 14.90s/it]

End KNN operations, time taken: 2.342142343521118
training loss: 2.8186784505844114
End document, total time: 14.67184853553772
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47324204444885254
Begin KNN operations
End KNN operations, time taken: 0.7986817359924316
Begin KNN operations
End KNN operations, time taken: 0.9927470684051514
Begin KNN operations
End KNN operations, time taken: 1.229191780090332
Begin KNN operations
End KNN operations, time taken: 1.4203336238861084
Begin KNN operations
End KNN operations, time taken: 1.6777386665344238
Begin KNN operations
End KNN operations, time taken: 1.8801183700561523
Begin KNN operations
End KNN operations, time taken: 2.1415486335754395
Begin KNN operations


training:  20%|██        | 40/200 [10:11<39:48, 14.93s/it]

End KNN operations, time taken: 2.45054292678833
training loss: 2.715817737579346
End document, total time: 14.80181622505188
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47714710235595703
Begin KNN operations
End KNN operations, time taken: 0.7589898109436035
Begin KNN operations
End KNN operations, time taken: 0.9853904247283936
Begin KNN operations
End KNN operations, time taken: 1.2385976314544678
Begin KNN operations
End KNN operations, time taken: 1.4634366035461426
Begin KNN operations
End KNN operations, time taken: 1.6917216777801514
Begin KNN operations
End KNN operations, time taken: 1.9327266216278076
Begin KNN operations
End KNN operations, time taken: 2.2502424716949463
Begin KNN operations


training:  20%|██        | 41/200 [10:26<39:40, 14.97s/it]

End KNN operations, time taken: 2.3895626068115234
training loss: 2.6923449277877807
End document, total time: 14.89809250831604
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4876270294189453
Begin KNN operations
End KNN operations, time taken: 0.7764434814453125
Begin KNN operations
End KNN operations, time taken: 0.9698896408081055
Begin KNN operations
End KNN operations, time taken: 1.2145845890045166
Begin KNN operations
End KNN operations, time taken: 1.4390404224395752
Begin KNN operations
End KNN operations, time taken: 1.704408884048462
Begin KNN operations
End KNN operations, time taken: 1.9530525207519531
Begin KNN operations
End KNN operations, time taken: 2.138350486755371
Begin KNN operations


training:  21%|██        | 42/200 [10:41<39:24, 14.96s/it]

End KNN operations, time taken: 2.3712191581726074
training loss: 2.690593600273132
End document, total time: 14.759241819381714
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47267889976501465
Begin KNN operations
End KNN operations, time taken: 0.748471736907959
Begin KNN operations
End KNN operations, time taken: 0.9642109870910645
Begin KNN operations
End KNN operations, time taken: 1.2429907321929932
Begin KNN operations
End KNN operations, time taken: 1.479578971862793
Begin KNN operations
End KNN operations, time taken: 1.665168285369873
Begin KNN operations
End KNN operations, time taken: 1.896916151046753
Begin KNN operations
End KNN operations, time taken: 2.097381114959717
Begin KNN operations


training:  22%|██▏       | 43/200 [10:56<39:02, 14.92s/it]

End KNN operations, time taken: 2.3548190593719482
training loss: 2.8634023189544675
End document, total time: 14.636755228042603
Begin document
Begin KNN operations
End KNN operations, time taken: 0.49655771255493164
Begin KNN operations
End KNN operations, time taken: 0.7838079929351807
Begin KNN operations
End KNN operations, time taken: 1.0153534412384033
Begin KNN operations
End KNN operations, time taken: 1.2066550254821777
Begin KNN operations
End KNN operations, time taken: 1.4137358665466309
Begin KNN operations
End KNN operations, time taken: 1.6485509872436523
Begin KNN operations
End KNN operations, time taken: 1.8766167163848877
Begin KNN operations
End KNN operations, time taken: 2.1220099925994873
Begin KNN operations


training:  22%|██▏       | 44/200 [11:11<38:44, 14.90s/it]

End KNN operations, time taken: 2.4202733039855957
training loss: 2.6240177392959594
End document, total time: 14.679930686950684
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4675469398498535
Begin KNN operations
End KNN operations, time taken: 0.7508378028869629
Begin KNN operations
End KNN operations, time taken: 0.9782910346984863
Begin KNN operations
End KNN operations, time taken: 1.2278494834899902
Begin KNN operations
End KNN operations, time taken: 1.4260716438293457
Begin KNN operations
End KNN operations, time taken: 1.6808650493621826
Begin KNN operations
End KNN operations, time taken: 1.9488465785980225
Begin KNN operations
End KNN operations, time taken: 2.139589548110962
Begin KNN operations


training:  22%|██▎       | 45/200 [11:25<38:27, 14.89s/it]

End KNN operations, time taken: 2.3489320278167725
training loss: 2.7804325342178347
End document, total time: 14.664414882659912
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4862401485443115
Begin KNN operations
End KNN operations, time taken: 0.7490832805633545
Begin KNN operations
End KNN operations, time taken: 0.9786937236785889
Begin KNN operations
End KNN operations, time taken: 1.222395420074463
Begin KNN operations
End KNN operations, time taken: 1.4288158416748047
Begin KNN operations
End KNN operations, time taken: 1.7154099941253662
Begin KNN operations
End KNN operations, time taken: 1.9413020610809326
Begin KNN operations
End KNN operations, time taken: 2.1051032543182373
Begin KNN operations


training:  23%|██▎       | 46/200 [11:40<38:12, 14.88s/it]

End KNN operations, time taken: 2.3642489910125732
training loss: 2.741799092292786
End document, total time: 14.696877479553223
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4747326374053955
Begin KNN operations
End KNN operations, time taken: 0.755265474319458
Begin KNN operations
End KNN operations, time taken: 1.0200474262237549
Begin KNN operations
End KNN operations, time taken: 1.264587640762329
Begin KNN operations
End KNN operations, time taken: 1.4393599033355713
Begin KNN operations
End KNN operations, time taken: 1.6587743759155273
Begin KNN operations
End KNN operations, time taken: 1.8928468227386475
Begin KNN operations
End KNN operations, time taken: 2.100233793258667
Begin KNN operations


training:  24%|██▎       | 47/200 [11:55<37:56, 14.88s/it]

End KNN operations, time taken: 2.3609704971313477
training loss: 2.6494030237197874
End document, total time: 14.67220425605774
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4892141819000244
Begin KNN operations
End KNN operations, time taken: 0.7402758598327637
Begin KNN operations
End KNN operations, time taken: 0.9885721206665039
Begin KNN operations
End KNN operations, time taken: 1.2023820877075195
Begin KNN operations
End KNN operations, time taken: 1.4274814128875732
Begin KNN operations
End KNN operations, time taken: 1.6591954231262207
Begin KNN operations
End KNN operations, time taken: 1.8671772480010986
Begin KNN operations
End KNN operations, time taken: 2.163363218307495
Begin KNN operations


training:  24%|██▍       | 48/200 [12:10<37:39, 14.86s/it]

End KNN operations, time taken: 2.39823842048645
training loss: 2.749323344230652
End document, total time: 14.64768671989441
Begin document
Begin KNN operations
End KNN operations, time taken: 0.46961212158203125
Begin KNN operations
End KNN operations, time taken: 0.7542741298675537
Begin KNN operations
End KNN operations, time taken: 0.9913489818572998
Begin KNN operations
End KNN operations, time taken: 1.2325193881988525
Begin KNN operations
End KNN operations, time taken: 1.4375066757202148
Begin KNN operations
End KNN operations, time taken: 1.6672134399414062
Begin KNN operations
End KNN operations, time taken: 1.9428534507751465
Begin KNN operations
End KNN operations, time taken: 2.1326897144317627
Begin KNN operations


training:  24%|██▍       | 49/200 [12:25<37:25, 14.87s/it]

End KNN operations, time taken: 2.3618173599243164
training loss: 2.607356953620911
End document, total time: 14.711924314498901
Begin document
Begin KNN operations
End KNN operations, time taken: 0.48039674758911133
Begin KNN operations
End KNN operations, time taken: 0.7480154037475586
Begin KNN operations
End KNN operations, time taken: 0.9833352565765381
Begin KNN operations
End KNN operations, time taken: 1.2031750679016113
Begin KNN operations
End KNN operations, time taken: 1.4800760746002197
Begin KNN operations
End KNN operations, time taken: 1.6396331787109375
Begin KNN operations
End KNN operations, time taken: 1.898409128189087
Begin KNN operations
End KNN operations, time taken: 2.1105380058288574
Begin KNN operations


training:  25%|██▌       | 50/200 [12:40<37:10, 14.87s/it]

End KNN operations, time taken: 2.432177782058716
training loss: 2.6865880012512204
End document, total time: 14.675077676773071
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4717404842376709
Begin KNN operations
End KNN operations, time taken: 0.7807877063751221
Begin KNN operations
End KNN operations, time taken: 1.0474658012390137
Begin KNN operations
End KNN operations, time taken: 1.2240893840789795
Begin KNN operations
End KNN operations, time taken: 1.479430913925171
Begin KNN operations
End KNN operations, time taken: 1.6746032238006592
Begin KNN operations
End KNN operations, time taken: 1.8954284191131592
Begin KNN operations
End KNN operations, time taken: 2.1839168071746826
Begin KNN operations


training:  26%|██▌       | 51/200 [12:55<37:07, 14.95s/it]

End KNN operations, time taken: 2.442199468612671
training loss: 2.696741771697998
End document, total time: 14.939074277877808
Begin document
Begin KNN operations
End KNN operations, time taken: 0.48301267623901367
Begin KNN operations
End KNN operations, time taken: 0.7587316036224365
Begin KNN operations
End KNN operations, time taken: 0.9844322204589844
Begin KNN operations
End KNN operations, time taken: 1.2042758464813232
Begin KNN operations
End KNN operations, time taken: 1.479644775390625
Begin KNN operations
End KNN operations, time taken: 1.6373004913330078
Begin KNN operations
End KNN operations, time taken: 1.8552041053771973
Begin KNN operations
End KNN operations, time taken: 2.201768398284912
Begin KNN operations


training:  26%|██▌       | 52/200 [13:10<36:48, 14.92s/it]

End KNN operations, time taken: 2.3680057525634766
training loss: 2.628322458267212
End document, total time: 14.67643141746521
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4760417938232422
Begin KNN operations
End KNN operations, time taken: 0.7396316528320312
Begin KNN operations
End KNN operations, time taken: 0.9658162593841553
Begin KNN operations
End KNN operations, time taken: 1.1840918064117432
Begin KNN operations
End KNN operations, time taken: 1.4718334674835205
Begin KNN operations
End KNN operations, time taken: 1.6791386604309082
Begin KNN operations
End KNN operations, time taken: 1.8778131008148193
Begin KNN operations
End KNN operations, time taken: 2.104203462600708
Begin KNN operations


training:  26%|██▋       | 53/200 [13:24<36:25, 14.87s/it]

End KNN operations, time taken: 2.3664238452911377
training loss: 2.6204872846603395
End document, total time: 14.561211347579956
Begin document
Begin KNN operations
End KNN operations, time taken: 0.48076462745666504
Begin KNN operations
End KNN operations, time taken: 0.7408137321472168
Begin KNN operations
End KNN operations, time taken: 0.9785575866699219
Begin KNN operations
End KNN operations, time taken: 1.2583739757537842
Begin KNN operations
End KNN operations, time taken: 1.4369118213653564
Begin KNN operations
End KNN operations, time taken: 1.625713586807251
Begin KNN operations
End KNN operations, time taken: 1.889455795288086
Begin KNN operations
End KNN operations, time taken: 2.1129531860351562
Begin KNN operations


training:  27%|██▋       | 54/200 [13:39<36:08, 14.85s/it]

End KNN operations, time taken: 2.407468557357788
training loss: 2.60718343257904
End document, total time: 14.631522178649902
Begin document
Begin KNN operations
End KNN operations, time taken: 0.49815988540649414
Begin KNN operations
End KNN operations, time taken: 0.7786014080047607
Begin KNN operations
End KNN operations, time taken: 0.9667832851409912
Begin KNN operations
End KNN operations, time taken: 1.2133917808532715
Begin KNN operations
End KNN operations, time taken: 1.460608959197998
Begin KNN operations
End KNN operations, time taken: 1.6640639305114746
Begin KNN operations
End KNN operations, time taken: 1.8982925415039062
Begin KNN operations
End KNN operations, time taken: 2.1381781101226807
Begin KNN operations


training:  28%|██▊       | 55/200 [13:54<35:56, 14.88s/it]

End KNN operations, time taken: 2.41849422454834
training loss: 2.604366230964661
End document, total time: 14.734868049621582
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47937822341918945
Begin KNN operations
End KNN operations, time taken: 0.7470798492431641
Begin KNN operations
End KNN operations, time taken: 0.9914131164550781
Begin KNN operations
End KNN operations, time taken: 1.2243766784667969
Begin KNN operations
End KNN operations, time taken: 1.470991611480713
Begin KNN operations
End KNN operations, time taken: 1.6466319561004639
Begin KNN operations
End KNN operations, time taken: 1.9708294868469238
Begin KNN operations
End KNN operations, time taken: 2.108631134033203
Begin KNN operations


training:  28%|██▊       | 56/200 [14:09<35:45, 14.90s/it]

End KNN operations, time taken: 2.3930649757385254
training loss: 2.5575271368026735
End document, total time: 14.75770092010498
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4764716625213623
Begin KNN operations
End KNN operations, time taken: 0.7501785755157471
Begin KNN operations
End KNN operations, time taken: 0.9805431365966797
Begin KNN operations
End KNN operations, time taken: 1.226604700088501
Begin KNN operations
End KNN operations, time taken: 1.4646484851837158
Begin KNN operations
End KNN operations, time taken: 1.7059504985809326
Begin KNN operations
End KNN operations, time taken: 1.9187233448028564
Begin KNN operations
End KNN operations, time taken: 2.192823648452759
Begin KNN operations


training:  28%|██▊       | 57/200 [14:24<35:33, 14.92s/it]

End KNN operations, time taken: 2.3747470378875732
training loss: 2.578276920318604
End document, total time: 14.79167127609253
Begin document
Begin KNN operations
End KNN operations, time taken: 0.48154520988464355
Begin KNN operations
End KNN operations, time taken: 0.7564833164215088
Begin KNN operations
End KNN operations, time taken: 0.9923138618469238
Begin KNN operations
End KNN operations, time taken: 1.2453651428222656
Begin KNN operations
End KNN operations, time taken: 1.4346625804901123
Begin KNN operations
End KNN operations, time taken: 1.6373074054718018
Begin KNN operations
End KNN operations, time taken: 1.8817858695983887
Begin KNN operations
End KNN operations, time taken: 2.1308867931365967
Begin KNN operations


training:  29%|██▉       | 58/200 [14:39<35:17, 14.91s/it]

End KNN operations, time taken: 2.453918933868408
training loss: 2.771521043777466
End document, total time: 14.702804327011108
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4741995334625244
Begin KNN operations
End KNN operations, time taken: 0.7700541019439697
Begin KNN operations
End KNN operations, time taken: 1.0021657943725586
Begin KNN operations
End KNN operations, time taken: 1.19630765914917
Begin KNN operations
End KNN operations, time taken: 1.4264171123504639
Begin KNN operations
End KNN operations, time taken: 1.7068665027618408
Begin KNN operations
End KNN operations, time taken: 1.8969395160675049
Begin KNN operations
End KNN operations, time taken: 2.187418222427368
Begin KNN operations


training:  30%|██▉       | 59/200 [14:54<35:02, 14.91s/it]

End KNN operations, time taken: 2.349080801010132
training loss: 2.606078386306763
End document, total time: 14.726085662841797
Begin document
Begin KNN operations
End KNN operations, time taken: 0.48328471183776855
Begin KNN operations
End KNN operations, time taken: 0.740339994430542
Begin KNN operations
End KNN operations, time taken: 0.9675750732421875
Begin KNN operations
End KNN operations, time taken: 1.2378408908843994
Begin KNN operations
End KNN operations, time taken: 1.4150660037994385
Begin KNN operations
End KNN operations, time taken: 1.7043242454528809
Begin KNN operations
End KNN operations, time taken: 1.9261276721954346
Begin KNN operations
End KNN operations, time taken: 2.1081275939941406
Begin KNN operations


training:  30%|███       | 60/200 [15:09<34:44, 14.89s/it]

End KNN operations, time taken: 2.365102767944336
training loss: 2.615067791938782
End document, total time: 14.643923282623291
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4769477844238281
Begin KNN operations
End KNN operations, time taken: 0.7515971660614014
Begin KNN operations
End KNN operations, time taken: 0.9730794429779053
Begin KNN operations
End KNN operations, time taken: 1.2316861152648926
Begin KNN operations
End KNN operations, time taken: 1.5167782306671143
Begin KNN operations
End KNN operations, time taken: 1.7244126796722412
Begin KNN operations
End KNN operations, time taken: 1.9355065822601318
Begin KNN operations
End KNN operations, time taken: 2.1623685359954834
Begin KNN operations


training:  30%|███       | 61/200 [15:24<34:35, 14.94s/it]

End KNN operations, time taken: 2.372631311416626
training loss: 2.568788909912109
End document, total time: 14.8570077419281
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47847509384155273
Begin KNN operations
End KNN operations, time taken: 0.7711021900177002
Begin KNN operations
End KNN operations, time taken: 0.9669861793518066
Begin KNN operations
End KNN operations, time taken: 1.2191946506500244
Begin KNN operations
End KNN operations, time taken: 1.4286115169525146
Begin KNN operations
End KNN operations, time taken: 1.6668822765350342
Begin KNN operations
End KNN operations, time taken: 1.8881070613861084
Begin KNN operations
End KNN operations, time taken: 2.149484157562256
Begin KNN operations


training:  31%|███       | 62/200 [15:39<34:20, 14.93s/it]

End KNN operations, time taken: 2.4538211822509766
training loss: 2.6289670944213865
End document, total time: 14.745342493057251
Begin document
Begin KNN operations
End KNN operations, time taken: 0.46546196937561035
Begin KNN operations
End KNN operations, time taken: 0.7507402896881104
Begin KNN operations
End KNN operations, time taken: 0.9967920780181885
Begin KNN operations
End KNN operations, time taken: 1.213670015335083
Begin KNN operations
End KNN operations, time taken: 1.4419465065002441
Begin KNN operations
End KNN operations, time taken: 1.701819658279419
Begin KNN operations
End KNN operations, time taken: 1.9599266052246094
Begin KNN operations
End KNN operations, time taken: 2.160651683807373
Begin KNN operations


training:  32%|███▏      | 63/200 [15:54<34:06, 14.94s/it]

End KNN operations, time taken: 2.352407217025757
training loss: 2.5338738441467283
End document, total time: 14.776860237121582
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4750690460205078
Begin KNN operations
End KNN operations, time taken: 0.7574999332427979
Begin KNN operations
End KNN operations, time taken: 0.9821300506591797
Begin KNN operations
End KNN operations, time taken: 1.2286546230316162
Begin KNN operations
End KNN operations, time taken: 1.4602837562561035
Begin KNN operations
End KNN operations, time taken: 1.7348923683166504
Begin KNN operations
End KNN operations, time taken: 1.9910333156585693
Begin KNN operations
End KNN operations, time taken: 2.1486945152282715
Begin KNN operations


training:  32%|███▏      | 64/200 [16:09<33:56, 14.98s/it]

End KNN operations, time taken: 2.3888533115386963
training loss: 2.5856857776641844
End document, total time: 14.870795249938965
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4678151607513428
Begin KNN operations
End KNN operations, time taken: 0.7542836666107178
Begin KNN operations
End KNN operations, time taken: 1.024719476699829
Begin KNN operations
End KNN operations, time taken: 1.2315280437469482
Begin KNN operations
End KNN operations, time taken: 1.435359239578247
Begin KNN operations
End KNN operations, time taken: 1.674009084701538
Begin KNN operations
End KNN operations, time taken: 1.8799784183502197
Begin KNN operations
End KNN operations, time taken: 2.114600896835327
Begin KNN operations


training:  32%|███▎      | 65/200 [16:24<33:37, 14.95s/it]

End KNN operations, time taken: 2.390817880630493
training loss: 2.528374147415161
End document, total time: 14.690115928649902
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47277259826660156
Begin KNN operations
End KNN operations, time taken: 0.7578020095825195
Begin KNN operations
End KNN operations, time taken: 0.9775185585021973
Begin KNN operations
End KNN operations, time taken: 1.2513353824615479
Begin KNN operations
End KNN operations, time taken: 1.4551608562469482
Begin KNN operations
End KNN operations, time taken: 1.6749250888824463
Begin KNN operations
End KNN operations, time taken: 1.898463487625122
Begin KNN operations
End KNN operations, time taken: 2.1926989555358887
Begin KNN operations


training:  33%|███▎      | 66/200 [16:39<33:22, 14.94s/it]

End KNN operations, time taken: 2.3676295280456543
training loss: 2.564573431015015
End document, total time: 14.745608568191528
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4646310806274414
Begin KNN operations
End KNN operations, time taken: 0.7568714618682861
Begin KNN operations
End KNN operations, time taken: 1.0036964416503906
Begin KNN operations
End KNN operations, time taken: 1.237626075744629
Begin KNN operations
End KNN operations, time taken: 1.4314064979553223
Begin KNN operations
End KNN operations, time taken: 1.6697790622711182
Begin KNN operations
End KNN operations, time taken: 1.9991204738616943
Begin KNN operations
End KNN operations, time taken: 2.1304714679718018
Begin KNN operations


training:  34%|███▎      | 67/200 [16:54<33:07, 14.95s/it]

End KNN operations, time taken: 2.364893674850464
training loss: 2.7500630378723145
End document, total time: 14.765351295471191
Begin document
Begin KNN operations
End KNN operations, time taken: 0.469022274017334
Begin KNN operations
End KNN operations, time taken: 0.7596838474273682
Begin KNN operations
End KNN operations, time taken: 0.9812347888946533
Begin KNN operations
End KNN operations, time taken: 1.2271111011505127
Begin KNN operations
End KNN operations, time taken: 1.4897730350494385
Begin KNN operations
End KNN operations, time taken: 1.6925036907196045
Begin KNN operations
End KNN operations, time taken: 1.9196393489837646
Begin KNN operations
End KNN operations, time taken: 2.1179535388946533
Begin KNN operations


training:  34%|███▍      | 68/200 [17:08<32:51, 14.94s/it]

End KNN operations, time taken: 2.378558397293091
training loss: 2.635791277885437
End document, total time: 14.743864297866821
Begin document
Begin KNN operations
End KNN operations, time taken: 0.48126697540283203
Begin KNN operations
End KNN operations, time taken: 0.7874021530151367
Begin KNN operations
End KNN operations, time taken: 0.988133430480957
Begin KNN operations
End KNN operations, time taken: 1.2147183418273926
Begin KNN operations
End KNN operations, time taken: 1.42689847946167
Begin KNN operations
End KNN operations, time taken: 1.6750996112823486
Begin KNN operations
End KNN operations, time taken: 1.935544729232788
Begin KNN operations
End KNN operations, time taken: 2.1185412406921387
Begin KNN operations


training:  34%|███▍      | 69/200 [17:23<32:38, 14.95s/it]

End KNN operations, time taken: 2.4426350593566895
training loss: 2.552727389335632
End document, total time: 14.79772424697876
Begin document
Begin KNN operations
End KNN operations, time taken: 0.46926069259643555
Begin KNN operations
End KNN operations, time taken: 0.7424726486206055
Begin KNN operations
End KNN operations, time taken: 0.9777898788452148
Begin KNN operations
End KNN operations, time taken: 1.1964218616485596
Begin KNN operations
End KNN operations, time taken: 1.4132826328277588
Begin KNN operations
End KNN operations, time taken: 1.6563756465911865
Begin KNN operations
End KNN operations, time taken: 1.8932862281799316
Begin KNN operations
End KNN operations, time taken: 2.1615655422210693
Begin KNN operations


training:  35%|███▌      | 70/200 [17:38<32:15, 14.89s/it]

End KNN operations, time taken: 2.346811294555664
training loss: 2.5738286972045903
End document, total time: 14.560983180999756
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4760708808898926
Begin KNN operations
End KNN operations, time taken: 0.7408454418182373
Begin KNN operations
End KNN operations, time taken: 0.9715361595153809
Begin KNN operations
End KNN operations, time taken: 1.1882503032684326
Begin KNN operations
End KNN operations, time taken: 1.414712905883789
Begin KNN operations
End KNN operations, time taken: 1.7124686241149902
Begin KNN operations
End KNN operations, time taken: 1.899348258972168
Begin KNN operations
End KNN operations, time taken: 2.0792453289031982
Begin KNN operations


training:  36%|███▌      | 71/200 [17:53<31:52, 14.83s/it]

End KNN operations, time taken: 2.341251850128174
training loss: 2.528142619132996
End document, total time: 14.518078327178955
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4665868282318115
Begin KNN operations
End KNN operations, time taken: 0.7376775741577148
Begin KNN operations
End KNN operations, time taken: 0.9850814342498779
Begin KNN operations
End KNN operations, time taken: 1.2506160736083984
Begin KNN operations
End KNN operations, time taken: 1.4383397102355957
Begin KNN operations
End KNN operations, time taken: 1.6646349430084229
Begin KNN operations
End KNN operations, time taken: 1.946664810180664
Begin KNN operations
End KNN operations, time taken: 2.132096767425537
Begin KNN operations


training:  36%|███▌      | 72/200 [18:08<31:41, 14.85s/it]

End KNN operations, time taken: 2.3863372802734375
training loss: 2.5227254390716554
End document, total time: 14.730503797531128
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4815032482147217
Begin KNN operations
End KNN operations, time taken: 0.7561345100402832
Begin KNN operations
End KNN operations, time taken: 0.9847314357757568
Begin KNN operations
End KNN operations, time taken: 1.1909160614013672
Begin KNN operations
End KNN operations, time taken: 1.4472923278808594
Begin KNN operations
End KNN operations, time taken: 1.6491897106170654
Begin KNN operations
End KNN operations, time taken: 1.877561330795288
Begin KNN operations
End KNN operations, time taken: 2.151244640350342
Begin KNN operations


training:  36%|███▋      | 73/200 [18:23<31:25, 14.85s/it]

End KNN operations, time taken: 2.4174702167510986
training loss: 2.5149769067764276
End document, total time: 14.669179916381836
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4834325313568115
Begin KNN operations
End KNN operations, time taken: 0.7436859607696533
Begin KNN operations
End KNN operations, time taken: 0.9710967540740967
Begin KNN operations
End KNN operations, time taken: 1.1888623237609863
Begin KNN operations
End KNN operations, time taken: 1.462282657623291
Begin KNN operations
End KNN operations, time taken: 1.6457622051239014
Begin KNN operations
End KNN operations, time taken: 1.9531774520874023
Begin KNN operations
End KNN operations, time taken: 2.1105844974517822
Begin KNN operations


training:  37%|███▋      | 74/200 [18:37<31:08, 14.83s/it]

End KNN operations, time taken: 2.348281145095825
training loss: 2.5396602630615233
End document, total time: 14.610424041748047
Begin document
Begin KNN operations
End KNN operations, time taken: 0.46762776374816895
Begin KNN operations
End KNN operations, time taken: 0.7516872882843018
Begin KNN operations
End KNN operations, time taken: 0.9710323810577393
Begin KNN operations
End KNN operations, time taken: 1.1944026947021484
Begin KNN operations
End KNN operations, time taken: 1.489962100982666
Begin KNN operations
End KNN operations, time taken: 1.700695514678955
Begin KNN operations
End KNN operations, time taken: 1.8575019836425781
Begin KNN operations
End KNN operations, time taken: 2.154622793197632
Begin KNN operations


training:  38%|███▊      | 75/200 [18:52<30:53, 14.83s/it]

End KNN operations, time taken: 2.3542850017547607
training loss: 2.506850600242615
End document, total time: 14.6300208568573
Begin document
Begin KNN operations
End KNN operations, time taken: 0.470719575881958
Begin KNN operations
End KNN operations, time taken: 0.7637248039245605
Begin KNN operations
End KNN operations, time taken: 1.026726484298706
Begin KNN operations
End KNN operations, time taken: 1.1995429992675781
Begin KNN operations
End KNN operations, time taken: 1.4237730503082275
Begin KNN operations
End KNN operations, time taken: 1.6449148654937744
Begin KNN operations
End KNN operations, time taken: 1.8742167949676514
Begin KNN operations
End KNN operations, time taken: 2.1255245208740234
Begin KNN operations


training:  38%|███▊      | 76/200 [19:07<30:37, 14.82s/it]

End KNN operations, time taken: 2.3899855613708496
training loss: 2.5461823463439943
End document, total time: 14.636356830596924
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47539734840393066
Begin KNN operations
End KNN operations, time taken: 0.7399113178253174
Begin KNN operations
End KNN operations, time taken: 0.9739258289337158
Begin KNN operations
End KNN operations, time taken: 1.2055492401123047
Begin KNN operations
End KNN operations, time taken: 1.4212892055511475
Begin KNN operations
End KNN operations, time taken: 1.651106357574463
Begin KNN operations
End KNN operations, time taken: 1.874826431274414
Begin KNN operations
End KNN operations, time taken: 2.276411294937134
Begin KNN operations


training:  38%|███▊      | 77/200 [19:22<30:25, 14.84s/it]

End KNN operations, time taken: 2.370936870574951
training loss: 2.457496762275696
End document, total time: 14.705230474472046
Begin document
Begin KNN operations
End KNN operations, time taken: 0.491715669631958
Begin KNN operations
End KNN operations, time taken: 0.7708802223205566
Begin KNN operations
End KNN operations, time taken: 0.973243236541748
Begin KNN operations
End KNN operations, time taken: 1.1970114707946777
Begin KNN operations
End KNN operations, time taken: 1.431044578552246
Begin KNN operations
End KNN operations, time taken: 1.6657383441925049
Begin KNN operations
End KNN operations, time taken: 1.918776512145996
Begin KNN operations
End KNN operations, time taken: 2.1162781715393066
Begin KNN operations


training:  39%|███▉      | 78/200 [19:37<30:10, 14.84s/it]

End KNN operations, time taken: 2.3701255321502686
training loss: 2.51853449344635
End document, total time: 14.66662883758545
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4862833023071289
Begin KNN operations
End KNN operations, time taken: 0.7438139915466309
Begin KNN operations
End KNN operations, time taken: 0.9721667766571045
Begin KNN operations
End KNN operations, time taken: 1.2476918697357178
Begin KNN operations
End KNN operations, time taken: 1.4946296215057373
Begin KNN operations
End KNN operations, time taken: 1.6826093196868896
Begin KNN operations
End KNN operations, time taken: 1.8435540199279785
Begin KNN operations
End KNN operations, time taken: 2.1032235622406006
Begin KNN operations


training:  40%|███▉      | 79/200 [19:52<29:54, 14.83s/it]

End KNN operations, time taken: 2.3476321697235107
training loss: 2.5386951684951784
End document, total time: 14.630144357681274
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47504162788391113
Begin KNN operations
End KNN operations, time taken: 0.7716445922851562
Begin KNN operations
End KNN operations, time taken: 0.9698569774627686
Begin KNN operations
End KNN operations, time taken: 1.188272476196289
Begin KNN operations
End KNN operations, time taken: 1.398186206817627
Begin KNN operations
End KNN operations, time taken: 1.6721210479736328
Begin KNN operations
End KNN operations, time taken: 1.8783609867095947
Begin KNN operations
End KNN operations, time taken: 2.095641851425171
Begin KNN operations


training:  40%|████      | 80/200 [20:06<29:38, 14.82s/it]

End KNN operations, time taken: 2.4635801315307617
training loss: 2.5113148927688593
End document, total time: 14.609966516494751
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4712240695953369
Begin KNN operations
End KNN operations, time taken: 0.7517721652984619
Begin KNN operations
End KNN operations, time taken: 0.9648957252502441
Begin KNN operations
End KNN operations, time taken: 1.2094357013702393
Begin KNN operations
End KNN operations, time taken: 1.416046142578125
Begin KNN operations
End KNN operations, time taken: 1.6483829021453857
Begin KNN operations
End KNN operations, time taken: 1.9387497901916504
Begin KNN operations
End KNN operations, time taken: 2.1624202728271484
Begin KNN operations


training:  40%|████      | 81/200 [20:21<29:23, 14.82s/it]

End KNN operations, time taken: 2.3971712589263916
training loss: 2.462646770477295
End document, total time: 14.643921136856079
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47158265113830566
Begin KNN operations
End KNN operations, time taken: 0.762514591217041
Begin KNN operations
End KNN operations, time taken: 0.9741842746734619
Begin KNN operations
End KNN operations, time taken: 1.2114818096160889
Begin KNN operations
End KNN operations, time taken: 1.4251296520233154
Begin KNN operations
End KNN operations, time taken: 1.7360844612121582
Begin KNN operations
End KNN operations, time taken: 1.914323329925537
Begin KNN operations
End KNN operations, time taken: 2.1131818294525146
Begin KNN operations


training:  41%|████      | 82/200 [20:36<29:11, 14.84s/it]

End KNN operations, time taken: 2.387451410293579
training loss: 2.5013988733291628
End document, total time: 14.699655532836914
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4786558151245117
Begin KNN operations
End KNN operations, time taken: 0.7526321411132812
Begin KNN operations
End KNN operations, time taken: 0.9840781688690186
Begin KNN operations
End KNN operations, time taken: 1.261307716369629
Begin KNN operations
End KNN operations, time taken: 1.426377534866333
Begin KNN operations
End KNN operations, time taken: 1.661661148071289
Begin KNN operations
End KNN operations, time taken: 1.9070806503295898
Begin KNN operations
End KNN operations, time taken: 2.1039021015167236
Begin KNN operations


training:  42%|████▏     | 83/200 [20:51<28:58, 14.86s/it]

End KNN operations, time taken: 2.4030041694641113
training loss: 2.57192587852478
End document, total time: 14.710221767425537
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4899141788482666
Begin KNN operations
End KNN operations, time taken: 0.7529196739196777
Begin KNN operations
End KNN operations, time taken: 0.9722995758056641
Begin KNN operations
End KNN operations, time taken: 1.2114949226379395
Begin KNN operations
End KNN operations, time taken: 1.4296512603759766
Begin KNN operations
End KNN operations, time taken: 1.6683447360992432
Begin KNN operations
End KNN operations, time taken: 1.8850393295288086
Begin KNN operations
End KNN operations, time taken: 2.137258768081665
Begin KNN operations


training:  42%|████▏     | 84/200 [21:06<28:41, 14.84s/it]

End KNN operations, time taken: 2.3798396587371826
training loss: 2.5576464176177978
End document, total time: 14.620124340057373
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47376322746276855
Begin KNN operations
End KNN operations, time taken: 0.7520012855529785
Begin KNN operations
End KNN operations, time taken: 0.9605176448822021
Begin KNN operations
End KNN operations, time taken: 1.1778604984283447
Begin KNN operations
End KNN operations, time taken: 1.4349710941314697
Begin KNN operations
End KNN operations, time taken: 1.6619701385498047
Begin KNN operations
End KNN operations, time taken: 1.9927165508270264
Begin KNN operations
End KNN operations, time taken: 2.0944058895111084
Begin KNN operations


training:  42%|████▎     | 85/200 [21:21<28:27, 14.85s/it]

End KNN operations, time taken: 2.4217112064361572
training loss: 2.5311460971832274
End document, total time: 14.664672136306763
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4725840091705322
Begin KNN operations
End KNN operations, time taken: 0.7494316101074219
Begin KNN operations
End KNN operations, time taken: 0.9717395305633545
Begin KNN operations
End KNN operations, time taken: 1.2137718200683594
Begin KNN operations
End KNN operations, time taken: 1.4504377841949463
Begin KNN operations
End KNN operations, time taken: 1.6692874431610107
Begin KNN operations
End KNN operations, time taken: 1.8786425590515137
Begin KNN operations
End KNN operations, time taken: 2.114656925201416
Begin KNN operations


training:  43%|████▎     | 86/200 [21:35<28:08, 14.81s/it]

End KNN operations, time taken: 2.3308627605438232
training loss: 2.5905379295349125
End document, total time: 14.558025121688843
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47187089920043945
Begin KNN operations
End KNN operations, time taken: 0.7589824199676514
Begin KNN operations
End KNN operations, time taken: 1.0112996101379395
Begin KNN operations
End KNN operations, time taken: 1.1933789253234863
Begin KNN operations
End KNN operations, time taken: 1.4311680793762207
Begin KNN operations
End KNN operations, time taken: 1.6523926258087158
Begin KNN operations
End KNN operations, time taken: 1.8987207412719727
Begin KNN operations
End KNN operations, time taken: 2.15474271774292
Begin KNN operations


training:  44%|████▎     | 87/200 [21:50<27:55, 14.82s/it]

End KNN operations, time taken: 2.388509750366211
training loss: 2.6397346258163448
End document, total time: 14.660804271697998
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47022080421447754
Begin KNN operations
End KNN operations, time taken: 0.7437329292297363
Begin KNN operations
End KNN operations, time taken: 0.9965255260467529
Begin KNN operations
End KNN operations, time taken: 1.1986289024353027
Begin KNN operations
End KNN operations, time taken: 1.414896011352539
Begin KNN operations
End KNN operations, time taken: 1.6782941818237305
Begin KNN operations
End KNN operations, time taken: 1.8725097179412842
Begin KNN operations
End KNN operations, time taken: 2.183170795440674
Begin KNN operations


training:  44%|████▍     | 88/200 [22:05<27:42, 14.84s/it]

End KNN operations, time taken: 2.416184425354004
training loss: 2.633846259117127
End document, total time: 14.686503410339355
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47601318359375
Begin KNN operations
End KNN operations, time taken: 0.7535943984985352
Begin KNN operations
End KNN operations, time taken: 0.9923362731933594
Begin KNN operations
End KNN operations, time taken: 1.2010531425476074
Begin KNN operations
End KNN operations, time taken: 1.4337401390075684
Begin KNN operations
End KNN operations, time taken: 1.6801555156707764
Begin KNN operations
End KNN operations, time taken: 1.9335744380950928
Begin KNN operations
End KNN operations, time taken: 2.097400665283203
Begin KNN operations


training:  44%|████▍     | 89/200 [22:20<27:25, 14.83s/it]

End KNN operations, time taken: 2.341240882873535
training loss: 2.4436646461486817
End document, total time: 14.610204696655273
Begin document
Begin KNN operations
End KNN operations, time taken: 0.48482465744018555
Begin KNN operations
End KNN operations, time taken: 0.7456140518188477
Begin KNN operations
End KNN operations, time taken: 1.0056543350219727
Begin KNN operations
End KNN operations, time taken: 1.222994327545166
Begin KNN operations
End KNN operations, time taken: 1.5477147102355957
Begin KNN operations
End KNN operations, time taken: 1.6448097229003906
Begin KNN operations
End KNN operations, time taken: 1.8929080963134766
Begin KNN operations
End KNN operations, time taken: 2.119114398956299
Begin KNN operations


training:  45%|████▌     | 90/200 [22:35<27:13, 14.85s/it]

End KNN operations, time taken: 2.3589320182800293
training loss: 2.495332145690918
End document, total time: 14.717903852462769
Begin document
Begin KNN operations
End KNN operations, time taken: 0.48929476737976074
Begin KNN operations
End KNN operations, time taken: 0.7605185508728027
Begin KNN operations
End KNN operations, time taken: 0.9952042102813721
Begin KNN operations
End KNN operations, time taken: 1.196218490600586
Begin KNN operations
End KNN operations, time taken: 1.436943769454956
Begin KNN operations
End KNN operations, time taken: 1.6530430316925049
Begin KNN operations
End KNN operations, time taken: 1.937880516052246
Begin KNN operations
End KNN operations, time taken: 2.1413357257843018
Begin KNN operations


training:  46%|████▌     | 91/200 [22:50<26:59, 14.86s/it]

End KNN operations, time taken: 2.4425621032714844
training loss: 2.481265521049499
End document, total time: 14.714669704437256
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4872119426727295
Begin KNN operations
End KNN operations, time taken: 0.7483394145965576
Begin KNN operations
End KNN operations, time taken: 0.9760856628417969
Begin KNN operations
End KNN operations, time taken: 1.2449560165405273
Begin KNN operations
End KNN operations, time taken: 1.4479453563690186
Begin KNN operations
End KNN operations, time taken: 1.642622470855713
Begin KNN operations
End KNN operations, time taken: 1.9104030132293701
Begin KNN operations
End KNN operations, time taken: 2.1726934909820557
Begin KNN operations


training:  46%|████▌     | 92/200 [23:05<26:45, 14.86s/it]

End KNN operations, time taken: 2.37551212310791
training loss: 2.57516725063324
End document, total time: 14.687765121459961
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47295260429382324
Begin KNN operations
End KNN operations, time taken: 0.754814863204956
Begin KNN operations
End KNN operations, time taken: 0.9831476211547852
Begin KNN operations
End KNN operations, time taken: 1.2232756614685059
Begin KNN operations
End KNN operations, time taken: 1.4277048110961914
Begin KNN operations
End KNN operations, time taken: 1.7636561393737793
Begin KNN operations
End KNN operations, time taken: 1.8975331783294678
Begin KNN operations
End KNN operations, time taken: 2.164088249206543
Begin KNN operations


training:  46%|████▋     | 93/200 [23:19<26:32, 14.89s/it]

End KNN operations, time taken: 2.3626585006713867
training loss: 2.5782726526260373
End document, total time: 14.765538454055786
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47562336921691895
Begin KNN operations
End KNN operations, time taken: 0.7566874027252197
Begin KNN operations
End KNN operations, time taken: 1.0039701461791992
Begin KNN operations
End KNN operations, time taken: 1.323681116104126
Begin KNN operations
End KNN operations, time taken: 1.5119898319244385
Begin KNN operations
End KNN operations, time taken: 1.6573379039764404
Begin KNN operations
End KNN operations, time taken: 1.8879098892211914
Begin KNN operations
End KNN operations, time taken: 2.150114059448242
Begin KNN operations


training:  47%|████▋     | 94/200 [23:35<26:23, 14.94s/it]

End KNN operations, time taken: 2.4253273010253906
training loss: 2.414963531494141
End document, total time: 14.887116193771362
Begin document
Begin KNN operations
End KNN operations, time taken: 0.5265626907348633
Begin KNN operations
End KNN operations, time taken: 0.7613039016723633
Begin KNN operations
End KNN operations, time taken: 0.9776895046234131
Begin KNN operations
End KNN operations, time taken: 1.2054200172424316
Begin KNN operations
End KNN operations, time taken: 1.4496784210205078
Begin KNN operations
End KNN operations, time taken: 1.6814284324645996
Begin KNN operations
End KNN operations, time taken: 1.9140174388885498
Begin KNN operations
End KNN operations, time taken: 2.2267978191375732
Begin KNN operations


training:  48%|████▊     | 95/200 [23:50<26:11, 14.97s/it]

End KNN operations, time taken: 2.3952369689941406
training loss: 2.5232537269592283
End document, total time: 14.858660459518433
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4815988540649414
Begin KNN operations
End KNN operations, time taken: 0.7546072006225586
Begin KNN operations
End KNN operations, time taken: 0.9761495590209961
Begin KNN operations
End KNN operations, time taken: 1.235231876373291
Begin KNN operations
End KNN operations, time taken: 1.4430947303771973
Begin KNN operations
End KNN operations, time taken: 1.658827304840088
Begin KNN operations
End KNN operations, time taken: 1.9690873622894287
Begin KNN operations
End KNN operations, time taken: 2.136800527572632
Begin KNN operations


training:  48%|████▊     | 96/200 [24:04<25:55, 14.96s/it]

End KNN operations, time taken: 2.4064722061157227
training loss: 2.4811808586120607
End document, total time: 14.7568519115448
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47007012367248535
Begin KNN operations
End KNN operations, time taken: 0.7514452934265137
Begin KNN operations
End KNN operations, time taken: 0.9764204025268555
Begin KNN operations
End KNN operations, time taken: 1.236595869064331
Begin KNN operations
End KNN operations, time taken: 1.5080397129058838
Begin KNN operations
End KNN operations, time taken: 1.658846139907837
Begin KNN operations
End KNN operations, time taken: 1.9031400680541992
Begin KNN operations
End KNN operations, time taken: 2.1173548698425293
Begin KNN operations


training:  48%|████▊     | 97/200 [24:19<25:39, 14.94s/it]

End KNN operations, time taken: 2.4001526832580566
training loss: 2.4674472570419312
End document, total time: 14.735835075378418
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4972233772277832
Begin KNN operations
End KNN operations, time taken: 0.7862269878387451
Begin KNN operations
End KNN operations, time taken: 1.0086803436279297
Begin KNN operations
End KNN operations, time taken: 1.2004237174987793
Begin KNN operations
End KNN operations, time taken: 1.4785442352294922
Begin KNN operations
End KNN operations, time taken: 1.6373159885406494
Begin KNN operations
End KNN operations, time taken: 1.9012744426727295
Begin KNN operations
End KNN operations, time taken: 2.0988709926605225
Begin KNN operations


training:  49%|████▉     | 98/200 [24:34<25:23, 14.94s/it]

End KNN operations, time taken: 2.444739580154419
training loss: 2.526894474029541
End document, total time: 14.758423089981079
Begin document
Begin KNN operations
End KNN operations, time taken: 0.476900577545166
Begin KNN operations
End KNN operations, time taken: 0.7623569965362549
Begin KNN operations
End KNN operations, time taken: 0.9832375049591064
Begin KNN operations
End KNN operations, time taken: 1.2219336032867432
Begin KNN operations
End KNN operations, time taken: 1.4281811714172363
Begin KNN operations
End KNN operations, time taken: 1.7111566066741943
Begin KNN operations
End KNN operations, time taken: 1.9218769073486328
Begin KNN operations
End KNN operations, time taken: 2.214728593826294
Begin KNN operations


training:  50%|████▉     | 99/200 [24:49<25:11, 14.97s/it]

End KNN operations, time taken: 2.417328119277954
training loss: 2.442824172973633
End document, total time: 14.830383777618408
Begin document
Begin KNN operations
End KNN operations, time taken: 0.48880624771118164
Begin KNN operations
End KNN operations, time taken: 0.7509138584136963
Begin KNN operations
End KNN operations, time taken: 0.9931952953338623
Begin KNN operations
End KNN operations, time taken: 1.245408058166504
Begin KNN operations
End KNN operations, time taken: 1.4740209579467773
Begin KNN operations
End KNN operations, time taken: 1.7178058624267578
Begin KNN operations
End KNN operations, time taken: 1.914421558380127
Begin KNN operations
End KNN operations, time taken: 2.140699863433838
Begin KNN operations


training:  50%|█████     | 100/200 [25:04<24:57, 14.97s/it]

End KNN operations, time taken: 2.396540641784668
training loss: 2.535315823554993
End document, total time: 14.815797328948975
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4852259159088135
Begin KNN operations
End KNN operations, time taken: 0.7567269802093506
Begin KNN operations
End KNN operations, time taken: 1.0283093452453613
Begin KNN operations
End KNN operations, time taken: 1.2616333961486816
Begin KNN operations
End KNN operations, time taken: 1.446686029434204
Begin KNN operations
End KNN operations, time taken: 1.6736257076263428
Begin KNN operations
End KNN operations, time taken: 1.950615644454956
Begin KNN operations
End KNN operations, time taken: 2.1581199169158936
Begin KNN operations
End KNN operations, time taken: 2.401674509048462
training loss: 2.440053391456604
End document, total time: 14.885385990142822
Begin KNN operations
End KNN operations, time taken: 0.4277188777923584
Begin KNN operations
End KNN operations, time taken: 0.6317911

training:  50%|█████     | 101/200 [25:33<31:25, 19.04s/it]

End KNN operations, time taken: 2.3555569648742676
valid loss: 2.4369598388671876
Begin document
Begin KNN operations
End KNN operations, time taken: 0.5045680999755859
Begin KNN operations
End KNN operations, time taken: 0.7499287128448486
Begin KNN operations
End KNN operations, time taken: 1.00726318359375
Begin KNN operations
End KNN operations, time taken: 1.2051723003387451
Begin KNN operations
End KNN operations, time taken: 1.4644033908843994
Begin KNN operations
End KNN operations, time taken: 1.669435739517212
Begin KNN operations
End KNN operations, time taken: 1.9418632984161377
Begin KNN operations
End KNN operations, time taken: 2.189811944961548
Begin KNN operations


training:  51%|█████     | 102/200 [25:48<29:05, 17.81s/it]

End KNN operations, time taken: 2.380985975265503
training loss: 2.409187126159668
End document, total time: 14.849442720413208
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47577762603759766
Begin KNN operations
End KNN operations, time taken: 0.755051851272583
Begin KNN operations
End KNN operations, time taken: 0.9919335842132568
Begin KNN operations
End KNN operations, time taken: 1.223465919494629
Begin KNN operations
End KNN operations, time taken: 1.4681470394134521
Begin KNN operations
End KNN operations, time taken: 1.7448980808258057
Begin KNN operations
End KNN operations, time taken: 1.898604393005371
Begin KNN operations
End KNN operations, time taken: 2.1327173709869385
Begin KNN operations


training:  52%|█████▏    | 103/200 [26:03<27:24, 16.96s/it]

End KNN operations, time taken: 2.3725717067718506
training loss: 2.4347650051116942
End document, total time: 14.785827398300171
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4811384677886963
Begin KNN operations
End KNN operations, time taken: 0.74283766746521
Begin KNN operations
End KNN operations, time taken: 1.014498233795166
Begin KNN operations
End KNN operations, time taken: 1.250821828842163
Begin KNN operations
End KNN operations, time taken: 1.4640302658081055
Begin KNN operations
End KNN operations, time taken: 1.676604986190796
Begin KNN operations
End KNN operations, time taken: 1.9074301719665527
Begin KNN operations
End KNN operations, time taken: 2.1358511447906494
Begin KNN operations


training:  52%|█████▏    | 104/200 [26:18<26:12, 16.38s/it]

End KNN operations, time taken: 2.422354221343994
training loss: 2.6652334213256834
End document, total time: 14.850351810455322
Begin document
Begin KNN operations
End KNN operations, time taken: 0.5011942386627197
Begin KNN operations
End KNN operations, time taken: 0.7543399333953857
Begin KNN operations
End KNN operations, time taken: 0.9939873218536377
Begin KNN operations
End KNN operations, time taken: 1.202859878540039
Begin KNN operations
End KNN operations, time taken: 1.4295103549957275
Begin KNN operations
End KNN operations, time taken: 1.6801753044128418
Begin KNN operations
End KNN operations, time taken: 1.887634038925171
Begin KNN operations
End KNN operations, time taken: 2.1668787002563477
Begin KNN operations


training:  52%|█████▎    | 105/200 [26:33<25:14, 15.94s/it]

End KNN operations, time taken: 2.43272066116333
training loss: 2.4429415225982667
End document, total time: 14.755014657974243
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47464680671691895
Begin KNN operations
End KNN operations, time taken: 0.7414791584014893
Begin KNN operations
End KNN operations, time taken: 0.9993994235992432
Begin KNN operations
End KNN operations, time taken: 1.1921823024749756
Begin KNN operations
End KNN operations, time taken: 1.4422738552093506
Begin KNN operations
End KNN operations, time taken: 1.662853479385376
Begin KNN operations
End KNN operations, time taken: 1.924957036972046
Begin KNN operations
End KNN operations, time taken: 2.1051015853881836
Begin KNN operations


training:  53%|█████▎    | 106/200 [26:48<24:27, 15.61s/it]

End KNN operations, time taken: 2.380603075027466
training loss: 2.5049993276596068
End document, total time: 14.637948751449585
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4793386459350586
Begin KNN operations
End KNN operations, time taken: 0.755385160446167
Begin KNN operations
End KNN operations, time taken: 0.9884142875671387
Begin KNN operations
End KNN operations, time taken: 1.216371774673462
Begin KNN operations
End KNN operations, time taken: 1.531996488571167
Begin KNN operations
End KNN operations, time taken: 1.6762409210205078
Begin KNN operations
End KNN operations, time taken: 1.9027941226959229
Begin KNN operations
End KNN operations, time taken: 2.144386053085327
Begin KNN operations


training:  54%|█████▎    | 107/200 [27:03<23:53, 15.42s/it]

End KNN operations, time taken: 2.3786513805389404
training loss: 2.4930582761764524
End document, total time: 14.794999122619629
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4739091396331787
Begin KNN operations
End KNN operations, time taken: 0.7603678703308105
Begin KNN operations
End KNN operations, time taken: 1.009087085723877
Begin KNN operations
End KNN operations, time taken: 1.2086763381958008
Begin KNN operations
End KNN operations, time taken: 1.4567975997924805
Begin KNN operations
End KNN operations, time taken: 1.6719176769256592
Begin KNN operations
End KNN operations, time taken: 1.9007749557495117
Begin KNN operations
End KNN operations, time taken: 2.147172689437866
Begin KNN operations


training:  54%|█████▍    | 108/200 [27:18<23:28, 15.31s/it]

End KNN operations, time taken: 2.484539747238159
training loss: 2.432080245018005
End document, total time: 14.856906175613403
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4803750514984131
Begin KNN operations
End KNN operations, time taken: 0.7476484775543213
Begin KNN operations
End KNN operations, time taken: 0.97739577293396
Begin KNN operations
End KNN operations, time taken: 1.2045025825500488
Begin KNN operations
End KNN operations, time taken: 1.447324514389038
Begin KNN operations
End KNN operations, time taken: 1.6749267578125
Begin KNN operations
End KNN operations, time taken: 1.9567766189575195
Begin KNN operations
End KNN operations, time taken: 2.3000969886779785
Begin KNN operations


training:  55%|█████▍    | 109/200 [27:33<23:07, 15.25s/it]

End KNN operations, time taken: 2.4276368618011475
training loss: 2.4561029195785524
End document, total time: 14.939300060272217
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4849867820739746
Begin KNN operations
End KNN operations, time taken: 0.7517640590667725
Begin KNN operations
End KNN operations, time taken: 0.9850435256958008
Begin KNN operations
End KNN operations, time taken: 1.2132747173309326
Begin KNN operations
End KNN operations, time taken: 1.4508326053619385
Begin KNN operations
End KNN operations, time taken: 1.7219278812408447
Begin KNN operations
End KNN operations, time taken: 1.9050688743591309
Begin KNN operations
End KNN operations, time taken: 2.112518787384033
Begin KNN operations


training:  55%|█████▌    | 110/200 [27:48<22:42, 15.14s/it]

End KNN operations, time taken: 2.363731622695923
training loss: 2.3830352783203126
End document, total time: 14.705094337463379
Begin document
Begin KNN operations
End KNN operations, time taken: 0.5001044273376465
Begin KNN operations
End KNN operations, time taken: 0.7497515678405762
Begin KNN operations
End KNN operations, time taken: 1.0214636325836182
Begin KNN operations
End KNN operations, time taken: 1.2611310482025146
Begin KNN operations
End KNN operations, time taken: 1.4573018550872803
Begin KNN operations
End KNN operations, time taken: 1.6791431903839111
Begin KNN operations
End KNN operations, time taken: 1.918607473373413
Begin KNN operations
End KNN operations, time taken: 2.089823007583618
Begin KNN operations


training:  56%|█████▌    | 111/200 [28:03<22:23, 15.09s/it]

End KNN operations, time taken: 2.3733484745025635
training loss: 2.4027997493743896
End document, total time: 14.774574041366577
Begin document
Begin KNN operations
End KNN operations, time taken: 0.49249982833862305
Begin KNN operations
End KNN operations, time taken: 0.7422544956207275
Begin KNN operations
End KNN operations, time taken: 0.9722058773040771
Begin KNN operations
End KNN operations, time taken: 1.2223641872406006
Begin KNN operations
End KNN operations, time taken: 1.4491682052612305
Begin KNN operations
End KNN operations, time taken: 1.673612117767334
Begin KNN operations
End KNN operations, time taken: 1.9216701984405518
Begin KNN operations
End KNN operations, time taken: 2.1869969367980957
Begin KNN operations


training:  56%|█████▌    | 112/200 [28:18<22:04, 15.05s/it]

End KNN operations, time taken: 2.3989145755767822
training loss: 2.5604620695114133
End document, total time: 14.782248258590698
Begin document
Begin KNN operations
End KNN operations, time taken: 0.46946024894714355
Begin KNN operations
End KNN operations, time taken: 0.7520546913146973
Begin KNN operations
End KNN operations, time taken: 0.9883167743682861
Begin KNN operations
End KNN operations, time taken: 1.2731554508209229
Begin KNN operations
End KNN operations, time taken: 1.474588394165039
Begin KNN operations
End KNN operations, time taken: 1.70145583152771
Begin KNN operations
End KNN operations, time taken: 2.0193119049072266
Begin KNN operations
End KNN operations, time taken: 2.1326487064361572
Begin KNN operations


training:  56%|█████▋    | 113/200 [28:33<21:51, 15.08s/it]

End KNN operations, time taken: 2.370041608810425
training loss: 2.4550796508789063
End document, total time: 14.9417142868042
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4854886531829834
Begin KNN operations
End KNN operations, time taken: 0.771334171295166
Begin KNN operations
End KNN operations, time taken: 0.9929828643798828
Begin KNN operations
End KNN operations, time taken: 1.2226674556732178
Begin KNN operations
End KNN operations, time taken: 1.5170783996582031
Begin KNN operations
End KNN operations, time taken: 1.652480125427246
Begin KNN operations
End KNN operations, time taken: 1.8879072666168213
Begin KNN operations
End KNN operations, time taken: 2.0944197177886963
Begin KNN operations


training:  57%|█████▋    | 114/200 [28:48<21:31, 15.01s/it]

End KNN operations, time taken: 2.368873357772827
training loss: 2.43939893245697
End document, total time: 14.682762384414673
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4943501949310303
Begin KNN operations
End KNN operations, time taken: 0.8218052387237549
Begin KNN operations
End KNN operations, time taken: 0.9984045028686523
Begin KNN operations
End KNN operations, time taken: 1.1986217498779297
Begin KNN operations
End KNN operations, time taken: 1.4489963054656982
Begin KNN operations
End KNN operations, time taken: 1.6597914695739746
Begin KNN operations
End KNN operations, time taken: 1.9086899757385254
Begin KNN operations
End KNN operations, time taken: 2.1637940406799316
Begin KNN operations


training:  57%|█████▊    | 115/200 [29:03<21:17, 15.03s/it]

End KNN operations, time taken: 2.430562973022461
training loss: 2.485150074958801
End document, total time: 14.868916273117065
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4921576976776123
Begin KNN operations
End KNN operations, time taken: 0.7565023899078369
Begin KNN operations
End KNN operations, time taken: 0.9829456806182861
Begin KNN operations
End KNN operations, time taken: 1.2165143489837646
Begin KNN operations
End KNN operations, time taken: 1.4600846767425537
Begin KNN operations
End KNN operations, time taken: 1.674659013748169
Begin KNN operations
End KNN operations, time taken: 1.92830491065979
Begin KNN operations
End KNN operations, time taken: 2.2081196308135986
Begin KNN operations


training:  58%|█████▊    | 116/200 [29:18<21:01, 15.02s/it]

End KNN operations, time taken: 2.3958632946014404
training loss: 2.4325634479522704
End document, total time: 14.822831869125366
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4819469451904297
Begin KNN operations
End KNN operations, time taken: 0.740398645401001
Begin KNN operations
End KNN operations, time taken: 0.9861576557159424
Begin KNN operations
End KNN operations, time taken: 1.2049798965454102
Begin KNN operations
End KNN operations, time taken: 1.4556002616882324
Begin KNN operations
End KNN operations, time taken: 1.7202672958374023
Begin KNN operations
End KNN operations, time taken: 1.9337084293365479
Begin KNN operations
End KNN operations, time taken: 2.1453020572662354
Begin KNN operations


training:  58%|█████▊    | 117/200 [29:33<20:44, 15.00s/it]

End KNN operations, time taken: 2.371356964111328
training loss: 2.4632086753845215
End document, total time: 14.765772581100464
Begin document
Begin KNN operations
End KNN operations, time taken: 0.485318660736084
Begin KNN operations
End KNN operations, time taken: 0.7563354969024658
Begin KNN operations
End KNN operations, time taken: 0.9937379360198975
Begin KNN operations
End KNN operations, time taken: 1.2779533863067627
Begin KNN operations
End KNN operations, time taken: 1.4444949626922607
Begin KNN operations
End KNN operations, time taken: 1.6785593032836914
Begin KNN operations
End KNN operations, time taken: 1.9655354022979736
Begin KNN operations
End KNN operations, time taken: 2.153592109680176
Begin KNN operations


training:  59%|█████▉    | 118/200 [29:48<20:31, 15.02s/it]

End KNN operations, time taken: 2.379467248916626
training loss: 2.440673279762268
End document, total time: 14.886446714401245
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4815535545349121
Begin KNN operations
End KNN operations, time taken: 0.7388386726379395
Begin KNN operations
End KNN operations, time taken: 0.9863848686218262
Begin KNN operations
End KNN operations, time taken: 1.22678542137146
Begin KNN operations
End KNN operations, time taken: 1.4787170886993408
Begin KNN operations
End KNN operations, time taken: 1.6845011711120605
Begin KNN operations
End KNN operations, time taken: 1.9066143035888672
Begin KNN operations
End KNN operations, time taken: 2.188184976577759
Begin KNN operations


training:  60%|█████▉    | 119/200 [30:03<20:15, 15.01s/it]

End KNN operations, time taken: 2.363415002822876
training loss: 2.4627794265747074
End document, total time: 14.784887552261353
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4772021770477295
Begin KNN operations
End KNN operations, time taken: 0.7458128929138184
Begin KNN operations
End KNN operations, time taken: 0.9655041694641113
Begin KNN operations
End KNN operations, time taken: 1.1983635425567627
Begin KNN operations
End KNN operations, time taken: 1.4462273120880127
Begin KNN operations
End KNN operations, time taken: 1.764589786529541
Begin KNN operations
End KNN operations, time taken: 1.9209399223327637
Begin KNN operations
End KNN operations, time taken: 2.1710522174835205
Begin KNN operations


training:  60%|██████    | 120/200 [30:18<19:59, 14.99s/it]

End KNN operations, time taken: 2.3625001907348633
training loss: 2.428338527679444
End document, total time: 14.769282817840576
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4726741313934326
Begin KNN operations
End KNN operations, time taken: 0.7386107444763184
Begin KNN operations
End KNN operations, time taken: 1.010319709777832
Begin KNN operations
End KNN operations, time taken: 1.2102036476135254
Begin KNN operations
End KNN operations, time taken: 1.4898145198822021
Begin KNN operations
End KNN operations, time taken: 1.662956953048706
Begin KNN operations
End KNN operations, time taken: 1.8797385692596436
Begin KNN operations
End KNN operations, time taken: 2.1535332202911377
Begin KNN operations


training:  60%|██████    | 121/200 [30:32<19:41, 14.96s/it]

End KNN operations, time taken: 2.390233278274536
training loss: 2.4195380210876465
End document, total time: 14.713079452514648
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47112226486206055
Begin KNN operations
End KNN operations, time taken: 0.7678544521331787
Begin KNN operations
End KNN operations, time taken: 1.0013504028320312
Begin KNN operations
End KNN operations, time taken: 1.2629311084747314
Begin KNN operations
End KNN operations, time taken: 1.4748213291168213
Begin KNN operations
End KNN operations, time taken: 1.692138671875
Begin KNN operations
End KNN operations, time taken: 1.959545373916626
Begin KNN operations
End KNN operations, time taken: 2.119903802871704
Begin KNN operations


training:  61%|██████    | 122/200 [30:48<19:29, 15.00s/it]

End KNN operations, time taken: 2.44974684715271
training loss: 2.4866338491439817
End document, total time: 14.913536548614502
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4852101802825928
Begin KNN operations
End KNN operations, time taken: 0.7522933483123779
Begin KNN operations
End KNN operations, time taken: 0.9719839096069336
Begin KNN operations
End KNN operations, time taken: 1.1932237148284912
Begin KNN operations
End KNN operations, time taken: 1.4491524696350098
Begin KNN operations
End KNN operations, time taken: 1.686173915863037
Begin KNN operations
End KNN operations, time taken: 1.895684003829956
Begin KNN operations
End KNN operations, time taken: 2.176408052444458
Begin KNN operations


training:  62%|██████▏   | 123/200 [31:02<19:11, 14.96s/it]

End KNN operations, time taken: 2.344111204147339
training loss: 2.427736496925354
End document, total time: 14.682348489761353
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4816105365753174
Begin KNN operations
End KNN operations, time taken: 0.7448389530181885
Begin KNN operations
End KNN operations, time taken: 0.9671216011047363
Begin KNN operations
End KNN operations, time taken: 1.2096729278564453
Begin KNN operations
End KNN operations, time taken: 1.4313762187957764
Begin KNN operations
End KNN operations, time taken: 1.6942028999328613
Begin KNN operations
End KNN operations, time taken: 1.899322509765625
Begin KNN operations
End KNN operations, time taken: 2.143131971359253
Begin KNN operations


training:  62%|██████▏   | 124/200 [31:17<18:54, 14.93s/it]

End KNN operations, time taken: 2.3819453716278076
training loss: 2.5011828660964963
End document, total time: 14.67779541015625
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4793071746826172
Begin KNN operations
End KNN operations, time taken: 0.744431734085083
Begin KNN operations
End KNN operations, time taken: 0.9893295764923096
Begin KNN operations
End KNN operations, time taken: 1.2746200561523438
Begin KNN operations
End KNN operations, time taken: 1.448009967803955
Begin KNN operations
End KNN operations, time taken: 1.6711757183074951
Begin KNN operations
End KNN operations, time taken: 1.8895604610443115
Begin KNN operations
End KNN operations, time taken: 2.163437843322754
Begin KNN operations


training:  62%|██████▎   | 125/200 [31:32<18:42, 14.96s/it]

End KNN operations, time taken: 2.4841561317443848
training loss: 2.4568708419799807
End document, total time: 14.872765064239502
Begin document
Begin KNN operations
End KNN operations, time taken: 0.49338269233703613
Begin KNN operations
End KNN operations, time taken: 0.7495369911193848
Begin KNN operations
End KNN operations, time taken: 0.9838554859161377
Begin KNN operations
End KNN operations, time taken: 1.2411067485809326
Begin KNN operations
End KNN operations, time taken: 1.4266462326049805
Begin KNN operations
End KNN operations, time taken: 1.633098840713501
Begin KNN operations
End KNN operations, time taken: 1.8902275562286377
Begin KNN operations
End KNN operations, time taken: 2.1657862663269043
Begin KNN operations


training:  63%|██████▎   | 126/200 [31:47<18:25, 14.94s/it]

End KNN operations, time taken: 2.4127540588378906
training loss: 2.432656121253967
End document, total time: 14.702885150909424
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4834456443786621
Begin KNN operations
End KNN operations, time taken: 0.750023603439331
Begin KNN operations
End KNN operations, time taken: 0.9856810569763184
Begin KNN operations
End KNN operations, time taken: 1.2430782318115234
Begin KNN operations
End KNN operations, time taken: 1.4511780738830566
Begin KNN operations
End KNN operations, time taken: 1.6509590148925781
Begin KNN operations
End KNN operations, time taken: 1.993642807006836
Begin KNN operations
End KNN operations, time taken: 2.118161201477051
Begin KNN operations


training:  64%|██████▎   | 127/200 [32:02<18:12, 14.96s/it]

End KNN operations, time taken: 2.402989387512207
training loss: 2.3937404870986936
End document, total time: 14.83591341972351
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4897613525390625
Begin KNN operations
End KNN operations, time taken: 0.7381970882415771
Begin KNN operations
End KNN operations, time taken: 0.98679518699646
Begin KNN operations
End KNN operations, time taken: 1.202974796295166
Begin KNN operations
End KNN operations, time taken: 1.490168571472168
Begin KNN operations
End KNN operations, time taken: 1.6524951457977295
Begin KNN operations
End KNN operations, time taken: 1.8719422817230225
Begin KNN operations
End KNN operations, time taken: 2.1294639110565186
Begin KNN operations


training:  64%|██████▍   | 128/200 [32:17<17:54, 14.92s/it]

End KNN operations, time taken: 2.357837200164795
training loss: 2.428193521499634
End document, total time: 14.646480560302734
Begin document
Begin KNN operations
End KNN operations, time taken: 0.48375940322875977
Begin KNN operations
End KNN operations, time taken: 0.786684513092041
Begin KNN operations
End KNN operations, time taken: 1.0003869533538818
Begin KNN operations
End KNN operations, time taken: 1.1958568096160889
Begin KNN operations
End KNN operations, time taken: 1.4419965744018555
Begin KNN operations
End KNN operations, time taken: 1.6643691062927246
Begin KNN operations
End KNN operations, time taken: 1.8767008781433105
Begin KNN operations
End KNN operations, time taken: 2.1269490718841553
Begin KNN operations


training:  64%|██████▍   | 129/200 [32:32<17:40, 14.93s/it]

End KNN operations, time taken: 2.4889776706695557
training loss: 2.518852877616882
End document, total time: 14.777384042739868
Begin document
Begin KNN operations
End KNN operations, time taken: 0.46874260902404785
Begin KNN operations
End KNN operations, time taken: 0.748436450958252
Begin KNN operations
End KNN operations, time taken: 0.9734935760498047
Begin KNN operations
End KNN operations, time taken: 1.2266368865966797
Begin KNN operations
End KNN operations, time taken: 1.4372575283050537
Begin KNN operations
End KNN operations, time taken: 1.6465342044830322
Begin KNN operations
End KNN operations, time taken: 1.8518400192260742
Begin KNN operations
End KNN operations, time taken: 2.1803996562957764
Begin KNN operations


training:  65%|██████▌   | 130/200 [32:47<17:22, 14.89s/it]

End KNN operations, time taken: 2.3580267429351807
training loss: 2.396738481521606
End document, total time: 14.61776328086853
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4806809425354004
Begin KNN operations
End KNN operations, time taken: 0.7418522834777832
Begin KNN operations
End KNN operations, time taken: 0.9700930118560791
Begin KNN operations
End KNN operations, time taken: 1.2053813934326172
Begin KNN operations
End KNN operations, time taken: 1.4615166187286377
Begin KNN operations
End KNN operations, time taken: 1.7043311595916748
Begin KNN operations
End KNN operations, time taken: 1.8980622291564941
Begin KNN operations
End KNN operations, time taken: 2.101118803024292
Begin KNN operations


training:  66%|██████▌   | 131/200 [33:02<17:06, 14.87s/it]

End KNN operations, time taken: 2.377603054046631
training loss: 2.47064778804779
End document, total time: 14.64644479751587
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4827299118041992
Begin KNN operations
End KNN operations, time taken: 0.7404534816741943
Begin KNN operations
End KNN operations, time taken: 0.9753270149230957
Begin KNN operations
End KNN operations, time taken: 1.258988618850708
Begin KNN operations
End KNN operations, time taken: 1.4484574794769287
Begin KNN operations
End KNN operations, time taken: 1.6473596096038818
Begin KNN operations
End KNN operations, time taken: 1.8675205707550049
Begin KNN operations
End KNN operations, time taken: 2.131556510925293
Begin KNN operations


training:  66%|██████▌   | 132/200 [33:16<16:49, 14.84s/it]

End KNN operations, time taken: 2.32680606842041
training loss: 2.412659931182861
End document, total time: 14.607059955596924
Begin document
Begin KNN operations
End KNN operations, time taken: 0.473691463470459
Begin KNN operations
End KNN operations, time taken: 0.7873721122741699
Begin KNN operations
End KNN operations, time taken: 0.9895682334899902
Begin KNN operations
End KNN operations, time taken: 1.2024986743927002
Begin KNN operations
End KNN operations, time taken: 1.4352524280548096
Begin KNN operations
End KNN operations, time taken: 1.6475274562835693
Begin KNN operations
End KNN operations, time taken: 1.9000217914581299
Begin KNN operations
End KNN operations, time taken: 2.125429630279541
Begin KNN operations


training:  66%|██████▋   | 133/200 [33:31<16:35, 14.86s/it]

End KNN operations, time taken: 2.436379909515381
training loss: 2.4916328430175785
End document, total time: 14.705511808395386
Begin document
Begin KNN operations
End KNN operations, time taken: 0.48430347442626953
Begin KNN operations
End KNN operations, time taken: 0.7373864650726318
Begin KNN operations
End KNN operations, time taken: 0.9669344425201416
Begin KNN operations
End KNN operations, time taken: 1.2083323001861572
Begin KNN operations
End KNN operations, time taken: 1.436211347579956
Begin KNN operations
End KNN operations, time taken: 1.6704888343811035
Begin KNN operations
End KNN operations, time taken: 1.9263992309570312
Begin KNN operations
End KNN operations, time taken: 2.132408618927002
Begin KNN operations


training:  67%|██████▋   | 134/200 [33:46<16:19, 14.85s/it]

End KNN operations, time taken: 2.363961696624756
training loss: 2.463526105880737
End document, total time: 14.619823217391968
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4868957996368408
Begin KNN operations
End KNN operations, time taken: 0.7601380348205566
Begin KNN operations
End KNN operations, time taken: 0.9724602699279785
Begin KNN operations
End KNN operations, time taken: 1.1866655349731445
Begin KNN operations
End KNN operations, time taken: 1.4486920833587646
Begin KNN operations
End KNN operations, time taken: 1.7218375205993652
Begin KNN operations
End KNN operations, time taken: 1.895387887954712
Begin KNN operations
End KNN operations, time taken: 2.1453301906585693
Begin KNN operations


training:  68%|██████▊   | 135/200 [34:01<16:04, 14.84s/it]

End KNN operations, time taken: 2.3361661434173584
training loss: 2.447981524467468
End document, total time: 14.65674376487732
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4751927852630615
Begin KNN operations
End KNN operations, time taken: 0.7799596786499023
Begin KNN operations
End KNN operations, time taken: 1.022296667098999
Begin KNN operations
End KNN operations, time taken: 1.2383403778076172
Begin KNN operations
End KNN operations, time taken: 1.4316904544830322
Begin KNN operations
End KNN operations, time taken: 1.656416893005371
Begin KNN operations
End KNN operations, time taken: 1.8656790256500244
Begin KNN operations
End KNN operations, time taken: 2.1948273181915283
Begin KNN operations


training:  68%|██████▊   | 136/200 [34:16<15:52, 14.88s/it]

End KNN operations, time taken: 2.397704601287842
training loss: 2.565702819824219
End document, total time: 14.79352617263794
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4748106002807617
Begin KNN operations
End KNN operations, time taken: 0.7453892230987549
Begin KNN operations
End KNN operations, time taken: 0.9824364185333252
Begin KNN operations
End KNN operations, time taken: 1.2183306217193604
Begin KNN operations
End KNN operations, time taken: 1.4108893871307373
Begin KNN operations
End KNN operations, time taken: 1.6666455268859863
Begin KNN operations
End KNN operations, time taken: 1.895658254623413
Begin KNN operations
End KNN operations, time taken: 2.1726343631744385
Begin KNN operations


training:  68%|██████▊   | 137/200 [34:31<15:37, 14.88s/it]

End KNN operations, time taken: 2.3820767402648926
training loss: 2.3712112665176392
End document, total time: 14.686437368392944
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47903943061828613
Begin KNN operations
End KNN operations, time taken: 0.7695419788360596
Begin KNN operations
End KNN operations, time taken: 0.9562814235687256
Begin KNN operations
End KNN operations, time taken: 1.208012580871582
Begin KNN operations
End KNN operations, time taken: 1.4744369983673096
Begin KNN operations
End KNN operations, time taken: 1.7098453044891357
Begin KNN operations
End KNN operations, time taken: 1.9700648784637451
Begin KNN operations
End KNN operations, time taken: 2.139573574066162
Begin KNN operations


training:  69%|██████▉   | 138/200 [34:46<15:24, 14.92s/it]

End KNN operations, time taken: 2.3745784759521484
training loss: 2.4076129198074336
End document, total time: 14.818456649780273
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4754610061645508
Begin KNN operations
End KNN operations, time taken: 0.748281717300415
Begin KNN operations
End KNN operations, time taken: 0.9834816455841064
Begin KNN operations
End KNN operations, time taken: 1.2125914096832275
Begin KNN operations
End KNN operations, time taken: 1.4882595539093018
Begin KNN operations
End KNN operations, time taken: 1.6493291854858398
Begin KNN operations
End KNN operations, time taken: 1.885077953338623
Begin KNN operations
End KNN operations, time taken: 2.1126461029052734
Begin KNN operations


training:  70%|██████▉   | 139/200 [35:01<15:07, 14.87s/it]

End KNN operations, time taken: 2.3334953784942627
training loss: 2.4063446998596194
End document, total time: 14.58866024017334
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4793670177459717
Begin KNN operations
End KNN operations, time taken: 0.7551579475402832
Begin KNN operations
End KNN operations, time taken: 1.0252904891967773
Begin KNN operations
End KNN operations, time taken: 1.20115327835083
Begin KNN operations
End KNN operations, time taken: 1.4345388412475586
Begin KNN operations
End KNN operations, time taken: 1.647745132446289
Begin KNN operations
End KNN operations, time taken: 1.8676633834838867
Begin KNN operations
End KNN operations, time taken: 2.090959072113037
Begin KNN operations


training:  70%|███████   | 140/200 [35:15<14:52, 14.87s/it]

End KNN operations, time taken: 2.4694786071777344
training loss: 2.4452295541763305
End document, total time: 14.678569078445435
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47990965843200684
Begin KNN operations
End KNN operations, time taken: 0.7370789051055908
Begin KNN operations
End KNN operations, time taken: 0.9721946716308594
Begin KNN operations
End KNN operations, time taken: 1.1836848258972168
Begin KNN operations
End KNN operations, time taken: 1.44722318649292
Begin KNN operations
End KNN operations, time taken: 1.6498980522155762
Begin KNN operations
End KNN operations, time taken: 1.892388105392456
Begin KNN operations
End KNN operations, time taken: 2.147860050201416
Begin KNN operations


training:  70%|███████   | 141/200 [35:30<14:35, 14.84s/it]

End KNN operations, time taken: 2.32706618309021
training loss: 2.407580614089966
End document, total time: 14.564261674880981
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47922515869140625
Begin KNN operations
End KNN operations, time taken: 0.7386200428009033
Begin KNN operations
End KNN operations, time taken: 1.038360834121704
Begin KNN operations
End KNN operations, time taken: 1.1830387115478516
Begin KNN operations
End KNN operations, time taken: 1.3991622924804688
Begin KNN operations
End KNN operations, time taken: 1.6899404525756836
Begin KNN operations
End KNN operations, time taken: 1.862443447113037
Begin KNN operations
End KNN operations, time taken: 2.092245101928711
Begin KNN operations


training:  71%|███████   | 142/200 [35:45<14:19, 14.82s/it]

End KNN operations, time taken: 2.377941846847534
training loss: 2.483642506599426
End document, total time: 14.603607892990112
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47589588165283203
Begin KNN operations
End KNN operations, time taken: 0.7650480270385742
Begin KNN operations
End KNN operations, time taken: 0.9880118370056152
Begin KNN operations
End KNN operations, time taken: 1.2412405014038086
Begin KNN operations
End KNN operations, time taken: 1.4302799701690674
Begin KNN operations
End KNN operations, time taken: 1.6537203788757324
Begin KNN operations
End KNN operations, time taken: 1.969944953918457
Begin KNN operations
End KNN operations, time taken: 2.137004852294922
Begin KNN operations


training:  72%|███████▏  | 143/200 [36:00<14:06, 14.86s/it]

End KNN operations, time taken: 2.3660900592803955
training loss: 2.4174384117126464
End document, total time: 14.757503747940063
Begin document
Begin KNN operations
End KNN operations, time taken: 0.486147403717041
Begin KNN operations
End KNN operations, time taken: 0.7462027072906494
Begin KNN operations
End KNN operations, time taken: 0.9841244220733643
Begin KNN operations
End KNN operations, time taken: 1.2415401935577393
Begin KNN operations
End KNN operations, time taken: 1.4374139308929443
Begin KNN operations
End KNN operations, time taken: 1.6670928001403809
Begin KNN operations
End KNN operations, time taken: 1.898876428604126
Begin KNN operations
End KNN operations, time taken: 2.134093999862671
Begin KNN operations


training:  72%|███████▏  | 144/200 [36:15<13:52, 14.86s/it]

End KNN operations, time taken: 2.4082062244415283
training loss: 2.3873800754547116
End document, total time: 14.688004732131958
Begin document
Begin KNN operations
End KNN operations, time taken: 0.46868109703063965
Begin KNN operations
End KNN operations, time taken: 0.7481489181518555
Begin KNN operations
End KNN operations, time taken: 0.9908411502838135
Begin KNN operations
End KNN operations, time taken: 1.1871955394744873
Begin KNN operations
End KNN operations, time taken: 1.4233031272888184
Begin KNN operations
End KNN operations, time taken: 1.656891107559204
Begin KNN operations
End KNN operations, time taken: 1.941403865814209
Begin KNN operations
End KNN operations, time taken: 2.124816417694092
Begin KNN operations


training:  72%|███████▎  | 145/200 [36:30<13:37, 14.86s/it]

End KNN operations, time taken: 2.3603782653808594
training loss: 2.3848864078521728
End document, total time: 14.659026861190796
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4795231819152832
Begin KNN operations
End KNN operations, time taken: 0.7443497180938721
Begin KNN operations
End KNN operations, time taken: 0.9744627475738525
Begin KNN operations
End KNN operations, time taken: 1.2320587635040283
Begin KNN operations
End KNN operations, time taken: 1.4948527812957764
Begin KNN operations
End KNN operations, time taken: 1.74021577835083
Begin KNN operations
End KNN operations, time taken: 1.8601999282836914
Begin KNN operations
End KNN operations, time taken: 2.093977451324463
Begin KNN operations


training:  73%|███████▎  | 146/200 [36:44<13:22, 14.86s/it]

End KNN operations, time taken: 2.342506170272827
training loss: 2.388828635215759
End document, total time: 14.675872802734375
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47386956214904785
Begin KNN operations
End KNN operations, time taken: 0.7643105983734131
Begin KNN operations
End KNN operations, time taken: 0.9910299777984619
Begin KNN operations
End KNN operations, time taken: 1.2074615955352783
Begin KNN operations
End KNN operations, time taken: 1.419813871383667
Begin KNN operations
End KNN operations, time taken: 1.6541180610656738
Begin KNN operations
End KNN operations, time taken: 1.9228711128234863
Begin KNN operations
End KNN operations, time taken: 2.1411807537078857
Begin KNN operations


training:  74%|███████▎  | 147/200 [36:59<13:07, 14.86s/it]

End KNN operations, time taken: 2.3802199363708496
training loss: 2.3985049486160275
End document, total time: 14.666146039962769
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47608017921447754
Begin KNN operations
End KNN operations, time taken: 0.7452242374420166
Begin KNN operations
End KNN operations, time taken: 0.9688980579376221
Begin KNN operations
End KNN operations, time taken: 1.188694953918457
Begin KNN operations
End KNN operations, time taken: 1.4677300453186035
Begin KNN operations
End KNN operations, time taken: 1.6584558486938477
Begin KNN operations
End KNN operations, time taken: 1.9000732898712158
Begin KNN operations
End KNN operations, time taken: 2.1774539947509766
Begin KNN operations


training:  74%|███████▍  | 148/200 [37:14<12:52, 14.86s/it]

End KNN operations, time taken: 2.3699018955230713
training loss: 2.3899832010269164
End document, total time: 14.66923189163208
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4837307929992676
Begin KNN operations
End KNN operations, time taken: 0.747593879699707
Begin KNN operations
End KNN operations, time taken: 0.9889736175537109
Begin KNN operations
End KNN operations, time taken: 1.19976806640625
Begin KNN operations
End KNN operations, time taken: 1.4089276790618896
Begin KNN operations
End KNN operations, time taken: 1.7145426273345947
Begin KNN operations
End KNN operations, time taken: 1.9281589984893799
Begin KNN operations
End KNN operations, time taken: 2.1483752727508545
Begin KNN operations


training:  74%|███████▍  | 149/200 [37:29<12:37, 14.86s/it]

End KNN operations, time taken: 2.3486478328704834
training loss: 2.4139554977416995
End document, total time: 14.693651676177979
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4675755500793457
Begin KNN operations
End KNN operations, time taken: 0.7395918369293213
Begin KNN operations
End KNN operations, time taken: 0.9707059860229492
Begin KNN operations
End KNN operations, time taken: 1.2446537017822266
Begin KNN operations
End KNN operations, time taken: 1.504953145980835
Begin KNN operations
End KNN operations, time taken: 1.667140007019043
Begin KNN operations
End KNN operations, time taken: 1.8701303005218506
Begin KNN operations
End KNN operations, time taken: 2.0989630222320557
Begin KNN operations


training:  75%|███████▌  | 150/200 [37:44<12:22, 14.84s/it]

End KNN operations, time taken: 2.3128325939178467
training loss: 2.5014663219451907
End document, total time: 14.6157705783844
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4926302433013916
Begin KNN operations
End KNN operations, time taken: 0.7816822528839111
Begin KNN operations
End KNN operations, time taken: 0.9819612503051758
Begin KNN operations
End KNN operations, time taken: 1.199267864227295
Begin KNN operations
End KNN operations, time taken: 1.3932342529296875
Begin KNN operations
End KNN operations, time taken: 1.6272222995758057
Begin KNN operations
End KNN operations, time taken: 1.8573317527770996
Begin KNN operations
End KNN operations, time taken: 2.0859522819519043
Begin KNN operations


training:  76%|███████▌  | 151/200 [37:59<12:05, 14.81s/it]

End KNN operations, time taken: 2.425490617752075
training loss: 2.4604860305786134
End document, total time: 14.560389041900635
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4825103282928467
Begin KNN operations
End KNN operations, time taken: 0.7521376609802246
Begin KNN operations
End KNN operations, time taken: 0.966801643371582
Begin KNN operations
End KNN operations, time taken: 1.214573621749878
Begin KNN operations
End KNN operations, time taken: 1.4263055324554443
Begin KNN operations
End KNN operations, time taken: 1.676576852798462
Begin KNN operations
End KNN operations, time taken: 1.9150938987731934
Begin KNN operations
End KNN operations, time taken: 2.1541943550109863
Begin KNN operations


training:  76%|███████▌  | 152/200 [38:13<11:51, 14.82s/it]

End KNN operations, time taken: 2.3276634216308594
training loss: 2.375802326202393
End document, total time: 14.63212513923645
Begin document
Begin KNN operations
End KNN operations, time taken: 0.5100173950195312
Begin KNN operations
End KNN operations, time taken: 0.7895247936248779
Begin KNN operations
End KNN operations, time taken: 0.9876017570495605
Begin KNN operations
End KNN operations, time taken: 1.2125182151794434
Begin KNN operations
End KNN operations, time taken: 1.434650182723999
Begin KNN operations
End KNN operations, time taken: 1.6989421844482422
Begin KNN operations
End KNN operations, time taken: 1.9047675132751465
Begin KNN operations
End KNN operations, time taken: 2.0913469791412354
Begin KNN operations


training:  76%|███████▋  | 153/200 [38:28<11:38, 14.85s/it]

End KNN operations, time taken: 2.3805129528045654
training loss: 2.4342401981353765
End document, total time: 14.74928593635559
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47413158416748047
Begin KNN operations
End KNN operations, time taken: 0.7516460418701172
Begin KNN operations
End KNN operations, time taken: 1.0137066841125488
Begin KNN operations
End KNN operations, time taken: 1.2747840881347656
Begin KNN operations
End KNN operations, time taken: 1.4380438327789307
Begin KNN operations
End KNN operations, time taken: 1.7088468074798584
Begin KNN operations
End KNN operations, time taken: 1.9672622680664062
Begin KNN operations
End KNN operations, time taken: 2.0798094272613525
Begin KNN operations


training:  77%|███████▋  | 154/200 [38:43<11:26, 14.92s/it]

End KNN operations, time taken: 2.4063632488250732
training loss: 2.4688193798065186
End document, total time: 14.881106615066528
Begin document
Begin KNN operations
End KNN operations, time taken: 0.48352956771850586
Begin KNN operations
End KNN operations, time taken: 0.7417008876800537
Begin KNN operations
End KNN operations, time taken: 0.9733119010925293
Begin KNN operations
End KNN operations, time taken: 1.273914098739624
Begin KNN operations
End KNN operations, time taken: 1.4816341400146484
Begin KNN operations
End KNN operations, time taken: 1.6673035621643066
Begin KNN operations
End KNN operations, time taken: 1.8933954238891602
Begin KNN operations
End KNN operations, time taken: 2.182614803314209
Begin KNN operations


training:  78%|███████▊  | 155/200 [38:58<11:12, 14.94s/it]

End KNN operations, time taken: 2.372025966644287
training loss: 2.4255188465118405
End document, total time: 14.783387422561646
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4710555076599121
Begin KNN operations
End KNN operations, time taken: 0.7488691806793213
Begin KNN operations
End KNN operations, time taken: 0.9597430229187012
Begin KNN operations
End KNN operations, time taken: 1.201068639755249
Begin KNN operations
End KNN operations, time taken: 1.4129581451416016
Begin KNN operations
End KNN operations, time taken: 1.7111053466796875
Begin KNN operations
End KNN operations, time taken: 1.921602725982666
Begin KNN operations
End KNN operations, time taken: 2.093860626220703
Begin KNN operations


training:  78%|███████▊  | 156/200 [39:13<10:54, 14.88s/it]

End KNN operations, time taken: 2.3517801761627197
training loss: 2.496750569343567
End document, total time: 14.563497304916382
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4679524898529053
Begin KNN operations
End KNN operations, time taken: 0.7517707347869873
Begin KNN operations
End KNN operations, time taken: 0.962329626083374
Begin KNN operations
End KNN operations, time taken: 1.1885266304016113
Begin KNN operations
End KNN operations, time taken: 1.4644253253936768
Begin KNN operations
End KNN operations, time taken: 1.65277099609375
Begin KNN operations
End KNN operations, time taken: 1.8645014762878418
Begin KNN operations
End KNN operations, time taken: 2.0968942642211914
Begin KNN operations


training:  78%|███████▊  | 157/200 [39:28<10:36, 14.81s/it]

End KNN operations, time taken: 2.322021007537842
training loss: 2.3820565223693846
End document, total time: 14.471640348434448
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4707345962524414
Begin KNN operations
End KNN operations, time taken: 0.7548933029174805
Begin KNN operations
End KNN operations, time taken: 0.9894640445709229
Begin KNN operations
End KNN operations, time taken: 1.1865763664245605
Begin KNN operations
End KNN operations, time taken: 1.4365954399108887
Begin KNN operations
End KNN operations, time taken: 1.7034542560577393
Begin KNN operations
End KNN operations, time taken: 1.961618423461914
Begin KNN operations
End KNN operations, time taken: 2.1416497230529785
Begin KNN operations


training:  79%|███████▉  | 158/200 [39:43<10:26, 14.92s/it]

End KNN operations, time taken: 2.612401247024536
training loss: 2.3851552724838254
End document, total time: 15.009032964706421
Begin document
Begin KNN operations
End KNN operations, time taken: 0.5099802017211914
Begin KNN operations
End KNN operations, time taken: 0.8536992073059082
Begin KNN operations
End KNN operations, time taken: 1.0763652324676514
Begin KNN operations
End KNN operations, time taken: 1.3537828922271729
Begin KNN operations
End KNN operations, time taken: 1.6580300331115723
Begin KNN operations
End KNN operations, time taken: 1.9321954250335693
Begin KNN operations
End KNN operations, time taken: 2.226323127746582
Begin KNN operations
End KNN operations, time taken: 2.4008119106292725
Begin KNN operations


training:  80%|███████▉  | 159/200 [40:00<10:32, 15.43s/it]

End KNN operations, time taken: 2.6571083068847656
training loss: 2.6031097412109374
End document, total time: 16.438806772232056
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4915616512298584
Begin KNN operations
End KNN operations, time taken: 0.769399881362915
Begin KNN operations
End KNN operations, time taken: 1.0433287620544434
Begin KNN operations
End KNN operations, time taken: 1.2799928188323975
Begin KNN operations
End KNN operations, time taken: 1.6145882606506348
Begin KNN operations
End KNN operations, time taken: 1.7554502487182617
Begin KNN operations
End KNN operations, time taken: 1.9589340686798096
Begin KNN operations
End KNN operations, time taken: 2.2445902824401855
Begin KNN operations


training:  80%|████████  | 160/200 [40:15<10:17, 15.44s/it]

End KNN operations, time taken: 2.3994808197021484
training loss: 2.493677854537964
End document, total time: 15.281883478164673
Begin document
Begin KNN operations
End KNN operations, time taken: 0.48687267303466797
Begin KNN operations
End KNN operations, time taken: 0.7829005718231201
Begin KNN operations
End KNN operations, time taken: 1.0380394458770752
Begin KNN operations
End KNN operations, time taken: 1.202610969543457
Begin KNN operations
End KNN operations, time taken: 1.438857078552246
Begin KNN operations
End KNN operations, time taken: 1.6651253700256348
Begin KNN operations
End KNN operations, time taken: 1.8751769065856934
Begin KNN operations
End KNN operations, time taken: 2.1201741695404053
Begin KNN operations


training:  80%|████████  | 161/200 [40:30<09:57, 15.31s/it]

End KNN operations, time taken: 2.5096957683563232
training loss: 2.3673330307006837
End document, total time: 14.827304124832153
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4815051555633545
Begin KNN operations
End KNN operations, time taken: 0.7452266216278076
Begin KNN operations
End KNN operations, time taken: 1.0168609619140625
Begin KNN operations
End KNN operations, time taken: 1.1988856792449951
Begin KNN operations
End KNN operations, time taken: 1.4426047801971436
Begin KNN operations
End KNN operations, time taken: 1.7479376792907715
Begin KNN operations
End KNN operations, time taken: 1.9149236679077148
Begin KNN operations
End KNN operations, time taken: 2.1472091674804688
Begin KNN operations


training:  81%|████████  | 162/200 [40:45<09:37, 15.21s/it]

End KNN operations, time taken: 2.3780500888824463
training loss: 2.4083000898361204
End document, total time: 14.783788681030273
Begin document
Begin KNN operations
End KNN operations, time taken: 0.48186254501342773
Begin KNN operations
End KNN operations, time taken: 0.7424678802490234
Begin KNN operations
End KNN operations, time taken: 0.9691317081451416
Begin KNN operations
End KNN operations, time taken: 1.2066781520843506
Begin KNN operations
End KNN operations, time taken: 1.4301505088806152
Begin KNN operations
End KNN operations, time taken: 1.7471916675567627
Begin KNN operations
End KNN operations, time taken: 1.888408899307251
Begin KNN operations
End KNN operations, time taken: 2.1825554370880127
Begin KNN operations


training:  82%|████████▏ | 163/200 [41:00<09:18, 15.11s/it]

End KNN operations, time taken: 2.334460973739624
training loss: 2.4302654027938844
End document, total time: 14.694174528121948
Begin document
Begin KNN operations
End KNN operations, time taken: 0.49190664291381836
Begin KNN operations
End KNN operations, time taken: 0.7319080829620361
Begin KNN operations
End KNN operations, time taken: 0.9800536632537842
Begin KNN operations
End KNN operations, time taken: 1.2342071533203125
Begin KNN operations
End KNN operations, time taken: 1.4386541843414307
Begin KNN operations
End KNN operations, time taken: 1.6660380363464355
Begin KNN operations
End KNN operations, time taken: 1.8624792098999023
Begin KNN operations
End KNN operations, time taken: 2.0895626544952393
Begin KNN operations


training:  82%|████████▏ | 164/200 [41:15<08:59, 15.00s/it]

End KNN operations, time taken: 2.342637777328491
training loss: 2.417797803878784
End document, total time: 14.569597482681274
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4977073669433594
Begin KNN operations
End KNN operations, time taken: 0.7708849906921387
Begin KNN operations
End KNN operations, time taken: 0.9885990619659424
Begin KNN operations
End KNN operations, time taken: 1.1912376880645752
Begin KNN operations
End KNN operations, time taken: 1.4174821376800537
Begin KNN operations
End KNN operations, time taken: 1.6485881805419922
Begin KNN operations
End KNN operations, time taken: 1.8705687522888184
Begin KNN operations
End KNN operations, time taken: 2.1245765686035156
Begin KNN operations


training:  82%|████████▎ | 165/200 [41:29<08:43, 14.94s/it]

End KNN operations, time taken: 2.416954755783081
training loss: 2.3553969144821165
End document, total time: 14.62838625907898
Begin document
Begin KNN operations
End KNN operations, time taken: 0.48126792907714844
Begin KNN operations
End KNN operations, time taken: 0.7423617839813232
Begin KNN operations
End KNN operations, time taken: 0.9842920303344727
Begin KNN operations
End KNN operations, time taken: 1.1982309818267822
Begin KNN operations
End KNN operations, time taken: 1.4487171173095703
Begin KNN operations
End KNN operations, time taken: 1.6539857387542725
Begin KNN operations
End KNN operations, time taken: 1.9585278034210205
Begin KNN operations
End KNN operations, time taken: 2.103053331375122
Begin KNN operations


training:  83%|████████▎ | 166/200 [41:44<08:26, 14.91s/it]

End KNN operations, time taken: 2.349135398864746
training loss: 2.4682327985763552
End document, total time: 14.648686408996582
Begin document
Begin KNN operations
End KNN operations, time taken: 0.48046255111694336
Begin KNN operations
End KNN operations, time taken: 0.7459230422973633
Begin KNN operations
End KNN operations, time taken: 0.9610879421234131
Begin KNN operations
End KNN operations, time taken: 1.200223684310913
Begin KNN operations
End KNN operations, time taken: 1.4815783500671387
Begin KNN operations
End KNN operations, time taken: 1.6982922554016113
Begin KNN operations
End KNN operations, time taken: 1.874723196029663
Begin KNN operations
End KNN operations, time taken: 2.0974273681640625
Begin KNN operations


training:  84%|████████▎ | 167/200 [41:59<08:10, 14.86s/it]

End KNN operations, time taken: 2.3374931812286377
training loss: 2.3959198951721192
End document, total time: 14.567464590072632
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47565603256225586
Begin KNN operations
End KNN operations, time taken: 0.7504312992095947
Begin KNN operations
End KNN operations, time taken: 0.9815487861633301
Begin KNN operations
End KNN operations, time taken: 1.2190008163452148
Begin KNN operations
End KNN operations, time taken: 1.416940689086914
Begin KNN operations
End KNN operations, time taken: 1.64913010597229
Begin KNN operations
End KNN operations, time taken: 1.8619449138641357
Begin KNN operations
End KNN operations, time taken: 2.1407127380371094
Begin KNN operations


training:  84%|████████▍ | 168/200 [42:14<07:54, 14.84s/it]

End KNN operations, time taken: 2.3665642738342285
training loss: 2.3805790185928344
End document, total time: 14.618211030960083
Begin document
Begin KNN operations
End KNN operations, time taken: 0.48456239700317383
Begin KNN operations
End KNN operations, time taken: 0.7541618347167969
Begin KNN operations
End KNN operations, time taken: 0.9993681907653809
Begin KNN operations
End KNN operations, time taken: 1.2654507160186768
Begin KNN operations
End KNN operations, time taken: 1.4295704364776611
Begin KNN operations
End KNN operations, time taken: 1.659271478652954
Begin KNN operations
End KNN operations, time taken: 1.890763759613037
Begin KNN operations
End KNN operations, time taken: 2.235283851623535
Begin KNN operations


training:  84%|████████▍ | 169/200 [42:29<07:41, 14.89s/it]

End KNN operations, time taken: 2.394209861755371
training loss: 2.356136417388916
End document, total time: 14.823554039001465
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4730074405670166
Begin KNN operations
End KNN operations, time taken: 0.7438881397247314
Begin KNN operations
End KNN operations, time taken: 0.9835340976715088
Begin KNN operations
End KNN operations, time taken: 1.2112605571746826
Begin KNN operations
End KNN operations, time taken: 1.416794776916504
Begin KNN operations
End KNN operations, time taken: 1.6807279586791992
Begin KNN operations
End KNN operations, time taken: 2.043508291244507
Begin KNN operations
End KNN operations, time taken: 2.1451454162597656
Begin KNN operations


training:  85%|████████▌ | 170/200 [42:44<07:27, 14.93s/it]

End KNN operations, time taken: 2.393927574157715
training loss: 2.3565099239349365
End document, total time: 14.818392753601074
Begin document
Begin KNN operations
End KNN operations, time taken: 0.48593568801879883
Begin KNN operations
End KNN operations, time taken: 0.7574894428253174
Begin KNN operations
End KNN operations, time taken: 0.9863393306732178
Begin KNN operations
End KNN operations, time taken: 1.2699790000915527
Begin KNN operations
End KNN operations, time taken: 1.4787907600402832
Begin KNN operations
End KNN operations, time taken: 1.6699600219726562
Begin KNN operations
End KNN operations, time taken: 1.8861896991729736
Begin KNN operations
End KNN operations, time taken: 2.1335928440093994
Begin KNN operations


training:  86%|████████▌ | 171/200 [42:59<07:12, 14.92s/it]

End KNN operations, time taken: 2.373709201812744
training loss: 2.443582034111023
End document, total time: 14.71847152709961
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4926173686981201
Begin KNN operations
End KNN operations, time taken: 0.7794528007507324
Begin KNN operations
End KNN operations, time taken: 0.9961233139038086
Begin KNN operations
End KNN operations, time taken: 1.2217752933502197
Begin KNN operations
End KNN operations, time taken: 1.4296290874481201
Begin KNN operations
End KNN operations, time taken: 1.674199104309082
Begin KNN operations
End KNN operations, time taken: 1.9039568901062012
Begin KNN operations
End KNN operations, time taken: 2.1846392154693604
Begin KNN operations


training:  86%|████████▌ | 172/200 [43:14<06:58, 14.94s/it]

End KNN operations, time taken: 2.4180846214294434
training loss: 2.3732224702835087
End document, total time: 14.807067394256592
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4770936965942383
Begin KNN operations
End KNN operations, time taken: 0.7408545017242432
Begin KNN operations
End KNN operations, time taken: 0.9873425960540771
Begin KNN operations
End KNN operations, time taken: 1.2290441989898682
Begin KNN operations
End KNN operations, time taken: 1.4613332748413086
Begin KNN operations
End KNN operations, time taken: 1.6680870056152344
Begin KNN operations
End KNN operations, time taken: 1.9493200778961182
Begin KNN operations
End KNN operations, time taken: 2.148102045059204
Begin KNN operations


training:  86%|████████▋ | 173/200 [43:29<06:43, 14.94s/it]

End KNN operations, time taken: 2.3858802318573
training loss: 2.4538488388061523
End document, total time: 14.762593030929565
Begin document
Begin KNN operations
End KNN operations, time taken: 0.49339866638183594
Begin KNN operations
End KNN operations, time taken: 0.754967451095581
Begin KNN operations
End KNN operations, time taken: 0.999413251876831
Begin KNN operations
End KNN operations, time taken: 1.2152376174926758
Begin KNN operations
End KNN operations, time taken: 1.4439594745635986
Begin KNN operations
End KNN operations, time taken: 1.7337100505828857
Begin KNN operations
End KNN operations, time taken: 2.011958122253418
Begin KNN operations
End KNN operations, time taken: 2.134168863296509
Begin KNN operations


training:  87%|████████▋ | 174/200 [43:44<06:29, 14.98s/it]

End KNN operations, time taken: 2.4013216495513916
training loss: 2.412290954589844
End document, total time: 14.913878440856934
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4991791248321533
Begin KNN operations
End KNN operations, time taken: 0.7507734298706055
Begin KNN operations
End KNN operations, time taken: 0.9967303276062012
Begin KNN operations
End KNN operations, time taken: 1.2593066692352295
Begin KNN operations
End KNN operations, time taken: 1.434366226196289
Begin KNN operations
End KNN operations, time taken: 1.708223581314087
Begin KNN operations
End KNN operations, time taken: 1.9363758563995361
Begin KNN operations
End KNN operations, time taken: 2.1516928672790527
Begin KNN operations


training:  88%|████████▊ | 175/200 [43:59<06:15, 15.03s/it]

End KNN operations, time taken: 2.520169496536255
training loss: 2.4417675733566284
End document, total time: 14.970349550247192
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4871039390563965
Begin KNN operations
End KNN operations, time taken: 0.7626852989196777
Begin KNN operations
End KNN operations, time taken: 0.9692604541778564
Begin KNN operations
End KNN operations, time taken: 1.2246124744415283
Begin KNN operations
End KNN operations, time taken: 1.4412622451782227
Begin KNN operations
End KNN operations, time taken: 1.66184663772583
Begin KNN operations
End KNN operations, time taken: 1.8886699676513672
Begin KNN operations
End KNN operations, time taken: 2.236783266067505
Begin KNN operations


training:  88%|████████▊ | 176/200 [44:14<06:00, 15.00s/it]

End KNN operations, time taken: 2.3586368560791016
training loss: 2.418433427810669
End document, total time: 14.751944303512573
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4991283416748047
Begin KNN operations
End KNN operations, time taken: 0.7358343601226807
Begin KNN operations
End KNN operations, time taken: 0.9795465469360352
Begin KNN operations
End KNN operations, time taken: 1.206777811050415
Begin KNN operations
End KNN operations, time taken: 1.411858081817627
Begin KNN operations
End KNN operations, time taken: 1.7165617942810059
Begin KNN operations
End KNN operations, time taken: 1.9322419166564941
Begin KNN operations
End KNN operations, time taken: 2.1168806552886963
Begin KNN operations


training:  88%|████████▊ | 177/200 [44:29<05:44, 14.96s/it]

End KNN operations, time taken: 2.356098175048828
training loss: 2.40255982875824
End document, total time: 14.681588411331177
Begin document
Begin KNN operations
End KNN operations, time taken: 0.49348998069763184
Begin KNN operations
End KNN operations, time taken: 0.7425234317779541
Begin KNN operations
End KNN operations, time taken: 0.9780991077423096
Begin KNN operations
End KNN operations, time taken: 1.2606730461120605
Begin KNN operations
End KNN operations, time taken: 1.4883520603179932
Begin KNN operations
End KNN operations, time taken: 1.6645777225494385
Begin KNN operations
End KNN operations, time taken: 1.966902494430542
Begin KNN operations
End KNN operations, time taken: 2.168020248413086
Begin KNN operations


training:  89%|████████▉ | 178/200 [44:44<05:30, 15.00s/it]

End KNN operations, time taken: 2.4056639671325684
training loss: 2.3868262767791752
End document, total time: 14.911438465118408
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4948756694793701
Begin KNN operations
End KNN operations, time taken: 0.7649695873260498
Begin KNN operations
End KNN operations, time taken: 0.9693529605865479
Begin KNN operations
End KNN operations, time taken: 1.2088525295257568
Begin KNN operations
End KNN operations, time taken: 1.4489843845367432
Begin KNN operations
End KNN operations, time taken: 1.7063243389129639
Begin KNN operations
End KNN operations, time taken: 1.8937599658966064
Begin KNN operations
End KNN operations, time taken: 2.120159149169922
Begin KNN operations


training:  90%|████████▉ | 179/200 [44:59<05:14, 15.00s/it]

End KNN operations, time taken: 2.448420524597168
training loss: 2.410606265068054
End document, total time: 14.78546929359436
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4768824577331543
Begin KNN operations
End KNN operations, time taken: 0.7529046535491943
Begin KNN operations
End KNN operations, time taken: 0.9788327217102051
Begin KNN operations
End KNN operations, time taken: 1.219067096710205
Begin KNN operations
End KNN operations, time taken: 1.4308533668518066
Begin KNN operations
End KNN operations, time taken: 1.682737112045288
Begin KNN operations
End KNN operations, time taken: 1.9552996158599854
Begin KNN operations
End KNN operations, time taken: 2.1508145332336426
Begin KNN operations


training:  90%|█████████ | 180/200 [45:14<04:59, 14.98s/it]

End KNN operations, time taken: 2.4067533016204834
training loss: 2.4287614107131956
End document, total time: 14.777807712554932
Begin document
Begin KNN operations
End KNN operations, time taken: 0.506218671798706
Begin KNN operations
End KNN operations, time taken: 0.7413341999053955
Begin KNN operations
End KNN operations, time taken: 0.9806361198425293
Begin KNN operations
End KNN operations, time taken: 1.2234728336334229
Begin KNN operations
End KNN operations, time taken: 1.5351502895355225
Begin KNN operations
End KNN operations, time taken: 1.7446131706237793
Begin KNN operations
End KNN operations, time taken: 1.911421775817871
Begin KNN operations
End KNN operations, time taken: 2.1320111751556396
Begin KNN operations


training:  90%|█████████ | 181/200 [45:29<04:45, 15.01s/it]

End KNN operations, time taken: 2.3698747158050537
training loss: 2.4814214944839477
End document, total time: 14.888548851013184
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4800136089324951
Begin KNN operations
End KNN operations, time taken: 0.7590842247009277
Begin KNN operations
End KNN operations, time taken: 1.0248165130615234
Begin KNN operations
End KNN operations, time taken: 1.2444391250610352
Begin KNN operations
End KNN operations, time taken: 1.430100679397583
Begin KNN operations
End KNN operations, time taken: 1.6556332111358643
Begin KNN operations
End KNN operations, time taken: 1.937166690826416
Begin KNN operations
End KNN operations, time taken: 2.1095995903015137
Begin KNN operations


training:  91%|█████████ | 182/200 [45:44<04:30, 15.01s/it]

End KNN operations, time taken: 2.437260389328003
training loss: 2.4695059061050415
End document, total time: 14.811882495880127
Begin document
Begin KNN operations
End KNN operations, time taken: 0.505286455154419
Begin KNN operations
End KNN operations, time taken: 0.7533068656921387
Begin KNN operations
End KNN operations, time taken: 0.9720304012298584
Begin KNN operations
End KNN operations, time taken: 1.2163944244384766
Begin KNN operations
End KNN operations, time taken: 1.4241833686828613
Begin KNN operations
End KNN operations, time taken: 1.679398536682129
Begin KNN operations
End KNN operations, time taken: 1.8725502490997314
Begin KNN operations
End KNN operations, time taken: 2.1671218872070312
Begin KNN operations


training:  92%|█████████▏| 183/200 [45:59<04:14, 14.97s/it]

End KNN operations, time taken: 2.3745827674865723
training loss: 2.429330611228943
End document, total time: 14.719929218292236
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4783191680908203
Begin KNN operations
End KNN operations, time taken: 0.7473235130310059
Begin KNN operations
End KNN operations, time taken: 0.9797215461730957
Begin KNN operations
End KNN operations, time taken: 1.2123236656188965
Begin KNN operations
End KNN operations, time taken: 1.4250519275665283
Begin KNN operations
End KNN operations, time taken: 1.7194304466247559
Begin KNN operations
End KNN operations, time taken: 1.9485337734222412
Begin KNN operations
End KNN operations, time taken: 2.121549367904663
Begin KNN operations


training:  92%|█████████▏| 184/200 [46:14<03:59, 14.96s/it]

End KNN operations, time taken: 2.3905134201049805
training loss: 2.4299749374389648
End document, total time: 14.77001142501831
Begin document
Begin KNN operations
End KNN operations, time taken: 0.479952335357666
Begin KNN operations
End KNN operations, time taken: 0.7424354553222656
Begin KNN operations
End KNN operations, time taken: 0.9817807674407959
Begin KNN operations
End KNN operations, time taken: 1.2338428497314453
Begin KNN operations
End KNN operations, time taken: 1.505204677581787
Begin KNN operations
End KNN operations, time taken: 1.6696350574493408
Begin KNN operations
End KNN operations, time taken: 1.8827574253082275
Begin KNN operations
End KNN operations, time taken: 2.0949268341064453
Begin KNN operations


training:  92%|█████████▎| 185/200 [46:29<03:44, 14.94s/it]

End KNN operations, time taken: 2.3667898178100586
training loss: 2.4081895828247073
End document, total time: 14.684232473373413
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4885241985321045
Begin KNN operations
End KNN operations, time taken: 0.7827787399291992
Begin KNN operations
End KNN operations, time taken: 0.9887032508850098
Begin KNN operations
End KNN operations, time taken: 1.2065036296844482
Begin KNN operations
End KNN operations, time taken: 1.4282379150390625
Begin KNN operations
End KNN operations, time taken: 1.6561460494995117
Begin KNN operations
End KNN operations, time taken: 1.9183769226074219
Begin KNN operations
End KNN operations, time taken: 2.153038740158081
Begin KNN operations


training:  93%|█████████▎| 186/200 [46:44<03:29, 14.99s/it]

End KNN operations, time taken: 2.558911085128784
training loss: 2.455559277534485
End document, total time: 14.913006067276001
Begin document
Begin KNN operations
End KNN operations, time taken: 0.49091339111328125
Begin KNN operations
End KNN operations, time taken: 0.7540910243988037
Begin KNN operations
End KNN operations, time taken: 0.9769001007080078
Begin KNN operations
End KNN operations, time taken: 1.238715648651123
Begin KNN operations
End KNN operations, time taken: 1.437567949295044
Begin KNN operations
End KNN operations, time taken: 1.6708641052246094
Begin KNN operations
End KNN operations, time taken: 1.9239609241485596
Begin KNN operations
End KNN operations, time taken: 2.1676981449127197
Begin KNN operations


training:  94%|█████████▎| 187/200 [46:59<03:14, 14.98s/it]

End KNN operations, time taken: 2.395047187805176
training loss: 2.446070599555969
End document, total time: 14.783535957336426
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4786233901977539
Begin KNN operations
End KNN operations, time taken: 0.7472701072692871
Begin KNN operations
End KNN operations, time taken: 0.9911580085754395
Begin KNN operations
End KNN operations, time taken: 1.2235124111175537
Begin KNN operations
End KNN operations, time taken: 1.4226317405700684
Begin KNN operations
End KNN operations, time taken: 1.7702739238739014
Begin KNN operations
End KNN operations, time taken: 1.9478838443756104
Begin KNN operations
End KNN operations, time taken: 2.1349563598632812
Begin KNN operations


training:  94%|█████████▍| 188/200 [47:14<02:59, 14.99s/it]

End KNN operations, time taken: 2.4218485355377197
training loss: 2.3666858196258547
End document, total time: 14.828848600387573
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4818277359008789
Begin KNN operations
End KNN operations, time taken: 0.7442662715911865
Begin KNN operations
End KNN operations, time taken: 1.0054192543029785
Begin KNN operations
End KNN operations, time taken: 1.296264886856079
Begin KNN operations
End KNN operations, time taken: 1.4338972568511963
Begin KNN operations
End KNN operations, time taken: 1.6872432231903076
Begin KNN operations
End KNN operations, time taken: 1.9039804935455322
Begin KNN operations
End KNN operations, time taken: 2.1462671756744385
Begin KNN operations


training:  94%|█████████▍| 189/200 [47:29<02:45, 15.01s/it]

End KNN operations, time taken: 2.4132308959960938
training loss: 2.5280448436737064
End document, total time: 14.852529525756836
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4887707233428955
Begin KNN operations
End KNN operations, time taken: 0.7523133754730225
Begin KNN operations
End KNN operations, time taken: 0.9949605464935303
Begin KNN operations
End KNN operations, time taken: 1.2267885208129883
Begin KNN operations
End KNN operations, time taken: 1.41664719581604
Begin KNN operations
End KNN operations, time taken: 1.6593611240386963
Begin KNN operations
End KNN operations, time taken: 1.9258999824523926
Begin KNN operations
End KNN operations, time taken: 2.234637498855591
Begin KNN operations


training:  95%|█████████▌| 190/200 [47:44<02:30, 15.03s/it]

End KNN operations, time taken: 2.439959764480591
training loss: 2.364933633804321
End document, total time: 14.892935991287231
Begin document
Begin KNN operations
End KNN operations, time taken: 0.49733757972717285
Begin KNN operations
End KNN operations, time taken: 0.7526311874389648
Begin KNN operations
End KNN operations, time taken: 0.9941813945770264
Begin KNN operations
End KNN operations, time taken: 1.2159323692321777
Begin KNN operations
End KNN operations, time taken: 1.436155080795288
Begin KNN operations
End KNN operations, time taken: 1.6817348003387451
Begin KNN operations
End KNN operations, time taken: 1.972733736038208
Begin KNN operations
End KNN operations, time taken: 2.1388726234436035
Begin KNN operations


training:  96%|█████████▌| 191/200 [47:59<02:15, 15.01s/it]

End KNN operations, time taken: 2.392364978790283
training loss: 2.4179248332977292
End document, total time: 14.788467645645142
Begin document
Begin KNN operations
End KNN operations, time taken: 0.49327588081359863
Begin KNN operations
End KNN operations, time taken: 0.7558395862579346
Begin KNN operations
End KNN operations, time taken: 0.9780697822570801
Begin KNN operations
End KNN operations, time taken: 1.2210755348205566
Begin KNN operations
End KNN operations, time taken: 1.4855782985687256
Begin KNN operations
End KNN operations, time taken: 1.6647076606750488
Begin KNN operations
End KNN operations, time taken: 1.8999497890472412
Begin KNN operations
End KNN operations, time taken: 2.1057357788085938
Begin KNN operations


training:  96%|█████████▌| 192/200 [48:14<01:59, 14.98s/it]

End KNN operations, time taken: 2.3831241130828857
training loss: 2.4041568756103517
End document, total time: 14.71794581413269
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4798102378845215
Begin KNN operations
End KNN operations, time taken: 0.7629203796386719
Begin KNN operations
End KNN operations, time taken: 1.0227642059326172
Begin KNN operations
End KNN operations, time taken: 1.2225341796875
Begin KNN operations
End KNN operations, time taken: 1.446256160736084
Begin KNN operations
End KNN operations, time taken: 1.6886744499206543
Begin KNN operations
End KNN operations, time taken: 1.9248032569885254
Begin KNN operations
End KNN operations, time taken: 2.136507272720337
Begin KNN operations


training:  96%|█████████▋| 193/200 [48:29<01:45, 15.00s/it]

End KNN operations, time taken: 2.4651529788970947
training loss: 2.3889214277267454
End document, total time: 14.88183307647705
Begin document
Begin KNN operations
End KNN operations, time taken: 0.48223161697387695
Begin KNN operations
End KNN operations, time taken: 0.7533752918243408
Begin KNN operations
End KNN operations, time taken: 0.9761703014373779
Begin KNN operations
End KNN operations, time taken: 1.215714454650879
Begin KNN operations
End KNN operations, time taken: 1.4572865962982178
Begin KNN operations
End KNN operations, time taken: 1.6708719730377197
Begin KNN operations
End KNN operations, time taken: 1.9707183837890625
Begin KNN operations
End KNN operations, time taken: 2.1308834552764893
Begin KNN operations


training:  97%|█████████▋| 194/200 [48:44<01:29, 14.98s/it]

End KNN operations, time taken: 2.404109001159668
training loss: 2.457029509544373
End document, total time: 14.759172916412354
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4824042320251465
Begin KNN operations
End KNN operations, time taken: 0.7406229972839355
Begin KNN operations
End KNN operations, time taken: 0.9779098033905029
Begin KNN operations
End KNN operations, time taken: 1.2147362232208252
Begin KNN operations
End KNN operations, time taken: 1.4429512023925781
Begin KNN operations
End KNN operations, time taken: 1.7391200065612793
Begin KNN operations
End KNN operations, time taken: 1.9141991138458252
Begin KNN operations
End KNN operations, time taken: 2.1633503437042236
Begin KNN operations


training:  98%|█████████▊| 195/200 [48:59<01:14, 14.98s/it]

End KNN operations, time taken: 2.402719259262085
training loss: 2.4014109849929812
End document, total time: 14.790935277938843
Begin document
Begin KNN operations
End KNN operations, time taken: 0.48105692863464355
Begin KNN operations
End KNN operations, time taken: 0.7500784397125244
Begin KNN operations
End KNN operations, time taken: 1.069657325744629
Begin KNN operations
End KNN operations, time taken: 1.3068561553955078
Begin KNN operations
End KNN operations, time taken: 1.4589998722076416
Begin KNN operations
End KNN operations, time taken: 1.6680009365081787
Begin KNN operations
End KNN operations, time taken: 1.925264835357666
Begin KNN operations
End KNN operations, time taken: 2.1091079711914062
Begin KNN operations


training:  98%|█████████▊| 196/200 [49:14<01:00, 15.02s/it]

End KNN operations, time taken: 2.422109603881836
training loss: 2.5023083448410035
End document, total time: 14.914488792419434
Begin document
Begin KNN operations
End KNN operations, time taken: 0.4888577461242676
Begin KNN operations
End KNN operations, time taken: 0.7518036365509033
Begin KNN operations
End KNN operations, time taken: 0.9941596984863281
Begin KNN operations
End KNN operations, time taken: 1.2284283638000488
Begin KNN operations
End KNN operations, time taken: 1.4349896907806396
Begin KNN operations
End KNN operations, time taken: 1.674189805984497
Begin KNN operations
End KNN operations, time taken: 1.9345476627349854
Begin KNN operations
End KNN operations, time taken: 2.189295768737793
Begin KNN operations


training:  98%|█████████▊| 197/200 [49:29<00:45, 15.03s/it]

End KNN operations, time taken: 2.4340102672576904
training loss: 2.4155546188354493
End document, total time: 14.877753973007202
Begin document
Begin KNN operations
End KNN operations, time taken: 0.47931623458862305
Begin KNN operations
End KNN operations, time taken: 0.755953311920166
Begin KNN operations
End KNN operations, time taken: 0.9894661903381348
Begin KNN operations
End KNN operations, time taken: 1.2214202880859375
Begin KNN operations
End KNN operations, time taken: 1.4421958923339844
Begin KNN operations
End KNN operations, time taken: 1.7027809619903564
Begin KNN operations
End KNN operations, time taken: 1.981520414352417
Begin KNN operations
End KNN operations, time taken: 2.1422431468963623
Begin KNN operations


training:  99%|█████████▉| 198/200 [49:44<00:30, 15.03s/it]

End KNN operations, time taken: 2.406026840209961
training loss: 2.4486013174057
End document, total time: 14.846877336502075
Begin document
Begin KNN operations
End KNN operations, time taken: 0.49159955978393555
Begin KNN operations
End KNN operations, time taken: 0.7573680877685547
Begin KNN operations
End KNN operations, time taken: 0.9809677600860596
Begin KNN operations
End KNN operations, time taken: 1.2288784980773926
Begin KNN operations
End KNN operations, time taken: 1.5425071716308594
Begin KNN operations
End KNN operations, time taken: 1.705092430114746
Begin KNN operations
End KNN operations, time taken: 1.929915428161621
Begin KNN operations
End KNN operations, time taken: 2.1608359813690186
Begin KNN operations


training: 100%|█████████▉| 199/200 [49:59<00:15, 15.05s/it]

End KNN operations, time taken: 2.4031240940093994
training loss: 2.50151789188385
End document, total time: 14.913466453552246
Begin document
Begin KNN operations
End KNN operations, time taken: 0.49132657051086426
Begin KNN operations
End KNN operations, time taken: 0.7613787651062012
Begin KNN operations
End KNN operations, time taken: 0.9866693019866943
Begin KNN operations
End KNN operations, time taken: 1.256568431854248
Begin KNN operations
End KNN operations, time taken: 1.4531452655792236
Begin KNN operations
End KNN operations, time taken: 1.6854360103607178
Begin KNN operations
End KNN operations, time taken: 1.9294054508209229
Begin KNN operations
End KNN operations, time taken: 2.147642135620117
Begin KNN operations


training: 100%|██████████| 200/200 [50:14<00:00, 15.07s/it]

End KNN operations, time taken: 2.4479188919067383
training loss: 2.3589978218078613
End document, total time: 14.891256093978882



