# Introduction

This notebook is designed to test the hypothesis that the order of data can significantly impact the speed at which a neural network training converges. 

We will be using the LEGO dataset for this experiment. The LEGO dataset is a rich source of sequential variable assignments and negations, making it an ideal testbed for our hypothesis.

We will experiment with various data orderings. These include multiple random orderings, and datapoints ranked by GPT 3.5 with respect to the perceived difficulty level of the datapoint. 

In addition, we will also explore a scenario where we construct a hypothetical skill tree required to achieve a low loss on the dataset. The datapoints will then be sorted according to a topological sort of the skills involved. 

Let's begin by importing the necessary libraries and loading the GPT2 model.

In [6]:
from transformers import GPT2Model
from torchtyping import TensorType
import torch
from transformers import GPT2Tokenizer

# Define the type for the input tensor
InputTensorType = TensorType["batch", "sequence"]

model = GPT2Model.from_pretrained('gpt2')
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
# Example usage:
input_tensor = torch.randint(0, 1000, (1, 10))  # Random tensor for testing. Dimensions represent [batch_size, sequence_length]
output = model(input_tensor)  # Output tensor from the GPT2 model. Dimensions represent [batch_size, sequence_length, hidden_state]
token_ids = input_tensor.tolist()
# Decode the token ids to tokens using batch decode
decoded_output = tokenizer.batch_decode(token_ids, skip_special_tokens=True)
print(decoded_output)

[' gamears su 0 years saithper��']
