# Seq2Seq DQ Test Notebook

In this notebook we test the dq client for Seq2Seq using simulated / fake data. The main intention is to battle test the different components of the client without training an actual model - i.e. optimizing for speed!

Things that we want to test:
1. Setting the tokenizer
2. Logging data (input + target outputs)
3. Logging model outputs 1+ epoch
4. Fake model generations - interestingly the best way to do this may be with a small validation dataset + a real LLM model. This depends a bit on design decisions around logging for generation.

NOTE: For a first pass we work with just a training dataset

Let's get testing

In [2]:
# from transformers import T5Tokenizer, T5ForConditionalGeneration
from datasets import load_dataset, Dataset
import numpy as np
# import torch

%load_ext autoreload
%autoreload 2

## Pull data from hf hub

Since part of the dq processing involves tokenizing and aligning text / token indices, we work with a small real-world dataset - rather than dummy data.

The Billsum dataset contains three columns:

<p style="text-align: center;">|| text || summary || title ||</p>

We look at just **summary** and **title** and map them as follows:
<p style="text-align: center;">(summary, title) --> (input context,  target output)</p>

We also use a small subset of the first 100(0?) data rows!

In [3]:
dataset_size = 100

ds = load_dataset("billsum")
ds = ds.remove_columns('text')
# Add ids
ds = ds.map(lambda _, idx: {"id": idx}, with_indices=True)
ds_train = Dataset.from_dict(ds['train'][:100])
ds_val = Dataset.from_dict(ds['test'][:100])
ds_train

Found cached dataset billsum (/Users/jonathangomesselman/.cache/huggingface/datasets/billsum/default/3.0.0/75cf1719d38d6553aa0e0714c393c74579b083ae6e164b2543684e3e92e0c4cc)


  0%|          | 0/3 [00:00<?, ?it/s]

Loading cached processed dataset at /Users/jonathangomesselman/.cache/huggingface/datasets/billsum/default/3.0.0/75cf1719d38d6553aa0e0714c393c74579b083ae6e164b2543684e3e92e0c4cc/cache-8163760ca7c203c4.arrow
Loading cached processed dataset at /Users/jonathangomesselman/.cache/huggingface/datasets/billsum/default/3.0.0/75cf1719d38d6553aa0e0714c393c74579b083ae6e164b2543684e3e92e0c4cc/cache-6f832c3394bf0964.arrow
Loading cached processed dataset at /Users/jonathangomesselman/.cache/huggingface/datasets/billsum/default/3.0.0/75cf1719d38d6553aa0e0714c393c74579b083ae6e164b2543684e3e92e0c4cc/cache-fa696985d54ba920.arrow


Dataset({
    features: ['summary', 'title', 'id'],
    num_rows: 100
})

In [4]:
ds_train[0]

{'summary': "Shields a business entity from civil liability relating to any injury or death occurring at a facility of that entity in connection with a use of such facility by a nonprofit organization if: (1) the use occurs outside the scope of business of the business entity; (2) such injury or death occurs during a period that such facility is used by such organization; and (3) the business entity authorized the use of such facility by the organization. \nMakes this Act inapplicable to an injury or death that results from an act or omission of a business entity that constitutes gross negligence or intentional misconduct, including misconduct that: (1) constitutes a hate crime or a crime of violence or act of international terrorism for which the defendant has been convicted in any court; or (2) involves a sexual offense for which the defendant has been convicted in any court or misconduct for which the defendant has been found to have violated a Federal or State civil rights law. \nP

## Logging Data

1. Before logging input data log the tokenizer (making sure we use the fast tokenizer)
2. Log the input and target output data

In [45]:
from transformers import AutoTokenizer, GenerationConfig, T5ForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained("t5-small", use_fast=True)
model = T5ForConditionalGeneration.from_pretrained("t5-small")

# Tokenize things
def tokenize_outputs(row):
    label_ids = tokenizer(row['title'])['input_ids']
    return {'labels': label_ids}

ds_train = ds_train.map(tokenize_outputs)
ds_val = ds_val.map(tokenize_outputs)

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

In [None]:
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [None]:
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [35]:
ds_train[0]

{'summary': "Shields a business entity from civil liability relating to any injury or death occurring at a facility of that entity in connection with a use of such facility by a nonprofit organization if: (1) the use occurs outside the scope of business of the business entity; (2) such injury or death occurs during a period that such facility is used by such organization; and (3) the business entity authorized the use of such facility by the organization. \nMakes this Act inapplicable to an injury or death that results from an act or omission of a business entity that constitutes gross negligence or intentional misconduct, including misconduct that: (1) constitutes a hate crime or a crime of violence or act of international terrorism for which the defendant has been convicted in any court; or (2) involves a sexual offense for which the defendant has been convicted in any court or misconduct for which the defendant has been found to have violated a Federal or State civil rights law. \nP

In [41]:
import os
os.environ['GALILEO_CONSOLE_URL']="https://console.dev.rungalileo.io"
os.environ["GALILEO_USERNAME"]="galileo@rungalileo.io"
os.environ["GALILEO_PASSWORD"]="A11a1una!"

import dataquality as dq
from dataquality.integrations.seq2seq.hf import watch
dq.configure()
dq.init("seq2seq")

temperature = 0.4
generation_config = GenerationConfig(
    max_new_tokens=15,
    # Whether we use multinomial sampling
    do_sample=temperature >= 1e-5,
    temperature=temperature,
)

watch(
    model,
    tokenizer,
    generation_config,
    generate_training_data=True
)



📡 https://console.dev.rungalileo.io
🔭 Logging you into Galileo

🚀 You're logged in to Galileo as galileo@rungalileo.io!
✨ Initializing new public project 'dry_aquamarine_reindeer_c156e'
🏃‍♂️ Creating new run '2023-09-19_1'
🛰 Connected to new project 'dry_aquamarine_reindeer_c156e', and new run '2023-09-19_1'.


In [42]:
def log_dataset(ds, input_col="summary", target_col="title"):
    dq.log_dataset(
        ds,
        text=input_col,
        label=target_col,
        split="training"
    )

# Log just for training
log_dataset(ds_train)

Aligning characters with tokens:   0%|          | 0/100 [00:00<?, ?it/s]

Logging 100 samples [########################################] 100.00% elapsed time  :     0.00s =  0.0m =  0.0h
 

## Logging Model Outputs
Log 1 epoch of fake model output data: includes just logits!

In [43]:
num_logits = len(tokenizer)


def log_epoch(ds):
    ids = ds['id']
    max_seq_length = np.max([len(ids) for ids in ds['labels']])
    print("len ids", len(ids))
    print("max seq len", max_seq_length)
    # Shape - [bs, max_seq_len, num_logits]
    fake_logits = np.random.randn(len(ids), max_seq_length, num_logits)
    dq.log_model_outputs(
        logits = fake_logits,
        ids = ids
    )

dq.set_epoch(0)
dq.set_split("train")
log_epoch(ds_train)

len ids 100
max seq len 111


In [44]:
dq.finish()

☁️ Uploading Data
CuML libraries not found, running standard process. For faster Galileo processing, consider installing
`pip install 'dataquality[cuda]' --extra-index-url=https://pypi.nvidia.com/`


training:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

training (epoch=0):   0%|          | 0/3 [00:00<?, ?it/s]

Uploading data to Galileo:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

Uploading data to Galileo:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Aligning characters with tokens:   0%|          | 0/1 [00:00<?, ?it/s]

Uploading data to Galileo:   0%|          | 0.00/609k [00:00<?, ?B/s]

Job default successfully submitted. Results will be available soon at https://console.dev.rungalileo.io/insights?projectId=3a311494-d5bb-4d80-b340-3b1316066bcc&runId=b5fc021d-91be-460e-9745-28b80c0ed65d&taskType=8&split=training
Waiting for job (you can safely close this window)...
	Downloading all embedding files for this run
	[training] 👀 Looking for data anomalies
Done! Job finished with status completed
🧹 Cleaning up
🧹 Cleaning up


'https://console.dev.rungalileo.io/insights?projectId=3a311494-d5bb-4d80-b340-3b1316066bcc&runId=b5fc021d-91be-460e-9745-28b80c0ed65d&taskType=8&split=training'