## End to end examples logging data to Galileo for Text Classification, MLTC, and NER

### For understanding the client and how to get started, see the [Dataquality Demo](./Dataquality-Client-Demo.ipynb)
### Check out the full documentation [here](https://rungalileo.gitbook.io/galileo/getting-started)
### To see real end-to-end notebooks training real ML models, see [here](https://drive.google.com/drive/folders/17-cHuRzXIpWaD8rYwy69RMQr__HiAiDk?usp=sharing)

In [None]:
## Local

import os

os.environ['GALILEO_CONSOLE_URL']="http://localhost:8088"
os.environ["GALILEO_USERNAME"]="user@example.com"
os.environ["GALILEO_PASSWORD"]="Th3secret_"

In [3]:
import dataquality as dq
# dq.configure()
dq.login()

📡 http://console.dev.rungalileo.io
🔭 Logging you into Galileo

👀 Found auth method email set via env, skipping prompt.
🚀 You're logged in to Galileo as galileo@rungalileo.io!


***Helper function***

In [4]:
from dataquality import config
import pandas as pd
from dataquality.clients.api import ApiClient
from time import sleep


api_client = ApiClient()


def see_results(wait=True, body={}):
    if wait:
        print("Waiting for data to be processed")
        if "localhost" in config.api_url:
            for i in tqdm(range(50)):
                sleep(1)
        else:
            api_client.wait_for_run()

    task_type = dq.config.task_type
    proj = api_client.get_project(config.current_project_id)["name"]
    run = api_client.get_project_run(config.current_project_id, config.current_run_id)["name"]
    api_client.export_run(proj, run, "training", f"{task_type}_training.csv")
    api_client.export_run(proj, run, "test", f"{task_type}_test.csv")
    api_client.export_run(proj, run, "validation", f"{task_type}_validation.csv")
    print(f"Exported to {task_type}_training.csv, {task_type}_test.csv, and {task_type}_validation.csv")
    df_train = pd.read_csv(f"{task_type}_training.csv")
    df_test = pd.read_csv(f"{task_type}_test.csv")
    df_val = pd.read_csv(f"{task_type}_validation.csv")
    print("Training")
    display(df_train)
    print("\nTest")
    display(df_test)
    print("\nValidation")
    display(df_val)
    return df_train, df_test, df_val

## Text Classification

In [5]:
from tqdm.notebook import tqdm
import time
import numpy as np
from uuid import uuid4
import pandas as pd
from sklearn.datasets import fetch_20newsgroups


dq.init("text_classification", "test-tc-run")


BATCH_SIZE=8
EMB_DIM=768
NUM_EPOCHS=1


newsgroups = fetch_20newsgroups(subset="train", remove=('headers', 'footers', 'quotes'))
dataset = pd.DataFrame()
dataset["text"] = newsgroups.data
label_ind = newsgroups.target_names
dataset["label"] = [label_ind[i] for i in newsgroups.target]
dataset["id"] = list(range(len(dataset)))


def generate_random_embeddings(batch_size: int, emb_dims: int) -> np.ndarray:
    return np.random.rand(batch_size, emb_dims)


def generate_random_probabilities(batch_size: int, num_classes: int) -> np.ndarray:
    probs = np.random.rand(batch_size, num_classes)
    return probs / probs.sum(axis=-1).reshape(-1, 1)  # Normalize to sum to 1


t_start = time.time()
dq.set_labels_for_run(dataset["label"].unique())

print("Logging input data")
for split in ["train", "test", "validation"]:
    dq.log_dataset(dataset, split=split)
    
print("Done")
print(f"Input logging took {time.time() - t_start} seconds\n\n")


print("Logging model outputs")
t_start = time.time()
num_classes = dataset["label"].nunique()
# Simulates model training loop
for epoch_idx in range(NUM_EPOCHS):
    print(f"Epoch {epoch_idx}")
    print('-'*100)
    for split in ["train"]:#, "test", "validation"]:
        print(split.capitalize())
        dq.set_split(split)
        for i in tqdm(range(0, len(dataset), BATCH_SIZE)):
            batch = dataset[i : i + BATCH_SIZE]
            embeddings = generate_random_embeddings(len(batch), EMB_DIM)
            probs = generate_random_probabilities(len(batch), num_classes)
            dq.log_model_outputs(
                embs=embeddings,
                probs=probs,
                epoch=epoch_idx,
                ids=batch["id"],
            )
    print('-'*100,end="\n\n")
            
print("Done")

time_spent = time.time() - t_start
print(f"Logging output took {time_spent} seconds")

dq.finish()
df_train, df_test, df_val = see_results()

📡 Retrieved project, test-tc-run, and starting a new run
🏃‍♂️ Starting run competitive_turquoise_shrimp
🛰 Connected to project, test-tc-run, and created run, competitive_turquoise_shrimp.
Logging input data
Exporting input data [########################################] 100.00% elapsed time  :     0.03s =  0.0m =  0.0h
Appending input data [########################################] 100.00% elapsed time  :     0.04s =  0.0m =  0.0h
Appending input data [########################################] 100.00% elapsed time  :     0.04s =  0.0m =  0.0h
 Done
Input logging took 12.208525896072388 seconds


Logging model outputs
Epoch 0
----------------------------------------------------------------------------------------------------
Train


  0%|          | 0/1415 [00:00<?, ?it/s]



----------------------------------------------------------------------------------------------------

Done
Logging output took 7.984760046005249 seconds
☁️ Uploading Data


training:   0%|          | 0/1 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1415 [00:00<?, ?it/s]

training (epoch=0):   0%|          | 0/3 [00:00<?, ?it/s]

🧹 Cleaning up
Job default successfully submitted. Results will be available soon at https://console.dev.rungalileo.io/insights?projectId=4a8a50d8-9a50-48c0-9c5b-d3f015b775d3&runId=044b3228-9416-4d07-af9a-d8213c8728b4&split=training&taskType=0&depHigh=1&depLow=0
Waiting for data to be processed
Waiting for job...
Done! Job finished with status finished
Your export has been written to text_classification_training.csv
Your export has been written to text_classification_test.csv
Your export has been written to text_classification_validation.csv
Exported to text_classification_training.csv, text_classification_test.csv, and text_classification_validation.csv
Training


Unnamed: 0,epoch,pred,text,split,data_schema_version,galileo_text_length,galileo_language_id,galileo_pii,confidence,data_error_potential,gold,id
0,0,sci.crypt,I was wondering if anyone out there could enli...,training,1,475,en,,0.117047,0.554383,rec.autos,0
1,0,comp.windows.x,A fair number of brave souls who upgraded thei...,training,1,530,en,,0.084308,0.511718,comp.sys.mac.hardware,1
2,0,comp.sys.mac.hardware,"well folks, my mac plus finally gave up the gh...",training,1,1659,en,email,0.093667,0.499922,comp.sys.mac.hardware,2
3,0,comp.os.ms-windows.misc,\nDo you have Weitek's address/phone number? ...,training,1,95,en,,0.087112,0.542990,comp.graphics,3
4,0,soc.religion.christian,"From article <C5owCB.n3p@world.std.com>, by to...",training,1,448,en,email,0.097053,0.503830,sci.space,4
...,...,...,...,...,...,...,...,...,...,...,...,...
11309,0,sci.electronics,DN> From: nyeda@cnsvax.uwec.edu (David Nye)\nD...,training,1,1782,en,email,0.101604,0.532813,sci.med,11309
11310,0,comp.sys.mac.hardware,"I have a (very old) Mac 512k and a Mac Plus, b...",training,1,674,en,email,0.094729,0.499111,comp.sys.mac.hardware,11310
11311,0,comp.sys.mac.hardware,I just installed a DX2-66 CPU in a clone mothe...,training,1,581,en,,0.094065,0.536087,comp.sys.ibm.pc.hardware,11311
11312,0,talk.politics.mideast,\nWouldn't this require a hyper-sphere. In 3-...,training,1,311,en,,0.092663,0.526501,comp.graphics,11312



Test


Unnamed: 0,"{""detail"":""No data in minio for object 4a8a50d8-9a50-48c0-9c5b-d3f015b775d3/044b3228-9416-4d07-af9a-d8213c8728b4/test/data/data.hdf5""}"



Validation


Unnamed: 0,"{""detail"":""No data in minio for object 4a8a50d8-9a50-48c0-9c5b-d3f015b775d3/044b3228-9416-4d07-af9a-d8213c8728b4/validation/data/data.hdf5""}"


## Multi Label

In [6]:
from typing import *
from random import choice
import numpy as np


dq.init("text_multi_label", "test-mltc-run")
dq.set_labels_for_run([["not "+_label, _label] for _label in ['toxic', 'severe_toxic', 'obscene', 'threat', 'insult','identity_hate']]) 
dq.set_tasks_for_run(['task_0', 'task_1', 'task_2', 'task_3', 'task_4', 'task_5'])

n = 5000

texts: List[str] = [f"text sample {i}" for i in range(n)]

labels: List[str] = [
    [choice(i) for i in dq.get_data_logger().logger_config.labels]
    for _ in range(n)
]

ids = list(range(n))


dq.log_data_samples(texts=texts, task_labels=labels, ids=ids, split="training")
dq.log_data_samples(texts=texts, task_labels=labels, ids=ids, split="test")
dq.log_data_samples(texts=texts, task_labels=labels, ids=ids, split="validation")

for split in ["train", "test", "validation"]:
    for epoch in range(5):
        emb=np.random.rand(n, 768)
        logits=[[np.random.rand(2)] * 6] * n
        ids=list(range(n))
        
        for i in range(0, n, 32):
            dq.log_model_outputs(
                embs=emb[i:i+5],
                logits=logits[i:i+5],
                ids=ids[i:i+5],
                split=split,
                epoch=epoch
            )

dq.finish()
df_train, df_test, df_val = see_results()


💭 Project test-mltc-run was not found.
✨ Initializing public project test-mltc-run
🏃‍♂️ Starting run selective_coral_cougar
Exporting input data [########################################] 100.00% elapsed time  :     0.01s =  0.0m =  0.0h
Appending input data [########################################] 100.00% elapsed time  :     0.01s =  0.0m =  0.0h
Appending input data [########################################] 100.00% elapsed time  :     0.01s =  0.0m =  0.0h
 ☁️ Uploading Data


training:   0%|          | 0/5 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

training (epoch=0):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

training (epoch=1):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

training (epoch=2):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

training (epoch=3):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

training (epoch=4):   0%|          | 0/3 [00:00<?, ?it/s]

validation:   0%|          | 0/5 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

validation (epoch=0):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

validation (epoch=1):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

validation (epoch=2):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

validation (epoch=3):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

validation (epoch=4):   0%|          | 0/3 [00:00<?, ?it/s]

test:   0%|          | 0/5 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

test (epoch=0):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

test (epoch=1):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

test (epoch=2):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

test (epoch=3):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

test (epoch=4):   0%|          | 0/3 [00:00<?, ?it/s]

🧹 Cleaning up
Job default successfully submitted. Results will be available soon at https://console.dev.rungalileo.io/insights?projectId=d172b38a-3170-4bfa-a88d-0f0cdeb613ef&runId=c45aa28d-0f8c-4628-ba46-1064529e9b0f&split=training&taskType=1&depHigh=1&depLow=0
Waiting for data to be processed
Waiting for job...
Done! Job finished with status finished
Your export has been written to text_multi_label_training.csv
Your export has been written to text_multi_label_test.csv
Your export has been written to text_multi_label_validation.csv
Exported to text_multi_label_training.csv, text_multi_label_test.csv, and text_multi_label_validation.csv
Training


Unnamed: 0,epoch,pred_task_0,pred_task_1,pred_task_2,pred_task_3,pred_task_4,pred_task_5,text,split,data_schema_version,...,gold_task_3,data_error_potential_task_4,gold_task_4,data_error_potential_task_5,gold_task_5,id,pred,gold,data_error_potential,confidence
0,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 0,training,1,...,threat,0.49348,not insult,0.49348,not identity_hate,0,1,1,0.50652,0.537259
1,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 1,training,1,...,threat,0.50652,insult,0.50652,identity_hate,1,1,0,0.49348,0.537259
2,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 2,training,1,...,threat,0.49348,not insult,0.49348,not identity_hate,2,1,0,0.49348,0.537259
3,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 3,training,1,...,not threat,0.50652,insult,0.50652,identity_hate,3,1,0,0.49348,0.537259
4,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 4,training,1,...,threat,0.49348,not insult,0.49348,not identity_hate,4,1,0,0.49348,0.537259
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
780,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 4992,training,1,...,threat,0.50652,insult,0.49348,not identity_hate,4992,1,1,0.50652,0.537259
781,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 4993,training,1,...,threat,0.50652,insult,0.49348,not identity_hate,4993,1,1,0.50652,0.537259
782,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 4994,training,1,...,threat,0.50652,insult,0.50652,identity_hate,4994,1,1,0.50652,0.537259
783,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 4995,training,1,...,threat,0.50652,insult,0.50652,identity_hate,4995,1,0,0.49348,0.537259



Test


Unnamed: 0,epoch,pred_task_0,pred_task_1,pred_task_2,pred_task_3,pred_task_4,pred_task_5,text,split,data_schema_version,...,gold_task_3,data_error_potential_task_4,gold_task_4,data_error_potential_task_5,gold_task_5,id,pred,gold,data_error_potential,confidence
0,4,not toxic,not severe_toxic,not obscene,not threat,not insult,not identity_hate,text sample 0,test,1,...,threat,0.464641,not insult,0.464641,not identity_hate,0,0,1,0.535359,0.5446
1,4,not toxic,not severe_toxic,not obscene,not threat,not insult,not identity_hate,text sample 1,test,1,...,threat,0.535359,insult,0.535359,identity_hate,1,0,0,0.464641,0.5446
2,4,not toxic,not severe_toxic,not obscene,not threat,not insult,not identity_hate,text sample 2,test,1,...,threat,0.464641,not insult,0.464641,not identity_hate,2,0,0,0.464641,0.5446
3,4,not toxic,not severe_toxic,not obscene,not threat,not insult,not identity_hate,text sample 3,test,1,...,not threat,0.535359,insult,0.535359,identity_hate,3,0,0,0.464641,0.5446
4,4,not toxic,not severe_toxic,not obscene,not threat,not insult,not identity_hate,text sample 4,test,1,...,threat,0.464641,not insult,0.464641,not identity_hate,4,0,0,0.464641,0.5446
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
780,4,not toxic,not severe_toxic,not obscene,not threat,not insult,not identity_hate,text sample 4992,test,1,...,threat,0.535359,insult,0.464641,not identity_hate,4992,0,1,0.535359,0.5446
781,4,not toxic,not severe_toxic,not obscene,not threat,not insult,not identity_hate,text sample 4993,test,1,...,threat,0.535359,insult,0.464641,not identity_hate,4993,0,1,0.535359,0.5446
782,4,not toxic,not severe_toxic,not obscene,not threat,not insult,not identity_hate,text sample 4994,test,1,...,threat,0.535359,insult,0.535359,identity_hate,4994,0,1,0.535359,0.5446
783,4,not toxic,not severe_toxic,not obscene,not threat,not insult,not identity_hate,text sample 4995,test,1,...,threat,0.535359,insult,0.535359,identity_hate,4995,0,0,0.464641,0.5446



Validation


Unnamed: 0,epoch,pred_task_0,pred_task_1,pred_task_2,pred_task_3,pred_task_4,pred_task_5,text,split,data_schema_version,...,gold_task_3,data_error_potential_task_4,gold_task_4,data_error_potential_task_5,gold_task_5,id,pred,gold,data_error_potential,confidence
0,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 0,validation,1,...,threat,0.591826,not insult,0.591826,not identity_hate,0,1,1,0.408174,0.67595
1,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 1,validation,1,...,threat,0.408174,insult,0.408174,identity_hate,1,1,0,0.591826,0.67595
2,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 2,validation,1,...,threat,0.591826,not insult,0.591826,not identity_hate,2,1,0,0.591826,0.67595
3,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 3,validation,1,...,not threat,0.408174,insult,0.408174,identity_hate,3,1,0,0.591826,0.67595
4,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 4,validation,1,...,threat,0.591826,not insult,0.591826,not identity_hate,4,1,0,0.591826,0.67595
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
780,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 4992,validation,1,...,threat,0.408174,insult,0.591826,not identity_hate,4992,1,1,0.408174,0.67595
781,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 4993,validation,1,...,threat,0.408174,insult,0.591826,not identity_hate,4993,1,1,0.408174,0.67595
782,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 4994,validation,1,...,threat,0.408174,insult,0.408174,identity_hate,4994,1,1,0.408174,0.67595
783,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 4995,validation,1,...,threat,0.408174,insult,0.408174,identity_hate,4995,1,0,0.591826,0.67595


## NER

In [7]:
from dataquality.schemas.task_type import TaskType
from dataquality import config 
from uuid import uuid4
import numpy as np
from time import sleep
from tqdm.notebook import tqdm


dq.init("text_ner", "test-ner-run")


def log_inputs():
    text_inputs = ['what movies star bruce willis', 'show me films with drew barrymore from the 1980s', 'what movies starred both al pacino and robert deniro', 'find me all of the movies that starred harold ramis and bill murray', 'find me a movie with a quote about baseball in it']
    tokens = [[(0, 4), (5, 11), (12, 16), (17, 22), (17, 22), (23, 29), (23, 29)], [(0, 4), (5, 7), (8, 13), (14, 18), (19, 23), (24, 33), (24, 33), (24, 33), (34, 38), (39, 42), (43, 48)], [(0, 4), (5, 11), (12, 19), (20, 24), (25, 27), (28, 34), (28, 34), (28, 34), (35, 38), (39, 45), (39, 45), (46, 52), (46, 52)], [(0, 4), (5, 7), (8, 11), (12, 14), (15, 18), (19, 25), (26, 30), (31, 38), (39, 45), (39, 45), (39, 45), (46, 51), (46, 51), (52, 55), (56, 60), (61, 67), (61, 67), (61, 67)], [(0, 4), (5, 7), (8, 9), (10, 15), (16, 20), (21, 22), (23, 28), (29, 34), (35, 43), (44, 46), (47, 49)]]
    gold_spans = [[{'start': 17, 'end': 29, 'label': 'ACTOR'}], [{'start': 19, 'end': 33, 'label': 'ACTOR'}, {'start': 43, 'end': 48, 'label': 'YEAR'}], [{'start': 25, 'end': 34, 'label': 'ACTOR'}, {'start': 39, 'end': 52, 'label': 'ACTOR'}], [{'start': 39, 'end': 51, 'label': 'ACTOR'}, {'start': 56, 'end': 67, 'label': 'ACTOR'}], []]
    ids = [0, 1, 2, 3, 4]

    labels = ['[PAD]', '[CLS]', '[SEP]', 'O', 'B-ACTOR', 'I-ACTOR', 'B-YEAR', 'B-TITLE', 'B-GENRE', 'I-GENRE', 'B-DIRECTOR', 'I-DIRECTOR', 'B-SONG', 'I-SONG', 'B-PLOT', 'I-PLOT', 'B-REVIEW', 'B-CHARACTER', 'I-CHARACTER', 'B-RATING', 'B-RATINGS_AVERAGE', 'I-RATINGS_AVERAGE', 'I-TITLE', 'I-RATING', 'B-TRAILER', 'I-TRAILER', 'I-REVIEW', 'I-YEAR']
    dq.set_labels_for_run(labels)
    dq.set_tagging_schema("BIO")
    dq.log_data_samples(texts=text_inputs, text_token_indices=tokens, ids=ids, gold_spans=gold_spans, split="training")
    dq.log_data_samples(texts=text_inputs, text_token_indices=tokens, ids=ids, gold_spans=gold_spans, split="validation")
    dq.log_data_samples(texts=text_inputs, text_token_indices=tokens, ids=ids, gold_spans=gold_spans, split="test")

def log_outputs():
    num_classes = 28
    embs = [np.random.rand(119, 768) for _ in range(5)]
    logits= [np.random.rand(119, 28) for _ in range(5)]                                      
    ids= list(range(5))
    for epoch in tqdm(range(6)):
        for split in ["training", "test", "validation"]:
            dq.log_model_outputs(
                embs=embs, logits=logits, ids=ids, split=split, epoch=epoch
            )
    
def finish():
    dq.finish()
    
    
def runit():
    log_inputs()
    log_outputs()
    finish()
    
runit()
df_train, df_test, df_val = see_results()

📡 Retrieved project, test-ner-run, and starting a new run
🏃‍♂️ Starting run combative_red_moth
🛰 Connected to project, test-ner-run, and created run, combative_red_moth.
Exporting input data [########################################] 100.00% elapsed time  :     0.00s =  0.0m =  0.0h
Appending input data [########################################] 100.00% elapsed time  :     0.00s =  0.0m =  0.0h
Appending input data [########################################] 100.00% elapsed time  :     0.00s =  0.0m =  0.0h
 

  0%|          | 0/6 [00:00<?, ?it/s]

☁️ Uploading Data


training:   0%|          | 0/6 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

training (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

training (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

training (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

training (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

training (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

training (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

validation:   0%|          | 0/6 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

validation (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

validation (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

validation (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

validation (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

validation (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

validation (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

test:   0%|          | 0/6 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

test (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

test (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

test (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

test (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

test (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

test (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

🧹 Cleaning up
Job default successfully submitted. Results will be available soon at https://console.dev.rungalileo.io/insights?projectId=9683fbaa-5f4b-471d-9c68-40095c543d5c&runId=049ce9b8-4267-4239-931c-8b9d35e50e3d&split=training&taskType=2&depHigh=1&depLow=0
Waiting for data to be processed
Waiting for job...
Done! Job finished with status finished
Your export has been written to text_ner_training.csv
Your export has been written to text_ner_test.csv
Your export has been written to text_ner_validation.csv
Exported to text_ner_training.csv, text_ner_test.csv, and text_ner_validation.csv
Training


Unnamed: 0,sample_id,text,id,missed_label,span_shift,wrong_tag,ghost_span,total_errors,galileo_text_length,galileo_language_id,galileo_pii,spans
0,0,what movies star bruce willis,0,0,1,0,1,2,29,en,,"[{""start"":17,""end"":29,""data_error_potential"":0..."
1,1,show me films with drew barrymore from the 1980s,1,0,2,0,5,7,48,en,,"[{""start"":19,""end"":33,""data_error_potential"":0..."
2,2,what movies starred both al pacino and robert ...,2,0,2,0,3,5,52,en,,"[{""start"":25,""end"":34,""data_error_potential"":0..."
3,3,find me all of the movies that starred harold ...,3,0,2,0,3,5,67,en,,"[{""start"":39,""end"":51,""data_error_potential"":0..."
4,4,find me a movie with a quote about baseball in it,4,0,0,0,5,5,49,en,,"[{""start"":5,""end"":7,""data_error_potential"":0.5..."



Test


Unnamed: 0,sample_id,text,id,missed_label,span_shift,wrong_tag,ghost_span,total_errors,galileo_text_length,galileo_language_id,galileo_pii,spans
0,0,what movies star bruce willis,0,0,1,0,1,2,29,en,,"[{""start"":17,""end"":29,""data_error_potential"":0..."
1,1,show me films with drew barrymore from the 1980s,1,0,2,0,5,7,48,en,,"[{""start"":19,""end"":33,""data_error_potential"":0..."
2,2,what movies starred both al pacino and robert ...,2,0,2,0,3,5,52,en,,"[{""start"":25,""end"":34,""data_error_potential"":0..."
3,3,find me all of the movies that starred harold ...,3,0,2,0,3,5,67,en,,"[{""start"":39,""end"":51,""data_error_potential"":0..."
4,4,find me a movie with a quote about baseball in it,4,0,0,0,5,5,49,en,,"[{""start"":5,""end"":7,""data_error_potential"":0.5..."



Validation


Unnamed: 0,sample_id,text,id,missed_label,span_shift,wrong_tag,ghost_span,total_errors,galileo_text_length,galileo_language_id,galileo_pii,spans
0,0,what movies star bruce willis,0,0,1,0,1,2,29,en,,"[{""start"":17,""end"":29,""data_error_potential"":0..."
1,1,show me films with drew barrymore from the 1980s,1,0,2,0,5,7,48,en,,"[{""start"":19,""end"":33,""data_error_potential"":0..."
2,2,what movies starred both al pacino and robert ...,2,0,2,0,3,5,52,en,,"[{""start"":25,""end"":34,""data_error_potential"":0..."
3,3,find me all of the movies that starred harold ...,3,0,2,0,3,5,67,en,,"[{""start"":39,""end"":51,""data_error_potential"":0..."
4,4,find me a movie with a quote about baseball in it,4,0,0,0,5,5,49,en,,"[{""start"":5,""end"":7,""data_error_potential"":0.5..."
