## End to end examples logging data to Galileo for Text Classification, MLTC, and NER

### For understanding the client and how to get started, see the [Dataquality Demo](./Dataquality-Client-Demo.ipynb)
### Check out the full documentation [here](https://rungalileo.gitbook.io/galileo/getting-started)
### To see real end-to-end notebooks training real ML models, see [here](https://drive.google.com/drive/folders/17-cHuRzXIpWaD8rYwy69RMQr__HiAiDk?usp=sharing)

In [6]:
## Local

import os

os.environ['GALILEO_CONSOLE_URL']="http://localhost:8088"
os.environ["GALILEO_USERNAME"]="user@example.com"
os.environ["GALILEO_PASSWORD"]="Th3secret_"

In [7]:
import dataquality as dq
dq.configure()

📡 http://localhost:8088
🔭 Logging you into Galileo

👀 Found auth method email set via env, skipping prompt.
🚀 You're logged in to Galileo as user@example.com!


***Helper function***

In [2]:
from dataquality import config
import pandas as pd
from dataquality.clients.api import ApiClient
from time import sleep


api_client = ApiClient()


def see_results(wait=True, body={}):
    if wait:
        print("Waiting for data to be processed")
        if "localhost" in config.api_url:
            for i in tqdm(range(50)):
                sleep(1)
        else:
            api_client.wait_for_run()

    task_type = dq.config.task_type
    proj = api_client.get_project(config.current_project_id)["name"]
    run = api_client.get_project_run(config.current_project_id, config.current_run_id)["name"]
    api_client.export_run(proj, run, "training", f"{task_type}_training.csv")
    api_client.export_run(proj, run, "test", f"{task_type}_test.csv")
    api_client.export_run(proj, run, "validation", f"{task_type}_validation.csv")
    print(f"Exported to {task_type}_training.csv, {task_type}_test.csv, and {task_type}_validation.csv")
    df_train = pd.read_csv(f"{task_type}_training.csv")
    df_test = pd.read_csv(f"{task_type}_test.csv")
    df_val = pd.read_csv(f"{task_type}_validation.csv")
    print("Training")
    display(df_train)
    print("\nTest")
    display(df_test)
    print("\nValidation")
    display(df_val)
    return df_train, df_test, df_val

In [8]:
dq.finish?

## Text Classification

In [19]:
from tqdm.notebook import tqdm
import time
import numpy as np
from uuid import uuid4
import pandas as pd
from sklearn.datasets import fetch_20newsgroups


dq.init("text_classification", "CBO-testing", "TC")


BATCH_SIZE=32
EMB_DIM=768
NUM_EPOCHS=1


newsgroups = fetch_20newsgroups(subset="train", remove=('headers', 'footers', 'quotes'))
dataset = pd.DataFrame()
dataset["text"] = newsgroups.data
label_ind = newsgroups.target_names
dataset["label"] = [label_ind[i] for i in newsgroups.target]
dataset["id"] = list(range(len(dataset)))


def generate_random_embeddings(batch_size: int, emb_dims: int) -> np.ndarray:
    return np.random.rand(batch_size, emb_dims)


def generate_random_probabilities(batch_size: int, num_classes: int) -> np.ndarray:
    probs = np.random.rand(batch_size, num_classes)
    return probs / probs.sum(axis=-1).reshape(-1, 1)  # Normalize to sum to 1


t_start = time.time()
dq.set_labels_for_run(dataset["label"].unique())

print("Logging input data")
for split in ["train", "test", "validation"]:
    dq.log_dataset(dataset, split=split)
    
print("Done")
print(f"Input logging took {time.time() - t_start} seconds\n\n")


print("Logging model outputs")
t_start = time.time()
num_classes = dataset["label"].nunique()
# Simulates model training loop
for epoch_idx in range(NUM_EPOCHS):
    print(f"Epoch {epoch_idx}")
    print('-'*100)
    for split in ["train", "test", "validation"]:
        print(split.capitalize())
        dq.set_split(split)
        for i in tqdm(range(0, len(dataset), BATCH_SIZE)):
            batch = dataset[i : i + BATCH_SIZE]
            embeddings = generate_random_embeddings(len(batch), EMB_DIM)
            probs = generate_random_probabilities(len(batch), num_classes)
            dq.log_model_outputs(
                embs=embeddings,
                probs=probs,
                epoch=epoch_idx,
                ids=batch["id"],
            )
    print('-'*100,end="\n\n")
            
print("Done")

time_spent = time.time() - t_start
print(f"Logging output took {time_spent} seconds")

dq.finish(wait=False)
t = time.time()
dq.wait_for_run()
t1 = time.time()
print("Finish took", round((t1-t) / 60, 3), "minutes")
# df_train, df_test, df_val = see_results()

📡 Retrieving run from existing project, CBO-testing
🛰 Connected to project, CBO-testing, and run, TC.




Logging input data
Exporting input data [########################################] 100.00% elapsed time  :     0.01s =  0.0m =  0.0h
Appending input data [########################################] 100.00% elapsed time  :     0.02s =  0.0m =  0.0h
Appending input data [########################################] 100.00% elapsed time  :     0.03s =  0.0m =  0.0h
 Done
Input logging took 10.947309017181396 seconds


Logging model outputs
Epoch 0
----------------------------------------------------------------------------------------------------
Train


  0%|          | 0/354 [00:00<?, ?it/s]



Test


  0%|          | 0/354 [00:00<?, ?it/s]

Validation


  0%|          | 0/354 [00:00<?, ?it/s]

----------------------------------------------------------------------------------------------------

Done
Logging output took 9.517239093780518 seconds
☁️ Uploading Data


training:   0%|          | 0/1 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/354 [00:00<?, ?it/s]

training (epoch=0):   0%|          | 0/3 [00:00<?, ?it/s]

validation:   0%|          | 0/1 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/354 [00:00<?, ?it/s]

validation (epoch=0):   0%|          | 0/3 [00:00<?, ?it/s]

test:   0%|          | 0/1 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/354 [00:00<?, ?it/s]

test (epoch=0):   0%|          | 0/3 [00:00<?, ?it/s]

🧹 Cleaning up
Job default successfully submitted. Results will be available soon at http://127.0.0.1:3000/insights?projectId=2d31c035-6c46-49e2-8515-4c5d0c5713e1&runId=c2b437b0-1ef7-4271-80bd-cd44df9980e5&split=training&depHigh=1&depLow=0&taskType=0
Waiting for job...
Done! Job finished with status completed
Finish took 2.566 minutes


In [None]:
# original chunk size + current default - 5.032 min
# default (from vaex) chunk size + current default - 4.532 min
# original chunk size + new default (export then xray) - 3.279 min
# default (from vaex) chunk size + new default (export then xray) - 3.762 min
# last test (get_dataframe(...)) - 2.566 min

## Multi Label

In [21]:
from typing import *
from random import choice
import numpy as np


dq.init("text_multi_label", "test-mltc-run")
dq.set_labels_for_run([["not "+_label, _label] for _label in ['toxic', 'severe_toxic', 'obscene', 'threat', 'insult','identity_hate']]) 
dq.set_tasks_for_run(['task_0', 'task_1', 'task_2', 'task_3', 'task_4', 'task_5'])

n = 5000

texts: List[str] = [f"text sample {i}" for i in range(n)]

labels: List[str] = [
    [choice(i) for i in dq.get_data_logger().logger_config.labels]
    for _ in range(n)
]

ids = list(range(n))


dq.log_data_samples(texts=texts, task_labels=labels, ids=ids, split="training")
dq.log_data_samples(texts=texts, task_labels=labels, ids=ids, split="test")
dq.log_data_samples(texts=texts, task_labels=labels, ids=ids, split="validation")

for split in ["train", "test", "validation"]:
    for epoch in range(5):
        emb=np.random.rand(n, 768)
        logits=[[np.random.rand(2)] * 6] * n
        ids=list(range(n))
        
        for i in range(0, n, 32):
            dq.log_model_outputs(
                embs=emb[i:i+5],
                logits=logits[i:i+5],
                ids=ids[i:i+5],
                split=split,
                epoch=epoch
            )

dq.finish()
# df_train, df_test, df_val = see_results()


💭 Project test-mltc-run was not found.
✨ Initializing public project test-mltc-run
🏃‍♂️ Starting run loose_fuchsia_parrot
Exporting input data [########################################] 100.00% elapsed time  :     0.00s =  0.0m =  0.0h
Appending input data [########################################] 100.00% elapsed time  :     0.01s =  0.0m =  0.0h
Appending input data [########################################] 100.00% elapsed time  :     0.01s =  0.0m =  0.0h
 ☁️ Uploading Data


training:   0%|          | 0/5 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

training (epoch=0):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

training (epoch=1):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

training (epoch=2):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

training (epoch=3):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

training (epoch=4):   0%|          | 0/3 [00:00<?, ?it/s]

validation:   0%|          | 0/5 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

validation (epoch=0):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

validation (epoch=1):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

validation (epoch=2):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

validation (epoch=3):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

validation (epoch=4):   0%|          | 0/3 [00:00<?, ?it/s]

test:   0%|          | 0/5 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

test (epoch=0):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

test (epoch=1):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

test (epoch=2):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

test (epoch=3):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

test (epoch=4):   0%|          | 0/3 [00:00<?, ?it/s]

🧹 Cleaning up
Job default successfully submitted. Results will be available soon at http://127.0.0.1:3000/insights?projectId=dc4e5868-daac-43a9-89fa-ff813826a76f&runId=f55948b5-be5b-4dd4-85e5-e0a13ad8b00b&split=training&depHigh=1&depLow=0&taskType=1
Waiting for job...
Done! Job finished with status completed


{'project_id': 'dc4e5868-daac-43a9-89fa-ff813826a76f',
 'run_id': 'f55948b5-be5b-4dd4-85e5-e0a13ad8b00b',
 'job_name': 'default',
 'labels': [['not toxic', 'toxic'],
  ['not severe_toxic', 'severe_toxic'],
  ['not obscene', 'obscene'],
  ['not threat', 'threat'],
  ['not insult', 'insult'],
  ['not identity_hate', 'identity_hate']],
 'task_type': 1,
 'tasks': ['task_0', 'task_1', 'task_2', 'task_3', 'task_4', 'task_5'],
 'non_inference_logged': False,
 'migration_name': None,
 'xray': True,
 'process_existing_inference_runs': False,
 'message': 'Processing job!',
 'link': 'http://127.0.0.1:3000/insights?projectId=dc4e5868-daac-43a9-89fa-ff813826a76f&runId=f55948b5-be5b-4dd4-85e5-e0a13ad8b00b&split=training&depHigh=1&depLow=0&taskType=1'}

In [24]:
see_results(wait=False)


Exported to text_multi_label_training.csv, text_multi_label_test.csv, and text_multi_label_validation.csv
Training


Unnamed: 0,epoch,pred_task_0,pred_task_1,pred_task_2,pred_task_3,pred_task_4,pred_task_5,text,split,data_schema_version,...,likely_mislabeled_3,likely_mislabeled_4,likely_mislabeled_5,x,y,pred,gold,data_error_potential,confidence,likely_mislabeled
0,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 0,training,1,...,False,True,True,9.072045,9.058199,1,0,0.546335,0.569694,True
1,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 1,training,1,...,False,True,False,8.276835,9.542554,1,0,0.546335,0.569694,True
2,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 2,training,1,...,True,True,True,8.531170,7.458549,1,0,0.546335,0.569694,True
3,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 3,training,1,...,True,False,True,7.104985,11.084365,1,0,0.546335,0.569694,True
4,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 4,training,1,...,False,True,True,8.001328,10.265515,1,1,0.453665,0.569694,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
780,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 4992,training,1,...,False,True,False,3.017641,8.699756,1,0,0.546335,0.569694,True
781,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 4993,training,1,...,True,False,False,2.765950,8.672996,1,0,0.546335,0.569694,True
782,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 4994,training,1,...,False,False,False,7.245671,11.039823,1,0,0.546335,0.569694,True
783,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 4995,training,1,...,True,False,False,7.348386,10.798277,1,0,0.546335,0.569694,False



Test


Unnamed: 0,epoch,pred_task_0,pred_task_1,pred_task_2,pred_task_3,pred_task_4,pred_task_5,text,split,data_schema_version,...,likely_mislabeled_3,likely_mislabeled_4,likely_mislabeled_5,x,y,pred,gold,data_error_potential,confidence,likely_mislabeled
0,4,not toxic,not severe_toxic,not obscene,not threat,not insult,not identity_hate,text sample 0,test,1,...,True,False,False,5.973294,4.578478,0,0,0.464583,0.586153,False
1,4,not toxic,not severe_toxic,not obscene,not threat,not insult,not identity_hate,text sample 1,test,1,...,True,False,True,6.046090,4.740110,0,0,0.464583,0.586153,False
2,4,not toxic,not severe_toxic,not obscene,not threat,not insult,not identity_hate,text sample 2,test,1,...,False,False,False,3.478628,2.340235,0,0,0.464583,0.586153,False
3,4,not toxic,not severe_toxic,not obscene,not threat,not insult,not identity_hate,text sample 3,test,1,...,False,True,False,3.796821,3.943028,0,0,0.464583,0.586153,False
4,4,not toxic,not severe_toxic,not obscene,not threat,not insult,not identity_hate,text sample 4,test,1,...,True,False,False,8.796474,3.168915,0,1,0.535417,0.586153,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
780,4,not toxic,not severe_toxic,not obscene,not threat,not insult,not identity_hate,text sample 4992,test,1,...,True,False,False,3.531583,0.557858,0,0,0.464583,0.586153,False
781,4,not toxic,not severe_toxic,not obscene,not threat,not insult,not identity_hate,text sample 4993,test,1,...,False,True,True,9.169562,1.940018,0,0,0.464583,0.586153,False
782,4,not toxic,not severe_toxic,not obscene,not threat,not insult,not identity_hate,text sample 4994,test,1,...,False,False,True,8.480592,3.483148,0,0,0.464583,0.586153,False
783,4,not toxic,not severe_toxic,not obscene,not threat,not insult,not identity_hate,text sample 4995,test,1,...,False,True,True,5.343930,-1.017015,0,0,0.464583,0.586153,False



Validation


Unnamed: 0,epoch,pred_task_0,pred_task_1,pred_task_2,pred_task_3,pred_task_4,pred_task_5,text,split,data_schema_version,...,likely_mislabeled_3,likely_mislabeled_4,likely_mislabeled_5,x,y,pred,gold,data_error_potential,confidence,likely_mislabeled
0,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 0,validation,1,...,False,True,True,4.761479,-0.415481,1,0,0.583715,0.686851,True
1,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 1,validation,1,...,False,True,False,6.292846,-0.318058,1,0,0.583715,0.686851,True
2,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 2,validation,1,...,True,True,True,6.234266,5.554712,1,0,0.583715,0.686851,True
3,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 3,validation,1,...,True,False,True,5.533935,5.467899,1,0,0.583715,0.686851,True
4,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 4,validation,1,...,False,True,True,4.307050,0.537139,1,1,0.416285,0.686851,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
780,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 4992,validation,1,...,False,True,False,7.970720,1.048047,1,0,0.583715,0.686851,True
781,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 4993,validation,1,...,True,False,False,2.784461,3.391117,1,0,0.583715,0.686851,True
782,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 4994,validation,1,...,False,False,False,2.884337,2.255941,1,0,0.583715,0.686851,True
783,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 4995,validation,1,...,True,False,False,7.039492,0.170575,1,0,0.583715,0.686851,False


(     epoch pred_task_0   pred_task_1 pred_task_2 pred_task_3 pred_task_4  \
 0        4       toxic  severe_toxic     obscene      threat      insult   
 1        4       toxic  severe_toxic     obscene      threat      insult   
 2        4       toxic  severe_toxic     obscene      threat      insult   
 3        4       toxic  severe_toxic     obscene      threat      insult   
 4        4       toxic  severe_toxic     obscene      threat      insult   
 ..     ...         ...           ...         ...         ...         ...   
 780      4       toxic  severe_toxic     obscene      threat      insult   
 781      4       toxic  severe_toxic     obscene      threat      insult   
 782      4       toxic  severe_toxic     obscene      threat      insult   
 783      4       toxic  severe_toxic     obscene      threat      insult   
 784      4       toxic  severe_toxic     obscene      threat      insult   
 
        pred_task_5              text     split  data_schema_version  ... 

## NER

In [20]:
from dataquality.schemas.task_type import TaskType
from dataquality import config 
from uuid import uuid4
import numpy as np
from time import sleep
from tqdm.notebook import tqdm


dq.init("text_ner", "test-ner-run")


def log_inputs():
    text_inputs = ['what movies star bruce willis', 'show me films with drew barrymore from the 1980s', 'what movies starred both al pacino and robert deniro', 'find me all of the movies that starred harold ramis and bill murray', 'find me a movie with a quote about baseball in it']
    tokens = [[(0, 4), (5, 11), (12, 16), (17, 22), (17, 22), (23, 29), (23, 29)], [(0, 4), (5, 7), (8, 13), (14, 18), (19, 23), (24, 33), (24, 33), (24, 33), (34, 38), (39, 42), (43, 48)], [(0, 4), (5, 11), (12, 19), (20, 24), (25, 27), (28, 34), (28, 34), (28, 34), (35, 38), (39, 45), (39, 45), (46, 52), (46, 52)], [(0, 4), (5, 7), (8, 11), (12, 14), (15, 18), (19, 25), (26, 30), (31, 38), (39, 45), (39, 45), (39, 45), (46, 51), (46, 51), (52, 55), (56, 60), (61, 67), (61, 67), (61, 67)], [(0, 4), (5, 7), (8, 9), (10, 15), (16, 20), (21, 22), (23, 28), (29, 34), (35, 43), (44, 46), (47, 49)]]
    gold_spans = [[{'start': 17, 'end': 29, 'label': 'ACTOR'}], [{'start': 19, 'end': 33, 'label': 'ACTOR'}, {'start': 43, 'end': 48, 'label': 'YEAR'}], [{'start': 25, 'end': 34, 'label': 'ACTOR'}, {'start': 39, 'end': 52, 'label': 'ACTOR'}], [{'start': 39, 'end': 51, 'label': 'ACTOR'}, {'start': 56, 'end': 67, 'label': 'ACTOR'}], []]
    ids = [0, 1, 2, 3, 4]

    labels = ['[PAD]', '[CLS]', '[SEP]', 'O', 'B-ACTOR', 'I-ACTOR', 'B-YEAR', 'B-TITLE', 'B-GENRE', 'I-GENRE', 'B-DIRECTOR', 'I-DIRECTOR', 'B-SONG', 'I-SONG', 'B-PLOT', 'I-PLOT', 'B-REVIEW', 'B-CHARACTER', 'I-CHARACTER', 'B-RATING', 'B-RATINGS_AVERAGE', 'I-RATINGS_AVERAGE', 'I-TITLE', 'I-RATING', 'B-TRAILER', 'I-TRAILER', 'I-REVIEW', 'I-YEAR']
    dq.set_labels_for_run(labels)
    dq.set_tagging_schema("BIO")
    dq.log_data_samples(texts=text_inputs, text_token_indices=tokens, ids=ids, gold_spans=gold_spans, split="training")
    dq.log_data_samples(texts=text_inputs, text_token_indices=tokens, ids=ids, gold_spans=gold_spans, split="validation")
    dq.log_data_samples(texts=text_inputs, text_token_indices=tokens, ids=ids, gold_spans=gold_spans, split="test")

def log_outputs():
    num_classes = 28
    embs = [np.random.rand(119, 768) for _ in range(5)]
    logits= [np.random.rand(119, 28) for _ in range(5)]                                      
    ids= list(range(5))
    for epoch in tqdm(range(6)):
        for split in ["training", "test", "validation"]:
            dq.log_model_outputs(
                embs=embs, logits=logits, ids=ids, split=split, epoch=epoch
            )
    
def finish():
    dq.finish()
    
    
def runit():
    log_inputs()
    log_outputs()
    finish()
    
runit()
df_train, df_test, df_val = see_results()

💭 Project test-ner-run was not found.
✨ Initializing public project test-ner-run
🏃‍♂️ Starting run eventual_purple_felidae
Exporting input data [########################################] 100.00% elapsed time  :     0.00s =  0.0m =  0.0h
Appending input data [########################################] 100.00% elapsed time  :     0.00s =  0.0m =  0.0h
Appending input data [########################################] 100.00% elapsed time  :     0.00s =  0.0m =  0.0h
 

  0%|          | 0/6 [00:00<?, ?it/s]

☁️ Uploading Data


training:   0%|          | 0/6 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

training (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

training (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

training (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

training (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

training (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

training (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

validation:   0%|          | 0/6 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

validation (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

validation (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

validation (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

validation (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

validation (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

validation (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

test:   0%|          | 0/6 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

test (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

test (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

test (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

test (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

test (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/1 [00:00<?, ?it/s]

test (epoch=):   0%|          | 0/3 [00:00<?, ?it/s]

🧹 Cleaning up
Job default successfully submitted. Results will be available soon at http://127.0.0.1:3000/insights?projectId=5d9ac2c9-b968-45ce-8ce7-f2db3d82b803&runId=691a7af3-2f61-413b-8395-487ff3dc2cea&split=training&depHigh=1&depLow=0&taskType=2
Waiting for job...
Done! Job finished with status completed
Waiting for data to be processed


  0%|          | 0/50 [00:00<?, ?it/s]

Exported to text_ner_training.csv, text_ner_test.csv, and text_ner_validation.csv
Training


Unnamed: 0,sample_id,spans,id,text,missed_label,span_shift,wrong_tag,ghost_span,total_errors,galileo_text_length,galileo_language_id,galileo_pii
0,0,"[{""start"":17,""end"":29,""data_error_potential"":0...",0,what movies star bruce willis,0,1,0,3,4,29,en,
1,1,"[{""start"":19,""end"":33,""data_error_potential"":0...",1,show me films with drew barrymore from the 1980s,0,2,0,1,3,48,en,
2,2,"[{""start"":25,""end"":34,""data_error_potential"":0...",2,what movies starred both al pacino and robert ...,0,2,0,2,4,52,en,
3,3,"[{""start"":39,""end"":51,""data_error_potential"":0...",3,find me all of the movies that starred harold ...,0,2,0,4,6,67,en,



Test


Unnamed: 0,sample_id,spans,id,text,missed_label,span_shift,wrong_tag,ghost_span,total_errors,galileo_text_length,galileo_language_id,galileo_pii
0,0,"[{""start"":17,""end"":29,""data_error_potential"":0...",0,what movies star bruce willis,0,1,0,3,4,29,en,
1,1,"[{""start"":19,""end"":33,""data_error_potential"":0...",1,show me films with drew barrymore from the 1980s,0,2,0,1,3,48,en,
2,2,"[{""start"":25,""end"":34,""data_error_potential"":0...",2,what movies starred both al pacino and robert ...,0,2,0,2,4,52,en,
3,3,"[{""start"":39,""end"":51,""data_error_potential"":0...",3,find me all of the movies that starred harold ...,0,2,0,4,6,67,en,



Validation


Unnamed: 0,sample_id,spans,id,text,missed_label,span_shift,wrong_tag,ghost_span,total_errors,galileo_text_length,galileo_language_id,galileo_pii
0,0,"[{""start"":17,""end"":29,""data_error_potential"":0...",0,what movies star bruce willis,0,1,0,3,4,29,en,
1,1,"[{""start"":19,""end"":33,""data_error_potential"":0...",1,show me films with drew barrymore from the 1980s,0,2,0,1,3,48,en,
2,2,"[{""start"":25,""end"":34,""data_error_potential"":0...",2,what movies starred both al pacino and robert ...,0,2,0,2,4,52,en,
3,3,"[{""start"":39,""end"":51,""data_error_potential"":0...",3,find me all of the movies that starred harold ...,0,2,0,4,6,67,en,
