## End to end examples logging data to Galileo for Text Classification, MLTC, and NER

### For understanding the client and how to get started, see the [Dataquality Demo](./Dataquality-Client-Demo.ipynb)
### Check out the full documentation [here](https://rungalileo.gitbook.io/galileo/getting-started)
### To see real end-to-end notebooks training real ML models, see [here](https://drive.google.com/drive/folders/17-cHuRzXIpWaD8rYwy69RMQr__HiAiDk?usp=sharing)

In [None]:
## Local

import os

os.environ['GALILEO_CONSOLE_URL']="http://localhost:8088"
os.environ["GALILEO_USERNAME"]="user@example.com"
os.environ["GALILEO_PASSWORD"]="Th3secret_"

In [2]:
import dataquality as dq
dq.configure()

Welcome to Galileo v0.5.4!
To skip this prompt in the future, set the following environment variable: GALILEO_CONSOLE_URL
🔭 Enter the url of your Galileo console
console.dev.rungalileo.io
📡 https://console.dev.rungalileo.io
🔭 Logging you into Galileo

👀 Found auth method email set via env, skipping prompt.
📧 Enter your email:galileo@rungalileo.io
🤫 Enter your password:········
🚀 You're logged in to Galileo as galileo@rungalileo.io!


***Helper function***

In [5]:
from dataquality import config
import pandas as pd
from dataquality.clients.api import ApiClient
from time import sleep


api_client = ApiClient()


def see_results(wait=True, body={}):
    if wait:
        api_client.wait_for_run()

    task_type = dq.config.task_type
    proj = api_client.get_project(config.current_project_id)["name"]
    run = api_client.get_project_run(config.current_project_id, config.current_run_id)["name"]
    api_client.export_run(proj, run, "training", f"{task_type}_training.csv")
    api_client.export_run(proj, run, "test", f"{task_type}_test.csv")
    api_client.export_run(proj, run, "validation", f"{task_type}_validation.csv")
    print(f"Exported to {task_type}_training.csv, {task_type}_test.csv, and {task_type}_validation.csv")
    df_train = pd.read_csv(f"{task_type}_training.csv")
    df_test = pd.read_csv(f"{task_type}_test.csv")
    df_val = pd.read_csv(f"{task_type}_validation.csv")
    print("Training")
    display(df_train)
    print("\nTest")
    display(df_test)
    print("\nValidation")
    display(df_val)
    return df_train, df_test, df_val

## Text Classification

In [None]:
%%time
import dataquality as dq
from tqdm.notebook import tqdm
import time
import numpy as np
from uuid import uuid4
import pandas as pd
from sklearn.datasets import fetch_20newsgroups


dq.login()
dq.init("text_classification", "test-tc-run")


BATCH_SIZE=32
EMB_DIM=768
NUM_EPOCHS=1


newsgroups = fetch_20newsgroups(subset="train", remove=('headers', 'footers', 'quotes'))
dataset = pd.DataFrame()
dataset["text"] = newsgroups.data
label_ind = newsgroups.target_names
dataset["label"] = [label_ind[i] for i in newsgroups.target]
dataset["id"] = list(range(len(dataset)))


def generate_random_embeddings(batch_size: int, emb_dims: int) -> np.ndarray:
    return np.random.rand(batch_size, emb_dims)


def generate_random_probabilities(batch_size: int, num_classes: int) -> np.ndarray:
    probs = np.random.rand(batch_size, num_classes)
    return probs / probs.sum(axis=-1).reshape(-1, 1)  # Normalize to sum to 1


t_start = time.time()
dq.set_labels_for_run(dataset["label"].unique())

print("Logging input data")
for split in ["train", "test", "validation"]:
    dq.log_dataset(dataset, split=split)
    
print("Done")
print(f"Input logging took {time.time() - t_start} seconds\n\n")


print("Logging model outputs")
t_start = time.time()
num_classes = dataset["label"].nunique()
# Simulates model training loop
for epoch_idx in range(NUM_EPOCHS):
    print(f"Epoch {epoch_idx}")
    print('-'*100)
    for split in ["train", "test", "validation"]:
        print(split.capitalize())
        dq.set_split(split)
        for i in tqdm(range(0, len(dataset), BATCH_SIZE)):
            batch = dataset[i : i + BATCH_SIZE]
            embeddings = generate_random_embeddings(len(batch), EMB_DIM)
            probs = generate_random_probabilities(len(batch), num_classes)
            dq.log_model_outputs(
                embs=embeddings,
                probs=probs,
                epoch=epoch_idx,
                ids=batch["id"],
            )
    print('-'*100,end="\n\n")
            
print("Done")

time_spent = time.time() - t_start
print(f"Logging output took {time_spent} seconds")

dq.finish()
df_train, df_test, df_val = see_results()

📡 https://console.dev.rungalileo.io
🔭 Logging you into Galileo

👀 Found auth method email set via env, skipping prompt.
🚀 You're logged in to Galileo as galileo@rungalileo.io!
📡 Retrieved project, test-tc-run, and starting a new run
🏃‍♂️ Starting run complex_black_stingray
🛰 Connected to project, test-tc-run, and created run, complex_black_stingray.
Logging input data
Logging 11314 samples [########################################] 100.00% elapsed time  :     0.02s =  0.0m =  0.0h
Logging 11314 samples [########################################] 100.00% elapsed time  :     0.01s =  0.0m =  0.0h
Logging 11314 samples [########################################] 100.00% elapsed time  :     0.01s =  0.0m =  0.0h
 Done
Input logging took 11.115311861038208 seconds


Logging model outputs
Epoch 0
----------------------------------------------------------------------------------------------------
Train


  0%|          | 0/354 [00:00<?, ?it/s]



Test


  0%|          | 0/354 [00:00<?, ?it/s]

Validation


  0%|          | 0/354 [00:00<?, ?it/s]

----------------------------------------------------------------------------------------------------

Done
Logging output took 7.216957330703735 seconds
☁️ Uploading Data


training:   0%|          | 0/1 [00:00<?, ?it/s]

Processing data for upload:   0%|          | 0/354 [00:00<?, ?it/s]

training (epoch=0):   0%|          | 0/3 [00:00<?, ?it/s]

Uploading data to Galileo:   0%|          | 0.00/66.4M [00:00<?, ?B/s]

Uploading data to Galileo:   0%|          | 0.00/2.04M [00:00<?, ?B/s]

Uploading data to Galileo:   0%|          | 0.00/13.7M [00:00<?, ?B/s]

validation:   0%|          | 0/1 [00:00<?, ?it/s]

Processing data for upload:   0%|          | 0/354 [00:00<?, ?it/s]

validation (epoch=0):   0%|          | 0/3 [00:00<?, ?it/s]

Uploading data to Galileo:   0%|          | 0.00/66.4M [00:00<?, ?B/s]

Uploading data to Galileo:   0%|          | 0.00/2.04M [00:00<?, ?B/s]

Uploading data to Galileo:   0%|          | 0.00/13.7M [00:00<?, ?B/s]

test:   0%|          | 0/1 [00:00<?, ?it/s]

Processing data for upload:   0%|          | 0/354 [00:00<?, ?it/s]

test (epoch=0):   0%|          | 0/3 [00:00<?, ?it/s]

Uploading data to Galileo:   0%|          | 0.00/66.4M [00:00<?, ?B/s]

Uploading data to Galileo:   0%|          | 0.00/2.04M [00:00<?, ?B/s]

Uploading data to Galileo:   0%|          | 0.00/13.6M [00:00<?, ?B/s]

Job default successfully submitted. Results will be available soon at https://console.dev.rungalileo.io/insights?projectId=4a8a50d8-9a50-48c0-9c5b-d3f015b775d3&runId=a1037cad-2e95-4f91-9536-c03fa79551f7&split=training&depHigh=1&depLow=0&taskType=0
Waiting for job...


In [7]:
dq.configure()

Welcome to Galileo v0.5.4a2!
To skip this prompt in the future, set the following environment variable: GALILEO_CONSOLE_URL
🔭 Enter the url of your Galileo console
console.preprod.rungalileo.io
📡 https://console.preprod.rungalileo.io
🔭 Logging you into Galileo

👀 Found auth method email set via env, skipping prompt.
📧 Enter your email:galileo@rungalileo.io
🤫 Enter your password:········
🚀 You're logged in to Galileo as galileo@rungalileo.io!


In [8]:
dq.get_run_status("computer_vision", "fake_example_image_size_32_num_samples_1024000")

{'id': '5c65a4c5-d27c-4fd3-acbf-cf39d05cb7ac',
 'created_at': '2022-09-21T03:32:28.919484',
 'updated_at': '2022-09-22T13:28:36.229386',
 'failed_at': None,
 'completed_at': '2022-09-22T13:28:36.227931',
 'job_name': 'default',
 'migration_name': None,
 'project_id': '5d0dc18a-82ff-4b2b-b453-e361fa3f4a72',
 'run_id': '0d4e4674-544f-4b26-abf1-f605d1e4dd69',
 'status': 'completed',
 'retries': 4,
 'request_data': {'xray': True,
  'tasks': None,
  'labels': ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'],
  'run_id': '0d4e4674-544f-4b26-abf1-f605d1e4dd69',
  'job_name': 'default',
  'task_type': 0,
  'project_id': '5d0dc18a-82ff-4b2b-b453-e361fa3f4a72',
  'migration_name': None,
  'non_inference_logged': False,
  'process_existing_inference_runs': False},
 'error_message': None}

## Multi Label

In [3]:
%%time
from typing import *
import dataquality as dq
from random import choice
import numpy as np

dq.login()
dq.init("text_multi_label", "test-mltc-run")
dq.set_tasks_for_run(['task_0', 'task_1', 'task_2', 'task_3', 'task_4', 'task_5'], binary=False)
dq.set_labels_for_run([["not "+_label, _label] for _label in ['toxic', 'severe_toxic', 'obscene', 'threat', 'insult','identity_hate']]) 


n = 5000

texts: List[str] = [f"text sample {i}" for i in range(n)]

labels: List[str] = [
    [choice(i) for i in dq.get_data_logger().logger_config.labels]
    for _ in range(n)
]

ids = list(range(n))


dq.log_data_samples(texts=texts, task_labels=labels, ids=ids, split="training")
dq.log_data_samples(texts=texts, task_labels=labels, ids=ids, split="test")
dq.log_data_samples(texts=texts, task_labels=labels, ids=ids, split="validation")

for split in ["train", "test", "validation"]:
    for epoch in range(5):
        emb=np.random.rand(n, 768)
        logits=[[np.random.rand(2)] * 6] * n
        ids=list(range(n))
        
        for i in range(0, n, 32):
            dq.log_model_outputs(
                embs=emb[i:i+32],
                logits=logits[i:i+32],
                ids=ids[i:i+32],
                split=split,
                epoch=epoch
            )

dq.finish()
df_train, df_test, df_val = see_results()


📡 https://console.dev.rungalileo.io
🔭 Logging you into Galileo

👀 Found auth method email set via env, skipping prompt.
🚀 You're logged in to Galileo as galileo@rungalileo.io!
📡 Retrieved project, test-mltc-run, and starting a new run
🏃‍♂️ Starting run worrying_blue_tarantula
🛰 Connected to project, test-mltc-run, and created run, worrying_blue_tarantula.
Exporting input data [########################################] 100.00% elapsed time  :     0.00s =  0.0m =  0.0h
Appending input data [########################################] 100.00% elapsed time  :     0.01s =  0.0m =  0.0h
Appending input data [########################################] 100.00% elapsed time  :     0.01s =  0.0m =  0.0h
 ☁️ Uploading Data


training:   0%|          | 0/5 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

training (epoch=0):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

training (epoch=1):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

training (epoch=2):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

training (epoch=3):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

training (epoch=4):   0%|          | 0/3 [00:00<?, ?it/s]

validation:   0%|          | 0/5 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

validation (epoch=0):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

validation (epoch=1):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

validation (epoch=2):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

validation (epoch=3):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

validation (epoch=4):   0%|          | 0/3 [00:00<?, ?it/s]

test:   0%|          | 0/5 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

test (epoch=0):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

test (epoch=1):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

test (epoch=2):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

test (epoch=3):   0%|          | 0/3 [00:00<?, ?it/s]

Combining batches for upload:   0%|          | 0/157 [00:00<?, ?it/s]

test (epoch=4):   0%|          | 0/3 [00:00<?, ?it/s]

🧹 Cleaning up
Job default successfully submitted. Results will be available soon at https://console.dev.rungalileo.io/insights?projectId=d172b38a-3170-4bfa-a88d-0f0cdeb613ef&runId=ac4a2aee-b3c4-4d7d-9263-a37949c99cac&split=training&depHigh=1&depLow=0&taskType=1
Waiting for job...
Done! Job finished with status completed
Waiting for job...
Done! Job finished with status completed
Exported to text_multi_label_training.csv, text_multi_label_test.csv, and text_multi_label_validation.csv
Training


Unnamed: 0,epoch,pred_task_0,pred_task_1,pred_task_2,pred_task_3,pred_task_4,pred_task_5,text,split,data_schema_version,...,likely_mislabeled_3,likely_mislabeled_4,likely_mislabeled_5,x,y,pred,gold,data_error_potential,confidence,likely_mislabeled
0,4,not toxic,not severe_toxic,not obscene,not threat,not insult,not identity_hate,text sample 0,training,1,...,True,False,True,1.786621,4.558834,0,0,0.48071,0.566915,False
1,4,not toxic,not severe_toxic,not obscene,not threat,not insult,not identity_hate,text sample 1,training,1,...,True,False,True,7.350076,3.551716,0,0,0.48071,0.566915,False
2,4,not toxic,not severe_toxic,not obscene,not threat,not insult,not identity_hate,text sample 2,training,1,...,True,True,False,5.624312,5.730748,0,1,0.51929,0.566915,True
3,4,not toxic,not severe_toxic,not obscene,not threat,not insult,not identity_hate,text sample 3,training,1,...,False,True,True,5.397110,0.678771,0,0,0.48071,0.566915,False
4,4,not toxic,not severe_toxic,not obscene,not threat,not insult,not identity_hate,text sample 4,training,1,...,False,False,True,2.005091,5.033963,0,0,0.48071,0.566915,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
780,4,not toxic,not severe_toxic,not obscene,not threat,not insult,not identity_hate,text sample 4992,training,1,...,True,False,True,7.128171,1.930935,0,0,0.48071,0.566915,False
781,4,not toxic,not severe_toxic,not obscene,not threat,not insult,not identity_hate,text sample 4993,training,1,...,False,True,False,6.632451,5.065577,0,1,0.51929,0.566915,True
782,4,not toxic,not severe_toxic,not obscene,not threat,not insult,not identity_hate,text sample 4994,training,1,...,False,True,False,7.229775,2.224858,0,1,0.51929,0.566915,False
783,4,not toxic,not severe_toxic,not obscene,not threat,not insult,not identity_hate,text sample 4995,training,1,...,False,False,False,1.886498,1.698276,0,0,0.48071,0.566915,False



Test


Unnamed: 0,epoch,pred_task_0,pred_task_1,pred_task_2,pred_task_3,pred_task_4,pred_task_5,text,split,data_schema_version,...,likely_mislabeled_3,likely_mislabeled_4,likely_mislabeled_5,x,y,pred,gold,data_error_potential,confidence,likely_mislabeled
0,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 0,test,1,...,False,True,False,4.474843,8.008639,1,0,0.460010,0.580129,True
1,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 1,test,1,...,False,True,False,2.741239,6.963940,1,0,0.460010,0.580129,True
2,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 2,test,1,...,False,False,True,2.317329,5.569234,1,1,0.539991,0.580129,False
3,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 3,test,1,...,True,False,False,2.581812,6.093951,1,0,0.460010,0.580129,True
4,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 4,test,1,...,True,True,False,2.395960,4.643307,1,0,0.460010,0.580129,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
780,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 4992,test,1,...,False,True,False,8.050829,6.160383,1,0,0.460010,0.580129,True
781,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 4993,test,1,...,True,False,False,6.934625,2.536875,1,1,0.539991,0.580129,False
782,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 4994,test,1,...,True,False,True,2.475639,4.034199,1,1,0.539991,0.580129,False
783,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 4995,test,1,...,False,False,True,6.956821,2.610488,1,0,0.460010,0.580129,True



Validation


Unnamed: 0,epoch,pred_task_0,pred_task_1,pred_task_2,pred_task_3,pred_task_4,pred_task_5,text,split,data_schema_version,...,likely_mislabeled_3,likely_mislabeled_4,likely_mislabeled_5,x,y,pred,gold,data_error_potential,confidence,likely_mislabeled
0,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 0,validation,1,...,False,True,False,5.149038,1.502900,1,0,0.506017,0.578045,True
1,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 1,validation,1,...,False,True,False,6.208752,1.534972,1,0,0.506017,0.578045,True
2,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 2,validation,1,...,False,False,True,5.541367,1.950478,1,1,0.493983,0.578045,False
3,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 3,validation,1,...,True,False,False,3.018355,6.921872,1,0,0.506017,0.578045,True
4,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 4,validation,1,...,True,True,False,7.786403,3.463464,1,0,0.506017,0.578045,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
780,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 4992,validation,1,...,False,True,False,5.057399,1.465297,1,0,0.506017,0.578045,True
781,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 4993,validation,1,...,True,False,False,7.199044,5.597103,1,1,0.493983,0.578045,False
782,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 4994,validation,1,...,True,False,True,2.195957,3.063439,1,1,0.493983,0.578045,False
783,4,toxic,severe_toxic,obscene,threat,insult,identity_hate,text sample 4995,validation,1,...,False,False,True,3.256423,5.456805,1,0,0.506017,0.578045,True


## NER

In [3]:
from dataquality.schemas.task_type import TaskType
from dataquality import config 
from uuid import uuid4
import numpy as np
from time import sleep
from tqdm.notebook import tqdm


dq.init("text_ner", "test-ner-run")


def log_inputs():
    text_inputs = ['what movies star bruce willis', 'show me films with drew barrymore from the 1980s', 'what movies starred both al pacino and robert deniro', 'find me all of the movies that starred harold ramis and bill murray', 'find me a movie with a quote about baseball in it']
    tokens = [[(0, 4), (5, 11), (12, 16), (17, 22), (17, 22), (23, 29), (23, 29)], [(0, 4), (5, 7), (8, 13), (14, 18), (19, 23), (24, 33), (24, 33), (24, 33), (34, 38), (39, 42), (43, 48)], [(0, 4), (5, 11), (12, 19), (20, 24), (25, 27), (28, 34), (28, 34), (28, 34), (35, 38), (39, 45), (39, 45), (46, 52), (46, 52)], [(0, 4), (5, 7), (8, 11), (12, 14), (15, 18), (19, 25), (26, 30), (31, 38), (39, 45), (39, 45), (39, 45), (46, 51), (46, 51), (52, 55), (56, 60), (61, 67), (61, 67), (61, 67)], [(0, 4), (5, 7), (8, 9), (10, 15), (16, 20), (21, 22), (23, 28), (29, 34), (35, 43), (44, 46), (47, 49)]]
    gold_spans = [[{'start': 17, 'end': 29, 'label': 'ACTOR'}], [{'start': 19, 'end': 33, 'label': 'ACTOR'}, {'start': 43, 'end': 48, 'label': 'YEAR'}], [{'start': 25, 'end': 34, 'label': 'ACTOR'}, {'start': 39, 'end': 52, 'label': 'ACTOR'}], [{'start': 39, 'end': 51, 'label': 'ACTOR'}, {'start': 56, 'end': 67, 'label': 'ACTOR'}], []]
    ids = [0, 1, 2, 3, 4]

    labels = ['[PAD]', '[CLS]', '[SEP]', 'O', 'B-ACTOR', 'I-ACTOR', 'B-YEAR', 'B-TITLE', 'B-GENRE', 'I-GENRE', 'B-DIRECTOR', 'I-DIRECTOR', 'B-SONG', 'I-SONG', 'B-PLOT', 'I-PLOT', 'B-REVIEW', 'B-CHARACTER', 'I-CHARACTER', 'B-RATING', 'B-RATINGS_AVERAGE', 'I-RATINGS_AVERAGE', 'I-TITLE', 'I-RATING', 'B-TRAILER', 'I-TRAILER', 'I-REVIEW', 'I-YEAR']
    dq.set_labels_for_run(labels)
    dq.set_tagging_schema("BIO")
    dq.log_data_samples(texts=text_inputs, text_token_indices=tokens, ids=ids, gold_spans=gold_spans, split="training")
    dq.log_data_samples(texts=text_inputs, text_token_indices=tokens, ids=ids, gold_spans=gold_spans, split="validation")
    dq.log_data_samples(texts=text_inputs, text_token_indices=tokens, ids=ids, gold_spans=gold_spans, split="test")

def log_outputs():
    num_classes = 28
    embs = [np.random.rand(119, 768) for _ in range(5)]
    logits= [np.random.rand(119, 28) for _ in range(5)]                                      
    ids= list(range(5))
    for epoch in tqdm(range(6)):
        for split in ["training", "test", "validation"]:
            dq.log_model_outputs(
                embs=embs, logits=logits, ids=ids, split=split, epoch=epoch
            )
    
def finish():
    dq.finish()
    
    
def runit():
    log_inputs()
    log_outputs()
    finish()
    
runit()
df_train, df_test, df_val = see_results()

📡 Retrieved project, test-ner-run, and starting a new run
🏃‍♂️ Starting run soft_scarlet_wallaby
🛰 Connected to project, test-ner-run, and created run, soft_scarlet_wallaby.
Logging 5 samples [########################################] 100.00% elapsed time  :     0.00s =  0.0m =  0.0h
Logging 5 samples [########################################] 100.00% elapsed time  :     0.00s =  0.0m =  0.0h
Logging 5 samples [########################################] 100.00% elapsed time  :     0.00s =  0.0m =  0.0h
 

  0%|          | 0/6 [00:00<?, ?it/s]

☁️ Uploading Data


training:   0%|          | 0/6 [00:00<?, ?it/s]

Processing data for upload:   0%|          | 0/1 [00:00<?, ?it/s]

training (epoch=0):   0%|          | 0/3 [00:00<?, ?it/s]

Uploading data to Galileo:   0%|          | 0.00/28.3k [00:00<?, ?B/s]

Processing data for upload:   0%|          | 0/1 [00:00<?, ?it/s]

training (epoch=1):   0%|          | 0/3 [00:00<?, ?it/s]

Uploading data to Galileo:   0%|          | 0.00/28.3k [00:00<?, ?B/s]

Processing data for upload:   0%|          | 0/1 [00:00<?, ?it/s]

training (epoch=2):   0%|          | 0/3 [00:00<?, ?it/s]

Uploading data to Galileo:   0%|          | 0.00/28.3k [00:00<?, ?B/s]

Processing data for upload:   0%|          | 0/1 [00:00<?, ?it/s]

training (epoch=3):   0%|          | 0/3 [00:00<?, ?it/s]

Uploading data to Galileo:   0%|          | 0.00/28.3k [00:00<?, ?B/s]

Processing data for upload:   0%|          | 0/1 [00:00<?, ?it/s]

training (epoch=4):   0%|          | 0/3 [00:00<?, ?it/s]

Uploading data to Galileo:   0%|          | 0.00/229k [00:00<?, ?B/s]

Uploading data to Galileo:   0%|          | 0.00/28.1k [00:00<?, ?B/s]

Uploading data to Galileo:   0%|          | 0.00/2.16k [00:00<?, ?B/s]

Processing data for upload:   0%|          | 0/1 [00:00<?, ?it/s]

training (epoch=5):   0%|          | 0/3 [00:00<?, ?it/s]

Uploading data to Galileo:   0%|          | 0.00/229k [00:00<?, ?B/s]

Uploading data to Galileo:   0%|          | 0.00/28.1k [00:00<?, ?B/s]

Uploading data to Galileo:   0%|          | 0.00/2.16k [00:00<?, ?B/s]

validation:   0%|          | 0/6 [00:00<?, ?it/s]

Processing data for upload:   0%|          | 0/1 [00:00<?, ?it/s]

validation (epoch=0):   0%|          | 0/3 [00:00<?, ?it/s]

Uploading data to Galileo:   0%|          | 0.00/28.3k [00:00<?, ?B/s]

Processing data for upload:   0%|          | 0/1 [00:00<?, ?it/s]

validation (epoch=1):   0%|          | 0/3 [00:00<?, ?it/s]

Uploading data to Galileo:   0%|          | 0.00/28.3k [00:00<?, ?B/s]

Processing data for upload:   0%|          | 0/1 [00:00<?, ?it/s]

validation (epoch=2):   0%|          | 0/3 [00:00<?, ?it/s]

Uploading data to Galileo:   0%|          | 0.00/28.3k [00:00<?, ?B/s]

Processing data for upload:   0%|          | 0/1 [00:00<?, ?it/s]

validation (epoch=3):   0%|          | 0/3 [00:00<?, ?it/s]

Uploading data to Galileo:   0%|          | 0.00/28.3k [00:00<?, ?B/s]

Processing data for upload:   0%|          | 0/1 [00:00<?, ?it/s]

validation (epoch=4):   0%|          | 0/3 [00:00<?, ?it/s]

Uploading data to Galileo:   0%|          | 0.00/229k [00:00<?, ?B/s]

Uploading data to Galileo:   0%|          | 0.00/28.1k [00:00<?, ?B/s]

Uploading data to Galileo:   0%|          | 0.00/2.18k [00:00<?, ?B/s]

Processing data for upload:   0%|          | 0/1 [00:00<?, ?it/s]

validation (epoch=5):   0%|          | 0/3 [00:00<?, ?it/s]

Uploading data to Galileo:   0%|          | 0.00/229k [00:00<?, ?B/s]

Uploading data to Galileo:   0%|          | 0.00/28.1k [00:00<?, ?B/s]

Uploading data to Galileo:   0%|          | 0.00/2.18k [00:00<?, ?B/s]

test:   0%|          | 0/6 [00:00<?, ?it/s]

Processing data for upload:   0%|          | 0/1 [00:00<?, ?it/s]

test (epoch=0):   0%|          | 0/3 [00:00<?, ?it/s]

Uploading data to Galileo:   0%|          | 0.00/28.3k [00:00<?, ?B/s]

Processing data for upload:   0%|          | 0/1 [00:00<?, ?it/s]

test (epoch=1):   0%|          | 0/3 [00:00<?, ?it/s]

Uploading data to Galileo:   0%|          | 0.00/28.3k [00:00<?, ?B/s]

Processing data for upload:   0%|          | 0/1 [00:00<?, ?it/s]

test (epoch=2):   0%|          | 0/3 [00:00<?, ?it/s]

Uploading data to Galileo:   0%|          | 0.00/28.3k [00:00<?, ?B/s]

Processing data for upload:   0%|          | 0/1 [00:00<?, ?it/s]

test (epoch=3):   0%|          | 0/3 [00:00<?, ?it/s]

Uploading data to Galileo:   0%|          | 0.00/28.3k [00:00<?, ?B/s]

Processing data for upload:   0%|          | 0/1 [00:00<?, ?it/s]

test (epoch=4):   0%|          | 0/3 [00:00<?, ?it/s]

Uploading data to Galileo:   0%|          | 0.00/229k [00:00<?, ?B/s]

Uploading data to Galileo:   0%|          | 0.00/28.1k [00:00<?, ?B/s]

Uploading data to Galileo:   0%|          | 0.00/2.15k [00:00<?, ?B/s]

Processing data for upload:   0%|          | 0/1 [00:00<?, ?it/s]

test (epoch=5):   0%|          | 0/3 [00:00<?, ?it/s]

Uploading data to Galileo:   0%|          | 0.00/229k [00:00<?, ?B/s]

Uploading data to Galileo:   0%|          | 0.00/28.1k [00:00<?, ?B/s]

Uploading data to Galileo:   0%|          | 0.00/2.15k [00:00<?, ?B/s]

Job default successfully submitted. Results will be available soon at https://console.dev.rungalileo.io/insights?projectId=9683fbaa-5f4b-471d-9c68-40095c543d5c&runId=359b481f-e2fb-454b-a69a-de0ff9746ccf&split=training&depHigh=1&depLow=0&taskType=2
Waiting for job...
Done! Job finished with status completed
🧹 Cleaning up
Waiting for job...
Done! Job finished with status completed
Exported to text_ner_training.csv, text_ner_test.csv, and text_ner_validation.csv
Training


Unnamed: 0,sample_id,spans,id,text,missed_label,span_shift,wrong_tag,ghost_span,total_errors,galileo_text_length,galileo_language_id,galileo_pii
0,0,"[{""start"":17,""end"":29,""data_error_potential"":0...",0,what movies star bruce willis,0,1,0,0,1,29,en,
1,1,"[{""start"":19,""end"":33,""data_error_potential"":0...",1,show me films with drew barrymore from the 1980s,0,1,0,4,5,48,en,
2,2,"[{""start"":25,""end"":34,""data_error_potential"":0...",2,what movies starred both al pacino and robert ...,0,2,0,3,5,52,en,
3,3,"[{""start"":39,""end"":51,""data_error_potential"":0...",3,find me all of the movies that starred harold ...,0,2,0,5,7,67,en,



Test


Unnamed: 0,sample_id,spans,id,text,missed_label,span_shift,wrong_tag,ghost_span,total_errors,galileo_text_length,galileo_language_id,galileo_pii
0,0,"[{""start"":17,""end"":29,""data_error_potential"":0...",0,what movies star bruce willis,0,1,0,0,1,29,en,
1,1,"[{""start"":19,""end"":33,""data_error_potential"":0...",1,show me films with drew barrymore from the 1980s,0,1,0,4,5,48,en,
2,2,"[{""start"":25,""end"":34,""data_error_potential"":0...",2,what movies starred both al pacino and robert ...,0,2,0,3,5,52,en,
3,3,"[{""start"":39,""end"":51,""data_error_potential"":0...",3,find me all of the movies that starred harold ...,0,2,0,5,7,67,en,



Validation


Unnamed: 0,sample_id,spans,id,text,missed_label,span_shift,wrong_tag,ghost_span,total_errors,galileo_text_length,galileo_language_id,galileo_pii
0,0,"[{""start"":17,""end"":29,""data_error_potential"":0...",0,what movies star bruce willis,0,1,0,0,1,29,en,
1,1,"[{""start"":19,""end"":33,""data_error_potential"":0...",1,show me films with drew barrymore from the 1980s,0,1,0,4,5,48,en,
2,2,"[{""start"":25,""end"":34,""data_error_potential"":0...",2,what movies starred both al pacino and robert ...,0,2,0,3,5,52,en,
3,3,"[{""start"":39,""end"":51,""data_error_potential"":0...",3,find me all of the movies that starred harold ...,0,2,0,5,7,67,en,


In [4]:
dq.metrics.api_client.get_project_run("9683fbaa-5f4b-471d-9c68-40095c543d5c","c3b328c2-80a6-47a6-abc9-d4fa2ed9f80c" )

{'name': 'youngest_silver_wolf',
 'project_id': '9683fbaa-5f4b-471d-9c68-40095c543d5c',
 'created_by': '4dac718f-e33a-4351-8d7a-9a122afe19f3',
 'id': 'c3b328c2-80a6-47a6-abc9-d4fa2ed9f80c',
 'created_at': '2022-09-22T17:55:03.735385',
 'updated_at': '2022-09-22T17:56:11.151571',
 'task_type': 2,
 'num_samples': 15,
 'is_example': False,
 'creator': {'email': 'galileo@rungalileo.io',
  'name': '',
  'auth_method': 'email',
  'is_admin': True,
  'org': '',
  'role': '',
  'interests': [],
  'tasks': [],
  'show_welcome_modal': False,
  'id': '4dac718f-e33a-4351-8d7a-9a122afe19f3',
  'created_at': '2022-04-18T21:14:05.315150',
  'updated_at': '2022-09-19T22:29:12.906941'},
 'example_content_id': None}

In [17]:
dq.metrics.get_embeddings("test-ner-run", "youngest_silver_wolf", "train", epoch=0)

GalileoException: Only the last 2 epochs of embeddings are available. Must request 5 or 4

In [7]:
dq.metrics.get_epochs("test-ner-run", "youngest_silver_wolf", "train")

[0, 1, 2, 3, 4, 5]