# GPT2 Model

This notebook implements a machine translation system that finetunes GPT-2 Medium to translate Sumerian cuneiform text into English. Here's a breakdown of the key components:

- **Model Setup:** Initializes GPT-2 Medium and configures tokenizer settings, ensuring proper padding token handling for the model.
- **Data Preparation:** Loads Sumerian tablets dataset, formatting each example with a clear structure.
- **Dataset Processing:**
  - Filters out overly long sequences (>528 words)
  - Implements custom SumerianEnglishDataset class with appropriate tokenization
  - Splits data into training and validation sets
- **Training Configuration:**
  - Uses DataCollatorForLanguageModeling for causal language modeling
  - Implements gradient checkpointing and mixed precision for memory efficiency
  - Configures output directories for model checkpoints and logs
- **Evaluation Metrics:** Implements custom evaluation with BLEU, METEOR, and ROUGE scores to measure translation quality.

In [None]:
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

import numpy as np
import pandas as pd

import torch
from torch.utils.data import Dataset, DataLoader, random_split
from transformers import (
    GPT2Tokenizer,
    GPT2LMHeadModel,
    Trainer,
    TrainingArguments,
    DataCollatorForLanguageModeling
)

In [None]:
model_name = 'gpt2-medium'
tokenizer = GPT2Tokenizer.from_pretrained(model_name)

model = GPT2LMHeadModel.from_pretrained(model_name)

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    model.config.pad_token_id = model.config.eos_token_id

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

In [None]:
# Test model is working
prompt_text = "Once upon a time, in a land far, far away,"

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    model.config.pad_token_id = model.config.eos_token_id

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

input_ids = tokenizer.encode(prompt_text, return_tensors='pt').to(device)

print("Trying generating text...")
try:
    output_sequences = model.generate(
        input_ids=input_ids,
        max_length=150,
        do_sample=True,
        temperature=0.7,
        top_k=50,
        top_p=0.95,
        repetition_penalty=1.1,
        num_return_sequences=1,
        pad_token_id=tokenizer.eos_token_id,
        attention_mask=torch.ones_like(input_ids)  # Add explicit attention mask
    )

    for i, generated_sequence in enumerate(output_sequences):
        text = tokenizer.decode(generated_sequence, skip_special_tokens=True)
        print(f"\n--- Generated text {i+1} ---")
        print(text)

except Exception as e:
    print(f"Error when generating text: {e}")

Modello caricato su: cuda
Generazione del testo...

--- Testo Generato 1 ---
Once upon a time, in a land far, far away, there lived a wise man named Arathorn the Wise. A king of many faces and great stature with long hair on his head, he was known to everyone as "the Great Sage."
From that day forth, all who knew him were called gods by those around them. Even when they grew tired from their labors or lost power for any reason at all...they still kept up appearances while taking care not look like anything else! That's why you always see so much people wearing these masks today!"
"And then what happened?" I asked her gently. The voice sounded deep within my mind but it wasn't very clear…did she know? She must have felt something


In [7]:
train_data = pd.read_csv('datasets/SumTablets_English_train.csv')
test_data = pd.read_csv('datasets/SumTablets_English_train.csv')

# Format the data for GPT-2:
# We'll combine Sumerian and English with a separator.
# GPT-2 will learn to generate the English part after seeing "English: ".
# The <|endoftext|> token is GPT-2's standard end-of-sequence token.
formatted_texts = []
for index, row in train_data.iterrows():
    sumerian_texts = row['transliteration']
    english_translations = row['translation']
    if isinstance(sumerian_texts, str) and isinstance(english_translations, str):
        sumerian_texts = sumerian_texts.replace('\n', ' ')
        english_translations = english_translations.replace('\n', ' ')
        formatted_texts.append(f"Sumerian: {sumerian_texts}\nEnglish: {english_translations}<|endoftext|>")
print(f"Loaded {len(formatted_texts)} formatted examples.")

lengths = [len(text.split()) for text in formatted_texts]
print(lengths)
mean_length = np.mean(lengths)
print(f"Mean length of the texts: {mean_length} words")
print(f"Percentage of texts longer than 528 words: {sum(length > 528 for length in lengths) / len(lengths) * 100:.2f}%")

# remove texts longer than 528 words
formatted_texts = [text for text in formatted_texts if len(text.split()) <= 528]
print(len(formatted_texts), "texts after filtering by length.")

print(f"\nExample formatted text:\n{formatted_texts[0]}")

Loaded 1905 formatted examples.
[45, 51, 47, 39, 55, 73, 148, 23, 49, 72, 52, 45, 125, 82, 78, 223, 96, 57, 58, 49, 48, 55, 38, 52, 107, 97, 31, 27, 24, 48, 75, 116, 121, 39, 51, 15, 41, 65, 79, 52, 176, 46, 39, 41, 36, 35, 120, 184, 77, 63, 38, 106, 40, 28, 17, 42, 21, 47, 124, 224, 90, 92, 87, 57, 262, 66, 67, 63, 57, 31, 45, 51, 72, 44, 49, 78, 52, 78, 79, 138, 49, 102, 847, 630, 490, 368, 409, 77, 410, 264, 696, 184, 62, 338, 454, 1560, 9, 9, 11, 10, 12, 13, 16, 11, 14, 12, 11, 9, 9, 7, 11, 11, 16, 9, 22, 12, 12, 26, 30, 9, 16, 12, 11, 11, 11, 7, 9, 8, 11, 13, 20, 25, 11, 11, 27, 11, 11, 11, 12, 9, 22, 9, 24, 6, 11, 15, 31, 32, 31, 30, 30, 26, 33, 41, 37, 31, 35, 31, 38, 26, 29, 26, 17, 23, 27, 36, 39, 41, 35, 37, 25, 29, 31, 31, 32, 30, 32, 26, 33, 41, 33, 31, 31, 30, 29, 32, 32, 37, 29, 32, 32, 28, 32, 33, 31, 28, 26, 32, 32, 32, 32, 33, 30, 33, 32, 32, 28, 28, 39, 41, 39, 37, 31, 35, 31, 36, 31, 36, 31, 35, 34, 36, 34, 36, 39, 36, 36, 43, 34, 39, 36, 36, 36, 39, 36, 36, 39, 36, 

In [None]:
MODEL_NAME = 'gpt2-medium'
OUTPUT_DIR = './sumerian_gpt2_finetuned'    # Directory to save the fine-tuned model
LOG_DIR = './sumerian_gpt2_finetuned_logs'  # Directory for training logs

# Create output and log directories if they don't exist
os.makedirs(OUTPUT_DIR, exist_ok=True)
os.makedirs(LOG_DIR, exist_ok=True)

# Training hyperparameters (adjust these based on your dataset size and resources)
NUM_EPOCHS = 5                         # Number of training epochs
LEARNING_RATE = 3e-5                   # Learning rate
WARMUP_RATIO = 0.1                     # Number of warmup steps for learning rate scheduler
WEIGHT_DECAY = 0.01                    # Weight decay
MAX_LENGTH = 528                       # Maximum sequence length for tokenizer
TRAIN_VALID_SPLIT = 0.1                # Proportion of data to use for validation


class SumerianEnglishDataset(Dataset):
    def __init__(self, texts, tokenizer, max_length):
        self.tokenizer = tokenizer
        self.texts = texts
        self.max_length = max_length
        self.encodings = []
        for text in texts:
            
            # Tokenize the combined text
            # truncation=True ensures that sequences longer than max_length are cut.
            # padding='max_length' pads shorter sequences to max_length.
            # return_tensors='pt' returns PyTorch tensors.
            encoding = self.tokenizer(
                text,
                truncation=True,
                max_length=self.max_length,     # Truncate to max_length
                padding="max_length",           # Ensure all sequences have the same length for batching
                return_attention_mask=True,     # Return attention masks
                return_tensors='pt'             # Explicitly specify to return PyTorch tensors
            )
            
            # For language modeling, the 'labels' are typically the same as 'input_ids'.
            self.encodings.append({
                "input_ids": encoding["input_ids"].squeeze(),
                "attention_mask": encoding["attention_mask"].squeeze()
            })

    def __len__(self):
        return len(self.encodings)

    def __getitem__(self, idx):
        item = self.encodings[idx]
        # The labels are the input_ids and the model is trained to predict the next token in the sequence.
        # The DataCollatorForLanguageModeling will shift them appropriately.
        return {"input_ids": item["input_ids"], "attention_mask": item["attention_mask"], "labels": item["input_ids"].clone()}

# Create the full dataset
full_dataset = SumerianEnglishDataset(formatted_texts, tokenizer, MAX_LENGTH)

# Split into training and validation sets
if TRAIN_VALID_SPLIT > 0:
    num_train = int((1 - TRAIN_VALID_SPLIT) * len(full_dataset))
    num_valid = len(full_dataset) - num_train
    train_dataset, eval_dataset = random_split(full_dataset, [num_train, num_valid])
    print(f"Split dataset into {len(train_dataset)} training samples and {len(eval_dataset)} validation samples.")
else:
    train_dataset = full_dataset
    eval_dataset = None     # No validation
    print(f"Using all {len(train_dataset)} samples for training. No validation set.")

Split dataset into 1662 training samples and 185 validation samples.


In [None]:
# Shere are some debug messages in italian which where useful during development
# Sorry for that

from evaluate import load
import numpy as np

# load the evaluation metrics
bleu_metric = load("bleu")
meteor_metric = load("meteor")
rouge_metric = load("rouge")

def compute_metrics(eval_preds):
    preds, labels = eval_preds

    print(f"DEBUG: Tipo iniziale di preds: {type(preds)}")
    if hasattr(preds, 'shape'):
        print(f"DEBUG: Shape iniziale di preds: {preds.shape}")
    elif isinstance(preds, (list, tuple)):
        print(f"DEBUG: Lunghezza iniziale di preds: {len(preds)}")

    actual_token_ids = preds

    # Case 1: If preds is a tuple, it might contain logits or actual token IDs.
    if isinstance(preds, tuple):
        print("DEBUG: preds è una tupla, prendo il primo elemento.")
        actual_token_ids = preds[0]

    print(f"DEBUG: Tipo di actual_token_ids dopo il check della tupla: {type(actual_token_ids)}")
    if hasattr(actual_token_ids, 'shape'):
        print(f"DEBUG: Shape di actual_token_ids dopo il check della tupla: {actual_token_ids.shape}")


    # Case 2: If preds is a numpy array or PyTorch tensor, we need to check its shape.
    if isinstance(actual_token_ids, (np.ndarray, torch.Tensor)) and actual_token_ids.ndim == 3:
        print("DEBUG: actual_token_ids sembrano logits, applico argmax.")
        if isinstance(actual_token_ids, torch.Tensor):
            actual_token_ids = actual_token_ids.cpu().numpy()
        actual_token_ids = np.argmax(actual_token_ids, axis=-1)
        print(f"DEBUG: Shape di actual_token_ids dopo argmax: {actual_token_ids.shape}")

    # Substitute -100 labels with pad_token_id
    labels = np.where(labels != -100, labels, tokenizer.pad_token_id)

    try:
        decoded_preds = tokenizer.batch_decode(actual_token_ids, skip_special_tokens=True)
        decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
    except Exception as e:
        print(f"ERRORE durante tokenizer.batch_decode:")
        print(f"  Tipo di actual_token_ids: {type(actual_token_ids)}")
        if hasattr(actual_token_ids, 'shape'): print(f"  Shape di actual_token_ids: {actual_token_ids.shape}")
        if hasattr(actual_token_ids, 'dtype'): print(f"  Dtype di actual_token_ids: {actual_token_ids.dtype}")
        print(f"  Esempio di un elemento in actual_token_ids (se lista/array): {actual_token_ids[0] if len(actual_token_ids)>0 else 'N/A'}")
        raise e


    cleaned_preds = [pred.split("English:")[-1].replace("<|endoftext|>", "").strip() if "English:" in pred else pred.replace("<|endoftext|>", "").strip() for pred in decoded_preds]
    cleaned_labels = [label.split("English:")[-1].replace("<|endoftext|>", "").strip() if "English:" in label else label.replace("<|endoftext|>", "").strip() for label in decoded_labels]

    list_of_lists_labels = [[label] for label in cleaned_labels]
    results = {}

    try:
        bleu_score_dict = bleu_metric.compute(predictions=cleaned_preds, references=list_of_lists_labels)
        results["bleu"] = bleu_score_dict.get("score", bleu_score_dict.get("bleu", 0.0))

        meteor_score_dict = meteor_metric.compute(predictions=cleaned_preds, references=cleaned_labels)
        results["meteor"] = meteor_score_dict["meteor"]

        rouge_score_dict = rouge_metric.compute(predictions=cleaned_preds, references=cleaned_labels)
        results["rougeL"] = rouge_score_dict.get("rougeLsum", rouge_score_dict.get("rougeL", 0.0)) 
    except Exception as e:
        print(f"AVVISO: Errore nel calcolo di una metrica: {e}")
        print(f"  cleaned_preds (primi 2): {cleaned_preds[:2]}")
        print(f"  list_of_lists_labels (primi 2): {list_of_lists_labels[:2]}")
        # Imposta valori di default se il calcolo fallisce per non bloccare tutto
        results["bleu"] = results.get("bleu", 0.0)
        results["meteor"] = results.get("meteor", 0.0)
        results["rougeL"] = results.get("rougeL", 0.0)

    try:
        prediction_lens = [len(tokenizer.encode(p, add_special_tokens=False)) for p in cleaned_preds]
        results["gen_len"] = np.mean(prediction_lens) if prediction_lens else 0.0
    except Exception as e:
        print(f"AVVISO: Errore nel calcolo di gen_len: {e}")
        results["gen_len"] = 0.0

    return {k: round(v, 4) if isinstance(v, float) else v for k, v in results.items()}

[nltk_data] Downloading package wordnet to /home/default/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt_tab to
[nltk_data]     /home/default/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /home/default/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


In [None]:
# Set the pad_token_id in the model configuration (important for generation and padding)
model.config.pad_token_id = tokenizer.pad_token_id
print(f"Set model.config.pad_token_id to {tokenizer.pad_token_id}")

# The DataCollatorForLanguageModeling will automatically create batches and
# shift the input_ids to create labels for causal language modeling (predicting the next token).
# It also handles padding. `mlm=False` means we are doing Causal Language Modeling (CLM), not Masked Language Modeling (MLM).
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False  # Causal Language Modeling for GPT-2
)


training_args = TrainingArguments(
    num_train_epochs=NUM_EPOCHS,                        # Total number of training epochs
    per_device_train_batch_size=2,                      # Batch size per device during training
    per_device_eval_batch_size=2,                       # Batch size for evaluation
    eval_accumulation_steps=4                           # Number of steps to accumulate for evaluation (to save memory)
    warmup_ratio=WARMUP_RATIO,                          # Warmup ratio for learning rate scheduler
    weight_decay=WEIGHT_DECAY,                          # Strength of weight decay
    learning_rate=LEARNING_RATE,
    gradient_checkpointing=True,                        # Enable gradient checkpointing to save memory

    output_dir=OUTPUT_DIR,                              # Directory to save model checkpoints and outputs
    logging_dir=LOG_DIR,                                # Directory for storing logs
    
    eval_strategy="epoch" if eval_dataset else "no",    # Evaluate at the end of each epoch if eval_dataset exists
    save_strategy="epoch",                              # Save a checkpoint at the end of each epoch
    
    load_best_model_at_end=True if eval_dataset else False, # Load the best model found during training (based on eval loss)
    metric_for_best_model="bleu" if eval_dataset else None, # Metric to use for determining the best model
    greater_is_better=True if eval_dataset else None,   # Whether a higher metric is better (for BLEU, it is)
    fp16=torch.cuda.is_available(),                     # Use 16-bit (mixed) precision training if a GPU is available
    report_to="tensorboard",                            # Report metrics to TensorBoard
    save_total_limit=2,                                 # Limit the total amount of checkpoints. Deletes the older checkpoints.
    
    gradient_accumulation_steps=2,                      # Gradient accumulation steps (if you want to simulate larger batch sizes)
    lr_scheduler_type="linear",                         # Learning rate scheduler type
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,  # Function to compute metrics during evaluation
)

print("Starting fine-tuning...")
try:
    trainer.train()
    print("Fine-tuning completed.")

except Exception as e:
    print(f"An error occurred during training: {e}")
    raise e

print(f"Saving model to {OUTPUT_DIR}")
trainer.save_model(OUTPUT_DIR)
tokenizer.save_pretrained(OUTPUT_DIR)
print(f"Model and tokenizer saved to {OUTPUT_DIR}")

In [None]:
# Inference Example (How to use the fine-tuned model)
print("\n--- Inference Example ---")

OUTPUT_DIR = 'sumerian_gpt2_finetuned'  # Directory where the fine-tuned model is saved
MAX_LENGTH = 528  # Maximum length for generation, adjust as needed

# Load the fine-tuned model and tokenizer
fine_tuned_model = GPT2LMHeadModel.from_pretrained(OUTPUT_DIR)
fine_tuned_tokenizer = GPT2Tokenizer.from_pretrained(OUTPUT_DIR)

# Ensure the pad token is set for the loaded tokenizer (it should be saved, but good to double check)
if fine_tuned_tokenizer.pad_token is None:
    fine_tuned_tokenizer.pad_token = fine_tuned_tokenizer.eos_token
    fine_tuned_model.config.pad_token_id = fine_tuned_tokenizer.eos_token_id


# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
fine_tuned_model.to(device)
fine_tuned_model.eval()

# Example Sumerian transliteration to translate
sumerian_prompt = "dingir inana za-me-en" # "Goddess Inana, you are"

# Format the prompt exactly as done during training, up to the point where generation should start
prompt_for_generation = f"Sumerian: {sumerian_prompt.strip()} English:"
print(f"Prompt for generation: '{prompt_for_generation}'")

# Tokenize the prompt
input_ids = fine_tuned_tokenizer.encode(prompt_for_generation, return_tensors='pt').to(device)

# Generate text
# Adjust generation parameters as needed
# max_new_tokens is often preferred over max_length for more control over the generated part
# For this example, we'll use max_length relative to the prompt.
output_sequences = fine_tuned_model.generate(
    input_ids=input_ids,
    max_length=MAX_LENGTH, # Max length of prompt + generated text
    # max_new_tokens=50, # Alternative: specify only the number of new tokens to generate
    temperature=0.7,          # Controls randomness. Lower is more deterministic.
    top_k=50,                 # Considers the top K most probable tokens at each step.
    top_p=0.95,               # Nucleus sampling: considers tokens with cumulative probability >= P.
    repetition_penalty=1.2,   # Penalizes repetition.
    num_return_sequences=1,   # Number of different sequences to generate.
    pad_token_id=fine_tuned_tokenizer.eos_token_id # Crucial for generation
)

# Decode and print the generated text
for generated_sequence in output_sequences:
    full_text = fine_tuned_tokenizer.decode(generated_sequence, skip_special_tokens=False) # Keep special tokens initially for inspection
    # Extract only the generated English part
    # This depends on your prompt format. We look for text after "English: "
    generated_english = full_text.split(prompt_for_generation)[-1]
    # Remove the <|endoftext|> token if present at the end
    generated_english = generated_english.replace(fine_tuned_tokenizer.eos_token, "").strip()

    print(f"Sumerian Input: {sumerian_prompt}")
    print(f"Generated English: {generated_english}")

print("\nScript finished.")


--- Inference Example ---
Prompt for generation: 'Sumerian: dingir inana za-me-en English:'
Sumerian Input: dingir inana za-me-en
Generated English: Dingira, beloved one of Zabala. Šulgi praised(?) him. ... year: “... .” Amar al-ŋu₁₀ was king. Zanin 1(banše). SIGMA ARAD(-Amar)-kiel 3/4 sila3 (silver): Dabin 2(barig) 4(ban2)... barzagal gur ki ur{d}nin <unk> ta  mu us₂-sa bad₃ mar-tu ba-du₈
English text on the tablet reciting praise for Dingira; month : "Dingira Festival" , Year following that which destroyed Mar-tud and Babylonia The silver is credited to Ur-Ninkaya instead of Manzil it has been confirmed by means out of Karakalla Temple administrator Zubida received into his temple seal a copy of this document from Uzzah Inanna hereby gives her approval as wife of Adad At length he swore an oath not even touching anything with his mouth He did not swear before anyone like That man who does not know justice may be struck off From among men there are no rivals! After having sworn thus 

In [11]:
# load model from a checkpoint
model = GPT2LMHeadModel.from_pretrained(f"{OUTPUT_DIR}").to(device)
tokenizer = GPT2Tokenizer.from_pretrained(f"{OUTPUT_DIR}")

for index, row in test_data.iloc[:30,:].iterrows():
    sumerian_texts = row['transliteration'].replace('\n', ' ')
    english_translations = row['translation'].replace('\n', ' ')
    prompt_text = f"Sumerian: {sumerian_texts} \nEnglish:"

    # --- Tokenizzazione dell'input ---
    input_ids = tokenizer.encode(prompt_text, return_tensors='pt').to(device)

    # --- Generazione del testo ---
    print("Generazione del testo...")
    try:
        output_sequences = model.generate(
            input_ids=input_ids,    
            max_length=200,             # Max length of prompt + generated text
            temperature=0.2,            # Controls randomness. Lower is more deterministic.
            top_k=40,                   # Considers the top K most probable tokens at each step.
            top_p=0.9,                  # Nucleus sampling: considers tokens with cumulative probability >= P.
            repetition_penalty=1,       # Penalizes repetition.
            num_return_sequences=1,     # Number of different sequences to generate.
            pad_token_id=tokenizer.eos_token_id, # Pad token ID for generation
            no_repeat_ngram_size=3,     # Prevent 3-gram repetition
            early_stopping=True,        # Stop when EOS is generated
            length_penalty=1.0,         # Neutral - neither favor short nor long outputs
            num_beams=3                 # Use beam search instead of sampling
        )

        print(f"Testo di input: {prompt_text}")
        print(f"Traduzione effettiva: {english_translations}")
        
        # --- Decodifica e Stampa ---
        for i, generated_sequence in enumerate(output_sequences):
            text = tokenizer.decode(generated_sequence, skip_special_tokens=True)
            print('Testo Generato:', text)
            print('---')

    except Exception as e:
        print(f"Errore durante la generazione del testo: {e}")

Generazione del testo...




Testo di input: Sumerian:  1(u) la₂ 1(diš) udu u₄ 2(u) 8(diš)-kam ki ab-ba-sa₆-ga-ta na-lu₅ i₃-dab₅   iti <unk> bi₂-gu₇ mu en-unu₆-gal {d}inana unu{ki}ga ba-hun  1(u) la₂ 1(diš) 
English:
Traduzione effettiva: 9 rams, 28th day, from Abba-saga, Nalu accepted; month: “ubi-feast,” year: “Enunugal of Inanna of Uruk was installed;” (total:) 9 (rams).
Testo Generato: Sumerian:  1(u) la₂ 1(diš) udu u₄ 2(u) 8(diš)-kam ki ab-ba-sa₆-ga-ta na-lu₅ i₃-dab₅   iti <unk> bi₂-gu₇ mu en-unu₆-gal {d}inana unu{ki}ga ba-hun  1(u) la₂ 1(diš) 
English: 9 rams, 28th day, from Abba-saga, Nalu accepted; month: “Ubi-feast,” “Enunugal of Inanna in Uruk was installed;” (total:) 9 (rams). 1 rams. Šulgi. Foreman: Iddab. ARADmu, the
---
Generazione del testo...
Testo di input: Sumerian:  3(diš) 1/2(diš) gin₂ 1(u) 5(diš) še ku₃-babbar ur₅-še₃ ur{d}en-lil₂-la₂-ta lugal-sa₆-ga u₃ ur{d}šu-mah šu ba-ti  iti ku₃ <unk> u₄ 2(u) 2(diš) ba-zal  mu si-ma-num₂{ki} ba-hul 
English:
Traduzione effettiva: 3 1/2 shekels 15 grains of

In [None]:
import sys  
sys.path.insert(1, '../utils')

from rclone import move_folder_to_onedrive

Checking if the destination folder 'sumerian_gpt2_finetuned' already exists on OneDrive...
Moving 'sumerian_gpt2_finetuned' to OneDrive: 'onedrive_bocconi:AI-project/sumerian_gpt2_finetuned'...
rclone command: rclone move sumerian_gpt2_finetuned onedrive_bocconi:AI-project/sumerian_gpt2_finetuned --create-empty-src-dirs -P
SUCCESS: Folder 'sumerian_gpt2_finetuned' moved successfully to OneDrive.


True

In [13]:
move_folder_to_onedrive('sumerian_gpt2_finetuned_logs')

Checking if the destination folder 'sumerian_gpt2_finetuned_logs' already exists on OneDrive...
Moving 'sumerian_gpt2_finetuned_logs' to OneDrive: 'onedrive_bocconi:AI-project/sumerian_gpt2_finetuned_logs'...
rclone command: rclone move sumerian_gpt2_finetuned_logs onedrive_bocconi:AI-project/sumerian_gpt2_finetuned_logs --create-empty-src-dirs -P
SUCCESS: Folder 'sumerian_gpt2_finetuned_logs' moved successfully to OneDrive.


True