# Evaluating Google API and GPT Translations on ArabAcquis using BLEU Scores

This Colab notebook implements a Python script to quantitatively evaluate and compare machine translation outputs—specifically those from Google Translate API and OpenAI's GPT model—against a human reference translation within the ArabAcquis dataset. It utilizes the NLTK library's **sentence_bleu** function, along with a smoothing function (**method1**), for this evaluation. Prior to BLEU calculation, all sentences (reference and candidates) are tokenized using **GPT2TokenizerFast** from the transformers library.
The script processes an input JSON file, calculates individual BLEU scores for both translation methods for each entry, and adds these scores as new fields (GoogleAPI_bleu_score, GPT_bleu_score). Finally, it computes and appends an overall summary, including average BLEU scores for each system and the total number of entries processed, to the output JSON file.
The primary purpose is to provide an objective, metric-based comparison of the translation quality offered by the two different machine translation systems on this specific dataset.
* **Input file:** ArabAcquis_TranslatedAndfiltered.json (JSON file containing original English, reference Arabic, Google API translations, and GPT translations, ideally filtered).
* **Output file:** ArabAcquis_with_bleu_scores.json (JSON file with added GoogleAPI_bleu_score and GPT_bleu_score fields for each entry, plus a summary object at the end containing average scores and total entries).

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
%cd /content/drive/MyDrive/ColabData/ArabAcquis Dataset

/content/drive/MyDrive/ColabData/ArabAcquis Dataset


In [None]:
import json
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction
from transformers import GPT2TokenizerFast

def load_json(file_path):
    """Load data from a JSON file."""
    with open(file_path, 'r', encoding='utf-8') as file:
        data = json.load(file)
    return data

def save_json(data, file_path):
    """Save data to a JSON file."""
    with open(file_path, 'w', encoding='utf-8') as file:
        json.dump(data, file, ensure_ascii=False, indent=4)

def calculate_bleu_scores(data):
    """Calculate BLEU scores for each translation technique and update the data."""
    tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")
    smooth_fn = SmoothingFunction().method1

    googleAPI_scores = []
    GPT_scores = []

    for item in data:
        reference = [tokenizer.tokenize(item['arabic'])]  # BLEU expects a list of lists for references
        googleAPI_candidate = tokenizer.tokenize(item['googleAPI_translated'])
        GPT_candidate = tokenizer.tokenize(item['GPT_translated'])

        googleAPI_score = sentence_bleu(reference, googleAPI_candidate, smoothing_function=smooth_fn)
        GPT_score = sentence_bleu(reference, GPT_candidate, smoothing_function=smooth_fn)

        item['GoogleAPI_bleu_score'] = googleAPI_score
        item['GPT_bleu_score'] = GPT_score

        googleAPI_scores.append(googleAPI_score)
        GPT_scores.append(GPT_score)

    # Calculate average BLEU scores
    avg_googleAPI_score = sum(googleAPI_scores) / len(googleAPI_scores) if googleAPI_scores else 0
    avg_GPT_score = sum(GPT_scores) / len(GPT_scores) if GPT_scores else 0

    # Add summary to the data
    summary = {
        "average_GoogleAPI_bleu_score": avg_googleAPI_score,
        "average_GPT_bleu_score": avg_GPT_score,
        "total_entries": len(data)
    }

    return data, summary

def main():
    input_file = 'ArabAcquis_TranslatedAndfiltered.json'  # The path to your original JSON file
    output_file = 'ArabAcquis_with_bleu_scores.json'  # The desired output file path

    # Load the JSON data
    data = load_json(input_file)

    # Calculate BLEU scores and get the updated data with summary
    updated_data, summary = calculate_bleu_scores(data)

    # Append summary to the updated data
    updated_data.append(summary)

    # Save the updated data with BLEU scores and summary
    save_json(updated_data, output_file)
    print(f"Updated data with BLEU scores saved to {output_file}")

if __name__ == "__main__":
    main()


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (1391 > 1024). Running this sequence through the model will result in indexing errors


Updated data with BLEU scores saved to ArabAcquis_with_bleu_scores.json
