<a href="https://colab.research.google.com/github/frank-morales2020/MLxDL/blob/main/H2E_HF.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

HF: https://huggingface.co/frankmorales2020/akkadian-to-english-translator



ARTICLE: https://medium.com/ai-simplified-in-plain-english/akkadian-to-english-building-a-neural-bridge-to-the-ancient-world-dd7bacbb9a02

In [1]:
!nvidia-smi

Sun Feb  8 16:07:44 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA L4                      Off |   00000000:00:03.0 Off |                    0 |
| N/A   49C    P8             13W /   72W |       0MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

## Case 1: Standard Translation Test

In [None]:

import torch
from transformers import MBart50TokenizerFast, MBartForConditionalGeneration

# 1. SETUP
repo_id = "frankmorales2020/akkadian-to-english-translator"
device = "cuda" if torch.cuda.is_available() else "cpu"

print(f"Loading updated model from {repo_id}...")

# Initialize with standard codes to prevent initialization errors
tokenizer = MBart50TokenizerFast.from_pretrained(repo_id, src_lang="en_XX", tgt_lang="en_XX")
model = MBartForConditionalGeneration.from_pretrained(repo_id).to(device)

# 2. REGISTER CUSTOM TOKEN
akk_token = "[akk_AK]"
if akk_token not in tokenizer.lang_code_to_id:
    tokenizer.add_special_tokens({'additional_special_tokens': [akk_token]})
    tokenizer.lang_code_to_id[akk_token] = tokenizer.convert_tokens_to_ids(akk_token)

# 3. TRANSLATION FUNCTION
def test_translate(text):
    tokenizer.src_lang = akk_token
    inputs = tokenizer(text, return_tensors="pt").to(device)
    en_id = tokenizer.convert_tokens_to_ids("en_XX")

    with torch.no_grad():
        generated_tokens = model.generate(
            **inputs,
            forced_bos_token_id=en_id,
            max_new_tokens=60,
            num_beams=5
        )
    return tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0]

# 4. COMPREHENSIVE TEST SUITE
# Testing legal, astronomical, and environmental categories
new_test_data = {
    "Legal Formulas": [
        "šumma wardum bēlšu ittabal",
        "šumma tamkārum kaspam iddin",
        "šumma šarrum mēšaram ištakan"
    ],
    "Astronomical Observations": [
        "Sin u Šamaš itti ahāmeš innamrū",
        "bibbu ina libbi Zuqaqīpī ittiqi",
        "kakkab ra-bi-i ittanmar"
    ],
    "Environmental/Historical": [
        "Idiglat u Purattu mīla imlū",
        "ina rēš šattim rādu ištakan",
        "šubtu nēhtu ina māti ibašši"
    ]
}



In [5]:
print("\n" + "="*60)
print("EXPANDED AKKADIAN-TO-ENGLISH TEST RESULTS")
print("="*60)

for category, phrases in new_test_data.items():
    print(f"\n>>> CATEGORY: {category}")
    print("-" * 30)
    for phrase in phrases:
        result = test_translate(phrase)
        print(f"AK: {phrase}")
        print(f"EN: {result}\n")


EXPANDED AKKADIAN-TO-ENGLISH TEST RESULTS

>>> CATEGORY: Legal Formulas
------------------------------
AK: šumma wardum bēlšu ittabal
EN: if a slave strikes his master

AK: šumma tamkārum kaspam iddin
EN: if a merchant has given silver

AK: šumma šarrum mēšaram ištakan
EN: if the king has established justice


>>> CATEGORY: Astronomical Observations
------------------------------
AK: Sin u Šamaš itti ahāmeš innamrū
EN: the Moon and Sun were seen together

AK: bibbu ina libbi Zuqaqīpī ittiqi
EN: a planet passed through the heart of Scorpio

AK: kakkab ra-bi-i ittanmar
EN: a great star was seen


>>> CATEGORY: Environmental/Historical
------------------------------
AK: Idiglat u Purattu mīla imlū
EN: the Tigris and Euphrates were filled with the flood

AK: ina rēš šattim rādu ištakan
EN: at the beginning of the year a rainstorm occurred

AK: šubtu nēhtu ina māti ibašši
EN: there is a peaceful dwelling in the land



## Case 2: H2E Governed Translation

In [None]:
import torch
from transformers import MBart50TokenizerFast, MBartForConditionalGeneration
from sentence_transformers import SentenceTransformer, util

# 1. SETUP
repo_id = "frankmorales2020/akkadian-to-english-translator"
device = "cuda" if torch.cuda.is_available() else "cpu"

# H2E COMPONENT: Normalized Expert Zone (NEZ) Validator
# This model acts as the 'Expert DNA' vault to measure intent alignment
nez_validator = SentenceTransformer('all-MiniLM-L6-v2')

print(f"Loading H2E-Governed Model from {repo_id}...")
tokenizer = MBart50TokenizerFast.from_pretrained(repo_id, src_lang="en_XX", tgt_lang="en_XX")
model = MBartForConditionalGeneration.from_pretrained(repo_id).to(device)

# 2. REGISTER CUSTOM TOKEN
akk_token = "[akk_AK]"
if akk_token not in tokenizer.lang_code_to_id:
    tokenizer.add_special_tokens({'additional_special_tokens': [akk_token]})
    tokenizer.lang_code_to_id[akk_token] = tokenizer.convert_tokens_to_ids(akk_token)

# 3. H2E GOVERNED TRANSLATION FUNCTION
def h2e_translate(akkadian_text, expert_reference):
    """
    Translates Akkadian text and calculates the Semantic ROI (SROI)
    to ensure accountability and prevent 'Semantic Drift'.
    """
    tokenizer.src_lang = akk_token
    inputs = tokenizer(akkadian_text, return_tensors="pt").to(device)
    en_id = tokenizer.convert_tokens_to_ids("en_XX")

    # IGZ Logic: Deterministic generation to maintain persona
    with torch.no_grad():
        generated_tokens = model.generate(
            **inputs,
            forced_bos_token_id=en_id,
            max_new_tokens=60,
            num_beams=5,
            temperature=0.35 # H2E-style precision control
        )

    translation = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0]

    # 4. SROI CALCULATION (The Measurement Signal)
    # V_agent: Real-time intent vector
    v_agent = nez_validator.encode(translation, convert_to_tensor=True)
    # V_expert: Gold Standard expert target
    v_expert = nez_validator.encode(expert_reference, convert_to_tensor=True)

    # Cosine Similarity is used to quantify the SROI signal
    sroi_score = util.pytorch_cos_sim(v_agent, v_expert).item()

    # 5. IGZ (Intent Governance Zone) Gate
    # Threshold 0.5535 is the industry requirement for H2E alignment
    status = "✅ ALIGNED" if sroi_score >= 0.5535 else "❌ DRIFT DETECTED"

    return translation, sroi_score, status

# 4. H2E TEST SUITE
test_cases = [
    {
        "akk": "šumma wardum bēlšu ittabal",
        "gold": "If a slave has stolen from his master."
    },
    {
        "akk": "Idiglat u Purattu mīla imlū",
        "gold": "The Tigris and Euphrates were filled with floodwaters."
    }
]


In [10]:
print("\n" + "="*70)
print("H2E GOVERNANCE REPORT: AKKADIAN-TO-ENGLISH TELEMETRY")
print("="*70)

for test in test_cases:
    res, sroi, audit = h2e_translate(test["akk"], test["gold"])
    print(f"AK: {test['akk']}")
    print(f"EN: {res}")
    print(f"SROI Score: {sroi:.4f} | Status: {audit}\n")


H2E GOVERNANCE REPORT: AKKADIAN-TO-ENGLISH TELEMETRY
AK: šumma wardum bēlšu ittabal
EN: if a slave strikes his master
SROI Score: 0.8676 | Status: ✅ ALIGNED

AK: Idiglat u Purattu mīla imlū
EN: the Tigris and Euphrates were filled with the flood
SROI Score: 0.9666 | Status: ✅ ALIGNED

