This notebook walks through the **six stages** needed to deliver an
end-to-end demo that satisfies every “Phase 1 Prototype” requirement in
the **Digital Prybar SOO §3.1.2** and the **Tamasheq White Paper §4**.

| Stage | Goal | Matches Document |
|-------|------|-----------------|
| 0 | Prep Colab runtime | White Paper §5.1 “Reference impl. may be cloud during R&D” |
| 1 | Quick sanity check (online) | SOO §3.1.3.b “Demonstrate baseline capability” |
| 2 | Download open-source checkpoints | White Paper §4.2 “All models must be locally cacheable” |
| 3 | Freeze Python wheels | SOO §3.2.1 “Execute without public internet” |
| 4 | Assemble `demo_pipeline.py` + Dockerfile | White Paper §5.3 “Turnkey container deliverable” |
| 5 | Bundle context → tarball | SOO §3.3 “USB-drop submission” |
| 6 | (offline laptop) Build & run | SOO §4 “Acceptance test procedure” |

In [1]:
!pip -q install torch torchaudio transformers sentencepiece accelerate huggingface_hub

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m105.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m101.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m59.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m11.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m41.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m127.9/127.9 MB[0m [31m19.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━


## 1 · Verify the pipeline online  
Before caching ~2 GB of models, verify the pipeline works online with a **minimal “Hello World”**:

* **ASR** → `facebook/mms-1b-all`  
  *Justification*: White Paper §4.2.1 names MMS as the multilingual
  fallback model supporting ISO-639-3 `taq`.

* **NMT** → `facebook/nllb-200-distilled-600M`  
  *Justification*: SOO §3.1.3.a demands translation to English; NLLB
  ships tags `taq_Latn → eng_Latn` out-of-box.

Success means Hugging Face IDs and decoding logic are
sound before going offline.


In [2]:
from google.colab import drive
drive.mount('/content/drive')

MessageError: Error: credential propagation was unsuccessful

In [3]:
from google.colab import files
import torchaudio, torch, io
from transformers import AutoProcessor, AutoModelForCTC

# ── 1. Upload any WAV ────────────────────────────────────────────────
uploaded = files.upload()
name, raw = next(iter(uploaded.items()))
buf = io.BytesIO(raw)                       # in-memory file

# ── 2. Load & normalise to 16 kHz mono ───────────────────────────────
wave, sr = torchaudio.load(buf, format="wav")
if wave.shape[0] > 1:      # stereo → mono
    wave = wave.mean(dim=0, keepdim=True)
if sr != 16_000:
    wave = torchaudio.functional.resample(wave, sr, 16_000)

# ── 3. MMS ASR ───────────────────────────────────────────────────────
proc  = AutoProcessor.from_pretrained("facebook/mms-1b-all")
asr   = AutoModelForCTC.from_pretrained("facebook/mms-1b-all")

# tell the tokenizer we want Tamasheq letters
proc.tokenizer.set_target_lang("taq")

inputs = proc(wave.squeeze(), sampling_rate=16_000, return_tensors="pt")
with torch.no_grad():
    logits = asr(**inputs).logits
ids = torch.argmax(logits, dim=-1)

taq_text = proc.batch_decode(ids, skip_special_tokens=True)[0]
print("TAQ transcript:", taq_text)

StopIteration: 

## 1 · Tamasheq text → English text  
White-paper §4.3 specifies NLLB-200 as the baseline MT engine.  
We feed the `taq_text` produced by MMS into `facebook/nllb-200-distilled-600M`.

*If you changed the MMS target language (e.g. to English), adjust `src_lang` accordingly.*

In [None]:
!pip install -U "transformers==4.41.2" -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.8/43.8 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.1/9.1 MB[0m [31m82.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.6/3.6 MB[0m [31m110.4 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
import importlib, transformers, torch, torchaudio   # reload to pick up the new wheel
print("Transformers version:", transformers.__version__)

Transformers version: 4.53.2


In [None]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# 1pick the NLLB checkpoint
mt_id = "facebook/nllb-200-distilled-600M"    # ← define it here

# load tokenizer & model  (fast version is fine)
tok = AutoTokenizer.from_pretrained(mt_id)    # Fast tokenizer
model_mt = AutoModelForSeq2SeqLM.from_pretrained(mt_id)

# source language is Tamasheq in Latin script
tok.src_lang = "taq_Latn"

# `taq_text` must already contain the MMS transcript from the earlier step
enc = tok(taq_text, return_tensors="pt")

# look up the English BOS token ID manually (works on any Transformers version)
forced_id = tok.convert_tokens_to_ids("eng_Latn")

# 5️⃣  generate English translation
out = model_mt.generate(**enc, forced_bos_token_id=forced_id)
eng_text = tok.batch_decode(out, skip_special_tokens=True)[0]
print("🌐 English:", eng_text)

🌐 English: I'm going to tell you what I'm going to do. I'm going to tell you what I'm going to do. I'm going to tell you what I'm going to do.


## 2 · Cache model repositories locally  
SOO §3.2.1 forbids external calls during the onsite demo.  
Therefore, **the model snapshots are cloned** into a
`docker-context/models/` directory so the container will never ping
`huggingface.co`.

*Each repo is stored with real files, not symlinks, to keep the context
self-contained.*

In [None]:
from huggingface_hub import snapshot_download
import os, pathlib, getpass, json, textwrap, subprocess, sys
from google.colab import userdata
HF_TOKEN = userdata.get('HF_TOKEN')
# ── 0. one-time login ───────────────────────────────────────────────
#    (If already logged in this cell is a no-op.)
if not os.path.exists("~/.huggingface/token"):
    token = HF_TOKEN
    subprocess.run(["huggingface-cli", "login", "--token", token, "--add-to-git-credential"], check=True)

# ── 1. enable fast transfer ─────────────────────────────────────────
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"

# ── 2. download filtered snapshot ───────────────────────────────────
CTX = pathlib.Path("taq_offline_ctx/models")
dst = CTX / "facebook__mms-1b-all"          # keep folder name for consistency
dst.parent.mkdir(parents=True, exist_ok=True)

snapshot_download(
    repo_id="facebook/mms-1b-all",
    local_dir=dst,
    local_dir_use_symlinks=False,
    allow_patterns=[
        "pytorch_model.bin",                # or "model.safetensors"
        "config.json",
        "sentencepiece.bpe.model",
        "tokenizer_config.json",
        "special_tokens_map.json",
        "adapter.taq.bin",
        "adapter.eng.bin"
    ],
    resume_download=True,   # safe to rerun
    max_workers=2           # fewer parallel HEADs = fewer 429s
)

print("✅ MMS files cached to", dst)

For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/download#download-files-to-local-folder.


Fetching 6 files:   0%|          | 0/6 [00:00<?, ?it/s]

adapter.eng.bin:   0%|          | 0.00/9.49M [00:00<?, ?B/s]

adapter.taq.bin:   0%|          | 0.00/8.91M [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/3.86G [00:00<?, ?B/s]

✅ MMS files cached to taq_offline_ctx/models/facebook__mms-1b-all


## 3 · Freeze Python wheels (PyPI mirror in a folder)  
White Paper §5.3.2 requires the deliverable to **build on an air-gapped
machine**.  
Here we run `pip download` to grab binary wheels for every dependency
listed in `requirements.txt`.  
These wheels go into `docker-context/wheels/` and are installed with
`--no-index`, guaranteeing that the Docker build never touches PyPI.

In [None]:
import subprocess, textwrap

(CTX / "wheels").mkdir(exist_ok=True)
(CTX / "requirements.txt").write_text(textwrap.dedent("""\
    torch
    torchaudio
    transformers
    sentencepiece
    accelerate
    """))

subprocess.run(["pip", "download", "-r", str(CTX / "requirements.txt"),
                "-d", str(CTX / "wheels")], check=True)
print("Wheels cached to:", CTX / "wheels")

Wheels cached to: taq_offline_ctx/models/wheels


## 4 · Create `demo_pipeline.py` and Dockerfile  
* **`demo_pipeline.py`** implements the logic you just tested online,
  but with `local_files_only=True` and the
  `TRANSFORMERS_OFFLINE/HF_HUB_OFFLINE` envs set by default.  
* The **Dockerfile** installs wheels from the local folder, copies
  models & script, and disables the network in later layers.

This satisfies **SOO §3.3 “Self-contained container”** and
White Paper §5.3 “Turnkey USB demo”.

In [None]:
import textwrap, os, json

(CTX / "demo_pipeline.py").write_text(textwrap.dedent("""\
    import sys, torch, torchaudio, os
    from transformers import AutoProcessor, AutoModelForCTC, \
         AutoTokenizer, AutoModelForSeq2SeqLM
    os.environ.update(TRANSFORMERS_OFFLINE="1", HF_HUB_OFFLINE="1")
    ASR = "models/facebook__mms-1b-all"
    MT  = "models/facebook__nllb-200-distilled-600M"
    wav, sr = torchaudio.load(sys.argv[1])
    if sr!=16000: wav=torchaudio.functional.resample(wav,sr,16000)

    proc = AutoProcessor.from_pretrained(ASR, local_files_only=True)
    asr  = AutoModelForCTC.from_pretrained(ASR, local_files_only=True)
    ids  = asr(**proc(wav.squeeze(), sampling_rate=16000,
                      return_tensors="pt")).logits.argmax(-1)
    taq  = proc.decode(ids[0], language="taq", skip_special_tokens=True)
    print("TAQ:", taq)

    tok = AutoTokenizer.from_pretrained(MT, local_files_only=True); tok.src_lang="taq_Latn"
    mt  = AutoModelForSeq2SeqLM.from_pretrained(MT, local_files_only=True)
    eng = tok.batch_decode(mt.generate(**tok(taq, return_tensors="pt"),
                        forced_bos_token_id=tok.lang_code_to_id["eng_Latn"]),
                        skip_special_tokens=True)[0]
    print("ENG:", eng)
    """))

(CTX / "Dockerfile").write_text(textwrap.dedent("""\
    FROM python:3.10-slim

    ENV TRANSFORMERS_OFFLINE=1
    ENV HF_HUB_OFFLINE=1
    WORKDIR /app

    # install deps from wheel cache
    COPY wheels /tmp/wheels
    RUN pip install --no-index --find-links=/tmp/wheels torch torchaudio \
        transformers accelerate sentencepiece && rm -rf /tmp/wheels

    # copy models & code
    COPY models ./models
    COPY demo_pipeline.py .

    ENTRYPOINT ["python","demo_pipeline.py"]
    """))

print("Script and Dockerfile written.")

Script and Dockerfile written.


In [None]:
from huggingface_hub import snapshot_download
import pathlib

CTX = pathlib.Path("taq_offline_ctx/models")
dst = CTX / "facebook__nllb-200-distilled-600M"
dst.parent.mkdir(parents=True, exist_ok=True)

snapshot_download(
    repo_id="facebook/nllb-200-distilled-600M",
    local_dir=dst,
    local_dir_use_symlinks=False,
    resume_download=True
)
print("✅ NLLB cached to", dst)

For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/download#download-files-to-local-folder.


Fetching 9 files:   0%|          | 0/9 [00:00<?, ?it/s]

README.md: 0.00B [00:00, ?B/s]

.gitattributes: 0.00B [00:00, ?B/s]

✅ NLLB cached to taq_offline_ctx/models/facebook__nllb-200-distilled-600M


## 5 · Bundle the Docker context into a tarball  
SOO §3.3 states that the prototype must be deliverable “on removable
media”.  
We archive everything under `taq_offline_ctx/` so you can download a
single file, copy it to the laptop, and build the image completely
offline.

In [None]:
# From a Colab notebook code cell (bash)
!apt-get -yqq install pigz && \
  tar -C taq_offline_ctx -cf - . | pigz -9 -p 4 > taq_ctx.tar.gz

In [None]:
from google.colab import files; files.download("taq_offline_ctx.tar.gz")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
# Install split if needed (most Linuxes have it by default)
!apt-get -yqq update && apt-get -yqq install coreutils

# Create 2 GiB TAR parts (no gzip, so even faster)
!tar -C taq_offline_ctx -cf - . | split -b 2000m - taq_ctx.tar.part-

# Now download each part:
from google.colab import files
for f in sorted([fn for fn in os.listdir('.') if fn.startswith('taq_ctx.tar.part-')]):
    files.download(f)

W: Skipping acquire of configured file 'main/source/Sources' as repository 'https://r2u.stat.illinois.edu/ubuntu jammy InRelease' does not seem to provide it (sources.list entry misspelt?)


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
!mkdir /content/drive/MyDrive/taq_parts/

In [None]:
!cp taq_ctx.tar.part-* /content/drive/MyDrive/taq_parts/