## Comet Score Tutorial

This tutorial doesn't come with any sample data files.
Upload your own from the Files tab on the left.

In [1]:
pip install unbabel-comet

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting unbabel-comet
  Downloading unbabel_comet-2.0.1-py3-none-any.whl (81 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.6/81.6 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting entmax<2.0,>=1.1 (from unbabel-comet)
  Downloading entmax-1.1-py3-none-any.whl (12 kB)
Collecting huggingface-hub<0.13.0,>=0.12.0 (from unbabel-comet)
  Downloading huggingface_hub-0.12.1-py3-none-any.whl (190 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m190.3/190.3 kB[0m [31m10.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting jsonargparse==3.13.1 (from unbabel-comet)
  Downloading jsonargparse-3.13.1-py3-none-any.whl (101 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m101.4/101.4 kB[0m [31m9.9 MB/s[0m eta [36m0:00:00[0m
Collecting pytorch-lightning<2.0.0,>=1.6.4 (from unbabel-comet)
  Downloading pytorch_lightning-1.9.5-py

Let's load a model to compute the scores.
A full list of available models can be found here: https://github.com/Unbabel/COMET/blob/master/MODELS.md

In [2]:
from comet import download_model, load_from_checkpoint

model_path = download_model("Unbabel/wmt22-comet-da")
model = load_from_checkpoint(model_path)


Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

Downloading (…)0f75ec7e72/README.md:   0%|          | 0.00/3.53k [00:00<?, ?B/s]

Downloading (…)c7e72/.gitattributes:   0%|          | 0.00/1.48k [00:00<?, ?B/s]

Downloading (…)080f75ec7e72/LICENSE:   0%|          | 0.00/9.69k [00:00<?, ?B/s]

Downloading (…)5ec7e72/hparams.yaml:   0%|          | 0.00/567 [00:00<?, ?B/s]

Downloading (…)"model.ckpt";:   0%|          | 0.00/2.32G [00:00<?, ?B/s]

INFO:pytorch_lightning.utilities.migration.utils:Lightning automatically upgraded your loaded checkpoint from v1.8.3.post1 to v1.9.5. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint --file ../root/.cache/huggingface/hub/models--Unbabel--wmt22-comet-da/snapshots/371e9839ca4e213dde891b066cf3080f75ec7e72/checkpoints/model.ckpt`


Downloading (…)tencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/9.10M [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/616 [00:00<?, ?B/s]

Let's load data from files into an array of src, mt, ref sentences:
This example assumes the file names srcs.txt for source data,
hyps.txt for the MT output to score,
refs.txt for the reference translations.
Adapt the code to your own file names.

In [16]:
with open("Source_0.5k.txt") as f:
    srcs = [line.strip() for line in f]

with open("Adapted_0.5k.txt") as f:
    hyps = [line.strip() for line in f]

with open("Reference_0.5k.txt") as f:
    refs = [line.strip() for line in f]

data = []
for idx, line in enumerate(srcs):
    data.append({"src": srcs[idx], "mt": hyps[idx], "ref": refs[idx]})



Let's look at the data:

In [17]:
print(data[0])

{'src': "WASHINGTON President Biden signed an executive order on Thursday designed to sharpen the federal government's powers to block Chinese investment in technology in the United States and limit its access to private data on citizens, in a move that is bound to heighten tensions with Beijing.", 'mt': '华盛顿（路透社） -华盛顿总统拜登周四签署了一项行政命令，旨在加强联邦政府阻止中国在美国技术投资并限制其获取公民私人数据的权力，此举必将加剧与北京的紧张关系。', 'ref': '华盛顿——拜登总统周四签署了一项行政命令，以加强联邦政府阻止中国在美国的技术投资、限制其获取美国公民私人数据的权力，此举势必将加剧与北京的紧张关系。'}


Now we compute the score.


In [18]:
model_output = model.predict(data, batch_size=8, gpus=1)


INFO:pytorch_lightning.utilities.rank_zero:GPU available: False, used: False
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:IPU available: False, using: 0 IPUs
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
Predicting DataLoader 0: 100%|██████████| 65/65 [20:19<00:00, 18.76s/it]


let's look at the results: we show the score for each individual sentence, and the average or system score.

In [19]:
print(model_output['scores'])

print(f"System score: {model_output['system_score']:.3f}")


[0.9244939088821411, 0.8830664753913879, 0.8995802998542786, 0.8602110743522644, 0.8254000544548035, 0.9032465815544128, 0.8621066212654114, 0.8013413548469543, 0.841986358165741, 0.8853585124015808, 0.8767821788787842, 0.8054440021514893, 0.8803325295448303, 0.8527320623397827, 0.8755542635917664, 0.8328056931495667, 0.8280830979347229, 0.8147920966148376, 0.8496097326278687, 0.9225757718086243, 0.750066876411438, 0.7917352318763733, 0.8620169758796692, 0.8597740530967712, 0.7531883716583252, 0.857298731803894, 0.8582188487052917, 0.8849930763244629, 0.8895737528800964, 0.8651179671287537, 0.9011660814285278, 0.9038150906562805, 0.8529009819030762, 0.8952792286872864, 0.9375094175338745, 0.8967161178588867, 0.8723231554031372, 0.8109388947486877, 0.8432278633117676, 0.8733023405075073, 0.8764284253120422, 0.8388397097587585, 0.8882470726966858, 0.8824660181999207, 0.8685099482536316, 0.9173029065132141, 0.910012423992157, 0.8589187264442444, 0.8249977231025696, 0.8857035040855408, 0.8