# BLASER 2.0: Testing Sentence Similarity

This notebook demonstrates how to use BLASER 2.0 for predicting sentence similarity using the SONAR embedding space. BLASER 2.0 is a family of models for automatic evaluation of machine translation quality based on SONAR embeddings.

There are two main models available:
- **BLASER 2.0 QE (Quality Estimation)**: Predicts similarity between source text and translation without requiring reference translations
- **BLASER 2.0 Ref**: Predicts similarity using source, translation, and reference translation

Both models are based on the [SONAR](https://github.com/facebookresearch/SONAR) (Sentence-level multimOdal and laNguage-Agnostic Representations) framework.

## Installing Required Packages with UV

We'll use `uv` to create a virtual environment and install the necessary packages. UV is a fast Python package installer and resolver, which makes dependency management more efficient.

SONAR requires specific versions of fairseq2 that match the PyTorch and CUDA versions. If you don't have `uv` installed, you can install it first with:
```bash
curl -sSf https://install.python-uv.org | python3
```

In [1]:
# Create and activate environment with uv using pyproject.toml
!uv venv .venv
!source .venv/bin/activate

# Install the project and its dependencies from pyproject.toml
!uv pip install -e .

# Check if UV installation was successful
!uv --version

Using CPython [36m3.10.18[39m
Creating virtual environment at: [36m.venv[39m
Activate with: [32msource .venv/bin/activate[39m
[2K[2mResolved [1m147 packages[0m [2min 38ms[0m[0m                                        [0m
[2K[2mPrepared [1m1 package[0m [2min 590ms[0m[0m                                              
[2K[2mInstalled [1m147 packages[0m [2min 250ms[0m[0m                             [0m
 [32m+[39m [1manyio[0m[2m==4.9.0[0m
 [32m+[39m [1margon2-cffi[0m[2m==25.1.0[0m
 [32m+[39m [1margon2-cffi-bindings[0m[2m==21.2.0[0m
 [32m+[39m [1marrow[0m[2m==1.3.0[0m
 [32m+[39m [1masttokens[0m[2m==3.0.0[0m
 [32m+[39m [1masync-lru[0m[2m==2.0.5[0m
 [32m+[39m [1mattrs[0m[2m==25.3.0[0m
 [32m+[39m [1mbabel[0m[2m==2.17.0[0m
 [32m+[39m [1mbeautifulsoup4[0m[2m==4.13.4[0m
 [32m+[39m [1mblaser-testing[0m[2m==0.1.0 (from file:///home/ec2-user/Projects/blaser_experiment)[0m
 [32m+[39m [1mbleach[0m[2m==6.2.0[0m


## Installing System Dependencies

Before installing fairseq2, we need to install the libsndfile system dependency, which is required for audio processing functionality.

In [3]:
# Install libsndfile system dependency needed for fairseq2
!sudo dnf install -y libsndfile

# Verify that libsndfile is installed
!ls -la /usr/lib64/libsndfile.so*

Last metadata expiration check: 0:30:32 ago on Wed Jun 11 16:59:58 2025.
Dependencies resolved.
 Package         Arch        Version                     Repository        Size
Installing:
 [1m[32mlibsndfile     [m x86_64      1.2.2-3.amzn2023.0.3        amazonlinux      225 k
Installing dependencies:
 [1m[32mflac-libs      [m x86_64      1.3.4-1.amzn2023.0.2        amazonlinux      234 k
 [1m[32mgsm            [m x86_64      1.0.19-5.amzn2023.0.3       amazonlinux       39 k
 [1m[32mlibogg         [m x86_64      2:1.3.4-4.amzn2023.0.2      amazonlinux       34 k
 [1m[32mlibvorbis      [m x86_64      1:1.3.7-3.amzn2023.0.2      amazonlinux      206 k
 [1m[32mopus           [m x86_64      1.3.1-8.amzn2023.0.3        amazonlinux      225 k

Transaction Summary
Install  6 Packages

Total download size: 962 k
Installed size: 2.5 M
Downloading Packages:
(1/6): libogg-1.3.4-4.amzn2023.0.2.x86_64.rpm   1.1 MB/s |  34 kB     00:00    
(2/6): gsm-1.0.19-5.amzn2023.0.3.x86_64.rp

In [4]:
# Install fairseq2 with the appropriate PyTorch version using our helper script
# This script automatically detects your PyTorch version and installs the compatible fairseq2
!python install_fairseq2.py

# Verify PyTorch version
import torch
print(f"PyTorch version: {torch.__version__}")

# Verify fairseq2 installation
try:
    import fairseq2
    print(f"fairseq2 version: {fairseq2.__version__}")
    print("fairseq2 installation successful!")
except ImportError:
    print("fairseq2 not installed correctly. Please run the helper script again or install manually.")

Installing fairseq2 for PyTorch 2.6.0 with cu124
[2mAudited [1m1 package[0m [2min 4ms[0m[0m
fairseq2 installation completed successfully!
PyTorch version: 2.6.0+cu124
PyTorch version: 2.6.0+cu124
fairseq2 version: 0.4.6
fairseq2 installation successful!
fairseq2 version: 0.4.6
fairseq2 installation successful!


## Setting Up BLASER 2.0

Now, let's import the necessary modules and set up the BLASER 2.0 models. We'll use both the Quality Estimation (QE) model and the Reference-based model.

In [9]:
# Import necessary modules
from sonar.inference_pipelines.text import TextToEmbeddingModelPipeline
from sonar.models.blaser.loader import load_blaser_model
import torch

# Set device based on availability
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Load BLASER models
try:
    blaser_ref = load_blaser_model("blaser_2_0_ref").eval().to(device)
    blaser_qe = load_blaser_model("blaser_2_0_qe").eval().to(device)
    print("BLASER 2.0 models loaded successfully")
    
    # Initialize text embedder
    text_embedder = TextToEmbeddingModelPipeline(
        encoder="text_sonar_basic_encoder", 
        tokenizer="text_sonar_basic_encoder",
        device=device
    )
    print("Text embedder initialized")
except Exception as e:
    print(f"Error loading models: {e}")

Using device: cuda
BLASER 2.0 models loaded successfully
Text embedder initialized
Text embedder initialized


## Testing Sentence Similarity with BLASER 2.0

Let's test the sentence similarity functionality with some examples in different languages.

In [10]:
# Example 1: French to English translation evaluation
src_text = ["Le chat s'assit sur le tapis."]
ref_text = ["The cat sat on the mat."]
mt_text = ["The cat sat down on the carpet."]

print("Source (French):", src_text[0])
print("Reference (English):", ref_text[0])
print("Machine Translation (English):", mt_text[0])

# Get embeddings and move all to the same device
src_embs = text_embedder.predict(src_text, source_lang="fra_Latn").to(device)
ref_embs = text_embedder.predict(ref_text, source_lang="eng_Latn").to(device)
mt_embs = text_embedder.predict(mt_text, source_lang="eng_Latn").to(device)

# Predict similarity scores
with torch.inference_mode():
    # Using reference-based model
    ref_score = blaser_ref(src=src_embs, ref=ref_embs, mt=mt_embs).item()
    
    # Using quality estimation model (no reference)
    qe_score = blaser_qe(src=src_embs, mt=mt_embs).item()

print("\nBLASER 2.0 Similarity Scores (scale 1-5, higher is more similar):")
print(f"Reference-based score: {ref_score:.3f}")
print(f"Quality Estimation score: {qe_score:.3f}")

Source (French): Le chat s'assit sur le tapis.
Reference (English): The cat sat on the mat.
Machine Translation (English): The cat sat down on the carpet.

BLASER 2.0 Similarity Scores (scale 1-5, higher is more similar):
Reference-based score: 4.688
Quality Estimation score: 4.708


In [11]:
# Example 2: Testing with multiple language pairs
language_examples = [
    {
        "src": ["Es ist ein schöner Tag heute."], 
        "src_lang": "deu_Latn",
        "ref": ["It is a beautiful day today."],
        "ref_lang": "eng_Latn",
        "mt": ["It's a nice day today."],
        "mt_lang": "eng_Latn",
        "description": "German to English"
    },
    {
        "src": ["El libro está sobre la mesa."], 
        "src_lang": "spa_Latn",
        "ref": ["The book is on the table."],
        "ref_lang": "eng_Latn",
        "mt": ["A book is placed on the table."],
        "mt_lang": "eng_Latn",
        "description": "Spanish to English"
    },
    {
        "src": ["I love studying languages."], 
        "src_lang": "eng_Latn",
        "ref": ["J'adore étudier les langues."],
        "ref_lang": "fra_Latn",
        "mt": ["J'aime apprendre des langues."],
        "mt_lang": "fra_Latn",
        "description": "English to French"
    }
]

print("\n===== Testing Multiple Language Pairs =====\n")

for example in language_examples:
    print(f"\n----- {example['description']} -----")
    print(f"Source ({example['src_lang']}): {example['src'][0]}")
    print(f"Reference ({example['ref_lang']}): {example['ref'][0]}")
    print(f"Machine Translation ({example['mt_lang']}): {example['mt'][0]}")
    
    # Get embeddings and ensure they're on the same device
    src_embs = text_embedder.predict(example['src'], source_lang=example['src_lang']).to(device)
    ref_embs = text_embedder.predict(example['ref'], source_lang=example['ref_lang']).to(device)
    mt_embs = text_embedder.predict(example['mt'], source_lang=example['mt_lang']).to(device)
    
    # Predict similarity scores
    with torch.inference_mode():
        ref_score = blaser_ref(src=src_embs, ref=ref_embs, mt=mt_embs).item()
        qe_score = blaser_qe(src=src_embs, mt=mt_embs).item()
        
    print(f"Reference-based score: {ref_score:.3f}")
    print(f"Quality Estimation score: {qe_score:.3f}")


===== Testing Multiple Language Pairs =====


----- German to English -----
Source (deu_Latn): Es ist ein schöner Tag heute.
Reference (eng_Latn): It is a beautiful day today.
Machine Translation (eng_Latn): It's a nice day today.
Reference-based score: 4.761
Quality Estimation score: 4.895

----- Spanish to English -----
Source (spa_Latn): El libro está sobre la mesa.
Reference (eng_Latn): The book is on the table.
Machine Translation (eng_Latn): A book is placed on the table.
Reference-based score: 4.385
Quality Estimation score: 4.303

----- English to French -----
Source (eng_Latn): I love studying languages.
Reference (fra_Latn): J'adore étudier les langues.
Machine Translation (fra_Latn): J'aime apprendre des langues.
Reference-based score: 4.855
Quality Estimation score: 5.026


## Evaluating Translations of Varying Quality

Let's test BLASER 2.0 with translations of varying quality to see how well it captures translation quality differences.

In [12]:
# Example 3: Translations of varying quality
source = ["This research paper presents a novel approach to natural language processing."]
source_lang = "eng_Latn"

translations = [
    {
        "text": ["Cet article de recherche présente une approche novatrice du traitement du langage naturel."],
        "lang": "fra_Latn",
        "quality": "High quality"
    },
    {
        "text": ["Ce papier de recherche présente une nouvelle approche pour le traitement du langage naturel."],
        "lang": "fra_Latn",
        "quality": "Medium quality"
    },
    {
        "text": ["Ce document recherche montre nouveau façon pour traitement langue naturelle."],
        "lang": "fra_Latn",
        "quality": "Low quality"
    },
    {
        "text": ["Ce texte parle de cuisine française et de recettes traditionnelles."],
        "lang": "fra_Latn",
        "quality": "Unrelated content"
    }
]

print(f"\n===== Evaluating Translations of Varying Quality =====\n")
print(f"Source (English): {source[0]}")

# Get source embedding and ensure it's on the correct device
src_embs = text_embedder.predict(source, source_lang=source_lang).to(device)

# Evaluate each translation
for translation in translations:
    print(f"\n{translation['quality']} translation ({translation['lang']}): {translation['text'][0]}")
    
    # Get translation embedding and ensure it's on the same device
    mt_embs = text_embedder.predict(translation['text'], source_lang=translation['lang']).to(device)
    
    # Predict quality estimation score
    with torch.inference_mode():
        qe_score = blaser_qe(src=src_embs, mt=mt_embs).item()
        
    print(f"Quality Estimation score: {qe_score:.3f}")


===== Evaluating Translations of Varying Quality =====

Source (English): This research paper presents a novel approach to natural language processing.

High quality translation (fra_Latn): Cet article de recherche présente une approche novatrice du traitement du langage naturel.
Quality Estimation score: 5.184

Medium quality translation (fra_Latn): Ce papier de recherche présente une nouvelle approche pour le traitement du langage naturel.
Quality Estimation score: 5.124

Low quality translation (fra_Latn): Ce document recherche montre nouveau façon pour traitement langue naturelle.
Quality Estimation score: 4.568

Unrelated content translation (fra_Latn): Ce texte parle de cuisine française et de recettes traditionnelles.
Quality Estimation score: 3.170


## Interactive Text Similarity Evaluation

The following cell allows you to input custom text for similarity evaluation.

In [None]:
# Check ipywidgets installation and enable the extension
try:
    import ipywidgets
    print(f"ipywidgets version: {ipywidgets.__version__}")
    
    # Enable the extension
    from IPython import get_ipython
    if get_ipython() is not None:
        get_ipython().run_line_magic('load_ext', 'ipywidgets')
        print("ipywidgets extension loaded successfully")
    else:
        print("Not running in an IPython environment")
    
    # Verify that the widgets are properly registered
    print("Available widget models:", ipywidgets.Widget.widget_types)
    
except ImportError as e:
    print(f"ipywidgets not properly installed: {e}")
    print("Installing ipywidgets...")
    %pip install ipywidgets
    print("Please restart the kernel after installation.")

In [None]:
# Custom text input
import ipywidgets as widgets
from IPython.display import display, clear_output

# Initialize Jupyter widgets extension if not already initialized
from IPython import get_ipython
if get_ipython() is not None:
    get_ipython().run_line_magic('matplotlib', 'inline')
    # Make sure widgets are properly initialized
    get_ipython().run_line_magic('reload_ext', 'ipywidgets')
    
# Create input widgets
src_lang_input = widgets.Text(value='eng_Latn', description='Source Lang:', layout={'width': '300px'})
src_text_input = widgets.Textarea(value='The weather is nice today.', description='Source Text:', layout={'width': '500px'})
mt_lang_input = widgets.Text(value='fra_Latn', description='Trans Lang:', layout={'width': '300px'})
mt_text_input = widgets.Textarea(value='Le temps est beau aujourd\'hui.', description='Translation:', layout={'width': '500px'})

# Function to evaluate similarity
def evaluate_similarity(button):
    clear_output(wait=True)
    
    # Display inputs
    display(src_lang_input, src_text_input, mt_lang_input, mt_text_input, evaluate_button)
    
    print("\nEvaluating similarity...\n")
    
    src_text = [src_text_input.value]
    mt_text = [mt_text_input.value]
    src_lang = src_lang_input.value
    mt_lang = mt_lang_input.value
    
    try:
        # Get embeddings and ensure they're on the same device
        src_embs = text_embedder.predict(src_text, source_lang=src_lang).to(device)
        mt_embs = text_embedder.predict(mt_text, source_lang=mt_lang).to(device)
        
        # Predict similarity score
        with torch.inference_mode():
            qe_score = blaser_qe(src=src_embs, mt=mt_embs).item()
            
        print(f"Source ({src_lang}): {src_text[0]}")
        print(f"Translation ({mt_lang}): {mt_text[0]}")
        print(f"\nQuality Estimation score: {qe_score:.3f} (scale: 1-5, higher is more similar)")
    except Exception as e:
        print(f"Error: {e}")

# Create evaluate button with callback
evaluate_button = widgets.Button(description="Evaluate Similarity", button_style='primary')
evaluate_button.on_click(evaluate_similarity)

# Group widgets in a vertical box for better layout
widget_box = widgets.VBox([
    widgets.HBox([src_lang_input]),
    src_text_input,
    widgets.HBox([mt_lang_input]),
    mt_text_input,
    evaluate_button
])

# Display widgets
display(widget_box)

Text(value='eng_Latn', description='Source Lang:')

Textarea(value='The weather is nice today.', description='Source Text:')

Text(value='fra_Latn', description='Trans Lang:')

Textarea(value="Le temps est beau aujourd'hui.", description='Translation:')

Button(description='Evaluate Similarity', style=ButtonStyle())

## Conclusion

BLASER 2.0 provides a powerful way to evaluate translation quality and sentence similarity across languages using the SONAR embedding space. It can be used in various scenarios:

- Machine translation quality evaluation
- Cross-lingual similarity assessment
- Comparing translation alternatives
- Evaluating speech-to-text translations

For more information, visit:
- [SONAR GitHub Repository](https://github.com/facebookresearch/SONAR)
- [BLASER 2.0 QE Model Card](https://huggingface.co/facebook/blaser-2.0-qe)
- [BLASER 2.0 Ref Model Card](https://huggingface.co/facebook/blaser-2.0-ref)