# BGE Reranker v2 m3 - Azure ML

This notebook demonstrates how to deploy and test the BGE Reranker v2 m3 model from HuggingFace on Azure Machine Learning's managed endpoint.

### 1. Connect to Azure ML Workspace

First, connect to your Azure ML workspace. Ensure you have the necessary credentials configured (e.g., via environment variables).

In [1]:
# Import necessary packages
import os
import time
import json
from urllib.request import Request, urlopen
from urllib.error import HTTPError
from azure.identity import DefaultAzureCredential
from azure.ai.ml import MLClient
from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
)

In [2]:
# Retrive Azure ML workspace details
subscription_id = os.getenv("AZUREML_SUBSCRIPTION_ID")
resource_group = os.getenv("AZUREML_RESOURCE_GROUP")
workspace_name = os.getenv("AZUREML_WORKSPACE_NAME")

In [3]:
# Authenticate and create MLClient
try:
    credential = DefaultAzureCredential()
    ml_client = MLClient(credential, subscription_id, resource_group, workspace_name)
    print(f"Connected to workspace: {workspace_name}")
except Exception as ex:
    print("Could not connect to Azure ML. Please check your credentials and workspace details.")
    print(ex)

Connected to workspace: laziz_AML_SW


### 2. Find BGE Reranker Model in Hugging Face Registry

We will use the model directly from the Hugging Face registry within Azure ML.

In [4]:
# Define model details
registry_name = "HuggingFace"
model_name = "baai-bge-reranker-v2-m3"

In [5]:
# Construct model ID for Azure ML deployment
model_id = f"azureml://registries/{registry_name}/models/{model_name}/labels/latest"

print(f"Using model ID: {model_id}")

Using model ID: azureml://registries/HuggingFace/models/baai-bge-reranker-v2-m3/labels/latest


### 3. Prepare Deployment Assets

For deploying a custom model or running inference, you typically need a scoring script and an environment. For Hugging Face models from the registry, Azure ML auto-provides the environment and scoring script. However, if customisation is needed, you can define them here.

In [6]:
# If you needed a custom environment or scoring script, you define them here:
# env = Environment(
#     image = "mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest",
#     conda_file = "./conda_env.yml"
# )
# code_config = CodeConfiguration(
#     code = "./", # Directory containing your scoring script
#     scoring_script = "score.py"
# )

### 4. Deploy BGE Model to Online Endpoint

Deploy the model to a managed online endpoint.

In [7]:
# Define unique endpoint name
endpoint_name = "bge-reranker-ep-" + str(int(time.time()))

In [8]:
# Create online endpoint
print(f"Creating endpoint: {endpoint_name}")
endpoint = ManagedOnlineEndpoint(
    name = endpoint_name,
    description = "Online endpoint for BGE Reranker v2 m3 model"
)
ml_client.begin_create_or_update(endpoint).wait()
print(f"Endpoint {endpoint_name} created.")

Creating endpoint: bge-reranker-ep-1748948005
Endpoint bge-reranker-ep-1748948005 created.


In [9]:
# Create the online deployment
deployment_name = "reranker-deployment"
instance_type = "Standard_NC4as_T4_v3"
instance_count = 1

print(f"Creating deployment: {deployment_name} on instance type {instance_type}")
deployment = ManagedOnlineDeployment(
    name = deployment_name,
    endpoint_name = endpoint_name,
    model = model_id,
    instance_type = instance_type,
    instance_count = instance_count,
    # If using custom code/environment, uncomment and configure these:
    # environment = env,
    # code_configuration = code_config
)

ml_client.online_deployments.begin_create_or_update(deployment).wait()
print(f"Deployment {deployment_name} created.")

Check: endpoint bge-reranker-ep-1748948005 exists


Creating deployment: reranker-deployment on instance type Standard_NC4as_T4_v3
..................................................................................................................Deployment reranker-deployment created.


In [10]:
# Set 100% traffic to the new deployment
endpoint.traffic = {deployment_name: 100}
ml_client.begin_create_or_update(endpoint).result()
print(f"Traffic assigned to deployment {deployment_name}.")

Traffic assigned to deployment reranker-deployment.


### 5. Test the Deployed Model

Send sample data to the deployed endpoint to test its functionality.

In [11]:
# Input data
data = {
    "query": "What is Deep Learning?",
    "texts": [
        "Deep learning is a subset of machine learning that uses neural networks with multiple layers to model and understand complex patterns in data.",
        "Machine learning is a method of data analysis that automates analytical model building using algorithms that iteratively learn from data.",
        "Python is a high-level programming language widely used for web development and data science applications.",
        "Artificial intelligence is the simulation of human intelligence processes by machines, especially computer systems."
    ]
}

In [12]:
# Helper function to process results
def process_reranker_results(original_data, reranker_response):
    """
    Process reranker results and combine with original texts
    
    Args:
        original_data (dict): Original query and texts
        reranker_response (bytes): Response from BGE reranker endpoint
    
    Returns:
        list: Ranked results with texts and scores
    """
    # Parse the reranker response
    scores = json.loads(reranker_response.decode('utf-8'))
    
    # Combine texts with their scores
    ranked_results = []
    for score_data in scores:
        index = score_data['index']
        score = score_data['score']
        text = original_data['texts'][index]
        
        ranked_results.append({
            'text': text,
            'score': score,
            'original_index': index
        })
    
    # Sort by score in descending order (highest relevance first)
    ranked_results.sort(key=lambda x: x['score'], reverse=True)
    
    return ranked_results

In [13]:
# Helper function to display results
def display_ranked_results(query, ranked_results):
    """
    Display the ranked results in a readable format
    
    Args:
        query (str): Original query
        ranked_results (list): Ranked results with scores
    """
    print(f"Query: {query}")
    print("=" * 80)
    print(f"{'Rank':<4} {'Score':<10} {'Text'}")
    print("=" * 80)
    
    for rank, result in enumerate(ranked_results, 1):
        score = f"{result['score']:.6f}"
        text = result['text'][:70] + "..." if len(result['text']) > 70 else result['text']
        print(f"{rank:<4} {score:<10} {text}")
    
    print("\n" + "=" * 80)
    print("Full ranked results:")
    print("=" * 80)
    
    for rank, result in enumerate(ranked_results, 1):
        print(f"\nRank {rank} (Score: {result['score']:.6f}):")
        print(f"Original Index: {result['original_index']}")
        print(f"Text: {result['text']}")

In [14]:
# Define API request body
body = str.encode(json.dumps(data))

# Retrieve the scoring URI and primary key from the deployed endpoint
url = ml_client.online_endpoints.get(name=endpoint_name).scoring_uri
url = url.removesuffix('/embed')
api_key = ml_client.online_endpoints.get_keys(name=endpoint.name).primary_key
if not api_key:
    raise Exception("A key should be provided to invoke the endpoint")

# Set up headers for the request
headers = {
    "Content-Type": "application/json",
    "Accept": "application/json",
    "Authorization":(
        "Bearer " + api_key
    )
}

In [15]:
# Submit API request
req = Request(url, body, headers)

try:
    response = urlopen(req)
    result = response.read()
    print("Raw API Response:")
    print(result)
    
    # Process and display ranked results
    print("\n" + "="*80)
    print("PROCESSED RERANKING RESULTS:")
    print("="*80)
    
    ranked_results = process_reranker_results(data, result)
    display_ranked_results(data['query'], ranked_results)
    
except HTTPError as error:
    print("The request failed with status code: " + str(error.code))
    print(error.info())
    print(error.read().decode("utf8", 'ignore'))

Raw API Response:
b'[{"index":0,"score":0.9998785},{"index":1,"score":0.01374232},{"index":3,"score":0.010944578},{"index":2,"score":0.00014202364}]'

PROCESSED RERANKING RESULTS:
Query: What is Deep Learning?
Rank Score      Text
1    0.999879   Deep learning is a subset of machine learning that uses neural network...
2    0.013742   Machine learning is a method of data analysis that automates analytica...
3    0.010945   Artificial intelligence is the simulation of human intelligence proces...
4    0.000142   Python is a high-level programming language widely used for web develo...

Full ranked results:

Rank 1 (Score: 0.999879):
Original Index: 0
Text: Deep learning is a subset of machine learning that uses neural networks with multiple layers to model and understand complex patterns in data.

Rank 2 (Score: 0.013742):
Original Index: 1
Text: Machine learning is a method of data analysis that automates analytical model building using algorithms that iteratively learn from data.

Ran

### 6. Clean up Resources (Optional)

If you no longer need the endpoint and deployment, you can delete them to avoid incurring costs.

In [16]:
# Clean up resources
print(f"Deleting endpoint {endpoint_name}...")
ml_client.online_endpoints.begin_delete(name=endpoint_name).wait()
print(f"Endpoint {endpoint_name} deleted.")

Deleting endpoint bge-reranker-ep-1748948005...
............................................................................Endpoint bge-reranker-ep-1748948005 deleted.
