## Using Document Boosting in Azure AI Search for enhanced retreival quality

In [None]:
!pip install azure-identity
!pip install kaggle
!pip install python-dotenv
!pip install rich
!pip install azure-search-documents --pre 

## Download Data from Kaggle

This dataset is sourced from [Rishabh Misra's publications](https://rishabhmisra.github.io/publications).

If you're using this dataset for your work, please cite the following articles:

**Citation in text format:**

1. Misra, Rishabh. "News Category Dataset." arXiv preprint arXiv:2209.11429 (2022).
2. Misra, Rishabh and Jigyasa Grover. "Sculpting Data for ML: The first act of Machine Learning." ISBN 9798585463570 (2021).

**Citation in BibTex format:**

```bibtex
@article{misra2022news,
  title={News Category Dataset},
  author={Misra, Rishabh},
  journal={arXiv preprint arXiv:2209.11429},
  year={2022}
}
@book{misra2021sculpting,
  author = {Misra, Rishabh and Grover, Jigyasa},
  year = {2021},
  month = {01},
  pages = {},
  title = {Sculpting Data for ML: The first act of Machine Learning},
  isbn = {9798585463570}
}

In [2]:
! kaggle datasets download -d rmisra/news-category-dataset

Dataset URL: https://www.kaggle.com/datasets/rmisra/news-category-dataset
License(s): Attribution 4.0 International (CC BY 4.0)
news-category-dataset.zip: Skipping, found more recently modified local copy (use --force to force download)


In [3]:
import pandas as pd
import zipfile

# Unzip the downloaded file
with zipfile.ZipFile('news-category-dataset.zip', 'r') as zip_ref:
    zip_ref.extractall()

# Load the dataset into a pandas DataFrame
df = pd.read_json('News_Category_Dataset_v3.json', lines=True)

# Display the first few rows of the DataFrame
df.head()

Unnamed: 0,link,headline,category,short_description,authors,date
0,https://www.huffpost.com/entry/covid-boosters-...,Over 4 Million Americans Roll Up Sleeves For O...,U.S. NEWS,Health experts said it is too early to predict...,"Carla K. Johnson, AP",2022-09-23
1,https://www.huffpost.com/entry/american-airlin...,"American Airlines Flyer Charged, Banned For Li...",U.S. NEWS,He was subdued by passengers and crew when he ...,Mary Papenfuss,2022-09-23
2,https://www.huffpost.com/entry/funniest-tweets...,23 Of The Funniest Tweets About Cats And Dogs ...,COMEDY,"""Until you have a dog you don't understand wha...",Elyse Wanshel,2022-09-23
3,https://www.huffpost.com/entry/funniest-parent...,The Funniest Tweets From Parents This Week (Se...,PARENTING,"""Accidentally put grown-up toothpaste on my to...",Caroline Bologna,2022-09-23
4,https://www.huffpost.com/entry/amy-cooper-lose...,Woman Who Called Cops On Black Bird-Watcher Lo...,U.S. NEWS,Amy Cooper accused investment firm Franklin Te...,Nina Golgowski,2022-09-22


## Dataset Overview

The dataset contains news articles with their corresponding categories. Each record in the dataset has a `headline`, `short_description`, and `category`. 

The `headline` and `short_description` fields contain textual data that can be vectorized for further analysis or machine learning tasks. The `category` field can be used as a label for supervised learning tasks.

Before proceeding with vectorization, it's a good idea to check the maximum length of characters in the `headline` and `short_description` fields. This will help us understand the size of the vectors we'll be working with and can inform decisions about preprocessing steps, such as chunking.

In [4]:
max_headline_length = df['headline'].str.len().max()
max_short_description_length = df['short_description'].str.len().max()

print(max_headline_length)
print(max_short_description_length)

320
1472


Let's plan to use OpenAI `text-embedding-3-large` with takes 8192 input tokens or ~32K characters of text. We are good to go to vectorizing! 

## Vectorize Headline and Short_Description

In [110]:
df['text_to_vectorize'] = df['headline'] + ' ' + df['short_description']

df.head()

Unnamed: 0,link,headline,category,short_description,authors,date,id,view_count,text_to_vectorize
0,https://www.huffpost.com/entry/covid-boosters-...,Over 4 Million Americans Roll Up Sleeves For O...,U.S. NEWS,Health experts said it is too early to predict...,"Carla K. Johnson, AP",2022-09-23T00:00:00.000000Z,0,87529,Over 4 Million Americans Roll Up Sleeves For O...
1,https://www.huffpost.com/entry/american-airlin...,"American Airlines Flyer Charged, Banned For Li...",U.S. NEWS,He was subdued by passengers and crew when he ...,Mary Papenfuss,2022-09-23T00:00:00.000000Z,1,61166,"American Airlines Flyer Charged, Banned For Li..."
2,https://www.huffpost.com/entry/funniest-tweets...,23 Of The Funniest Tweets About Cats And Dogs ...,COMEDY,"""Until you have a dog you don't understand wha...",Elyse Wanshel,2022-09-23T00:00:00.000000Z,2,62216,23 Of The Funniest Tweets About Cats And Dogs ...
3,https://www.huffpost.com/entry/funniest-parent...,The Funniest Tweets From Parents This Week (Se...,PARENTING,"""Accidentally put grown-up toothpaste on my to...",Caroline Bologna,2022-09-23T00:00:00.000000Z,3,67162,The Funniest Tweets From Parents This Week (Se...
4,https://www.huffpost.com/entry/amy-cooper-lose...,Woman Who Called Cops On Black Bird-Watcher Lo...,U.S. NEWS,Amy Cooper accused investment firm Franklin Te...,Nina Golgowski,2022-09-22T00:00:00.000000Z,4,52370,Woman Who Called Cops On Black Bird-Watcher Lo...


Now, we can use text_to_vectorize as the input to OpenAI Embedding Models and project that to a new column called "vector"

In [111]:
# Ensure the id field is a string or create it if it doesn't exist
if 'id' in df.columns:
    df["id"] = df["id"].astype(str)
else:
    df["id"] = df.index.astype(str)

# Convert the 'date' field to the correct format
df["date"] = pd.to_datetime(df["date"]).dt.strftime('%Y-%m-%dT%H:%M:%S.%f')[:-3] + 'Z'
df.head()

Unnamed: 0,link,headline,category,short_description,authors,date,id,view_count,text_to_vectorize
0,https://www.huffpost.com/entry/covid-boosters-...,Over 4 Million Americans Roll Up Sleeves For O...,U.S. NEWS,Health experts said it is too early to predict...,"Carla K. Johnson, AP",2022-09-23T00:00:00.000000Z,0,87529,Over 4 Million Americans Roll Up Sleeves For O...
1,https://www.huffpost.com/entry/american-airlin...,"American Airlines Flyer Charged, Banned For Li...",U.S. NEWS,He was subdued by passengers and crew when he ...,Mary Papenfuss,2022-09-23T00:00:00.000000Z,1,61166,"American Airlines Flyer Charged, Banned For Li..."
2,https://www.huffpost.com/entry/funniest-tweets...,23 Of The Funniest Tweets About Cats And Dogs ...,COMEDY,"""Until you have a dog you don't understand wha...",Elyse Wanshel,2022-09-23T00:00:00.000000Z,2,62216,23 Of The Funniest Tweets About Cats And Dogs ...
3,https://www.huffpost.com/entry/funniest-parent...,The Funniest Tweets From Parents This Week (Se...,PARENTING,"""Accidentally put grown-up toothpaste on my to...",Caroline Bologna,2022-09-23T00:00:00.000000Z,3,67162,The Funniest Tweets From Parents This Week (Se...
4,https://www.huffpost.com/entry/amy-cooper-lose...,Woman Who Called Cops On Black Bird-Watcher Lo...,U.S. NEWS,Amy Cooper accused investment firm Franklin Te...,Nina Golgowski,2022-09-22T00:00:00.000000Z,4,52370,Woman Who Called Cops On Black Bird-Watcher Lo...


Let's also create an artifical column called 'view_count' so we can demo mangnitude document boosting later

In [113]:
import numpy as np

df['view_count'] = np.random.randint(0, 100001, size=len(df), dtype=np.int32)

## Generate Embeddings 

### Authenticate Azure OpenAI

In [114]:
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
import json
import os

# User-specified parameter
USE_AAD_FOR_AOAI = True

def authenticate_openai(api_key=None, use_aad_for_aoai=False):
    from azure.identity import get_bearer_token_provider
    from openai import AzureOpenAI

    if use_aad_for_aoai:
        print("Using AAD for authentication.")
        credential = DefaultAzureCredential()
        token_provider = get_bearer_token_provider(credential, "https://cognitiveservices.azure.com/.default")
        client = AzureOpenAI(
            azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
            api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
            azure_ad_token_provider=token_provider,
        )
    else:
        print("Using API keys for authentication.")
        if api_key is None:
            raise ValueError("API key must be provided if not using AAD for authentication.")
        client = AzureOpenAI(
            api_key=api_key,
            azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
            api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
        )
    return client

openai_client = authenticate_openai(api_key=os.getenv("AZURE_OPENAI_API_KEY"), use_aad_for_aoai=USE_AAD_FOR_AOAI)

Using AAD for authentication.


In [8]:
from tqdm import tqdm
from tenacity import retry, stop_after_attempt, wait_exponential
import json

@retry(stop=stop_after_attempt(5), wait=wait_exponential(multiplier=1, min=4, max=60))
def get_embeddings(openai_client, texts):
    response = openai_client.embeddings.create(
        input=texts,
        model=os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYED_MODEL_NAME")
    )
    response_json = json.loads(response.model_dump_json(indent=2))
    return [data['embedding'] for data in response_json['data']]

def add_embeddings_to_df(df, text_column, vector_column, batch_size=1000):
    embeddings = []
    for i in tqdm(range(0, len(df[text_column]), batch_size)):
        batch_texts = df[text_column][i:i+batch_size].tolist()
        batch_embeddings = get_embeddings(openai_client, batch_texts)
        embeddings.extend(batch_embeddings)
    df[vector_column] = embeddings
    return df

df_vectors = add_embeddings_to_df(df, "text_to_vectorize", "vector")
print(df_vectors.head())


100%|██████████| 210/210 [45:02<00:00, 12.87s/it]

                                                link  \
0  https://www.huffpost.com/entry/covid-boosters-...   
1  https://www.huffpost.com/entry/american-airlin...   
2  https://www.huffpost.com/entry/funniest-tweets...   
3  https://www.huffpost.com/entry/funniest-parent...   
4  https://www.huffpost.com/entry/amy-cooper-lose...   

                                            headline   category  \
0  Over 4 Million Americans Roll Up Sleeves For O...  U.S. NEWS   
1  American Airlines Flyer Charged, Banned For Li...  U.S. NEWS   
2  23 Of The Funniest Tweets About Cats And Dogs ...     COMEDY   
3  The Funniest Tweets From Parents This Week (Se...  PARENTING   
4  Woman Who Called Cops On Black Bird-Watcher Lo...  U.S. NEWS   

                                   short_description               authors  \
0  Health experts said it is too early to predict...  Carla K. Johnson, AP   
1  He was subdued by passengers and crew when he ...        Mary Papenfuss   
2  "Until you have a dog y




Let's drop the `text_to_vectorize` column from the data frame since we no longer need this since the vectors are already created from the concatenation of the `headline` and `short_description` fields.

In [None]:
df_vectors.to_json('df_vectors.json', orient='records')

In [None]:
df_vectors.drop(columns=['text_to_vectorize'], inplace=True)

## Create Azure AI Search index

### Authenticate to Azure AI Search

In [123]:
from azure.search.documents import SearchClient
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from azure.core.credentials import AzureKeyCredential  
import os

INDEX_NAME="news-category"

# User-specified parameter
USE_AAD_FOR_SEARCH = True  

def authenticate_azure_search(api_key=None, use_aad_for_search=False):
    if use_aad_for_search:
        print("Using AAD for authentication.")
        credential = DefaultAzureCredential()
    else:
        print("Using API keys for authentication.")
        if api_key is None:
            raise ValueError("API key must be provided if not using AAD for authentication.")
        credential = AzureKeyCredential(api_key)
    return credential

azure_search_credential = authenticate_azure_search(api_key=os.getenv("AZURE_SEARCH_ADMIN_KEY"), use_aad_for_search=USE_AAD_FOR_SEARCH)


Using AAD for authentication.


In [129]:
import os
from azure.identity import DefaultAzureCredential
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    HnswAlgorithmConfiguration,
    HnswParameters,
    SearchField,
    SearchableField,
    SearchFieldDataType,
    SearchIndex,
    SemanticConfiguration,
    SemanticField,
    SemanticPrioritizedFields,
    SemanticSearch,
    SimpleField,
    VectorSearch,
    VectorSearchAlgorithmKind,
    VectorSearchAlgorithmMetric,
    VectorSearchProfile,
    AzureOpenAIModelName,
    AzureOpenAIParameters,
    AzureOpenAIVectorizer,
    ScoringProfile,
    MagnitudeScoringFunction,
    MagnitudeScoringParameters,
    FreshnessScoringFunction,
    FreshnessScoringParameters,
    DistanceScoringFunction,
    TagScoringFunction,
    TagScoringParameters,
    ScoringFunctionInterpolation,
    ScoringFunctionAggregation,
    TextWeights,
)

# Initialize the SearchIndexClient
index_client = SearchIndexClient(
    endpoint=os.getenv("AZURE_SEARCH_SERVICE_ENDPOINT"),
    credential=DefaultAzureCredential(),
)

# Define the fields
fields = [
    SimpleField(name="id", type=SearchFieldDataType.String, key=True),
    SimpleField(name="link", type=SearchFieldDataType.String),
    SearchableField(name="headline", type=SearchFieldDataType.String),
    SearchableField(
        name="category",
        type=SearchFieldDataType.String,
        filterable=True,
        facetable=True,
    ),
    SearchableField(
        name="short_description",
        type=SearchFieldDataType.String,
    ),
    SearchableField(name="authors", type=SearchFieldDataType.String),
    SearchField(
        name="date",
        type=SearchFieldDataType.DateTimeOffset,
        filterable=True,
        sortable=True,
    ),
    SimpleField(name="view_count", type=SearchFieldDataType.Int32, filterable=True, sortable=True),  
    SearchField(
        name="vector",
        type="Collection(Edm.Single)",
        vector_search_dimensions=3072,
        vector_search_profile_name="my-vector-config",
    )
]

# Define the vector search
vector_search = VectorSearch(
    profiles=[
        VectorSearchProfile(
            name="my-vector-config",
            algorithm_configuration_name="my-hnsw",
            vectorizer="my-vectorizer",
        )
    ],
    algorithms=[
        HnswAlgorithmConfiguration(
            name="my-hnsw",
            kind=VectorSearchAlgorithmKind.HNSW,
            parameters=HnswParameters(metric=VectorSearchAlgorithmMetric.COSINE),
        )
    ],
    vectorizers=[
        AzureOpenAIVectorizer(
            name="my-vectorizer",
            azure_open_ai_parameters=AzureOpenAIParameters(
                resource_uri=os.getenv("AZURE_OPENAI_ENDPOINT"),
                deployment_id=os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYED_MODEL_NAME"),
                model_name=AzureOpenAIModelName.TEXT_EMBEDDING3_LARGE,
            ),
        )
    ],
)

# Configure the semantic search configuration
semantic_search = SemanticSearch(
    configurations=[
        SemanticConfiguration(
            name="my-semantic-config",
            prioritized_fields=SemanticPrioritizedFields(
                title_field=SemanticField(field_name="headline"),
                keywords_fields=[SemanticField(field_name="category")],
                content_fields=[SemanticField(field_name="short_description")],
            ),
        )
    ]
)

# Define scoring profiles
scoring_profiles = [
    ScoringProfile(
        name="boostCategory",
        text_weights=TextWeights(
            weights={
                "category": 10.0,
            }
        ),
    ),
    ScoringProfile(
        name="boostRecency",
        functions=[
            FreshnessScoringFunction(
                field_name="date",
                boost=2.0,
                parameters=FreshnessScoringParameters(
                    boosting_duration="P1095D",
                ),
                interpolation=ScoringFunctionInterpolation.LINEAR,
            )
        ],
    ),
    ScoringProfile(
        name="boostByTag",
        functions=[
            TagScoringFunction(
                field_name="category",
                boost=2.0,
                parameters=TagScoringParameters(
                    tags_parameter="tags",
                ),
            )
        ],
    ),
    ScoringProfile(
        name="boostViewCount",
        functions=[
            MagnitudeScoringFunction(
                field_name="view_count",
                boost=10.0,
                parameters=MagnitudeScoringParameters(
                    boosting_range_start=0,
                    boosting_range_end=10000,
                ),
                interpolation=ScoringFunctionInterpolation.LINEAR,
            )
        ],
    ),
]

# Define the index
index = SearchIndex(
    name=INDEX_NAME,
    fields=fields,
    scoring_profiles=scoring_profiles,
    vector_search=vector_search,
    semantic_search=semantic_search,
)

# Create or update the index
result = index_client.create_or_update_index(index)
print(f"{result.name} created")

news-category created


## Upload documents

Prior to uploading to AI Search, we need to convert the pandas data frame to a list of dictionaries. 

In [13]:
# Convert the DataFrame to a list of dictionaries
documents = df_vectors.to_dict(orient="records")

In [8]:
import json

# Load the JSON file
with open('df_vectors.json', 'r') as file:
    documents = json.load(file)

In [9]:
# Remove the 'text_to_vectorize' field from each document
documents = [{key: value for key, value in doc.items() if key != 'text_to_vectorize'} for doc in documents]

if documents:  # Ensure there are documents to inspect after modification
    first_document = documents[0]
    fields = list(first_document.keys())
    print("Fields in the first document after removal:", fields)
else:
    print("No documents found.")

Fields in the first document after removal: ['link', 'headline', 'category', 'short_description', 'authors', 'date', 'id', 'vector']


To optimize performance of uploading, we will batch upload in increments of 1000 documents. 

In [None]:
import os
from azure.search.documents import SearchClient
from azure.core.credentials import AzureKeyCredential
from azure.core.exceptions import HttpResponseError

# Check if documents are loaded
if not documents:
    print("No documents found to upload.")
else:
    print(f"Loaded {len(documents)} documents to upload.")

def create_search_client(index_name):
    return SearchClient(
        endpoint=os.getenv("AZURE_SEARCH_SERVICE_ENDPOINT"),
        index_name=index_name,
        credential=azure_search_credential
    )

def upload_documents_to_index(client, documents):
    BATCH_SIZE = 250
    for start_idx in range(0, len(documents), BATCH_SIZE):
        end_idx = start_idx + BATCH_SIZE
        documents_to_upload = documents[start_idx:end_idx]
        try:
            client.merge_or_upload_documents(documents=documents_to_upload)
            print(f"Uploaded documents {start_idx} to {end_idx}")
        except HttpResponseError as e:
            print(f"Failed to upload documents {start_idx} to {end_idx}: {e}")

# Create the search client
search_client = create_search_client(INDEX_NAME)

# Upload documents to the index
upload_documents_to_index(search_client, documents)

## Perform a vector search 

Let's perform a vector search with no scoring profile attached and evaluate the relevance.

In [135]:
import os
from azure.search.documents import SearchClient
from azure.core.credentials import AzureKeyCredential
from azure.search.documents.models import VectorizableTextQuery

# Initialize the search client
search_client = SearchClient(
    endpoint=os.getenv("AZURE_SEARCH_SERVICE_ENDPOINT"),
    index_name=INDEX_NAME,
    credential=azure_search_credential,
)

def search_and_print_results(scoring_profile=None):
    query = "top business trends 2022"
    vector_query = VectorizableTextQuery(
        text=query, k_nearest_neighbors=50, fields="vector", exhaustive=True
    )
    results = search_client.search(
        search_text=None,
        vector_queries=[vector_query],
        scoring_profile=scoring_profile,
        # scoring_parameters={"tags-BUSINESS": tags},
        # filter="category eq 'TRAVEL' or category eq 'BUSINESS'",  # Adjust category filter as needed
        top=3,
    )

    profile_name = scoring_profile if scoring_profile else 'Vanilla (No Scoring Profile)'
    print(f"\nResults for {profile_name} Scoring Profile:")
    for result in results:
        print(f"headline: {result['headline']}")
        print(f"score: {result['@search.score']}")
        print(f"description: {result['short_description']}")
        print(f"views: {result['view_count']}")
        print(f"category: {result['category']}")
        print(f"date: {result['date']}")
        print(f"link: {result['link']}\n")

# Perform searches with and without the freshness scoring profile
search_and_print_results("boostViewCount")
search_and_print_results()  # Vanilla query without any scoring profile



Results for boostViewCount Scoring Profile:
headline: My Top 10 Predictions for 2013
score: 6.119715213775635
description: Those are my predictions, and I'm sticking to them.         Photo credit: vestman Since then, I've continued to predict trends
views: 9720
category: WELLNESS
date: 2013-01-09T00:00:00Z
link: https://www.huffingtonpost.com/entry/my-top-10-predictions-for_us_5b9cb3f0e4b03a1dcc81057e

headline: 10 Food Trends to Watch
score: 5.499270915985107
description: We're going to live in a world with Coke robots, apparently.
views: 8419
category: FOOD & DRINK
date: 2013-12-04T00:00:00Z
link: https://www.huffingtonpost.com/entry/10-food-trends-to-watch_us_5b9dabb8e4b03a1dcc8b2d1d

headline: The Biggest Food Trends Of 2015
score: 4.834758281707764
description: 
views: 7338
category: TASTE
date: 2014-12-01T00:00:00Z
link: https://www.huffingtonpost.com/entry/2014-food-trends_n_5792598.html


Results for Vanilla (No Scoring Profile) Scoring Profile:
headline: Disrupt Yourself Firs

### Freshness Boosting

In [99]:
from rich.console import Console
from rich.table import Table
from rich.text import Text

# Initialize a Rich console
console = Console()

def search_and_print_results(scoring_profile=None):
    query = "latest news on airlines"
    vector_query = VectorizableTextQuery(
        text=query, k_nearest_neighbors=50, fields="vector", exhaustive=True
    )
    results = search_client.search(
        search_text=None,
        vector_queries=[vector_query],
        scoring_profile=scoring_profile,
        top=3,
    )

    profile_name = scoring_profile if scoring_profile else 'Vanilla (No Scoring Profile)'
    console.print(f"\nResults for {profile_name} Scoring Profile:", style="bold blue")

    # Create a table for the results
    table = Table(show_header=True, header_style="bold magenta")
    table.add_column("Headline", style="dim", width=20)
    table.add_column("Score")
    table.add_column("Description", width=40)
    table.add_column("Category")
    table.add_column("Date")
    table.add_column("Link")

    for result in results:
        # Format the link as a clickable hyperlink
        link_text = Text(result['link'], style="link")
        link_text.stylize(f"link {result['link']}")

        table.add_row(
            result['headline'],
            str(result['@search.score']),
            result['short_description'],
            result['category'],
            result['date'],
            link_text  # Use the formatted link text here
        )

    # Print the table
    console.print(table)

# Perform searches with and without the freshness scoring profile
search_and_print_results()  # Vanilla query without any scoring profile
search_and_print_results("boostRecency")


When the boostRecency scoring profile is applied, the search results prioritize newer articles, as demonstrated by the higher ranking of the Alaska Airlines article dated 2022-04-01. In contrast, the vanilla query without any scoring profile returns results based on default relevance, where older articles from 2013 and 2012 are ranked higher. This illustrates the effectiveness of the freshness scoring profile in promoting more recent content.

### Category Boosting

For this one, we will do a hybrid search so the category boost can be applied to our full-text search algorithm.

In [80]:
from rich.console import Console
from rich.table import Table
from rich.text import Text

# Initialize a Rich console
console = Console()

def search_and_print_results(scoring_profile=None):
    query = "Entertainment Industry Trends"
    vector_query = VectorizableTextQuery(
        text=query, k_nearest_neighbors=50, fields="vector", exhaustive=True
    )
    results = search_client.search(
        search_text=query, # passing in text query for hybrid search
        vector_queries=[vector_query],
        scoring_profile=scoring_profile,
        top=3,
    )

    profile_name = scoring_profile if scoring_profile else 'Vanilla (No Scoring Profile)'
    console.print(f"\nResults for {profile_name} Scoring Profile:", style="bold blue")

    # Create a table for the results
    table = Table(show_header=True, header_style="bold magenta")
    table.add_column("Headline", style="dim", width=20)
    table.add_column("Score")
    table.add_column("Description", width=40)
    table.add_column("Category")
    table.add_column("Date")
    table.add_column("Link")

    for result in results:
        # Format the link as a clickable hyperlink
        link_text = Text(result['link'], style="link")
        link_text.stylize(f"link {result['link']}")

        table.add_row(
            result['headline'],
            str(result['@search.score']),
            result['short_description'],
            result['category'],
            result['date'],
            link_text  # Use the formatted link text here
        )

    # Print the table
    console.print(table)

# Perform searches with and without the freshness scoring profile
search_and_print_results()  # Vanilla query without any scoring profile
search_and_print_results("boostCategory")

When the boostCategory scoring profile is applied, the search results show a slight increase in relevance for articles categorized under TRAVEL, as demonstrated by the inclusion of "Airfares Higher, But New Planes Take Out Some Sting" in the results. However, articles in the WORLD NEWS category are still prominent, indicating that the boost is not overriding the default relevance completely. In contrast, the vanilla query without any scoring profile returns results based on default relevance, where articles under the TRAVEL category still appear but without the specific boost applied. This illustrates the impact of the category boosting scoring profile in subtly promoting articles in specific categories.

### Tag Boosting

In [107]:
def search_and_print_results(scoring_profile=None):
    query = "what are the hottest trends in the banking business industry"
    tags = "BUSINESS"  # Replace with your tags
    vector_query = VectorizableTextQuery(
        text=query, k_nearest_neighbors=50, fields="vector", exhaustive=True
    )

    # Prepare the search parameters
    search_params = {
        "search_text": None,  # passing in text query for hybrid search
        "vector_queries": [vector_query],
        "scoring_profile": scoring_profile,
        "top": 3,
    }

    # Conditionally add scoring_parameters if a scoring_profile is specified
    if scoring_profile:
        search_params["scoring_parameters"] = {"tags-BUSINESS": tags}

    results = search_client.search(**search_params)

    profile_name = (
        scoring_profile if scoring_profile else "Vanilla (No Scoring Profile)"
    )
    console.print(f"\nResults for {profile_name} Scoring Profile:", style="bold blue")

    # Create a table for the results
    table = Table(show_header=True, header_style="bold magenta")
    table.add_column("Headline", style="dim", width=20)
    table.add_column("Score")
    table.add_column("Description", width=40)
    table.add_column("Category")
    table.add_column("Date")
    table.add_column("Link")

    for result in results:
        # Format the link as a clickable hyperlink
        link_text = Text(result["link"], style="link")
        link_text.stylize(f"link {result['link']}")

        table.add_row(
            result["headline"],
            str(result["@search.score"]),
            result["short_description"],
            result["category"],
            result["date"],
            link_text,  # Use the formatted link text here
        )

    # Print the table
    console.print(table)


# Perform searches with and without the freshness scoring profile
search_and_print_results()  # Vanilla query without any scoring profile
search_and_print_results("boostByTag")

When the boostByTag scoring profile is applied, the search results prioritize articles that have tags matching the query, as demonstrated by the higher ranking of business-related articles. In contrast, the vanilla query without any scoring profile returns results based on default relevance, where articles might be relevant but lack the specific tag-based boosting. This illustrates the effectiveness of the tag boosting scoring profile in promoting content that is more aligned with the specific tags, ensuring that more relevant and contextually appropriate articles are surfaced at the top of the search results.

### Magnitude Boosting

In [143]:
from rich.console import Console
from rich.table import Table
from rich.text import Text

# Initialize a Rich console
console = Console()

def search_and_print_results(scoring_profile=None):
    query = "top business trends 2022"
    vector_query = VectorizableTextQuery(
        text=query, k_nearest_neighbors=50, fields="vector", exhaustive=True
    )
    results = search_client.search(
        search_text=None,
        vector_queries=[vector_query],
        scoring_profile=scoring_profile,
        top=3,
    )

    profile_name = scoring_profile if scoring_profile else 'Vanilla (No Scoring Profile)'
    console.print(f"\nResults for {profile_name} Scoring Profile:", style="bold blue")

    # Create a table for the results
    table = Table(show_header=True, header_style="bold magenta")
    table.add_column("Headline", style="dim", width=20)
    table.add_column("Score")
    table.add_column("Description", width=40)
    table.add_column("Category")
    table.add_column("Date")
    table.add_column("View Count")  
    table.add_column("Link")

    for result in results:
        # Format the link as a clickable hyperlink
        link_text = Text(result['link'], style="link")
        link_text.stylize(f"link {result['link']}")

        table.add_row(
            result['headline'],
            str(result['@search.score']),
            result['short_description'],
            result['category'],
            result['date'],
            str(result['view_count']),  
            link_text  
        )

    # Print the table
    console.print(table)

# Perform searches with and without the freshness scoring profile
search_and_print_results()  # Vanilla query without any scoring profile
search_and_print_results("boostViewCount")

When the boostViewCount scoring profile is applied, the search results prioritize articles with higher view counts, as demonstrated by the higher scores of articles like "My Top 10 Predictions for 2013" and "10 Food Trends to Watch," despite being less relevant or older. These results reflect the impact of magnitude boosting based on the view_count field, which promotes more popular content.

In contrast, the vanilla query without any scoring profile returns results based on default relevance. Articles like "Disrupt Yourself First: Top 10 Game Changing Tech Trends" and "7 Human Resources Trends Your Small Business Needs to Know" have much higher view counts and are ranked higher due to their inherent relevance and popularity.

This comparison highlights how magnitude boosting can significantly alter the ranking of search results by emphasizing the popularity of content, which can be useful for promoting highly-viewed articles.