# Using Document Boosting in Azure AI Search for Enhanced Retrieval Quality

This notebook demonstrates how to enhance retrieval quality using document boosting capabilities with vector/hybrid search in Azure AI Search.

## Setup
We begin by installing necessary libraries and downloading the dataset from Kaggle.

```python
# Install required packages
!pip install azure-identity
!pip install kaggle
!pip install python-dotenv
!pip install rich
!pip install azure-search-documents --pre


## Download Data from Kaggle

This dataset is sourced from [Rishabh Misra's publications](https://rishabhmisra.github.io/publications).

If you're using this dataset for your work, please cite the following articles:

**Citation in text format:**

1. Misra, Rishabh. "News Category Dataset." arXiv preprint arXiv:2209.11429 (2022).
2. Misra, Rishabh and Jigyasa Grover. "Sculpting Data for ML: The first act of Machine Learning." ISBN 9798585463570 (2021).

**Citation in BibTex format:**

```bibtex
@article{misra2022news,
  title={News Category Dataset},
  author={Misra, Rishabh},
  journal={arXiv preprint arXiv:2209.11429},
  year={2022}
}
@book{misra2021sculpting,
  author = {Misra, Rishabh and Grover, Jigyasa},
  year = {2021},
  month = {01},
  pages = {},
  title = {Sculpting Data for ML: The first act of Machine Learning},
  isbn = {9798585463570}
}

In [None]:
# Download the dataset
!kaggle datasets download -d rmisra/news-category-dataset

# Unzip the downloaded file
import pandas as pd
import zipfile

with zipfile.ZipFile('news-category-dataset.zip', 'r') as zip_ref:
    zip_ref.extractall()

# Load the dataset into a pandas DataFrame
df = pd.read_json('News_Category_Dataset_v3.json', lines=True)

# Display the first few rows of the DataFrame
df.head()

### Dataset Overview
The dataset contains news articles with their categories, authors, and links. Note, some articles do not have a headline or short_desription. We will not do any NULL replacements for the purpose of this demo. 


In [2]:
# Check the maximum length of characters in the headline and short_description fields
max_headline_length = df['headline'].str.len().max()
max_short_description_length = df['short_description'].str.len().max()

print(max_headline_length)
print(max_short_description_length)

320
1472


The maximum length of the headline is 320 characters and the short description is 1472 characters. We plan to use OpenAI `text-embedding-3-large` with a capacity of 8192 input tokens or ~32K characters of text.


The provided data preparation steps are essential for enhancing semantic search capabilities. By concatenating text fields for vectorization, ensuring unique document identifiers, standardizing date formats for freshness boosting, and generating random view counts for magnitude boosting, we significantly improve the search experience. These steps collectively optimize the dataset for more relevant and timely search results, leveraging both content relevance and document popularity.

In [4]:
import numpy as np

# Vectorize Headline and Short_Description
df['text_to_vectorize'] = df['headline'] + ' ' + df['short_description']

# Ensure the id field is a string or create it if it doesn't exist
if 'id' in df.columns:
    df["id"] = df["id"].astype(str)
else:
    df["id"] = df.index.astype(str)

# Convert the 'date' field to the correct format
df["date"] = pd.to_datetime(df["date"]).dt.strftime('%Y-%m-%dT%H:%M:%S.%f')[:-3] + 'Z'

# Add a random view_count for each article
df['view_count'] = np.random.randint(0, 100001, size=len(df), dtype=np.int32)

# Display the updated DataFrame
df.head()

Unnamed: 0,link,headline,category,short_description,authors,date,text_to_vectorize,id,view_count
0,https://www.huffpost.com/entry/covid-boosters-...,Over 4 Million Americans Roll Up Sleeves For O...,U.S. NEWS,Health experts said it is too early to predict...,"Carla K. Johnson, AP",2022-09-23T00:00:00.000000Z,Over 4 Million Americans Roll Up Sleeves For O...,0,87893
1,https://www.huffpost.com/entry/american-airlin...,"American Airlines Flyer Charged, Banned For Li...",U.S. NEWS,He was subdued by passengers and crew when he ...,Mary Papenfuss,2022-09-23T00:00:00.000000Z,"American Airlines Flyer Charged, Banned For Li...",1,34665
2,https://www.huffpost.com/entry/funniest-tweets...,23 Of The Funniest Tweets About Cats And Dogs ...,COMEDY,"""Until you have a dog you don't understand wha...",Elyse Wanshel,2022-09-23T00:00:00.000000Z,23 Of The Funniest Tweets About Cats And Dogs ...,2,3262
3,https://www.huffpost.com/entry/funniest-parent...,The Funniest Tweets From Parents This Week (Se...,PARENTING,"""Accidentally put grown-up toothpaste on my to...",Caroline Bologna,2022-09-23T00:00:00.000000Z,The Funniest Tweets From Parents This Week (Se...,3,86021
4,https://www.huffpost.com/entry/amy-cooper-lose...,Woman Who Called Cops On Black Bird-Watcher Lo...,U.S. NEWS,Amy Cooper accused investment firm Franklin Te...,Nina Golgowski,2022-09-22T00:00:00.000000Z,Woman Who Called Cops On Black Bird-Watcher Lo...,4,62263


## Generate Embeddings 

### Authenticate Azure OpenAI

In [114]:
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
import json
import os

# User-specified parameter
USE_AAD_FOR_AOAI = True

def authenticate_openai(api_key=None, use_aad_for_aoai=False):
    from azure.identity import get_bearer_token_provider
    from openai import AzureOpenAI

    if use_aad_for_aoai:
        print("Using AAD for authentication.")
        credential = DefaultAzureCredential()
        token_provider = get_bearer_token_provider(credential, "https://cognitiveservices.azure.com/.default")
        client = AzureOpenAI(
            azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
            api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
            azure_ad_token_provider=token_provider,
        )
    else:
        print("Using API keys for authentication.")
        if api_key is None:
            raise ValueError("API key must be provided if not using AAD for authentication.")
        client = AzureOpenAI(
            api_key=api_key,
            azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
            api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
        )
    return client

openai_client = authenticate_openai(api_key=os.getenv("AZURE_OPENAI_API_KEY"), use_aad_for_aoai=USE_AAD_FOR_AOAI)

Using AAD for authentication.


In [8]:
from tqdm import tqdm
from tenacity import retry, stop_after_attempt, wait_exponential
import json

@retry(stop=stop_after_attempt(5), wait=wait_exponential(multiplier=1, min=4, max=60))
def get_embeddings(openai_client, texts):
    response = openai_client.embeddings.create(
        input=texts,
        model=os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYED_MODEL_NAME")
    )
    response_json = json.loads(response.model_dump_json(indent=2))
    return [data['embedding'] for data in response_json['data']]

def add_embeddings_to_df(df, text_column, vector_column, batch_size=1000):
    embeddings = []
    for i in tqdm(range(0, len(df[text_column]), batch_size)):
        batch_texts = df[text_column][i:i+batch_size].tolist()
        batch_embeddings = get_embeddings(openai_client, batch_texts)
        embeddings.extend(batch_embeddings)
    df[vector_column] = embeddings
    return df

df_vectors = add_embeddings_to_df(df, "text_to_vectorize", "vector")
print(df_vectors.head())


100%|██████████| 210/210 [45:02<00:00, 12.87s/it]

                                                link  \
0  https://www.huffpost.com/entry/covid-boosters-...   
1  https://www.huffpost.com/entry/american-airlin...   
2  https://www.huffpost.com/entry/funniest-tweets...   
3  https://www.huffpost.com/entry/funniest-parent...   
4  https://www.huffpost.com/entry/amy-cooper-lose...   

                                            headline   category  \
0  Over 4 Million Americans Roll Up Sleeves For O...  U.S. NEWS   
1  American Airlines Flyer Charged, Banned For Li...  U.S. NEWS   
2  23 Of The Funniest Tweets About Cats And Dogs ...     COMEDY   
3  The Funniest Tweets From Parents This Week (Se...  PARENTING   
4  Woman Who Called Cops On Black Bird-Watcher Lo...  U.S. NEWS   

                                   short_description               authors  \
0  Health experts said it is too early to predict...  Carla K. Johnson, AP   
1  He was subdued by passengers and crew when he ...        Mary Papenfuss   
2  "Until you have a dog y




Let's drop the `text_to_vectorize` column from the data frame since we no longer need this since the vectors are already created from the concatenation of the `headline` and `short_description` fields.

In [None]:
df_vectors.drop(columns=['text_to_vectorize'], inplace=True)

## Create Azure AI Search Index
Next, we create an Azure AI Search index to upload our data with vector embeddings.


### Authenticate to Azure AI Search

In [37]:
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from azure.core.credentials import AzureKeyCredential  
import os

INDEX_NAME="news-category"

# User-specified parameter
USE_AAD_FOR_SEARCH = True  

def authenticate_azure_search(api_key=None, use_aad_for_search=False):
    if use_aad_for_search:
        print("Using AAD for authentication.")
        credential = DefaultAzureCredential()
    else:
        print("Using API keys for authentication.")
        if api_key is None:
            raise ValueError("API key must be provided if not using AAD for authentication.")
        credential = AzureKeyCredential(api_key)
    return credential

azure_search_credential = authenticate_azure_search(api_key=os.getenv("AZURE_SEARCH_ADMIN_KEY"), use_aad_for_search=USE_AAD_FOR_SEARCH)


Using AAD for authentication.


In [38]:
import os
from azure.identity import DefaultAzureCredential
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    AzureOpenAIModelName,
    AzureOpenAIParameters,
    AzureOpenAIVectorizer,
    FreshnessScoringFunction,
    FreshnessScoringParameters,
    HnswAlgorithmConfiguration,
    HnswParameters,
    MagnitudeScoringFunction,
    MagnitudeScoringParameters,
    ScoringFunctionInterpolation,
    ScoringProfile,
    SearchField,
    SearchFieldDataType,
    SearchIndex,
    SearchableField,
    SimpleField,
    TagScoringFunction,
    TagScoringParameters,
    TextWeights,
    VectorSearch,
    VectorSearchAlgorithmKind,
    VectorSearchAlgorithmMetric,
    VectorSearchProfile,
)

# Initialize the SearchIndexClient
index_client = SearchIndexClient(
    endpoint=os.getenv("AZURE_SEARCH_SERVICE_ENDPOINT"),
    credential=DefaultAzureCredential(),
)

# Define the fields
fields = [
    SimpleField(name="id", type=SearchFieldDataType.String, key=True),
    SimpleField(name="link", type=SearchFieldDataType.String),
    SearchableField(name="headline", type=SearchFieldDataType.String),
    SearchableField(
        name="category",
        type=SearchFieldDataType.String,
        filterable=True,
        facetable=True,
    ),
    SearchableField(
        name="short_description",
        type=SearchFieldDataType.String,
    ),
    SearchableField(name="authors", type=SearchFieldDataType.String),
    SearchField(
        name="date",
        type=SearchFieldDataType.DateTimeOffset,
        filterable=True,
        sortable=True,
    ),
    SimpleField(
        name="view_count",
        type=SearchFieldDataType.Int32,
        filterable=True,
        sortable=True,
    ),
    SearchField(
        name="vector",
        type="Collection(Edm.Single)",
        vector_search_dimensions=3072,
        vector_search_profile_name="my-vector-config",
    ),
]

# Define the vector search
vector_search = VectorSearch(
    profiles=[
        VectorSearchProfile(
            name="my-vector-config",
            algorithm_configuration_name="my-hnsw",
            vectorizer="my-vectorizer",
        )
    ],
    algorithms=[
        HnswAlgorithmConfiguration(
            name="my-hnsw",
            kind=VectorSearchAlgorithmKind.HNSW,
            parameters=HnswParameters(metric=VectorSearchAlgorithmMetric.COSINE),
        )
    ],
    vectorizers=[
        AzureOpenAIVectorizer(
            name="my-vectorizer",
            azure_open_ai_parameters=AzureOpenAIParameters(
                resource_uri=os.getenv("AZURE_OPENAI_ENDPOINT"),
                deployment_id=os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYED_MODEL_NAME"),
                model_name=AzureOpenAIModelName.TEXT_EMBEDDING3_LARGE,
            ),
        )
    ],
)

# Define scoring profiles
scoring_profiles = [
    ScoringProfile(
        name="boostCategory",
        text_weights=TextWeights(
            weights={
                "category": 10.0,
            }
        ),
    ),
    ScoringProfile(
        name="boostRecency",
        functions=[
            FreshnessScoringFunction(
                field_name="date",
                boost=10.0,
                parameters=FreshnessScoringParameters(
                    boosting_duration="P1095D",
                ),
                interpolation=ScoringFunctionInterpolation.LINEAR,
            )
        ],
    ),
    ScoringProfile(
        name="boostByTag",
        functions=[
            TagScoringFunction(
                field_name="category",
                boost=10.0,
                parameters=TagScoringParameters(
                    tags_parameter="tags",
                ),
            )
        ],
    ),
    ScoringProfile(
        name="boostViewCount",
        functions=[
            MagnitudeScoringFunction(
                field_name="view_count",
                boost=10.0,
                parameters=MagnitudeScoringParameters(
                    boosting_range_start=0,
                    boosting_range_end=10000,
                ),
                interpolation=ScoringFunctionInterpolation.LINEAR,
            )
        ],
    ),
]

# Define the index
index = SearchIndex(
    name=INDEX_NAME,
    fields=fields,
    scoring_profiles=scoring_profiles,
    vector_search=vector_search,
)

# Create or update the index
result = index_client.create_or_update_index(index)
print(f"{result.name} created")

news-category created


##  Upload Documents
Convert the pandas DataFrame to a list of dictionaries and upload to the Azure AI Search index.


In [13]:
# Convert the DataFrame to a list of dictionaries
documents = df_vectors.to_dict(orient="records")

To optimize performance of uploading, we will batch upload in increments of 1000 documents. 

In [None]:
import os
from azure.search.documents import SearchClient
from azure.core.credentials import AzureKeyCredential
from azure.core.exceptions import HttpResponseError

# Check if documents are loaded
if not documents:
    print("No documents found to upload.")
else:
    print(f"Loaded {len(documents)} documents to upload.")

def create_search_client(index_name):
    return SearchClient(
        endpoint=os.getenv("AZURE_SEARCH_SERVICE_ENDPOINT"),
        index_name=index_name,
        credential=azure_search_credential
    )

def upload_documents_to_index(client, documents):
    BATCH_SIZE = 250
    for start_idx in range(0, len(documents), BATCH_SIZE):
        end_idx = start_idx + BATCH_SIZE
        documents_to_upload = documents[start_idx:end_idx]
        try:
            client.merge_or_upload_documents(documents=documents_to_upload)
            print(f"Uploaded documents {start_idx} to {end_idx}")
        except HttpResponseError as e:
            print(f"Failed to upload documents {start_idx} to {end_idx}: {e}")

# Create the search client
search_client = create_search_client(INDEX_NAME)

# Upload documents to the index
upload_documents_to_index(search_client, documents)

## Evaluate Retreival Quality with different document boosting techniques

### Freshness Boosting
Apply a scoring profile to prioritize newer articles.

In [34]:
# Initialize Search Client to query the index
def create_search_client(index_name):
    return SearchClient(
        endpoint=os.getenv("AZURE_SEARCH_SERVICE_ENDPOINT"),
        index_name=index_name,
        credential=azure_search_credential
    )
search_client = create_search_client(INDEX_NAME)

In [32]:
from rich.console import Console
from rich.table import Table
from rich.text import Text
from azure.search.documents.models import VectorizableTextQuery

# Initialize a Rich console
console = Console()

def search_and_print_results(scoring_profile=None):
    query = "latest news on airlines"
    vector_query = VectorizableTextQuery(
        text=query, k_nearest_neighbors=50, fields="vector", exhaustive=True
    )
    results = search_client.search(
        search_text=None,
        vector_queries=[vector_query],
        scoring_profile=scoring_profile,
        top=3,
    )

    profile_name = scoring_profile if scoring_profile else 'Vanilla (No Scoring Profile)'
    console.print(f"\nResults for {profile_name} Scoring Profile:", style="bold blue")

    # Create a table for the results
    table = Table(show_header=True, header_style="bold magenta")
    table.add_column("Headline", style="dim", width=20)
    table.add_column("Score")
    table.add_column("Description", width=40)
    table.add_column("Date", width=15)
    table.add_column("Link")

    for result in results:
        # Format the link as a clickable hyperlink
        link_text = Text(result['link'], style="link")
        link_text.stylize(f"link {result['link']}")

        table.add_row(
            result['headline'],
            str(result['@search.score']),
            result['short_description'],
            result['date'],
            link_text  # Use the formatted link text here
        )

    # Print the table
    console.print(table)

# Perform searches with and without the freshness scoring profile
search_and_print_results()  # Vanilla query without any scoring profile
search_and_print_results("boostRecency")


When the `boostRecency` scoring profile is applied, it sets the time delta to `P1095D`, which boosts articles published within the past 1095 days (approximately 3 years) from the current date. As a result, the search prioritizes newer articles. For example, the Alaska Airlines article dated April 1, 2022, ranks higher. In contrast, a vanilla query without any scoring profile ranks results based on default relevance, where older articles from 2013 and 2012 are ranked higher. This demonstrates the effectiveness of the freshness scoring profile in promoting more recent content.
#### Use Cases for Freshness Boosting

- **News Websites:** Ensures that the latest news articles are prioritized, keeping users informed with up-to-date information.
- **E-commerce:** Highlights the newest products, helping customers find the latest arrivals and trends.
- **Social Media Platforms:** Promotes the most recent posts and updates, enhancing user engagement with current content.
- **Job Portals:** Displays the latest job postings, providing job seekers with the most recent opportunities.
- **Event Listings:** Prioritizes upcoming events, making it easier for users to find and attend future activities.
- **Blogs and Content Platforms:** Surfaces the most recent blog posts and articles, keeping content fresh and engaging for readers.
- **Customer Support:** Prioritizes the latest support documents and FAQs, ensuring users have access to the most current solutions and information.

#### Why Customers Might Use Freshness Boosting

- **Enhanced User Experience:** Users receive the most current and relevant information, improving satisfaction and engagement.
- **Timely Information:** Ensures that time-sensitive information is readily available, crucial for news and event-related queries.
- **Competitive Advantage:** Keeps the content dynamic and updated, which can be a key differentiator in competitive markets.
- **Increased Engagement:** Fresh content is more likely to attract and retain users, driving higher interaction rates.
- **Improved Relevance:** Aligns search results with user intent, especially for queries seeking the latest updates and trends.


### Category Boosting
Perform a hybrid search to apply category boosts.

In [31]:
def search_and_print_results(scoring_profile=None):
    query = "Entertainment Industry Trends"
    vector_query = VectorizableTextQuery(
        text=query, k_nearest_neighbors=50, fields="vector", exhaustive=True
    )
    results = search_client.search(
        search_text=query, # passing in text query for hybrid search
        vector_queries=[vector_query],
        scoring_profile=scoring_profile,
        top=3,
    )

    profile_name = scoring_profile if scoring_profile else 'Vanilla (No Scoring Profile)'
    console.print(f"\nResults for {profile_name} Scoring Profile:", style="bold blue")

    # Create a table for the results
    table = Table(show_header=True, header_style="bold magenta")
    table.add_column("Headline", style="dim", width=20)
    table.add_column("Score")
    table.add_column("Description", width=40)
    table.add_column("Category")
    table.add_column("Date")
    table.add_column("Link")

    for result in results:
        # Format the link as a clickable hyperlink
        link_text = Text(result['link'], style="link")
        link_text.stylize(f"link {result['link']}")

        table.add_row(
            result['headline'],
            str(result['@search.score']),
            result['short_description'],
            result['category'],
            result['date'],
            link_text  # Use the formatted link text here
        )

    # Print the table
    console.print(table)

# Perform searches with and without the freshness scoring profile
search_and_print_results()  # Vanilla query without any scoring profile
search_and_print_results("boostCategory")

When the `boostCategory` scoring profile is applied, the search results prioritize articles within the specified category, as demonstrated by the higher ranking of articles categorized under ENTERTAINMENT. For example, the top three results include articles like "Hollywood & Vine: The Entertainment Industry Seeks The Future In Viral Video" and "Is Music Dead? (Thoughts on the Music Industry After SXSW 2015)", which are directly related to the entertainment industry. In contrast, a vanilla query without any scoring profile returns results based on default relevance, where articles from unrelated categories like WEDDINGS also appear in the top results. This illustrates the effectiveness of the category boosting scoring profile in promoting content that is more relevant to the specified category.

#### Use Cases for Category Boosting

- **E-commerce Sites:** Highlight products within a specific category, helping customers find items relevant to their search more easily.
- **News Websites:** Prioritize articles within the user's preferred news categories, such as sports, politics, or entertainment.
- **Content Platforms:** Promote articles, blogs, or videos in a specific category, improving user engagement by showing content aligned with their interests.
- **Library and Database Searches:** Enhance the relevance of search results by prioritizing documents, books, or papers within a specified academic or professional field.
- **Streaming Services:** Surface movies, shows, or music within a specific genre or category, providing users with more tailored recommendations.
- **Customer Support:** Prioritize support articles and FAQs related to a specific product or issue, helping users find relevant solutions faster.

#### Why Customers Might Use Category Boosting

- **Improved User Experience:** Users receive more relevant search results, improving satisfaction and engagement.
- **Focused Content Delivery:** Ensures that users see content related to their interests or needs, enhancing relevance and usability.
- **Enhanced Discoverability:** Makes it easier for users to discover content within their areas of interest, potentially increasing time spent on the platform.
- **Targeted Marketing:** Allows businesses to highlight specific categories of products or services, driving targeted marketing efforts.
- **Efficient Information Retrieval:** Helps users quickly find relevant information within large datasets, improving efficiency and satisfaction.
- **Contextual Relevance:** Aligns search results with the user's context and preferences, ensuring a more personalized experience.


### Tag Boosting
Apply tag-based boosting to promote content aligned with specific tags.

In [107]:
def search_and_print_results(scoring_profile=None):
    query = "what are the hottest trends in the banking business industry"
    tags = "BUSINESS"  
    vector_query = VectorizableTextQuery(
        text=query, k_nearest_neighbors=50, fields="vector", exhaustive=True
    )

    # Prepare the search parameters
    search_params = {
        "search_text": None,  
        "vector_queries": [vector_query],
        "scoring_profile": scoring_profile,
        "top": 3,
    }

    # Conditionally add scoring_parameters if a scoring_profile is specified
    if scoring_profile:
        search_params["scoring_parameters"] = {"tags-BUSINESS": tags}

    results = search_client.search(**search_params)

    profile_name = (
        scoring_profile if scoring_profile else "Vanilla (No Scoring Profile)"
    )
    console.print(f"\nResults for {profile_name} Scoring Profile:", style="bold blue")

    # Create a table for the results
    table = Table(show_header=True, header_style="bold magenta")
    table.add_column("Headline", style="dim", width=20)
    table.add_column("Score")
    table.add_column("Description", width=40)
    table.add_column("Category")
    table.add_column("Date")
    table.add_column("Link")

    for result in results:
        # Format the link as a clickable hyperlink
        link_text = Text(result["link"], style="link")
        link_text.stylize(f"link {result['link']}")

        table.add_row(
            result["headline"],
            str(result["@search.score"]),
            result["short_description"],
            result["category"],
            result["date"],
            link_text,  # Use the formatted link text here
        )

    # Print the table
    console.print(table)


# Perform searches with and without the freshness scoring profile
search_and_print_results()  # Vanilla query without any scoring profile
search_and_print_results("boostByTag")

When the `boostByTag` scoring profile is applied with the tag "BUSINESS," the search results prioritize articles related to the specified tag. For example, articles such as "What's the Future of Retail Banking?" and "Banking Saves Health Care," which are directly related to the business category, rank higher. In contrast, a vanilla query without any scoring profile returns results based on default relevance, where articles from other categories like MONEY also appear in the top results. This demonstrates the effectiveness of the tag boosting scoring profile in promoting content that is more relevant to the specified tag.

#### Use Cases for Tag Boosting

- **E-commerce Sites:** Highlight products tagged with specific attributes, helping customers find items with desired features or characteristics.
- **News Websites:** Prioritize articles within specific tags such as economy, politics, or technology, improving user engagement by showing relevant content.
- **Content Platforms:** Promote content tagged with specific topics or keywords, ensuring users see articles, blogs, or videos that match their interests.
- **Knowledge Bases:** Enhance search results by prioritizing documents tagged with relevant topics, helping users find the most pertinent information quickly.
- **Streaming Services:** Surface media content tagged with specific genres or themes, providing users with tailored recommendations.
- **Marketing Campaigns:** Focus on promoting content tagged with campaign-specific keywords, ensuring targeted marketing efforts are more effective.

#### Why Customers Might Use Tag Boosting

- **Improved User Experience:** Users receive more relevant search results based on specific tags, improving satisfaction and engagement.
- **Focused Content Delivery:** Ensures that users see content related to their interests or needs, enhancing relevance and usability.
- **Enhanced Discoverability:** Makes it easier for users to discover content within their areas of interest, potentially increasing time spent on the platform.
- **Targeted Marketing:** Allows businesses to highlight specific categories of products or services, driving targeted marketing efforts.
- **Efficient Information Retrieval:** Helps users quickly find relevant information within large datasets, improving efficiency and satisfaction.
- **Contextual Relevance:** Aligns search results with the user's context and preferences, ensuring a more personalized experience.


### Magnitude Boosting
Apply magnitude boosting based on the view count field to promote popular content.


In [35]:
def search_and_print_results(scoring_profile=None):
    query = "Entertainment Industry Trends"
    vector_query = VectorizableTextQuery(
        text=query, k_nearest_neighbors=50, fields="vector", exhaustive=True
    )
    results = search_client.search(
        search_text=None,
        vector_queries=[vector_query],
        scoring_profile=scoring_profile,
        top=3,
    )

    profile_name = scoring_profile if scoring_profile else 'Vanilla (No Scoring Profile)'
    console.print(f"\nResults for {profile_name} Scoring Profile:", style="bold blue")

    # Create a table for the results
    table = Table(show_header=True, header_style="bold magenta")
    table.add_column("Headline", style="dim", width=20)
    table.add_column("Score")
    table.add_column("Category")
    table.add_column("Date")
    table.add_column("View Count")  
    table.add_column("Link")

    for result in results:
        # Format the link as a clickable hyperlink
        link_text = Text(result['link'], style="link")
        link_text.stylize(f"link {result['link']}")

        table.add_row(
            result['headline'],
            str(result['@search.score']),
            result['category'],
            result['date'],
            str(result['view_count']),  
            link_text  
        )

    # Print the table
    console.print(table)

# Perform searches with and without the freshness scoring profile
search_and_print_results()  # Vanilla query without any scoring profile
search_and_print_results("boostViewCount")

When the `boostViewCount` scoring profile is applied, the search results prioritize articles with higher view counts, even if they are not the most topically relevant or recent. For example, articles such as "Millennials & The Music Business: Inverting the Hierarchy" and "The Biggest Food Trends Of 2015" with significant views rank higher despite not being directly related to entertainment industry trends. In contrast, a vanilla query without any scoring profile returns results based on default relevance, where more topically relevant articles like "Hollywood & Vine: The Entertainment Industry Seeks The Future In Viral Video" and "5 Entertainment Events We Want To See Happen In 2015" are ranked higher regardless of their view counts. This illustrates the effectiveness of the magnitude boosting scoring profile in promoting more popular content based on view count.

#### Use Cases for Magnitude Boosting

- **Content Platforms:** Highlight the most viewed articles, videos, or posts, ensuring popular content is prominently displayed.
- **E-commerce:** Promote best-selling products, helping customers find popular items that others have purchased.
- **Social Media:** Surface posts with the highest engagement, increasing visibility for popular content.
- **News Websites:** Prioritize widely-read news articles, ensuring significant stories are highlighted.
- **Educational Platforms:** Feature courses or tutorials with the most enrollments, guiding users to popular learning resources.
- **Review Sites:** Emphasize products or services with the most reviews, helping users make informed decisions based on popular opinion.

#### Why Customers Might Use Magnitude Boosting

- **Increased Engagement:** Promoting popular content can drive higher user engagement as users tend to trust and follow the crowd.
- **Enhanced Discoverability:** Helps users find content that is widely recognized and appreciated, improving user satisfaction.
- **Social Proof:** Showcasing high-view content leverages social proof, influencing new users to explore and trust the platform.
- **Prioritizing Proven Content:** Ensures that well-received and validated content is easily accessible, enhancing content quality perception.
- **Driving Traffic:** Popular items or articles are likely to attract more traffic, leading to increased views and potential conversions.
- **Boosting Sales:** Highlighting best-selling products can encourage more purchases, leveraging the popularity of items to drive sales.
