# Vector search in Python (Azure AI Search)

This code demonstrates how to use Azure AI Search by using the push API to insert vectors into your search index:

+ Create an index schema
+ Load the sample data from a local folder
+ Embed the documents in-memory using Azure OpenAI's text-embedding-3-large model
+ Index the vector and nonvector fields on Azure AI Search
+ Run a series of vector and hybrid queries, including metadata filtering and hybrid (text + vectors) search. 

The code uses Azure OpenAI to generate embeddings for title and content fields. You'll need access to Azure OpenAI to run this demo.

The code reads the `articles_1000.json` file, which contains the input data for which embeddings need to be generated.

The output is a combination of human-readable text and embeddings that can be pushed into a search index.

## Prerequisites

+ An Azure subscription, with [access to Azure OpenAI](https://aka.ms/oai/access). You must have the Azure OpenAI service name and an API key.

+ A deployment of the text-embedding-3-large embedding model.

+ Azure AI Search, any tier, but choose a service that has sufficient capacity for your vector index. We recommend Basic or higher. [Enable semantic ranking](https://learn.microsoft.com/azure/search/semantic-how-to-enable-disable) if you want to run the hybrid query with semantic ranking.

We used Python 3.11, [Visual Studio Code with the Python extension](https://code.visualstudio.com/docs/python/python-tutorial), and the [Jupyter extension](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter) to test this example.

### Set up a Python virtual environment in Visual Studio Code

1. Open the Command Palette (Ctrl+Shift+P).
1. Search for **Python: Create Environment**.
1. Select **Venv**.
1. Select a Python interpreter. Choose 3.10 or later.

It can take a minute to set up. If you run into problems, see [Python environments in VS Code](https://code.visualstudio.com/docs/python/environments).

### Install packages

In [2]:
! pip install -r azure-search-vector-python-sample-requirements.txt --quiet

In [3]:
!pip install openai pandas tqdm 



## Import required libraries and environment variables

In [4]:
from dotenv import load_dotenv
from azure.identity import DefaultAzureCredential
from azure.core.credentials import AzureKeyCredential
import os

load_dotenv(override=True) # take environment variables from .env.

# The following variables from your .env file are used in this notebook
endpoint = os.environ["AZURE_SEARCH_SERVICE_ENDPOINT"]
credential = AzureKeyCredential(os.getenv("AZURE_SEARCH_ADMIN_KEY", "")) if len(os.getenv("AZURE_SEARCH_ADMIN_KEY", "")) > 0 else DefaultAzureCredential()
index_name = os.getenv("AZURE_SEARCH_INDEX", "vectest")
azure_openai_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
azure_openai_key = os.getenv("AZURE_OPENAI_KEY", "") if len(os.getenv("AZURE_OPENAI_KEY", "")) > 0 else None
azure_openai_embedding_deployment = os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT", "text-embedding-3-large")
azure_openai_embedding_dimensions = int(os.getenv("AZURE_OPENAI_EMBEDDING_DIMENSIONS", 1024))
embedding_model_name = os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT", "text-embedding-3-large")
azure_openai_api_version = os.getenv("AZURE_OPENAI_API_VERSION", "2024-10-21")

## Data Preparation 

In [None]:
# Load dataset as csv file 
df = pd.read_csv(r'C:\Users\lananoor\OneDrive - Microsoft\AI Agents\FilteredQueryAgent\ingestion\medium_data.csv')


In [None]:
#View data set 
df.head()

Unnamed: 0,id,url,title,subtitle,claps,responses,reading_time,publication,date
0,1,https://medium.datadriveninvestor.com/is-fasta...,Is FastAPI going to replace Django?,,226,4,4,Data Driven Investor,2020-05-24
1,2,https://medium.datadriveninvestor.com/whats-th...,What’s the Best Way to Buy a Reliable Luxury Car?,The full cost of ownership makes buying brand-...,186,3,10,Data Driven Investor,2020-05-24
2,3,https://medium.datadriveninvestor.com/credit-r...,Credit Risk Assessment,,76,0,7,Data Driven Investor,2020-05-24
3,4,https://medium.datadriveninvestor.com/cash-is-...,Cash is Trash or Cash is King? What´s it gonna...,,139,0,6,Data Driven Investor,2020-05-24
4,5,https://medium.datadriveninvestor.com/how-to-b...,How to be Flipin’ awesome for your fans,Flipboard magazines capitalize on marketing vi...,34,0,5,Data Driven Investor,2020-05-24


In [4]:
len(df)

11642

In [None]:
# Filter out any rows with null values
df_no_nulls = df.dropna()

# Select the first 1000 rows
df_1000 = df_no_nulls.head(1000)

In [7]:
df_1000.head()

Unnamed: 0,id,url,title,subtitle,claps,responses,reading_time,publication,date
1,2,https://medium.datadriveninvestor.com/whats-th...,What’s the Best Way to Buy a Reliable Luxury Car?,The full cost of ownership makes buying brand-...,186,3,10,Data Driven Investor,2020-05-24
4,5,https://medium.datadriveninvestor.com/how-to-b...,How to be Flipin’ awesome for your fans,Flipboard magazines capitalize on marketing vi...,34,0,5,Data Driven Investor,2020-05-24
6,7,https://medium.datadriveninvestor.com/biometri...,Biometrics for Authentication Security and Pri...,"Biometrics is a growing field, and…",172,0,5,Data Driven Investor,2020-05-24
10,11,https://medium.datadriveninvestor.com/what-mak...,What Makes Travel So Thrilling?,Photo by Rosa Diaz,93,0,3,Data Driven Investor,2020-05-24
11,12,https://medium.datadriveninvestor.com/lessons-...,Lessons Learned From Doing All The Productive ...,As advised by Google,87,0,5,Data Driven Investor,2020-05-24


### Generate new Content field using Azure OpenAI 

In [None]:
import os
from openai import AzureOpenAI

# Set up env variables 
endpoint = "https://.openai.azure.com/"
model_name = "gpt-4o"
deployment = "gpt-4o"

subscription_key = ""
api_version = "2024-12-01-preview"

client = AzureOpenAI(
    api_version=api_version,
    azure_endpoint=endpoint,
    api_key=subscription_key,
)

response = client.chat.completions.create(
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant.",
        },
        {
            "role": "user",
            "content": "I am going to Paris, what should I see?",
        }
    ],
    max_tokens=4096,
    temperature=1.0,
    top_p=1.0,
    model=deployment
)

print(response.choices[0].message.content)

Paris is a city filled with iconic landmarks, world-class museums, charming neighborhoods, and delicious food. Here’s a list of must-see attractions and experiences to make the most of your trip:

### **Iconic Landmarks**
1. **Eiffel Tower**  
   Visit the most recognizable landmark in Paris. You can admire it from the Trocadéro Gardens, picnic on the Champ de Mars, or go up to the top for stunning views of the city.

2. **Louvre Museum**  
   Home to masterpieces like the Mona Lisa and the Venus de Milo, the Louvre is a must-see for art lovers. Even if you don’t go inside, the glass pyramid is a sight to enjoy.

3. **Notre-Dame Cathedral**  
   Despite ongoing restoration after the 2019 fire, Notre-Dame remains a Gothic architectural masterpiece. Stroll around its exterior and explore Île de la Cité.

4. **Sacré-Cœur Basilica**  
   Located at the highest point in Paris, Montmartre, this white basilica offers incredible views of the city.

5. **Arc de Triomphe and Champs-Élysées**  
 

In [51]:
# Generate summary for content field 
def generate_summary(title, subtitle):
    prompt = (
        f"Given the following article information, generate a concise summary (about 200 words) of what the article is about. "
        f"Base your summary ONLY on the title and subtitle. Do not add any extra information.\n\n"
        f"Title: {title}\nSubtitle: {subtitle}\n\nSummary:"
    )
    try:
        response = client.chat.completions.create(
            model=deployment,  # for Azure, use "engine"
            messages=[
                {"role": "system", "content": "You are a helpful assistant that summarizes articles based only on their title and subtitle."},
                {"role": "user", "content": prompt}
            ],
            max_tokens=400,  # adjust if needed
            temperature=0.5,
        )
        return response.choices[0].message.content.strip()  # <-- Corrected here
    except Exception as e:
        print(f"Error: {e}")
        return ""

# Loop through the DataFrame and generate summaries
contents = []
for idx, row in tqdm(df_1000.iterrows(), total=len(df_1000)):
    title = row['title']
    subtitle = row['subtitle']
    summary = generate_summary(title, subtitle)
    contents.append(summary)

# Add the new column
df_1000['content'] = contents

# Preview
df_1000.head()


 12%|█▏        | 119/1000 [03:00<28:23,  1.93s/it]

Error: Error code: 400 - {'error': {'message': "The response was filtered due to the prompt triggering Azure OpenAI's content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", 'type': None, 'param': 'prompt', 'code': 'content_filter', 'status': 400, 'innererror': {'code': 'ResponsibleAIPolicyViolation', 'content_filter_result': {'hate': {'filtered': False, 'severity': 'safe'}, 'jailbreak': {'filtered': False, 'detected': False}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': True, 'severity': 'medium'}, 'violence': {'filtered': False, 'severity': 'safe'}}}}}


100%|██████████| 1000/1000 [24:45<00:00,  1.49s/it]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_1000['content'] = contents


Unnamed: 0,id,url,title,subtitle,claps,responses,reading_time,publication,date,content
1,2,https://medium.datadriveninvestor.com/whats-th...,What’s the Best Way to Buy a Reliable Luxury Car?,The full cost of ownership makes buying brand-...,186,3,10,Data Driven Investor,2020-05-24,The article explores the optimal approach to p...
4,5,https://medium.datadriveninvestor.com/how-to-b...,How to be Flipin’ awesome for your fans,Flipboard magazines capitalize on marketing vi...,34,0,5,Data Driven Investor,2020-05-24,The article explores strategies for creating e...
6,7,https://medium.datadriveninvestor.com/biometri...,Biometrics for Authentication Security and Pri...,"Biometrics is a growing field, and…",172,0,5,Data Driven Investor,2020-05-24,The article explores the use of biometrics as ...
10,11,https://medium.datadriveninvestor.com/what-mak...,What Makes Travel So Thrilling?,Photo by Rosa Diaz,93,0,3,Data Driven Investor,2020-05-24,"The article, titled ""What Makes Travel So Thri..."
11,12,https://medium.datadriveninvestor.com/lessons-...,Lessons Learned From Doing All The Productive ...,As advised by Google,87,0,5,Data Driven Investor,2020-05-24,The article explores insights gained from impl...


In [52]:
# Count the number of null values in the 'content' column
null_count = df_1000['content'].isnull().sum()
print(f"Number of null values in 'content': {null_count}")


Number of null values in 'content': 0


### Convert CSV to JSON fields 

In [57]:
# Convert DataFrame to a list of dicts and write as JSON file
df_1000.to_json("df_seperate_doc.json", orient='records', lines=True)


In [58]:
df_1000.to_json("df_1000.json", orient='records', lines=False)

In [59]:
import json

# Convert DataFrame to a list of dictionaries (one dict per row)
records = df_1000.to_dict(orient='records')

# Save as a JSON array (list of objects, one per row)
with open("articles_1000.json", "w", encoding="utf-8") as f:
    json.dump(records, f, ensure_ascii=False, indent=2)


In [60]:
# Check Data Type 
print(df_1000.dtypes)


id               int64
url             object
title           object
subtitle        object
claps            int64
responses        int64
reading_time     int64
publication     object
date            object
content         object
dtype: object


### Add empty embedding field 

## Create embeddings
Read your data, generate OpenAI embeddings and export to a format to insert your Azure AI Search index:

In [None]:
import os
import json
from openai import AzureOpenAI

# Azure OpenAI config
azure_openai_endpoint = "https://.openai.azure.com/"
azure_openai_key = ""      
azure_openai_embedding_deployment = "text-embedding-3-large"
azure_openai_api_version = "2023-05-15"  
embedding_model_name = azure_openai_embedding_deployment
azure_openai_embedding_dimensions = 3072 

client_embeddings = AzureOpenAI(
    azure_deployment=azure_openai_embedding_deployment,
    api_version=azure_openai_api_version,
    azure_endpoint=azure_openai_endpoint,
    api_key=azure_openai_key
)

# Load your articles JSON (articles_1000.json)
input_path = 'articles_1000.json'
with open(input_path, 'r', encoding='utf-8') as file:
    input_data = json.load(file)

# --- Prepare texts for embedding ---
titles_text = [
    (item.get('title', '') + '. ' + item.get('subtitle', '')).strip()
    for item in input_data
]
content_text = [item.get('content', '') for item in input_data]

# Optional: Replace empty strings with something (if you want to avoid blank embeddings)
titles_text = [txt if txt.strip() else "empty" for txt in titles_text]
content_text = [txt if txt.strip() else "empty" for txt in content_text]

# --- Batch embedding function ---
def batch(iterable, batch_size=16):
    """Yield successive batch_size-sized chunks from iterable."""
    for i in range(0, len(iterable), batch_size):
        yield iterable[i:i + batch_size]

# --- For titles_vector ---
titles_embeddings = []
for chunk in batch(titles_text, 16):
    response = client_embeddings.embeddings.create(
        input=chunk,
        model=embedding_model_name,
        dimensions=azure_openai_embedding_dimensions
    )
    titles_embeddings.extend([item.embedding for item in response.data])

# --- For content_vector ---
content_embeddings = []
for chunk in batch(content_text, 16):
    response = client_embeddings.embeddings.create(
        input=chunk,
        model=embedding_model_name,
        dimensions=azure_openai_embedding_dimensions
    )
    content_embeddings.extend([item.embedding for item in response.data])

# Assign embeddings to new fields in your documents
for i, item in enumerate(input_data):
    item['titlesVector'] = titles_embeddings[i]
    item['contentVector'] = content_embeddings[i]

# Output with new fields
output_path = os.path.join('output', 'articles_final.json')
os.makedirs(os.path.dirname(output_path), exist_ok=True)
with open(output_path, 'w', encoding='utf-8') as f:
    json.dump(input_data, f, ensure_ascii=False, indent=2)


### Clean Data Types in JSON 

In [11]:
# Check data type of fields 
from collections import defaultdict

types_per_field = defaultdict(set)

for doc in data:
    for k, v in doc.items():
        types_per_field[k].add(type(v))

for k, types in types_per_field.items():
    print(f"{k}: {types}")



id: {<class 'int'>}
url: {<class 'str'>}
title: {<class 'str'>}
subtitle: {<class 'str'>}
claps: {<class 'int'>}
responses: {<class 'int'>}
reading_time: {<class 'int'>}
publication: {<class 'str'>}
date: {<class 'str'>}
content: {<class 'str'>}
titlesVector: {<class 'list'>}
contentVector: {<class 'list'>}


In [18]:
# Convert id field from into to string 
import json

# Load your JSON
with open('output/articles_final.json', encoding='utf-8') as f:
    data = json.load(f)

# Convert all 'id' fields to string
for item in data:
    item['id'] = str(item['id'])

# Optionally, save it back
with open('output/articles_final.json', 'w', encoding='utf-8') as f:
    json.dump(data, f, ensure_ascii=False, indent=2)

print("All IDs converted to strings.")


All IDs converted to strings.


In [16]:
# Make sure date field is in the right format for DateTimeOffset in index 

with open('output/articles_final.json', encoding='utf-8') as f:
    data = json.load(f)
for item in data[:5]:
    print(item['date'], type(item['date']))


2020-05-24 <class 'str'>
2020-05-24 <class 'str'>
2020-05-24 <class 'str'>
2020-05-24 <class 'str'>
2020-05-24 <class 'str'>


In [19]:
# Final - Check data type of fields 
from collections import defaultdict

types_per_field = defaultdict(set)

for doc in data:
    for k, v in doc.items():
        types_per_field[k].add(type(v))

for k, types in types_per_field.items():
    print(f"{k}: {types}")



id: {<class 'str'>}
url: {<class 'str'>}
title: {<class 'str'>}
subtitle: {<class 'str'>}
claps: {<class 'int'>}
responses: {<class 'int'>}
reading_time: {<class 'int'>}
publication: {<class 'str'>}
date: {<class 'str'>}
content: {<class 'str'>}
titlesVector: {<class 'list'>}
contentVector: {<class 'list'>}


## Create your search index

Create your search index schema and vector search configuration. If you get an error, check the search service for available quota and check the .env file to make sure you're using a unique search index name.

In [None]:
endpoint = "https://.search.windows.net"
api_key = ""
credential = AzureKeyCredential(api_key)
index_name = "medium-articles-date-index" # choose an index name 

In [79]:
index_name = "medium-articles-date-index"

from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    SimpleField,
    SearchableField,
    SearchFieldDataType,
    SearchField,
    VectorSearch,
    HnswAlgorithmConfiguration,
    VectorSearchProfile,
    SemanticConfiguration,
    SemanticPrioritizedFields,
    SemanticField,
    SemanticSearch,
    SearchIndex,
    AzureOpenAIVectorizer,
    AzureOpenAIVectorizerParameters
)

# Set your actual endpoint, credential, and index_name
index_client = SearchIndexClient(endpoint=endpoint, credential=credential)

fields = [
    SimpleField(name="id", type=SearchFieldDataType.String, key=True, filterable=True, sortable=True, facetable=True),
    SearchableField(name="url", type=SearchFieldDataType.String, filterable=True),
    SearchableField(name="title", type=SearchFieldDataType.String, filterable=True),
    SearchableField(name="subtitle", type=SearchFieldDataType.String),
    SimpleField(name="claps", type=SearchFieldDataType.Int32, filterable=True, sortable=True, facetable=True),
    SimpleField(name="responses", type=SearchFieldDataType.Int32, filterable=True, sortable=True, facetable=True),
    SimpleField(name="reading_time", type=SearchFieldDataType.Int32, filterable=True, sortable=True, facetable=True),
    SearchableField(name="publication", type=SearchFieldDataType.String, filterable=True, facetable=True),
    SimpleField(name="date", type=SearchFieldDataType.DateTimeOffset, filterable=True, sortable=True),
    SearchableField(name="content", type=SearchFieldDataType.String),
    SearchField(name="titlesVector", type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
                searchable=True, vector_search_dimensions=3072, vector_search_profile_name="myHnswProfile"),
    SearchField(name="contentVector", type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
                searchable=True, vector_search_dimensions=3072, vector_search_profile_name="myHnswProfile"),
]

vector_search = VectorSearch(
    algorithms=[
        HnswAlgorithmConfiguration(
            name="myHnsw"
        )
    ],
    profiles=[
        VectorSearchProfile(
            name="myHnswProfile",
            algorithm_configuration_name="myHnsw",
            vectorizer_name="myVectorizer"
        )
    ],
    vectorizers=[
        AzureOpenAIVectorizer(
            vectorizer_name="myVectorizer",
            parameters=AzureOpenAIVectorizerParameters(
                resource_url=azure_openai_endpoint,
                deployment_name=azure_openai_embedding_deployment,
                model_name=embedding_model_name,
                api_key=azure_openai_key
            )
        )
    ]
)

semantic_config = SemanticConfiguration(
    name="my-semantic-config",
    prioritized_fields=SemanticPrioritizedFields(
        title_field=SemanticField(field_name="title"),
        keywords_fields=[SemanticField(field_name="subtitle"), SemanticField(field_name="publication")],
        content_fields=[SemanticField(field_name="content")]
    )
)

semantic_search = SemanticSearch(configurations=[semantic_config])

index = SearchIndex(
    name=index_name,
    fields=fields,
    vector_search=vector_search,
    semantic_search=semantic_search
)

result = index_client.create_or_update_index(index)
print(f'{result.name} created')


medium-articles-date-index created


## Insert text and embeddings into vector store
Add texts and metadata from the JSON data to the vector store:

### Split JSON into smaller chunks 

In [37]:
# Split data into smaller batches 
import json

input_path = r'C:\Users\lananoor\OneDrive - Microsoft\AI Agents\FilteredQueryAgent\ingestion\output\articles_final.json'
output_path = r'C:\Users\lananoor\OneDrive - Microsoft\AI Agents\FilteredQueryAgent\ingestion\output\articles_1.json'

# Read the full JSON file
with open(input_path, 'r', encoding='utf-8') as infile:
    docs = json.load(infile)

# Take the first 10 documents
docs_10 = docs[:10]

# Write them to the new file
with open(output_path, 'w', encoding='utf-8') as outfile:
    json.dump(docs_10, outfile, ensure_ascii=False, indent=2)

print("Created articles_1.json with 10 docs.")


Created articles_1.json with 10 docs.


In [67]:
import json

for i in range(1, 6):
    path = output_path = os.path.join('output', f'articles_{i}.json')
    try:
        with open(path, 'r', encoding='utf-8') as f:
            docs = json.load(f)
            print(f"{path}: {len(docs)} documents")
    except Exception as e:
        print(f"Error reading {path}: {e}")


output\articles_1.json: 10 documents
output\articles_2.json: 90 documents
output\articles_3.json: 100 documents
output\articles_4.json: 200 documents
output\articles_5.json: 200 documents


In [70]:
# Split data into smaller batches 
import json

input_path = r'C:\Users\lananoor\OneDrive - Microsoft\AI Agents\FilteredQueryAgent\ingestion\output\articles_final.json'
output_path = r'C:\Users\lananoor\OneDrive - Microsoft\AI Agents\FilteredQueryAgent\ingestion\output\articles_7.json'

# Read the full JSON file
with open(input_path, 'r', encoding='utf-8') as infile:
    docs = json.load(infile)

# Take the first 10 documents
docs_10 = docs[800:1000]

# Write them to the new file
with open(output_path, 'w', encoding='utf-8') as outfile:
    json.dump(docs_10, outfile, ensure_ascii=False, indent=2)

print("Created articles_7.json with 200 docs.")


Created articles_7.json with 200 docs.


### Ingest JSON to AI Search Index in Batches 

In [58]:
from azure.search.documents import SearchClient

output_path = os.path.join('output', 'articles_1.json') 

# Upload some documents to the index  
with open(output_path, 'r') as file:  
    documents = json.load(file)  
search_client = SearchClient(endpoint=endpoint, index_name=index_name, credential=credential)
result = search_client.upload_documents(documents)

for res in result:
    if not res.succeeded:
        print(f"Failed to upload doc id={res.key}, error: {res.error_message}")


In [80]:
from azure.search.documents import SearchIndexingBufferedSender

output_path = os.path.join('output', 'articles_1.json') 

# Upload some documents to the index  
with open(output_path, 'r', encoding='utf-8') as file:  
    documents = json.load(file)  
  
# Use SearchIndexingBufferedSender to upload the documents in batches optimized for indexing  
with SearchIndexingBufferedSender(  
    endpoint=endpoint,  
    index_name=index_name,  
    credential=credential,  
) as batch_client:  
    # Add upload actions for all documents  
    batch_client.upload_documents(documents=documents)  
print(f"Uploaded {len(documents)} documents in total")  

Uploaded 10 documents in total


In [86]:
from azure.search.documents import SearchIndexingBufferedSender

output_path = os.path.join('output', 'articles_7.json') 

# Upload some documents to the index  
with open(output_path, 'r', encoding='utf-8') as file:  
    documents = json.load(file)  
  
# Use SearchIndexingBufferedSender to upload the documents in batches optimized for indexing  
with SearchIndexingBufferedSender(  
    endpoint=endpoint,  
    index_name=index_name,  
    credential=credential,  
) as batch_client:  
    # Add upload actions for all documents  
    batch_client.upload_documents(documents=documents)  
print(f"Uploaded {len(documents)} documents in total")  

Uploaded 200 documents in total


## Full Text Search  

### Filtering Queries 

In [87]:
## Filter claps and publication 

from azure.search.documents import SearchClient
from azure.core.credentials import AzureKeyCredential

# Define the search client
search_client = SearchClient(endpoint=endpoint, index_name=index_name, credential=credential)

# Define the filter query using search.ismatch for Brodheadsville
filter_query = "publication eq 'Better Humans' and claps gt 500"

# Perform the search query with the filter, limiting to top 5 results
results = search_client.search(
    search_text="*",  # Wildcard search to match all documents
    filter=filter_query,
    select=["id", "title", "publication", "claps"],
    top=5  # Limit to top 5 results
)

# Print the results
for result in results:
    print(f"ID: {result['id']}")
    print(f"Title: {result['title']}")
    print(f"Publication: {result['publication']}")
    print(f"Claps: {result['claps']}")
    print("-" * 40)


ID: 363
Title: How to Overcome Your Phone Addiction With Mindfulness
Publication: Better Humans
Claps: 1300
----------------------------------------
ID: 497
Title: How to Begin a Sugar-Free Life
Publication: Better Humans
Claps: 1200
----------------------------------------
ID: 743
Title: Simple Changes That Helped Me Lose 5 Kg and Feel Great in Just One Month
Publication: Better Humans
Claps: 1400
----------------------------------------
ID: 857
Title: How to Select the Best Exercises for Building Muscle
Publication: Better Humans
Claps: 523
----------------------------------------
ID: 974
Title: Hardware to Boost Your Productivity in 2020
Publication: Better Humans
Claps: 2200
----------------------------------------


In [88]:
from azure.search.documents import SearchClient

# Define the search client
search_client = SearchClient(endpoint=endpoint, index_name=index_name, credential=credential)

# Define the filter query
filter_query = "publication eq 'UX Collective' and responses ge 5"

# Perform the search query with the filter, limiting to top 5 results
results = search_client.search(
    search_text="*",  # Wildcard search to match all documents
    filter=filter_query,
    select=["id", "title", "publication", "date", "claps", "responses"],  # <-- fixed comma
    top=5
)

# Print the results
for result in results:
    print(f"ID: {result['id']}")
    print(f"Title: {result['title']}")
    print(f"Publication: {result['publication']}")
    print(f"Claps: {result['claps']}")
    print(f"Responses: {result['responses']}")
    print(f"Date: {result['date']}")
    print("-" * 40)


ID: 27
Title: 10 eye-catching logo animations you’ll wish you made
Publication: UX Collective
Claps: 1700
Responses: 10
Date: 2020-05-24T00:00:00Z
----------------------------------------
ID: 30
Title: How to stop the battle between Product Managers and Designers
Publication: UX Collective
Claps: 366
Responses: 5
Date: 2020-05-24T00:00:00Z
----------------------------------------
ID: 876
Title: Loading: Neumorphism 2
Publication: UX Collective
Claps: 839
Responses: 9
Date: 2020-04-10T00:00:00Z
----------------------------------------
ID: 1202
Title: 3D Design is in — my journey into trying 3D for the first time
Publication: UX Collective
Claps: 292
Responses: 7
Date: 2020-02-21T00:00:00Z
----------------------------------------
ID: 1096
Title: Change in Google Search is killing it
Publication: UX Collective
Claps: 4700
Responses: 46
Date: 2020-02-16T00:00:00Z
----------------------------------------


In [91]:
## Filter for publication, date and responses 

# Define the search client
search_client = SearchClient(endpoint=endpoint, index_name=index_name, credential=credential)

# Define the filter query with correct DateTimeOffset literals
filter_query = (
    "publication eq 'UX Collective' and "
    "date ge 2020-05-01T00:00:00Z and "
    "date le 2020-05-31T00:00:00Z and "
    "responses ge 5"
)

# Perform the search query with the filter, limiting to top 5 results
results = search_client.search(
    search_text="*",
    filter=filter_query,
    select=["id", "title", "publication", "date", "responses"],
    top=5
)

# Print the results
for result in results:
    print(f"ID: {result['id']}")
    print(f"Title: {result['title']}")
    print(f"Publication: {result['publication']}")
    print(f"Responses: {result['responses']}")
    print(f"Date: {result['date']}")
    print("-" * 40)


ID: 27
Title: 10 eye-catching logo animations you’ll wish you made
Publication: UX Collective
Responses: 10
Date: 2020-05-24T00:00:00Z
----------------------------------------
ID: 30
Title: How to stop the battle between Product Managers and Designers
Publication: UX Collective
Responses: 5
Date: 2020-05-24T00:00:00Z
----------------------------------------


In [92]:
## Filter for publication, date and responses 

# Define the search client
search_client = SearchClient(endpoint=endpoint, index_name=index_name, credential=credential)

# Define the filter query with correct DateTimeOffset literals
filter_query = (
    "publication eq 'The Startup' and "
    "date ge 2019-05-01T00:00:00Z and "
    "date le 2021-05-31T00:00:00Z and "
    "responses ge 10"
)

# Perform the search query with the filter, limiting to top 5 results
results = search_client.search(
    search_text="*",
    filter=filter_query,
    select=["id", "title", "publication", "date", "responses"],
    top=5
)

# Print the results
for result in results:
    print(f"ID: {result['id']}")
    print(f"Title: {result['title']}")
    print(f"Publication: {result['publication']}")
    print(f"Responses: {result['responses']}")
    print(f"Date: {result['date']}")
    print("-" * 40)


ID: 389
Title: I Lost $40,000 in a Month and Learned a Valuable Lesson
Publication: The Startup
Responses: 30
Date: 2020-01-03T00:00:00Z
----------------------------------------
ID: 625
Title: Stop Checking for Nulls
Publication: The Startup
Responses: 29
Date: 2020-05-27T00:00:00Z
----------------------------------------
ID: 1208
Title: Your Brain Is Not an Indestructible Punching Bag
Publication: The Startup
Responses: 26
Date: 2020-02-21T00:00:00Z
----------------------------------------
ID: 386
Title: Biggest Startup Failure in History
Publication: The Startup
Responses: 15
Date: 2020-01-03T00:00:00Z
----------------------------------------
ID: 905
Title: How to Fold The Deck Chairs on The Titanic
Publication: The Startup
Responses: 11
Date: 2020-04-10T00:00:00Z
----------------------------------------


In [94]:
from azure.search.documents import SearchClient

# Define the search client
search_client = SearchClient(endpoint=endpoint, index_name=index_name, credential=credential)

# Define the filter query
filter_query = "search.ismatch('Google', 'title') and reading_time gt 5"

# Perform the search query with the filter, limiting to top 5 results
results = search_client.search(
    search_text="*",  # Wildcard search to match all documents
    filter=filter_query,
    select=["id", "title", "subtitle", "publication", "reading_time"],  # <-- fixed comma
    top=5
)

# Print the results
for result in results:
    print(f"ID: {result['id']}")
    print(f"Title: {result['title']}")
    print(f"Subtitle: {result['subtitle']}")
    print(f"Publication: {result['publication']}")
    print(f"Reading Time: {result['reading_time']}")
    print("-" * 40)


ID: 1670
Title: Google Algorithm Update What You Need to Know
Subtitle: Remaining up-to-date with…
Publication: Data Driven Investor
Reading Time: 7
----------------------------------------
ID: 114
Title: How to Set Up and Track Goals in Google Analytics
Subtitle: Understanding if your marketing and content is…
Publication: The Startup
Reading Time: 6
----------------------------------------
ID: 984
Title: How to A/B Test With Google Optimize: A Complete Guide
Subtitle: Setup, install, test, and analyze multiple…
Publication: Better Marketing
Reading Time: 12
----------------------------------------


In [None]:
from azure.search.documents import SearchClient

# Define the search client
search_client = SearchClient(endpoint=endpoint, index_name=index_name, credential=credential)

# Define the filter query
filter_query = "search.ismatch('Microsoft', 'content') and reading_time gt 5"

# Perform the search query with the filter, limiting to top 5 results
results = search_client.search(
    search_text="*",  # Wildcard search to match all documents
    filter=filter_query,
    select=["id", "title", "subtitle", "content", "reading_time"],  # <-- fixed comma
    top=5
)

# Print the results
for result in results:
    print(f"ID: {result['id']}")
    print(f"Title: {result['title']}")
    print(f"Subtitle: {result['subtitle']}")
    print(f"Content: {result['content']}")
    print(f"Reading Time: {result['reading_time']}")
    print("-" * 40)


ID: 158
Title: Who Rules the Cloud Service: AWS or Azure?
Subtitle: Comparison Between Amazon Web Service Vs Microsoft Azure
Content: The article explores the competition between Amazon Web Services (AWS) and Microsoft Azure, two leading cloud service providers. It presents a comparison of the features, capabilities, and offerings of each platform to determine which dominates the cloud service industry.
Reading Time: 10
----------------------------------------


In [101]:
from azure.search.documents import SearchClient

search_client = SearchClient(endpoint=endpoint, index_name=index_name, credential=credential)

# Example filter:
# - "Azure" appears in content
# - reading_time more than 7
# - responses more than 3
# - claps at least 500

filter_query = (
    "search.ismatch('Azure', 'content') "
    "and reading_time gt 7 "
    "and responses lt 3 "
    "and claps lt 500"
)

results = search_client.search(
    search_text="*",  # Wildcard search
    filter=filter_query,
    select=["id", "title", "subtitle", "content", "reading_time", "responses", "claps"],  # typo fixed
    top=5
)

for result in results:
    print(f"ID: {result['id']}")
    print(f"Title: {result['title']}")
    print(f"Subtitle: {result['subtitle']}")
    print(f"Content: {result['content']}")
    print(f"Reading Time: {result['reading_time']}")
    print(f"Responses: {result['responses']}")
    print(f"Claps: {result['claps']}")
    print("-" * 40)


ID: 230
Title: Introduction to Azure Cache for Redis with .NET Core
Subtitle: Azure Cache for Redis provides us with a powerful…
Content: The article titled "Introduction to Azure Cache for Redis with .NET Core" explores the capabilities of Azure Cache for Redis and its integration with .NET Core applications. The subtitle suggests that Azure Cache for Redis is a powerful tool, likely emphasizing its utility in enhancing application performance and scalability. The article likely provides an overview of how developers can leverage this caching service within their .NET Core projects to optimize data retrieval and processing.
Reading Time: 9
Responses: 0
Claps: 96
----------------------------------------
ID: 158
Title: Who Rules the Cloud Service: AWS or Azure?
Subtitle: Comparison Between Amazon Web Service Vs Microsoft Azure
Content: The article explores the competition between Amazon Web Services (AWS) and Microsoft Azure, two leading cloud service providers. It presents a comparison