# Query Text Data

Demonstrates use of the Intelligence Toolkit library to respond to queries about a collection of text documents.

See [readme](https://github.com/microsoft/intelligence-toolkit/blob/main/app/workflows/query_text_data/README.md) for more details.


In [1]:
import sys

sys.path.append("..")
import os

os.environ["MKL_THREADING_LAYER"] = (
    "GNU"  # Avoids threadpoolctl error in Linux and MacOS
)
from intelligence_toolkit.query_text_data.api import QueryTextData
from intelligence_toolkit.query_text_data.classes import (
    ProcessedChunks,
    ChunkSearchConfig
)

import intelligence_toolkit.query_text_data.prompts as prompts
from intelligence_toolkit.AI.openai_configuration import OpenAIConfiguration
from intelligence_toolkit.AI.openai_embedder import OpenAIEmbedder
from intelligence_toolkit.helpers.constants import CACHE_PATH
import nest_asyncio  # Necessary to run async code in ipynb
import pandas as pd

nest_asyncio.apply()

SyntaxError: invalid syntax (classes.py, line 65)

In [None]:
# Create the workflow object
qtd = QueryTextData()
# Set up the AI model and embedding model
ai_configuration = OpenAIConfiguration(
    {
        "api_type": "OpenAI",
        "api_key": os.environ["OPENAI_API_KEY"],
        "model": "gpt-4o",
    }
)
qtd.set_ai_config(ai_configuration=ai_configuration, embedding_cache=CACHE_PATH)
text_embedder = OpenAIEmbedder(
    configuration=ai_configuration,
)
qtd.set_embedder(text_embedder)
print("Created QueryTextData object")

Created QueryTextData object


In [None]:
# Provide text inputs as a dictionary of title->text
# Enter the path to your own data here
input_path = "../example_outputs/query_text_data/news_articles/news_articles_texts.csv"
file_name = input_path.split("/")[-1]
df = pd.read_csv(input_path)
text_to_chunks = qtd.process_data_from_df(df, file_name)
print("Processed data from df")

Processed data from df


In [None]:
# Process the chunks into the index data structures
processed_chunks: ProcessedChunks = qtd.process_text_chunks()
print(f"Processed chunks")

Processed chunks


In [None]:
# Embed the text chunks
cid_to_vector = await qtd.embed_text_chunks()
print(f"Embedded chunks")

  0%|          | 0/500 [00:00<?, ?it/s]

100%|██████████| 500/500 [00:06<00:00, 77.58it/s] 
100%|██████████| 1/1 [00:00<00:00,  3.64it/s]

Got 0 existing texts
Got 501 new texts
Embedded chunks





In [None]:
# Edit the query to be answered
query = "What events are discussed?"
expanded_query = await qtd.anchor_query_to_concepts(query=query, top_concepts=100)
print(f"Expanded query: {expanded_query}")

Expanded query: What events are discussed, such as those involving the tennis world, relief efforts by international organizations, or the culinary arts with renowned chefs?


In [None]:
# Mine relevant chunks to the query
chunk_search_config: ChunkSearchConfig = ChunkSearchConfig(
    # How many relevance tests are permitted per query. Higher values may provide higher quality results at higher cost
    relevance_test_budget=50,
    # How many chunks before and after each relevant chunk to test, once the relevance test budget is near or the search process has terminated
    adjacent_test_steps=1,
    # How many relevance tests to run on each community in turn
    community_relevance_tests=5,
    # How many relevance tests to run in parallel at a time
    relevance_test_batch_size=5,
    # How many chunks to use to rank communities by relevance
    community_ranking_chunks=5,
    # When to restart testing communities in relevance order
    irrelevant_community_restart=5,
    # Perform thematic analysis after how many relevance tests (0 to disable)
    analysis_update_interval=0
)
relevant_cids, search_summary = await qtd.detect_relevant_text_chunks(
    query=query, expanded_query=expanded_query, chunk_search_config=chunk_search_config
)
print(f"Mined relevant chunks")

100%|██████████| 5/5 [00:03<00:00,  1.63it/s]
100%|██████████| 5/5 [00:00<00:00,  6.42it/s]
100%|██████████| 5/5 [00:00<00:00,  7.55it/s]
100%|██████████| 5/5 [00:01<00:00,  4.91it/s]
100%|██████████| 5/5 [00:01<00:00,  4.17it/s]
100%|██████████| 5/5 [00:00<00:00,  7.72it/s]
100%|██████████| 5/5 [00:00<00:00,  8.24it/s]
100%|██████████| 5/5 [00:00<00:00,  8.41it/s]
100%|██████████| 5/5 [00:00<00:00,  9.12it/s]
100%|██████████| 5/5 [00:00<00:00,  9.34it/s]

Mined relevant chunks





In [8]:
# Generate an extended answer to the query, which could then be summarized into a shorter form
await qtd.answer_query_with_relevant_chunks(target_chunks_per_cluster=5)
print("Answered query")

Answering query with clustered ids: {'All relevant chunks': [159, 361, 243, 363, 438, 241, 91, 242, 219, 183, 120, 278, 371, 137, 431, 459]}


100%|██████████| 3/3 [00:14<00:00,  4.79s/it]


Extracted references: [91, 120, 137, 159, 183, 219, 241, 242, 243, 278, 361, 363, 431, 459]
Answered query


In [9]:
# Output the final extended answer
print(qtd.answer_object.extended_answer)

## Query

*What events are discussed?*

## Expanded Query

*What events are discussed, such as those involving the tennis world, relief efforts by international organizations, or the culinary arts with renowned chefs?*

## Answer

The events discussed include a variety of global gatherings and activities across culinary arts, health, economic, and sports sectors. In the culinary world, renowned chefs like Antonio Rossi and Marco Tanzi shared their expertise at events in Rome, such as the Italian Food Festival, showcasing innovative approaches to traditional Italian cuisine [source: [159](#source-159), [91](#source-91)]. Additionally, the Gastronomy Weekend and other culinary festivals featured chefs like Maria Lopez and John Smith, promoting community engagement and cultural exchange through food [source: [361](#source-361), [243](#source-243)].

In the realm of global health and economics, the World Health Organization convened in Geneva to address issues like infectious diseases and 

In [10]:
# Condense the answer
qtd.condense_answer(ai_instructions=prompts.user_prompt)
print("Condensed answer")

Condensed answer


In [11]:
# Output the final extended answer
print(qtd.condensed_answer)

# Events Discussed

## Culinary Arts

### Italian Food Festival in Rome
Renowned chefs like Antonio Rossi and Marco Tanzi shared their expertise at events in Rome, such as the Italian Food Festival, showcasing innovative approaches to traditional Italian cuisine [source: news_articles_texts.csv_160 (1), news_articles_texts.csv_92 (1)].

### Gastronomy Weekend
The Gastronomy Weekend featured chefs like Maria Lopez and John Smith, promoting community engagement and cultural exchange through food [source: news_articles_texts.csv_362 (1)].

### Food Lovers Club Event
The Food Lovers Club celebrated chefs Maria Lopez and Ana Torres, who were praised for their innovative and traditional culinary techniques [source: news_articles_texts.csv_244 (1)].

### Wine and Dine Event
The Wine and Dine event at Riverfront Plaza featured chefs Maria Lopez and Ana Torres, who presented Mediterranean and Latin American dishes paired with wines [source: news_articles_texts.csv_364 (1)].

### Gourmet Guild's