# Tutorial on GraphRAG Local Search with Couchbase Vector Store

This notebook walks through the process of setting up a search engine that combines Couchbase for storing embeddings, OpenAI's models for generating embeddings, and a local search engine for querying structured data. 
This is useful when you need to search through structured data using natural language queries, leveraging both machine learning and a database.This approach is useful for searching structured data using natural language queries, leveraging both machine learning and a database.

## Setting up Couchbase

Before running this notebook, set up the following in Couchbase:

1. Create a bucket named "graphrag-demo" (or as specified in COUCHBASE_BUCKET_NAME)
2. Within the bucket, create a scope named "shared" (or as specified in COUCHBASE_SCOPE_NAME)
3. Within the scope, create a collection named "entity_description_embeddings" (or as specified in COUCHBASE_COLLECTION_NAME)

These settings should match the environment variables defined in your .env file or the default values in the code.

4. In the Couchbase Full Text Search (FTS) index section, create a new index by importing the `graphrag_demo_index.json` file. This file contains the necessary configuration for the vector search index.


# Importing Necessary Libraries
In this section, we import all the essential Python libraries required to perform various tasks, such as loading data, interacting with Couchbase, and using OpenAI models for generating text and embeddings.

The libraries used include:

asyncio: For running asynchronous tasks.
logging: For managing logs that help in debugging and monitoring the workflow.
pandas: For data manipulation and reading from data files.
tiktoken: For tokenizing text, which is essential for preparing text before passing it to a language model.
graphrag.query and vector_stores: These are custom libraries that handle entity extraction, searching, and vector storage.

In [1]:
import logging
import os

import pandas as pd
import tiktoken
from dotenv import load_dotenv

from couchbase.auth import PasswordAuthenticator
from couchbase.options import ClusterOptions
from graphrag.query.context_builder.entity_extraction import EntityVectorStoreKey
from graphrag.query.indexer_adapters import (
    read_indexer_covariates,
    read_indexer_entities,
    read_indexer_relationships,
    read_indexer_reports,
    read_indexer_text_units,
)
from graphrag.query.input.loaders.dfs import store_entity_semantic_embeddings
from graphrag.query.llm.oai.chat_openai import ChatOpenAI
from graphrag.query.llm.oai.embedding import OpenAIEmbedding
from graphrag.query.llm.oai.typing import OpenaiApiType
from graphrag.query.structured_search.local_search.mixed_context import (
    LocalSearchMixedContext,
)
from graphrag.query.structured_search.local_search.search import LocalSearch
from graphrag.vector_stores.couchbasedb import CouchbaseVectorStore

# Configuring Environment Variables
Here, we configure various environment variables that define paths, API keys, and connection strings. These values are essential for connecting to Couchbase and OpenAI, loading data, and defining other constants.

- INPUT_DIR: This specifies the directory path where the input data files are located. These files typically contain the raw data that will be processed and analyzed in the notebook.
- COUCHBASE_CONNECTION_STRING: This is the connection string used to establish a connection with the Couchbase database. It usually includes the protocol and host information (e.g., "couchbase://localhost").
- OPENAI_API_KEY: This is your personal API key for accessing OpenAI's services. It's required for authentication when making requests to OpenAI's API, allowing you to use their language models and other AI services.
- LLM_MODEL: This variable specifies which Large Language Model (LLM) from OpenAI to use for text generation tasks. For example, it could be set to "gpt-4" for using GPT-4, or "gpt-3.5-turbo" for using ChatGPT.
- EMBEDDING_MODEL: This defines the specific model used for generating text embeddings. Text embeddings are vector representations of text that capture semantic meaning. For OpenAI, a common choice is "text-embedding-ada-002".

These environment variables are crucial for the notebook's functionality, as they provide necessary configuration details for data access, database connections, and AI model interactions.

In [2]:
load_dotenv()

INPUT_DIR = os.getenv("INPUT_DIR")
COUCHBASE_CONNECTION_STRING = os.getenv(
    "COUCHBASE_CONNECTION_STRING", "couchbase://localhost"
)
COUCHBASE_USERNAME = os.getenv("COUCHBASE_USERNAME", "Administrator")
COUCHBASE_PASSWORD = os.getenv("COUCHBASE_PASSWORD", "password")
COUCHBASE_BUCKET_NAME = os.getenv("COUCHBASE_BUCKET_NAME", "graphrag-demo")
COUCHBASE_SCOPE_NAME = os.getenv("COUCHBASE_SCOPE_NAME", "shared")
COUCHBASE_COLLECTION_NAME = os.getenv(
    "COUCHBASE_COLLECTION_NAME", "entity_description_embeddings"
)
COUCHBASE_VECTOR_INDEX_NAME = os.getenv("COUCHBASE_VECTOR_INDEX_NAME", "graphrag_index")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
LLM_MODEL = os.getenv("LLM_MODEL", "gpt-4o")
EMBEDDING_MODEL = os.getenv("EMBEDDING_MODEL", "text-embedding-ada-002")

# Loading Data from Parquet Files
In this part, we load data from Parquet files into a dictionary. Each file corresponds to a particular table in the dataset, and we define functions that will handle the loading and processing of each table.

read_indexer_entities, read_indexer_relationships, etc., are custom functions responsible for reading specific parts of the data, such as entities and relationships.

We use pandas to load the data from the files, and if a file is not found, we log a warning and continue.

In [3]:
# Set up logging
logging.basicConfig(
    level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)
logger = logging.getLogger(__name__)
logger.info("Loading data from parquet files")
data = {}

# Constants
COMMUNITY_LEVEL = 2

# Table names
TABLE_NAMES = {
    "COMMUNITY_REPORT_TABLE": "create_final_community_reports",
    "ENTITY_TABLE": "create_final_nodes",
    "ENTITY_EMBEDDING_TABLE": "create_final_entities",
    "RELATIONSHIP_TABLE": "create_final_relationships",
    "COVARIATE_TABLE": "create_final_covariates",
    "TEXT_UNIT_TABLE": "create_final_text_units",
}

2024-09-19 12:10:20,405 - __main__ - INFO - Loading data from parquet files


## Entities table:
This table stores information about various entities in the system. Each entity has a unique ID, a short ID, a title, a type (e.g., PERSON), a description, and embeddings of the description. It may also include name embeddings, graph embeddings, community IDs, text unit IDs, document IDs, a rank, and additional attributes.

In [4]:
try:
    data["entities"] = pd.read_parquet(
        f"{INPUT_DIR}/{TABLE_NAMES['ENTITY_TABLE']}.parquet"
    )
    entity_embeddings = pd.read_parquet(
        f"{INPUT_DIR}/{TABLE_NAMES['ENTITY_EMBEDDING_TABLE']}.parquet"
    )
    data["entities"] = read_indexer_entities(
        data["entities"], entity_embeddings, COMMUNITY_LEVEL
    )
    print("Entities table sample:")
    print(data["entities"][0])
except FileNotFoundError:
    logger.warning("ENTITY_TABLE file not found. Setting entities to None.")
    data["entities"] = None

Entities table sample:
Entity(id='bc0e3f075a4c4ebbb7c7b152b65a5625', short_id='16', title='AGENTS', type='PERSON', description='Agents Alex Mercer, Jordan Hayes, Taylor Cruz, and Sam Rivera are the team members proceeding into Dulce base', description_embedding=[0.0020066287834197283, -0.026493584737181664, -0.01942051388323307, -0.03475676849484444, -0.02514118142426014, 0.05112085118889809, -0.0065084416419267654, -0.006623396184295416, 0.004919367842376232, -0.03559526056051254, 0.022395802661776543, 0.037596818059682846, -0.01751362532377243, -0.0053048026748001575, 0.01192143652588129, -0.012746402993798256, 0.0003514135896693915, -0.03619031608104706, -0.0123880160972476, -0.01390270795673132, 0.0030564318876713514, -0.013429366983473301, 0.01271259319037199, -0.023585917428135872, -0.00046700183884240687, -0.021949509158730507, 0.012502970173954964, -0.024356786161661148, 0.02913077175617218, 0.0017665770137682557, 0.00477398419752717, 0.0003277465293649584, -0.00338777084834873

## Relationships table:
This table represents relationships between entities. Each relationship has a unique ID, a short ID, a source entity, a target entity, a weight, a description, and potentially description embeddings. It also includes text unit IDs, document IDs, and may have additional attributes like rank.

In [5]:
try:
    data["relationships"] = pd.read_parquet(
        f"{INPUT_DIR}/{TABLE_NAMES['RELATIONSHIP_TABLE']}.parquet"
    )
    data["relationships"] = read_indexer_relationships(data["relationships"])
    print("Relationships table sample:")
    print(data["relationships"][0])
except FileNotFoundError:
    logger.warning("RELATIONSHIP_TABLE file not found. Setting relationships to None.")
    data["relationships"] = None

Relationships table sample:
Relationship(id='43c3390303c6476cb65f584e37c3e81c', short_id='0', source='ALEX MERCER', target='TAYLOR CRUZ', weight=139.0, description="Alex Mercer and Taylor Cruz are both agents in the Paranormal Military Squad, working together on various extraterrestrial data and alien signals. Their professional relationship is marked by a balance of curiosity and caution, with Alex Mercer often leading and Taylor Cruz providing a cautious and analytical perspective. Despite occasional conflicts between obedience and investigative zeal, Alex acknowledges Taylor's authority and respects her cautionary advice. \n\nTaylor Cruz, who often questions Mercer's commitment, relies on him to keep the team grounded and focused. They work together to ensure the team's efforts remain controlled and cautious, balancing innovative and defensive strategies as well as scientific and military perspectives. Their collaboration involves debating hypotheses and facts, with Cruz's commandin

## Covariate table:
This table stores additional variables or attributes that may be associated with entities, relationships, or other elements in the system. Covariates are typically used to provide context or additional information that can be useful for analysis or modeling.

In [6]:
try:
    data["covariates"] = pd.read_parquet(
        f"{INPUT_DIR}/{TABLE_NAMES['COVARIATE_TABLE']}.parquet"
    )
    data["covariates"] = read_indexer_covariates(data["covariates"])
    print("Covariates table sample:")
    print(data["covariates"][0])
except FileNotFoundError:
    logger.warning("COVARIATE_TABLE file not found. Setting covariates to None.")
    data["covariates"] = None



## Reports table:
This table contains community reports. Each report has an ID, a short ID, a title, a community ID, a summary, full content, a rank, and potentially embeddings for the summary and full content. It may also include additional attributes.

In [7]:
try:
    data["reports"] = pd.read_parquet(
        f"{INPUT_DIR}/{TABLE_NAMES['COMMUNITY_REPORT_TABLE']}.parquet"
    )
    entity_data = pd.read_parquet(f"{INPUT_DIR}/{TABLE_NAMES['ENTITY_TABLE']}.parquet")
    data["reports"] = read_indexer_reports(
        data["reports"], entity_data, COMMUNITY_LEVEL
    )
    print("Reports table sample:")
    print(data["reports"][0])
except FileNotFoundError:
    logger.warning("COMMUNITY_REPORT_TABLE file not found. Setting reports to None.")
    data["reports"] = None

Reports table sample:
CommunityReport(id='18', short_id='18', title='Paranormal Military Squad and Operation: Dulce', community_id='18', summary="The community centers around the Paranormal Military Squad, a clandestine governmental faction tasked with investigating and engaging with extraterrestrial phenomena. This elite group operates primarily from the Dulce military base, where they are deeply involved in Operation: Dulce, mediating Earth's contact with alien intelligence. Key figures within the squad include Alex Mercer, Dr. Jordan Hayes, Taylor Cruz, and Sam Rivera, each contributing significantly to the mission. The community's activities are focused on deciphering alien signals, ensuring humanity's safety, and preparing for potential first contact scenarios.", full_content="# Paranormal Military Squad and Operation: Dulce\n\nThe community centers around the Paranormal Military Squad, a clandestine governmental faction tasked with investigating and engaging with extraterrestrial

## Text units table:
This table stores text units, which are likely segments of text from documents. Each text unit has an ID, a short ID, the actual text content, and potentially text embeddings. It also includes entity IDs, relationship IDs, covariate IDs, the number of tokens, document IDs, and may have additional attributes.

In [8]:
try:
    data["text_units"] = pd.read_parquet(
        f"{INPUT_DIR}/{TABLE_NAMES['TEXT_UNIT_TABLE']}.parquet"
    )
    data["text_units"] = read_indexer_text_units(data["text_units"])
    print("Text units table sample:")
    print(data["text_units"][0])
except FileNotFoundError:
    logger.warning("TEXT_UNIT_TABLE file not found. Setting text_units to None.")
    data["text_units"] = None

print("Data loading completed")

Text units table sample:
TextUnit(id='5b2d21ec6fc171c30bdda343f128f5a6', short_id='0', text='# Operation: Dulce\n\n## Chapter 1\n\nThe thrumming of monitors cast a stark contrast to the rigid silence enveloping the group. Agent Alex Mercer, unfailingly determined on paper, seemed dwarfed by the enormity of the sterile briefing room where Paranormal Military Squad\'s elite convened. With dulled eyes, he scanned the projectors outlining their impending odyssey into Operation: Dulce.\n\n“I assume, Agent Mercer, you’re not having second thoughts?” It was Taylor Cruz’s voice, laced with an edge that demanded attention.\n\nAlex flickered a strained smile, still thumbing his folder\'s corner. "Of course not, Agent Cruz. Just trying to soak in all the details." The compliance in his tone was unsettling, even to himself.\n\nJordan Hayes, perched on the opposite side of the table, narrowed their eyes but offered a supportive nod. "Details are imperative. We’ll need your clear-headedness down the

# Setting Up the Couchbase Vector Store
Couchbase is used here to store the semantic embeddings generated from entities. In this step, we define a method to connect to the Couchbase database using the provided credentials.

The CouchbaseVectorStore allows you to store, retrieve, and manage vector embeddings in Couchbase.
The connect() method initializes the connection to Couchbase using the provided connection string, username, and password.

In [9]:
logger.info("Setting up CouchbaseVectorStore")

try:
    couchbase_vector_store = CouchbaseVectorStore(
        collection_name=COUCHBASE_COLLECTION_NAME,
        bucket_name=COUCHBASE_BUCKET_NAME,
        scope_name=COUCHBASE_SCOPE_NAME,
        index_name=COUCHBASE_VECTOR_INDEX_NAME,
    )

    auth = PasswordAuthenticator(str(COUCHBASE_USERNAME), str(COUCHBASE_PASSWORD))
    cluster_options = ClusterOptions(auth)

    couchbase_vector_store.connect(
        connection_string=COUCHBASE_CONNECTION_STRING,
        cluster_options=cluster_options,
    )
    logger.info("CouchbaseVectorStore setup completed")
except Exception:
    logger.exception("Error setting up CouchbaseVectorStore")
    raise

2024-09-19 12:10:20,566 - __main__ - INFO - Setting up CouchbaseVectorStore
2024-09-19 12:10:20,568 - graphrag.vector_stores.couchbasedb - INFO - Connecting to Couchbase at couchbase://localhost
2024-09-19 12:10:20,609 - graphrag.vector_stores.couchbasedb - INFO - Successfully connected to Couchbase
2024-09-19 12:10:20,611 - __main__ - INFO - CouchbaseVectorStore setup completed


# Setting Up Language Models
In this section, we configure the language models using OpenAI’s API. We initialize:

ChatOpenAI: This is the language model used to generate responses to natural language queries.
OpenAIEmbedding: This is the model used to generate vector embeddings for text data.
tiktoken: This tokenizer is used to split text into tokens, which are essential for sending data to the language model.

In [10]:
logger.info("Setting up LLM and embedding models")

try:
    llm = ChatOpenAI(
        api_key=OPENAI_API_KEY,
        model=LLM_MODEL,
        api_type=OpenaiApiType.OpenAI,
        max_retries=20,
    )

    token_encoder = tiktoken.get_encoding("cl100k_base")

    text_embedder = OpenAIEmbedding(
        api_key=OPENAI_API_KEY,
        api_base=None,
        api_type=OpenaiApiType.OpenAI,
        model=EMBEDDING_MODEL,
        deployment_name=EMBEDDING_MODEL,
        max_retries=20,
    )

    logger.info("LLM and embedding models setup completed")
except Exception:
    logger.exception("Error setting up models")
    raise

2024-09-19 12:10:20,630 - __main__ - INFO - Setting up LLM and embedding models
2024-09-19 12:10:21,260 - __main__ - INFO - LLM and embedding models setup completed


# Storing Embeddings in Couchbase
After generating embeddings for the entities, we store them in Couchbase. We use the store_entity_semantic_embeddings function to store the embeddings.

This method checks if the input is either a dictionary or a list and processes it accordingly.
It uses the Couchbase vector store to save the embeddings, ensuring that entities have the proper 'id' attribute for storage.


In [11]:
logger.info("Storing entity embeddings")

try:
    if not isinstance(data["entities"], list):
        raise TypeError("data['entities'] must be a list")

    store_entity_semantic_embeddings(
        entities=data["entities"], vectorstore=couchbase_vector_store
    )
    logger.info("Entity semantic embeddings stored successfully")
except AttributeError:
    logger.exception(
        "Error storing entity semantic embeddings. Ensure all entities have an 'id' attribute"
    )
    raise
except TypeError:
    logger.exception(
        "Error storing entity semantic embeddings. Ensure data['entities'] is a list"
    )
    raise
except Exception:
    logger.exception("Error storing entity semantic embeddings")
    raise

2024-09-19 12:10:21,274 - __main__ - INFO - Storing entity embeddings
2024-09-19 12:10:21,276 - graphrag.vector_stores.couchbasedb - INFO - Loading 92 documents into vector storage
2024-09-19 12:10:21,501 - graphrag.vector_stores.couchbasedb - INFO - Successfully loaded all 92 documents
2024-09-19 12:10:21,504 - __main__ - INFO - Entity semantic embeddings stored successfully


### **7. Building the Search Engine (In the Context of Graphrag)**

In this section, we focus on creating a search engine that integrates multiple components, specifically designed for the **Graphrag** system. Graphrag is a sophisticated architecture built for handling structured data, entities, relationships, and other contextual information to provide semantic search capabilities. This search engine allows you to query this structured data using **natural language** and get relevant, context-aware responses.

Here, we explain the components of the search engine in detail and how they contribute to its functionality within Graphrag.

#### **1. Context Builder (LocalSearchMixedContext)**

The `LocalSearchMixedContext` class is the cornerstone of our search engine in Graphrag. It acts as a **contextual environment** for the search process by combining various types of data—such as **community reports, text units, entities, relationships, and covariates**—into a coherent structure that can be used by the search engine. In this context:

- **Community Reports**: These are structured documents or insights generated at a community level, such as summaries or analytics reports, which are crucial when trying to query community-specific data.
- **Text Units**: Smaller pieces of text, such as paragraphs, sentences, or tokens that are stored in the system. These units help in understanding specific parts of the context when answering questions.
- **Entities**: These represent the core subjects (people, organizations, products, etc.) around which your queries are structured. Each entity has certain attributes and semantic embeddings stored in Couchbase, and these are used to enrich the search results.
- **Relationships**: The connections between entities, which can represent anything from business partnerships to familial ties or data dependencies. Understanding these relationships helps in contextualizing the search results more effectively.
- **Covariates**: Additional variables or metadata that provide more information about entities and relationships. These could include factors like location, time, or other metrics that affect the relevance of the search.

All these elements work together to build the **context** that the search engine will use to find and rank results.

- **entity_text_embeddings**: The entity descriptions are stored as vector embeddings (using the Couchbase vector store) to help in finding semantically similar entities.
- **text_embedder**: This is the **OpenAI embedding model** used to embed both the entities and user queries in a similar vector space, allowing for meaningful similarity comparisons.
- **token_encoder**: Tokenization splits the input text into tokens (smaller chunks), making it easier to process by the language models.

#### **2. Local Search Parameters**

Once the context is established, we define the parameters for the **search engine**. These parameters guide how the search engine processes the context to answer a query.

- **text_unit_prop**: This sets the proportion of text units to be considered when building the context. In this case, 50% of the context comes from text units.
- **community_prop**: Similar to `text_unit_prop`, this defines how much weight to give community reports. Here, 10% of the context is derived from community reports.
- **conversation_history_max_turns**: This specifies how many conversation history turns are retained when building the context. It helps in multi-turn queries, where the context from previous queries may still be relevant.
- **top_k_mapped_entities**: Defines how many of the most relevant entities should be considered in each query. In this case, we are considering the top 10 entities.
- **top_k_relationships**: Similarly, we consider the top 10 relationships that are most relevant to the query.
- **include_entity_rank**: Whether to rank entities based on their relevance to the query.
- **include_relationship_weight**: Whether to include relationship weights in the ranking process. This is crucial because certain relationships may have higher importance based on the data being queried.
- **embedding_vectorstore_key**: Defines the **key** for accessing entity embeddings from Couchbase. Here, we use `EntityVectorStoreKey.ID` as the identifier for retrieving the correct embeddings.
- **max_tokens**: The maximum number of tokens to consider in the context.


#### **3. Language Model Parameters**

For answering the query, we use the **language model** (LLM) to generate the response. The parameters for the LLM are configured here:
- **max_tokens**: Limits the number of tokens (words or sub-words) in the generated answer.
- **temperature**: Controls the randomness of the output. Setting it to `0.0` makes the model’s answers more deterministic.


#### **4. Integrating Everything: Creating the Search Engine**

Finally, all components are integrated into the `LocalSearch` class, which serves as the main search engine. This class is responsible for:
- **Accepting queries** in natural language.
- **Using the context builder** to form a detailed context based on the available structured data (entities, relationships, text, reports).
- **Passing the query and context** to the language model (LLM), which generates the final answer.

The search engine is now ready to process queries, using the underlying Graphrag system to provide context-aware and semantically rich answers.


### **Summary**

In this section, we have built a search engine specifically designed for the **Graphrag** system. This search engine leverages **structured data** (entities, relationships, reports, etc.) and integrates **semantic embeddings** stored in Couchbase. The search engine processes the query using OpenAI's language model, which uses the structured data context to generate meaningful answers.

Key steps include:
1. Setting up the **context builder** to combine different types of data.
2. Defining search parameters for handling text units, entities, relationships, and embedding similarities.
3. Integrating the **language model** to generate answers based on the context.

This search engine is highly useful for querying large-scale structured data and generating insights using natural language queries. It’s particularly relevant for systems like Graphrag, where the data has both structured and unstructured components that need to be processed together for an enriched search experience.

In [12]:
logger.info("Creating search engine")

context_builder = LocalSearchMixedContext(
    community_reports=data["reports"],
    text_units=data["text_units"],
    entities=data["entities"],
    relationships=data["relationships"],
    covariates=data["covariates"],
    entity_text_embeddings=couchbase_vector_store,
    embedding_vectorstore_key=EntityVectorStoreKey.ID,
    text_embedder=text_embedder,
    token_encoder=token_encoder,
)

local_context_params = {
    "text_unit_prop": 0.5,
    "community_prop": 0.1,
    "conversation_history_max_turns": 5,
    "conversation_history_user_turns_only": True,
    "top_k_mapped_entities": 10,
    "top_k_relationships": 10,
    "include_entity_rank": True,
    "include_relationship_weight": True,
    "include_community_rank": False,
    "return_candidate_context": False,
    "embedding_vectorstore_key": EntityVectorStoreKey.ID,
    "max_tokens": 12_000,
}

llm_params = {
    "max_tokens": 2_000,
    "temperature": 0.0,
}

search_engine = LocalSearch(
    llm=llm,
    context_builder=context_builder,
    token_encoder=token_encoder,
    llm_params=llm_params,
    context_builder_params=local_context_params,
    response_type="multiple paragraphs",
)

logger.info("Search engine created")

2024-09-19 12:10:21,527 - __main__ - INFO - Creating search engine
2024-09-19 12:10:21,535 - __main__ - INFO - Search engine created


# Running a Query
Finally, we run a query on the search engine. In this case, the query is "Give me a summary about the story". This simulates asking the search engine to summarize the entities and relationships stored in Couchbase.

asearch: This is an asynchronous search function that takes a query and returns a response generated by the language model.

In [13]:
question = "Give me a summary about the story"
logger.info("Running query: '%s'", question)

try:
    result = await search_engine.asearch(question)
    print(f"Question: '{question}'")
    print(f"Answer: {result.response}")
    logger.info("Query completed successfully")
except Exception as e:
    logger.exception("An error occurred while processing the query")
    print(f"An error occurred while processing the query: {(e)}")

2024-09-19 12:10:21,568 - __main__ - INFO - Running query: 'Give me a summary about the story'
2024-09-19 12:10:21,572 - graphrag.vector_stores.couchbasedb - INFO - Performing similarity search by text with k=20
2024-09-19 12:10:22,137 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-09-19 12:10:22,144 - graphrag.vector_stores.couchbasedb - INFO - Performing similarity search by vector with k=20
2024-09-19 12:10:22,152 - graphrag.vector_stores.couchbasedb - INFO - Found 20 results in similarity search by vector
2024-09-19 12:10:22,279 - graphrag.query.structured_search.local_search.search - INFO - GENERATE ANSWER: 1726728021.5726213. QUERY: Give me a summary about the story
2024-09-19 12:10:23,179 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-09-19 12:10:34,053 - __main__ - INFO - Query completed successfully


Question: 'Give me a summary about the story'
Answer: # Summary of the Story

## Introduction to the Paranormal Military Squad

The narrative centers around the Paranormal Military Squad, a secretive governmental faction tasked with investigating and engaging with extraterrestrial phenomena. This elite group operates primarily from the Dulce military base, where they are deeply involved in Operation: Dulce. The mission's primary objective is to mediate Earth's contact with alien intelligence, ensuring humanity's safety and preparing for potential first contact scenarios [Data: Paranormal Military Squad and Operation: Dulce (18)].

## Key Figures and Their Roles

Several key figures play crucial roles within the Paranormal Military Squad. Alex Mercer provides leadership and strategic insights, guiding the team through high-stakes operations. Dr. Jordan Hayes focuses on deciphering alien codes and understanding their intent, contributing significantly to the team's mission. Taylor Cruz o

With these steps, the entire process of loading data, setting up models, storing embeddings, and running a search engine query is written out in sequence without using functions. Let me know if any additional modifications are needed!