# Weaviate Experiment Example

## Installations

In [1]:
# !pip install --quiet --force-reinstall prompttools
# !pip install weaviate-client

First, you need to connect to a Weaviate client. One easy option is to connect to a local instance launched with Docker.

Please see their official documentation for details: https://weaviate.io/developers/weaviate/tutorials/connect

In [2]:
import weaviate

client = weaviate.Client("http://localhost:8080/")  # This can be replaced by other clients

## Run an experiment

### Working with data already exists in Weaviate

First, we will show how you can run experiment if you already have datat that exists in Weaviate. You can test different queries to see how the responses differ.

You can also skip to the next section if you would like to insert data at runtime.

In [3]:
# Optional helper function to have us clear data in this example
def _clear_existing_class(class_name: str):
    # Clearing existing object and class
    try:
        client.batch.delete_objects(class_name=class_name, where={})
    except Exception:
        pass
    client.schema.delete_class(class_name)

In [4]:
# Can skip if you already have data in Weaviate
# Create class object
class_name = "Question"

_clear_existing_class(class_name)

class_obj = {
    "class": class_name,
    "vectorizer": "text2vec-contextionary",
    "moduleConfig": {
        "text2vec-contextionary": {
          "vectorizeClassName": "false"
        }
    },
}

client.schema.create_class(class_obj)

We insert data in the following cell.

In [5]:
# Can skip if you already have data in Weaviate
data = [{'doc_id': "1", 'Category': 'science', 'Question': 'How many hydrogens are in H2O?', 'Answer': '2'},
        {'doc_id': "2", 'Category': 'math', 'Question': '2+2', 'Answer': '4'},
        {'doc_id': "3", 'Category': 'math', 'Question': 'Is 13 a prime number?', 'Answer': 'Yes'},
        {'doc_id': "4", 'Category': 'geography', 'Question': 'Which continent is India in?', 'Answer': 'Asia'},
        {'doc_id': "5", 'Category': 'geography', 'Question': 'Which continent is China in?', 'Answer': 'Asia'},
        {'doc_id': "6", 'Category': 'geography', 'Question': 'Which continent is USA in?', 'Answer': 'North America'},
        {'doc_id': "7", 'Category': 'geography', 'Question': 'Which is the largest state in the USA?', 'Answer': 'Alaska'},
        {'doc_id': "8", 'Category': 'geography', 'Question': 'Which is the most populous state in the USA?', 'Answer': 'California'},]


def _insert_data(data, client):
    print(f"Inserting data into Weaviate before the experiment...")
    with client.batch(
        batch_size=100
    ) as batch:
        for i, d in enumerate(data):

            properties = {
                "doc_id": d["doc_id"],
                "answer": d["Answer"],
                "question": d["Question"],
                "category": d["Category"],
            }

            client.batch.add_data_object(
                properties,
                "Question",
            )

_insert_data(data, client)

Inserting data into Weaviate before the experiment...


#### Building differnt queries

In this first example, we will demonstrate how you can test different queries on the same existing dataset.

Given that there are many options on how a near text search operator and filters can be constructed. We ask that you define query functions and pass that into the experiment in the form of a dictionary `{name: callable_function}`, as shown below.

We have three examples of query functions:
1. Default query - performs a near text search based on your text query and properties
2. A custom query - similar to the default, but wants the result to be away from the "geography" topic
3. Hybrid search - combines `bm25` search and vector (near text) search

You can create your own custom query function, such as one that perform generative search.

Note: the default query function is available and used by experiment by default if you aren't interested in testing different querying methods.

In [6]:
def default_query_builder(
    client: weaviate.Client,
    class_name: str,
    property_names: list[str],
    text_query: str,
):
    near_text_search_operator = {"concepts": [text_query]}
    return client.query.get(class_name, property_names).with_additional(["id"]).with_near_text(near_text_search_operator).with_limit(limit=3)


def away_query_builder(
    client: weaviate.Client,
    class_name: str,
    property_names: list[str],
    text_query: str,
):
    near_text_search_operator = {
      "concepts": [text_query],
      "distance": 0.6,
      "moveAwayFrom": {
        "concepts": ["where geography "],
        "force": 0.45
      },
    }
    return client.query.get(class_name, property_names).with_additional(["id"]).with_near_text(near_text_search_operator).with_limit(limit=3)

def hybird_query_builder(
    client: weaviate.Client,
    class_name: str,
    property_names: list[str],
    text_query: str,
):
    hybrid_kwargs = {"query": text_query, "properties": property_names, "vector": None}
    return client.query.get(class_name, property_names).with_additional(["id"]).with_hybrid(**hybrid_kwargs).with_limit(limit=3)
    

# A dictionary of the name of query builder and the corresponding callable function.
query_builders = {"default": default_query_builder, "away": away_query_builder, "hybrid": hybird_query_builder}

We will define our queries and run the experiment!

In [7]:
from prompttools.experiment import WeaviateExperiment

property_names = ["category", "question", "answer"]  # Specify what property you would like to query
text_queries = ["Hydrogen", "USA"]  # Your text queries that will be passed in to query builder to create query


experiment = WeaviateExperiment(client=client,
                                class_name=class_name,
                                use_existing_data=True,
                                property_names=property_names,
                                text_queries=text_queries,
                                query_builders=query_builders)
experiment.run()

Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  pkg_resources.declare_namespace(__name__)
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(parent)


As you can see from the results, the "away" queries try to stay away from geography topics in its responses (unless it is highly relevant), whereas the first one is agnostic.

In [8]:
experiment.visualize()

Unnamed: 0,text_query,query_builder_name,top objs,latency
0,Hydrogen,default,"[{'_additional': {'id': '5a12d2bb-4eea-408d-b80c-bc730ea6f518'}, 'answer': '2', 'category': 'science', 'question': 'How many hydrogens are in H2O?'}, {'_additional': {'id': 'de282efa-f441-4dcb-a2bb-df8233664dc1'}, 'answer': 'Alaska', 'category': 'geography', 'question': 'Which is the largest state in the USA?'}, {'_additional': {'id': '5a691b7a-9d9f-401b-a2c9-208a84faed17'}, 'answer': 'Asia', 'category': 'geography', 'question': 'Which continent is India in?'}]",0.004654
1,USA,default,"[{'_additional': {'id': 'de282efa-f441-4dcb-a2bb-df8233664dc1'}, 'answer': 'Alaska', 'category': 'geography', 'question': 'Which is the largest state in the USA?'}, {'_additional': {'id': '1f21ccab-185a-417e-806f-2044077b22a7'}, 'answer': 'California', 'category': 'geography', 'question': 'Which is the most populous state in the USA?'}, {'_additional': {'id': '020c271d-ba75-4c27-a0ba-263df7e3d0d9'}, 'answer': 'North America', 'category': 'geography', 'question': 'Which continent is USA in?'}]",0.003442
2,Hydrogen,away,"[{'_additional': {'id': '5a12d2bb-4eea-408d-b80c-bc730ea6f518'}, 'answer': '2', 'category': 'science', 'question': 'How many hydrogens are in H2O?'}]",0.003543
3,USA,away,"[{'_additional': {'id': 'de282efa-f441-4dcb-a2bb-df8233664dc1'}, 'answer': 'Alaska', 'category': 'geography', 'question': 'Which is the largest state in the USA?'}, {'_additional': {'id': '1f21ccab-185a-417e-806f-2044077b22a7'}, 'answer': 'California', 'category': 'geography', 'question': 'Which is the most populous state in the USA?'}, {'_additional': {'id': '020c271d-ba75-4c27-a0ba-263df7e3d0d9'}, 'answer': 'North America', 'category': 'geography', 'question': 'Which continent is USA in?'}]",0.003603
4,Hydrogen,hybrid,"[{'_additional': {'id': '5a12d2bb-4eea-408d-b80c-bc730ea6f518'}, 'answer': '2', 'category': 'science', 'question': 'How many hydrogens are in H2O?'}, {'_additional': {'id': 'de282efa-f441-4dcb-a2bb-df8233664dc1'}, 'answer': 'Alaska', 'category': 'geography', 'question': 'Which is the largest state in the USA?'}, {'_additional': {'id': '5a691b7a-9d9f-401b-a2c9-208a84faed17'}, 'answer': 'Asia', 'category': 'geography', 'question': 'Which continent is India in?'}]",0.003401
5,USA,hybrid,"[{'_additional': {'id': 'de282efa-f441-4dcb-a2bb-df8233664dc1'}, 'answer': 'Alaska', 'category': 'geography', 'question': 'Which is the largest state in the USA?'}, {'_additional': {'id': '1f21ccab-185a-417e-806f-2044077b22a7'}, 'answer': 'California', 'category': 'geography', 'question': 'Which is the most populous state in the USA?'}, {'_additional': {'id': '020c271d-ba75-4c27-a0ba-263df7e3d0d9'}, 'answer': 'North America', 'category': 'geography', 'question': 'Which continent is USA in?'}]",0.004394


### Adding data into Weaviate during the experiment

If you want to test different vectorizers (embedding functions), you may want to add data into Weaviate during the experiment.

In the example below, we will be inserting the same data as we were doing in the above section.

Define the `vectorizer` and corresponding `moduleConfig` in a list of tuples. Although the example here has 1, you can define multiple here if you want to compare multiple vectorizers.

In [9]:
vectorizers_and_moduleConfigs = [
    ("text2vec-contextionary", {  # This runs on CPU, but you can swap to `huggingface` or `openai` as well
        "text2vec-contextionary": {
          "vectorizeClassName": "false"
        }
    }),
]

We need to definte the properties for the data that we are about to insert. You can customize how and whether each property is vectorized if you wish.

In [10]:
property_definitions = [
        {
            'name': 'category',
            'dataType': ['text'],
        },
        {
            'name': 'question',
            'dataType': ['text'],
        },
            {
            'name': 'answer',
            'dataType': ['text'],
        },
    ]


By default, Weaviate uses cosine similarity as the distance function. We can compare it with other ones.

In [11]:
distance_metrics = ["cosine", "l2-squared"]

In [12]:
class_name = "Question"

# Clearing existing class object
_clear_existing_class(class_name)

In [13]:
property_names = ["question", "answer"]  # Specify what property you would like to query
text_queries = ["continent Asia", "hydrogen"]  # Your text queries that will be passed in to query builder to create query


experiment = WeaviateExperiment(client=client,
                                class_name=class_name,
                                use_existing_data=False,
                                property_names=property_names,
                                text_queries=text_queries,
                                vectorizers_and_moduleConfigs=vectorizers_and_moduleConfigs,
                                property_definitions=property_definitions,
                                data_objects = data,
                                distance_metrics = distance_metrics,
                               )
experiment.run()

In this particular case, the results between using different distance function is the same. However, cosine method is marginally faster than L2-squared in terms of latency.

In [14]:
experiment.visualize()

Unnamed: 0,text_query,vectorIndexConfig,top objs,latency
0,continent Asia,{'vectorIndexConfig': {'distance': 'cosine'}},"[{'answer': 'Asia', 'question': 'Which continent is China in?'}, {'answer': 'Asia', 'question': 'Which continent is India in?'}, {'answer': 'North America', 'question': 'Which continent is USA in?'}]",0.004236
1,hydrogen,{'vectorIndexConfig': {'distance': 'cosine'}},"[{'answer': '2', 'question': 'How many hydrogens are in H2O?'}, {'answer': 'Alaska', 'question': 'Which is the largest state in the USA?'}, {'answer': 'Asia', 'question': 'Which continent is India in?'}]",0.004716
2,continent Asia,{'vectorIndexConfig': {'distance': 'l2-squared'}},"[{'answer': 'Asia', 'question': 'Which continent is China in?'}, {'answer': 'Asia', 'question': 'Which continent is India in?'}, {'answer': 'North America', 'question': 'Which continent is USA in?'}]",0.002707
3,hydrogen,{'vectorIndexConfig': {'distance': 'l2-squared'}},"[{'answer': '2', 'question': 'How many hydrogens are in H2O?'}, {'answer': 'Alaska', 'question': 'Which is the largest state in the USA?'}, {'answer': 'Asia', 'question': 'Which continent is India in?'}]",0.002025


## Evaluate the model response

We will circle back to the first example to demonstrate how we can evaluate the outputs from Weaviate.

We will compute the "ranking correlation" between the expected ranking and actual ranking returned from the queries.

Note that this is not the only way to evaluate a vector database, there are other examples show case how you can evaluate with LLMs.

In [15]:
_clear_existing_class(class_name)
_insert_data(data, client)

property_names = ["doc_id", "category", "question", "answer"]  # Specify what property you would like to query
text_queries = ["Hydrogen", "USA"]  # Your text queries that will be passed in to query builder to create query


experiment = WeaviateExperiment(client=client,
                                class_name=class_name,
                                use_existing_data=True,
                                property_names=property_names,
                                text_queries=text_queries,
                                query_builders=query_builders)
experiment.run()
experiment.visualize()

Inserting data into Weaviate before the experiment...


Unnamed: 0,text_query,query_builder_name,top objs,latency
0,Hydrogen,default,"[{'_additional': {'id': 'd64dc836-3793-47d9-b5cd-157576e9efe8'}, 'answer': '2', 'category': 'science', 'doc_id': '1', 'question': 'How many hydrogens are in H2O?'}, {'_additional': {'id': '1f9bfa43-0b3a-47c0-85f5-12ee6e859617'}, 'answer': 'Alaska', 'category': 'geography', 'doc_id': '7', 'question': 'Which is the largest state in the USA?'}, {'_additional': {'id': '919798e4-7f06-4f2e-830b-8c39c0e3cc93'}, 'answer': 'Asia', 'category': 'geography', 'doc_id': '4', 'question': 'Which continent is India in?'}]",0.002544
1,USA,default,"[{'_additional': {'id': '1f9bfa43-0b3a-47c0-85f5-12ee6e859617'}, 'answer': 'Alaska', 'category': 'geography', 'doc_id': '7', 'question': 'Which is the largest state in the USA?'}, {'_additional': {'id': '9b6af54e-6d62-4674-8c08-22a5bb8e5904'}, 'answer': 'California', 'category': 'geography', 'doc_id': '8', 'question': 'Which is the most populous state in the USA?'}, {'_additional': {'id': '4a417618-cedd-422f-bdeb-fe0aadc8855d'}, 'answer': 'North America', 'category': 'geography', 'doc_id': '6', 'question': 'Which continent is USA in?'}]",0.002263
2,Hydrogen,away,"[{'_additional': {'id': 'd64dc836-3793-47d9-b5cd-157576e9efe8'}, 'answer': '2', 'category': 'science', 'doc_id': '1', 'question': 'How many hydrogens are in H2O?'}]",0.003149
3,USA,away,"[{'_additional': {'id': '1f9bfa43-0b3a-47c0-85f5-12ee6e859617'}, 'answer': 'Alaska', 'category': 'geography', 'doc_id': '7', 'question': 'Which is the largest state in the USA?'}, {'_additional': {'id': '9b6af54e-6d62-4674-8c08-22a5bb8e5904'}, 'answer': 'California', 'category': 'geography', 'doc_id': '8', 'question': 'Which is the most populous state in the USA?'}, {'_additional': {'id': '4a417618-cedd-422f-bdeb-fe0aadc8855d'}, 'answer': 'North America', 'category': 'geography', 'doc_id': '6', 'question': 'Which continent is USA in?'}]",0.002926
4,Hydrogen,hybrid,"[{'_additional': {'id': 'd64dc836-3793-47d9-b5cd-157576e9efe8'}, 'answer': '2', 'category': 'science', 'doc_id': '1', 'question': 'How many hydrogens are in H2O?'}, {'_additional': {'id': '1f9bfa43-0b3a-47c0-85f5-12ee6e859617'}, 'answer': 'Alaska', 'category': 'geography', 'doc_id': '7', 'question': 'Which is the largest state in the USA?'}, {'_additional': {'id': '919798e4-7f06-4f2e-830b-8c39c0e3cc93'}, 'answer': 'Asia', 'category': 'geography', 'doc_id': '4', 'question': 'Which continent is India in?'}]",0.002252
5,USA,hybrid,"[{'_additional': {'id': '1f9bfa43-0b3a-47c0-85f5-12ee6e859617'}, 'answer': 'Alaska', 'category': 'geography', 'doc_id': '7', 'question': 'Which is the largest state in the USA?'}, {'_additional': {'id': '9b6af54e-6d62-4674-8c08-22a5bb8e5904'}, 'answer': 'California', 'category': 'geography', 'doc_id': '8', 'question': 'Which is the most populous state in the USA?'}, {'_additional': {'id': '4a417618-cedd-422f-bdeb-fe0aadc8855d'}, 'answer': 'North America', 'category': 'geography', 'doc_id': '6', 'question': 'Which continent is USA in?'}]",0.002231


We can use write a custom function to extract the top document IDs as an independent column.

In [16]:
def get_top_doc_ids(row):
    return [d["doc_id"] for d in row["top objs"]]

experiment.evaluate("top doc ids", get_top_doc_ids)
experiment.visualize()

Unnamed: 0,text_query,query_builder_name,top objs,latency,top doc ids
0,Hydrogen,default,"[{'_additional': {'id': 'd64dc836-3793-47d9-b5cd-157576e9efe8'}, 'answer': '2', 'category': 'science', 'doc_id': '1', 'question': 'How many hydrogens are in H2O?'}, {'_additional': {'id': '1f9bfa43-0b3a-47c0-85f5-12ee6e859617'}, 'answer': 'Alaska', 'category': 'geography', 'doc_id': '7', 'question': 'Which is the largest state in the USA?'}, {'_additional': {'id': '919798e4-7f06-4f2e-830b-8c39c0e3cc93'}, 'answer': 'Asia', 'category': 'geography', 'doc_id': '4', 'question': 'Which continent is India in?'}]",0.002544,"[1, 7, 4]"
1,USA,default,"[{'_additional': {'id': '1f9bfa43-0b3a-47c0-85f5-12ee6e859617'}, 'answer': 'Alaska', 'category': 'geography', 'doc_id': '7', 'question': 'Which is the largest state in the USA?'}, {'_additional': {'id': '9b6af54e-6d62-4674-8c08-22a5bb8e5904'}, 'answer': 'California', 'category': 'geography', 'doc_id': '8', 'question': 'Which is the most populous state in the USA?'}, {'_additional': {'id': '4a417618-cedd-422f-bdeb-fe0aadc8855d'}, 'answer': 'North America', 'category': 'geography', 'doc_id': '6', 'question': 'Which continent is USA in?'}]",0.002263,"[7, 8, 6]"
2,Hydrogen,away,"[{'_additional': {'id': 'd64dc836-3793-47d9-b5cd-157576e9efe8'}, 'answer': '2', 'category': 'science', 'doc_id': '1', 'question': 'How many hydrogens are in H2O?'}]",0.003149,[1]
3,USA,away,"[{'_additional': {'id': '1f9bfa43-0b3a-47c0-85f5-12ee6e859617'}, 'answer': 'Alaska', 'category': 'geography', 'doc_id': '7', 'question': 'Which is the largest state in the USA?'}, {'_additional': {'id': '9b6af54e-6d62-4674-8c08-22a5bb8e5904'}, 'answer': 'California', 'category': 'geography', 'doc_id': '8', 'question': 'Which is the most populous state in the USA?'}, {'_additional': {'id': '4a417618-cedd-422f-bdeb-fe0aadc8855d'}, 'answer': 'North America', 'category': 'geography', 'doc_id': '6', 'question': 'Which continent is USA in?'}]",0.002926,"[7, 8, 6]"
4,Hydrogen,hybrid,"[{'_additional': {'id': 'd64dc836-3793-47d9-b5cd-157576e9efe8'}, 'answer': '2', 'category': 'science', 'doc_id': '1', 'question': 'How many hydrogens are in H2O?'}, {'_additional': {'id': '1f9bfa43-0b3a-47c0-85f5-12ee6e859617'}, 'answer': 'Alaska', 'category': 'geography', 'doc_id': '7', 'question': 'Which is the largest state in the USA?'}, {'_additional': {'id': '919798e4-7f06-4f2e-830b-8c39c0e3cc93'}, 'answer': 'Asia', 'category': 'geography', 'doc_id': '4', 'question': 'Which continent is India in?'}]",0.002252,"[1, 7, 4]"
5,USA,hybrid,"[{'_additional': {'id': '1f9bfa43-0b3a-47c0-85f5-12ee6e859617'}, 'answer': 'Alaska', 'category': 'geography', 'doc_id': '7', 'question': 'Which is the largest state in the USA?'}, {'_additional': {'id': '9b6af54e-6d62-4674-8c08-22a5bb8e5904'}, 'answer': 'California', 'category': 'geography', 'doc_id': '8', 'question': 'Which is the most populous state in the USA?'}, {'_additional': {'id': '4a417618-cedd-422f-bdeb-fe0aadc8855d'}, 'answer': 'North America', 'category': 'geography', 'doc_id': '6', 'question': 'Which continent is USA in?'}]",0.002231,"[7, 8, 6]"


Then, we can compute the ranking correlation. A positive correlation means it is matching the expected ranking, when a negative means the match is worse.

In [17]:
from prompttools.utils import ranking_correlation

EXPECTED_RANKING_LIST = [
    ["1", "0", "0"],  # Padded
    ["7", "8", "6"],
    ["1"],
    ["7", "8", "6"],
    ["1", "0", "0"],
    ["7", "8", "6"],
]

experiment.evaluate("ranking_correlation", ranking_correlation, expected_ranking=EXPECTED_RANKING_LIST)
experiment.visualize()

Unnamed: 0,text_query,query_builder_name,top objs,latency,top doc ids,ranking_correlation
0,Hydrogen,default,"[{'_additional': {'id': 'd64dc836-3793-47d9-b5cd-157576e9efe8'}, 'answer': '2', 'category': 'science', 'doc_id': '1', 'question': 'How many hydrogens are in H2O?'}, {'_additional': {'id': '1f9bfa43-0b3a-47c0-85f5-12ee6e859617'}, 'answer': 'Alaska', 'category': 'geography', 'doc_id': '7', 'question': 'Which is the largest state in the USA?'}, {'_additional': {'id': '919798e4-7f06-4f2e-830b-8c39c0e3cc93'}, 'answer': 'Asia', 'category': 'geography', 'doc_id': '4', 'question': 'Which continent is India in?'}]",0.002544,"[1, 7, 4]",-0.866025
1,USA,default,"[{'_additional': {'id': '1f9bfa43-0b3a-47c0-85f5-12ee6e859617'}, 'answer': 'Alaska', 'category': 'geography', 'doc_id': '7', 'question': 'Which is the largest state in the USA?'}, {'_additional': {'id': '9b6af54e-6d62-4674-8c08-22a5bb8e5904'}, 'answer': 'California', 'category': 'geography', 'doc_id': '8', 'question': 'Which is the most populous state in the USA?'}, {'_additional': {'id': '4a417618-cedd-422f-bdeb-fe0aadc8855d'}, 'answer': 'North America', 'category': 'geography', 'doc_id': '6', 'question': 'Which continent is USA in?'}]",0.002263,"[7, 8, 6]",1.0
2,Hydrogen,away,"[{'_additional': {'id': 'd64dc836-3793-47d9-b5cd-157576e9efe8'}, 'answer': '2', 'category': 'science', 'doc_id': '1', 'question': 'How many hydrogens are in H2O?'}]",0.003149,[1],1.0
3,USA,away,"[{'_additional': {'id': '1f9bfa43-0b3a-47c0-85f5-12ee6e859617'}, 'answer': 'Alaska', 'category': 'geography', 'doc_id': '7', 'question': 'Which is the largest state in the USA?'}, {'_additional': {'id': '9b6af54e-6d62-4674-8c08-22a5bb8e5904'}, 'answer': 'California', 'category': 'geography', 'doc_id': '8', 'question': 'Which is the most populous state in the USA?'}, {'_additional': {'id': '4a417618-cedd-422f-bdeb-fe0aadc8855d'}, 'answer': 'North America', 'category': 'geography', 'doc_id': '6', 'question': 'Which continent is USA in?'}]",0.002926,"[7, 8, 6]",1.0
4,Hydrogen,hybrid,"[{'_additional': {'id': 'd64dc836-3793-47d9-b5cd-157576e9efe8'}, 'answer': '2', 'category': 'science', 'doc_id': '1', 'question': 'How many hydrogens are in H2O?'}, {'_additional': {'id': '1f9bfa43-0b3a-47c0-85f5-12ee6e859617'}, 'answer': 'Alaska', 'category': 'geography', 'doc_id': '7', 'question': 'Which is the largest state in the USA?'}, {'_additional': {'id': '919798e4-7f06-4f2e-830b-8c39c0e3cc93'}, 'answer': 'Asia', 'category': 'geography', 'doc_id': '4', 'question': 'Which continent is India in?'}]",0.002252,"[1, 7, 4]",-0.866025
5,USA,hybrid,"[{'_additional': {'id': '1f9bfa43-0b3a-47c0-85f5-12ee6e859617'}, 'answer': 'Alaska', 'category': 'geography', 'doc_id': '7', 'question': 'Which is the largest state in the USA?'}, {'_additional': {'id': '9b6af54e-6d62-4674-8c08-22a5bb8e5904'}, 'answer': 'California', 'category': 'geography', 'doc_id': '8', 'question': 'Which is the most populous state in the USA?'}, {'_additional': {'id': '4a417618-cedd-422f-bdeb-fe0aadc8855d'}, 'answer': 'North America', 'category': 'geography', 'doc_id': '6', 'question': 'Which continent is USA in?'}]",0.002231,"[7, 8, 6]",1.0
