Skip to content

KeyError: 'file_name' when running synthetic data generation example #444

@dbolotov

Description

@dbolotov

Describe the bug
When running the synthetic data generation example from https://docs.ragas.io/en/latest/getstarted/testset_generation.html#get-started-testset-generation, I get the error:

KeyError: 'file_name'

on the line:

testset = testsetgenerator.generate(documents, test_size=test_size).

Ragas version: 0.0.23.dev37+g041b20c
Python version: 3.11

Code to Reproduce

import os
os.environ["OPENAI_API_KEY"] = "KEY GOES HERE"

from llama_index import download_loader

SemanticScholarReader = download_loader("SemanticScholarReader")
loader = SemanticScholarReader()
# Narrow down the search space
query_space = "large language models"
# Increase the limit to obtain more documents
documents = loader.load_data(query=query_space, limit=10)


from ragas.testset import TestsetGenerator

testsetgenerator = TestsetGenerator.from_default()
test_size = 10
testset = testsetgenerator.generate(documents, test_size=test_size)

Error trace

testset = testsetgenerator.generate(documents, test_size=test_size)
  0%|          | 0/10 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "C:\Users\DMBO\AppData\Local\Anaconda3\envs\docgpt_ragas_py311\Lib\site-packages\IPython\core\interactiveshell.py", line 3553, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-3-62b08db69f64>", line 1, in <module>
    testset = testsetgenerator.generate(documents, test_size=test_size)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\DMBO\AppData\Local\Anaconda3\envs\docgpt_ragas_py311\Lib\site-packages\ragas\testset\testset_generator.py", line 427, in generate
    neighbor_nodes = doc_nodes_map[curr_node.metadata["file_name"]]
                                   ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
KeyError: 'file_name'

Expected behavior
I expected the test set to be generated without error.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions