Skip to content

Unable to create Test data with TestsetGenerator #2274

@Dimpus

Description

@Dimpus

[ ] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug
Unable to create Test data with TestsetGenerator.
It's using invalid themes internally: List[Tuple] instead of List[str]

Ragas version: 0.3.4
Python version:3.12.2

Code to Reproduce

def test_dataCreation():
    llm=ChatOpenAI(model_name="gpt-4",temperature=0)
    langchain_llm=LangchainLLMWrapper(llm)
    embed=OpenAIEmbeddings()
    generate_embed=LangchainEmbeddingsWrapper(embed)
    loader=DirectoryLoader(
        path="[**provided a valid path**]",
        glob="**/*.docx",
        loader_cls=UnstructuredWordDocumentLoader,
    )
    docs=loader.load()
    
    genertor=TestsetGenerator(llm=langchain_llm,embedding_model=generate_embed)
    dataset=genertor.generate_with_langchain_docs(docs,testset_size=5)

Error trace

E                   pydantic_core._pydantic_core.ValidationError: 6 validation errors for ThemesPersonasInput
E                   themes.0
E                     Input should be a valid string [type=string_type, input_value=('Rest API Testing', 'REST API'), input_type=tuple]
E                       For further information visit https://errors.pydantic.dev/2.11/v/string_type
E                   themes.1
E                     Input should be a valid string [type=string_type, input_value=('postman', 'Postman'), input_type=tuple]
E                       For further information visit https://errors.pydantic.dev/2.11/v/string_type
E                   themes.2
E                     Input should be a valid string [type=string_type, input_value=('REST Assured API', 'REST Assured API'), input_type=tuple]
E                       For further information visit https://errors.pydantic.dev/2.11/v/string_type
E                   themes.3
E                     Input should be a valid string [type=string_type, input_value=('RestAssured', 'REST Assured API'), input_type=tuple]
E                       For further information visit https://errors.pydantic.dev/2.11/v/string_type
E                   themes.4
E                     Input should be a valid string [type=string_type, input_value=('POSTMAN', 'Postman'), input_type=tuple]
E                       For further information visit https://errors.pydantic.dev/2.11/v/string_type
E                   themes.5
E                     Input should be a valid string [type=string_type, input_value=('RESTAPI', 'REST API'), input_type=tuple]
E                       For further information visit https://errors.pydantic.dev/2.11/v/string_type

/opt/anaconda3/lib/python3.12/site-packages/ragas/testset/synthesizers/multi_hop/specific.py:95: ValidationError
**Expected behavior**
Pydantic model expects themes to be a list of strings, but it's being passed a list of tuples.

Additional context
Is there any workaround? Do I need to try with any other version of Ragas?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingmodule-testsetgenModule testset generation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions