Skip to content

[Bug]: <title>extract entities by nltk strategy found Error: "Column(s) ['description', 'source_id'] do not exist" #1601

@HENScience

Description

@HENScience

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
  • I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the bug

When I set the strategy of entity extraction to nltk, the following error occurs during index creation:
KeyError: "Column(s) ['description', 'source_id'] do not exist"
graphrag\index\operations\extract_entities\extract_entities.py", line 171, in _merge_entities
.agg(description=("description", list), text_unit_ids=("source_id", list))

Steps to reproduce

No response

Expected Behavior

No response

GraphRAG Config Used

# Paste your config here
entity_extraction:
  prompt: "prompts/entity_extraction.txt"
  entity_types: [organization,person,geo,event]
  max_gleanings: 1
  strategy: 
    type: nltk

Logs and screenshots

No response

Additional Information

  • GraphRAG Version: v1.1.1
  • Operating System: window11 Professional
  • Python Version: 3.10

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriageDefault label assignment, indicates new issue needs reviewed by a maintainer

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions