Skip to content

[Bug]: Auto prompt tuning - ValueError: Single '}' encountered in format string #1912

@ashkan-software2

Description

@ashkan-software2

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
  • I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the bug

Hello,

During auto prompt tuning, GraphRAG generates a knowledge graph output that has bugs:

Bug: knowledge graph is not valid, because the number of } is more than {.

Steps to reproduce

  1. Init graphrag
  2. provide some paragraphs from this PDF: https://kpmg.com/kpmg-us/content/dam/kpmg/frv/pdf/2024/handbook-revenue-recognition-1224.pdf
  3. run prompt tuning

You will see this error:

Traceback (most recent call last):
  File ".../pypoetry/virtualenvs/service-vector-embedding-6NKDQ0ig-py3.11/lib/python3.11/site-packages/graphrag/index/operations/extract_graph/graph_extractor.py", line 127, in __call__
    result = await self._process_document(text, prompt_variables)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../pypoetry/virtualenvs/service-vector-embedding-6NKDQ0ig-py3.11/lib/python3.11/site-packages/graphrag/index/operations/extract_graph/graph_extractor.py", line 156, in _process_document
    self._extraction_prompt.format(**{
ValueError: Single '}' encountered in format string

and when I look at the extract_graph.txt I see the issue. For example, see here (there are 15 { but there are 19 } - look at the extra } in advance}) for example)

extract_graph.txt

("entity"{tuple_delimiter}HOSTING SERVICE FEES{tuple_delimiter}cost types{tuple_delimiter}Fees for hosting services, charged at $100 per month, paid in advance})
{record_delimiter}
("entity"{tuple_delimiter}REMAINING TERM OF THE HOSTING ARRANGEMENT{tuple_delimiter}lease arrangements{tuple_delimiter}The duration left on the hosting arrangement from the go-live date, which is 5 years})
{record_delimiter}
("entity"{tuple_delimiter}GO-LIVE DATE{tuple_delimiter}implementation details{tuple_delimiter}The date when the cloud-based solution became operational, which is January 1, Year 3})
{record_delimiter}
("entity"{tuple_delimiter}CAPITALIZED IMPLEMENTATION COSTS – PAYROLL MODULE{tuple_delimiter}cost types{tuple_delimiter}The costs incurred to implement the payroll processing module, amounting to $400, which are capitalized})

Expected Behavior

The extract_graph.txt should have equal number of { and } and free of errors

GraphRAG Config Used

models:
  default_chat_model:
    type: openai_chat
    auth_type: api_key
    api_key: ${GRAPHRAG_API_KEY}
    model: gpt-4-turbo-preview
    model_supports_json: true
    concurrent_requests: 25
    async_mode: threaded
    retry_strategy: native
    max_retries: -1
    tokens_per_minute: 0
    requests_per_minute: 0
  default_embedding_model:
    type: openai_embedding
    auth_type: api_key
    api_key: ${GRAPHRAG_API_KEY}
    model: text-embedding-3-small
    model_supports_json: true
    concurrent_requests: 25
    async_mode: threaded
    retry_strategy: native
    max_retries: -1
    tokens_per_minute: 0
    requests_per_minute: 0
vector_store:
  default_vector_store:
    type: lancedb
    db_uri: output/lancedb
    container_name: default
    overwrite: true
embed_text:
  model_id: default_embedding_model
  vector_store_id: default_vector_store
input:
  type: file
  file_type: json
  base_dir: input
  text_column: page_content
  title_column: title
  metadata:
  - page
  - data_type
  - figures
chunks:
  size: 1200
  overlap: 100
  group_by_columns:
  - id
cache:
  type: file
  base_dir: cache
reporting:
  type: file
  base_dir: logs
output:
  type: file
  base_dir: output
extract_graph:
  model_id: default_chat_model
  prompt: prompts/extract_graph.txt
  entity_types:
  - organization
  - trademark
  - publication
  - standard
  max_gleanings: 1
summarize_descriptions:
  model_id: default_chat_model
  prompt: prompts/summarize_descriptions.txt
  max_length: 500
extract_graph_nlp:
  text_analyzer:
    extractor_type: regex_english
extract_claims:
  enabled: false
  model_id: default_chat_model
  prompt: prompts/extract_claims.txt
  description: Any claims or facts that could be relevant to information discovery.
  max_gleanings: 1
community_reports:
  model_id: default_chat_model
  graph_prompt: prompts/community_report_graph.txt
  text_prompt: prompts/community_report_text.txt
  max_length: 2000
  max_input_length: 8000
cluster_graph:
  max_cluster_size: 10
embed_graph:
  enabled: false
umap:
  enabled: false
snapshots:
  graphml: false
  embeddings: false
local_search:
  chat_model_id: default_chat_model
  embedding_model_id: default_embedding_model
  prompt: prompts/local_search_system_prompt.txt
global_search:
  chat_model_id: default_chat_model
  map_prompt: prompts/global_search_map_system_prompt.txt
  reduce_prompt: prompts/global_search_reduce_system_prompt.txt
  knowledge_prompt: prompts/global_search_knowledge_system_prompt.txt
drift_search:
  chat_model_id: default_chat_model
  embedding_model_id: default_embedding_model
  prompt: prompts/drift_search_system_prompt.txt
  reduce_prompt: prompts/drift_search_reduce_prompt.txt
basic_search:
  chat_model_id: default_chat_model
  embedding_model_id: default_embedding_model
  prompt: prompts/basic_search_system_prompt.txt

Logs and screenshots

Image

Additional Information

  • GraphRAG Version: 2.1.0
  • Operating System: Linux
  • Python Version: 3.11.2
  • Related Issues:

Metadata

Metadata

Assignees

No one assigned

    Labels

    autoresolvedawaiting_responseMaintainers or community have suggested solutions or requested info, awaiting filer responsebugSomething isn't workingstaleUsed by auto-resolve bot to flag inactive issuestriageDefault label assignment, indicates new issue needs reviewed by a maintainer

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions