-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Description
Do you need to file an issue?
- I have searched the existing issues and this bug is not already filed.
- My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
- I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.
Describe the issue
It's a great project, I have some problem ,someone help me.
After 'init', I use embedding and model as follows:
NAME ID SIZE MODIFIED
qwen3.5:cloud a7bf6f7891c3 - 3 hours ago
qwen3-embedding:0.6b ac6da0dfba84 639 MB 6 hours ago
NextI use markitdown xxxx.docx -o input/report.md, and then I try
graphrag index --verbose
It failed:
Starting workflow: create_community_reports
24 / 24 ..........................................................................................
Workflow complete: create_community_reports
WorkflowFunctionOutput(result= id human_readable_id community ... full_content_json period size
0 313abae31643604bbdae091eabbde0358ed3285425868a... 22 22 ... {\n "title": "Cloud Computing Platform and ... 2026-03-05 6
1 4baf1c686ca534cdbd588100f72312081c8604641a238f... 23 23 ... {\n "title": "Cloud Service Customer and Cl... 2026-03-05 5
2 90399b5b0eeff524152242ffba57f631d06e613e49205d... 8 8 ... {\n "title": "Access Control and Business A... 2026-03-05 4
3 851858884b531363dc6cb68287503988301a7e168d76f6... 9 9 ... {\n "title": "User Identity Authentication ... 2026-03-05 3
4 afa2a54f3124b1fe45804dccf12c3030b421e05d9dc400... 10 10 ... {\n "title": "Test Object and Data Leakage ... 2026-03-05 8
5 bc04d1054d54d3f81b802df2dc4dcfaf3e5dd8d70446c3... 12 12 ... {\n "title": "Cloud Service Customer and Co... 2026-03-05 11
6 9a7bacde60b499079b535fbd6959156304a67b14b826a5... 13 13 ... {\n "title": "Cloud Service Provider and Se... 2026-03-05 2
7 8a4ea1998da3eb55f4f8e254a4c8e3a49c3be2e8cc2d8e... 14 14 ... {\n "title": "Network Attack and Cloud Secu... 2026-03-05 4
8 a4eb34f7187e0391eaf6879357c0669facf8a37abe9da1... 15 15 ... {\n "title": "ATTACKER and ADMINISTRATOR Co... 2026-03-05 2
9 dc0b62f79106e536539cbb24ba3483c2a57d0a8c1a1eaf... 16 16 ... {\n "title": "XXX System Security Assessmen... 2026-03-05 6
10 74b77e558aab7035937e76826503bdbb81599e6a8661fc... 17 17 ... {\n "title": "Internal Network and Lateral ... 2026-03-05 2
11 555a214dc05fe3ff1822a2e5992e8aad5a22f33c9fece7... 19 19 ... {\n "title": "Big Data Application and Thir... 2026-03-05 2
12 25d56bb34d7c29e5f16fead576fa3b94a7cfffc05ee97c... 20 20 ... {\n "title": "Big Data Platform and Network... 2026-03-05 6
13 d3b9f5ee75d5b4887b5909162cf7d708e1306801536173... 21 21 ... {\n "title": "Level Assessment Project and ... 2026-03-05 4
14 ef803d0b763a46b49100e26345fcda83ab3ba1ec072a5b... 0 0 ... {\n "title": "Test Object Security Assessme... 2026-03-05 17
15 f1d586879d81036c634696479e6c1eb43ad83a7ff0ff36... 1 1 ... {\n "title": "Cloud Service Customer and Co... 2026-03-05 17
16 4488006bfdd32ec4fb0c4f542e2d514ebedd82902b4a57... 2 2 ... {\n "title": "Security Level Assessment and... 2026-03-05 5
17 67808773301a407e71796d18a1ec0a47000c0a65cac345... 3 3 ... {\n "title": "Assessment Agency and Single ... 2026-03-05 7
18 3aecdbb3428e06eda06062693e497b903e487c12393c33... 4 4 ... {\n "title": "Assessment Institution and Se... 2026-03-05 5
19 0b7c74a9c19b72021d63beb91779b3b52950852c9d2e36... 5 5 ... {\n "title": "XXX System Security Assessmen... 2026-03-05 13
20 ce95d2fea415f99cc1804bcf5d8ff130db3209b457490f... 6 6 ... {\n "title": "Big Data Platform Security As... 2026-03-05 12
21 42212a6744bd770346b291e710804a7681a4efde69c80d... 7 7 ... {\n "title": "Unauthorized Personnel and Id... 2026-03-05 2
[22 rows x 15 columns], stop=False)
Starting workflow: generate_text_embeddings
Pipeline error: The length of the values Array needs to be a multiple of the list_size..............
Pipeline complete
### Steps to reproduce
_No response_
### GraphRAG Config Used
```yaml
# Paste your config here
### This config file contains required core defaults that must be set, along with a handful of common optional settings.
### For a full list of available settings, see https://microsoft.github.io/graphrag/config/yaml/
### LLM settings ###
## There are a number of settings to tune the threading and token limits for LLM calls - check the docs.
completion_models:
default_completion_model:
model_provider: openai
model: qwen3.5:cloud
api_base: http://localhost:11434/v1
auth_method: api_key # or azure_managed_identity
api_key: ${GRAPHRAG_API_KEY} # set this in the generated .env file, or remove if managed identity
retry:
type: exponential_backoff
embedding_models:
default_embedding_model:
model_provider: openai
model: qwen3-embedding:0.6b
api_base: http://localhost:11434/v1
auth_method: api_key
api_key: ${GRAPHRAG_API_KEY}
retry:
type: exponential_backoff
### Document processing settings ###
input:
type: text # [csv, text, json, jsonl]
chunking:
type: tokens
size: 1200
overlap: 100
encoding_model: o200k_base
### Storage settings ###
## If blob storage is specified in the following four sections,
## connection_string and container_name must be provided
input_storage:
type: file # [file, blob, cosmosdb]
base_dir: "input"
output_storage:
type: file # [file, blob, cosmosdb]
base_dir: "output"
reporting:
type: file # [file, blob]
base_dir: "logs"
cache:
type: json # [json, memory, none]
storage:
type: file # [file, blob, cosmosdb]
base_dir: "cache"
vector_store:
type: lancedb
db_uri: output/lancedb
### Workflow settings ###
embed_text:
embedding_model_id: default_embedding_model
extract_graph:
completion_model_id: default_completion_model
prompt: "prompts/extract_graph.txt"
entity_types: [organization,person,geo,event]
max_gleanings: 1
summarize_descriptions:
completion_model_id: default_completion_model
prompt: "prompts/summarize_descriptions.txt"
max_length: 500
extract_graph_nlp:
text_analyzer:
extractor_type: regex_english # [regex_english, syntactic_parser, cfg]
cluster_graph:
max_cluster_size: 10
extract_claims:
enabled: false
completion_model_id: default_completion_model
prompt: "prompts/extract_claims.txt"
description: "Any claims or facts that could be relevant to information discovery."
max_gleanings: 1
community_reports:
completion_model_id: default_completion_model
graph_prompt: "prompts/community_report_graph.txt"
text_prompt: "prompts/community_report_text.txt"
max_length: 2000
max_input_length: 8000
snapshots:
graphml: false
embeddings: false
### Query settings ###
## The prompt locations are required here, but each search method has a number of optional knobs that can be tuned.
## See the config docs: https://microsoft.github.io/graphrag/config/yaml/#query
local_search:
completion_model_id: default_completion_model
embedding_model_id: default_embedding_model
prompt: "prompts/local_search_system_prompt.txt"
global_search:
completion_model_id: default_completion_model
map_prompt: "prompts/global_search_map_system_prompt.txt"
reduce_prompt: "prompts/global_search_reduce_system_prompt.txt"
knowledge_prompt: "prompts/global_search_knowledge_system_prompt.txt"
drift_search:
completion_model_id: default_completion_model
embedding_model_id: default_embedding_model
prompt: "prompts/drift_search_system_prompt.txt"
reduce_prompt: "prompts/drift_search_reduce_prompt.txt"
basic_search:
completion_model_id: default_completion_model
embedding_model_id: default_embedding_model
prompt: "prompts/basic_search_system_prompt.txt"
Logs and screenshots
No response
Additional Information
- GraphRAG Version:
- Operating System:
- Python Version:
- Related Issues: