Skip to content

[Feature Request]: Enhance LLM usage logging in indexing workflows #2103

@june616

Description

@june616

Do you need to file an issue?

  • I have searched the existing issues and this feature is not already filed.
  • My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
  • I believe this is a legitimate feature request, not just a question. If this is a question, please use the Discussions area.

Is your feature request related to a problem? Please describe.

Hi team, I’ve been trying to apply GraphRAG in a production use case (with Azure OpenAI) and noticed that while the query phase has detailed LLM usage logging, the indexing workflows currently lack similar observability. This makes it hard to track during indexing, especially when optimizing LLM cost and latency.

Describe the solution you'd like

I’m implementing this enhancement by adding logging for LLM calls in the indexing phase, aligned with the existing logging structure (e.g., via --verbose). Here is the final example stats.json file after running the index command:

{
    "total_runtime": 98.39666604995728,
    "num_documents": 1,
    "update_documents": 0,
    "input_load_time": 0,
    "workflows": {
        "load_input_documents": {
            "overall": 0.05307793617248535
        },
        "create_base_text_units": {
            "overall": 0.05640888214111328
        },
        "create_final_documents": {
            "overall": 0.054161787033081055
        },
        "extract_graph": {
            "overall": 44.75299096107483
        },
        "finalize_graph": {
            "overall": 17.086341857910156
        },
        "extract_covariates": {
            "overall": 0.0010209083557128906
        },
        "create_communities": {
            "overall": 0.12611031532287598
        },
        "create_final_text_units": {
            "overall": 0.06224370002746582
        },
        "create_community_reports": {
            "overall": 14.221134662628174
        },
        "generate_text_embeddings": {
            "overall": 21.974961280822754
        }
    },
    # New Added
    "total_llm_calls": 20,
    "total_prompt_tokens": 104652,
    "total_completion_tokens": 9691,
    "llm_usage_by_workflow": {
        "extract_graph": {
            "llm_calls": 5,
            "prompt_tokens": 66766,
            "completion_tokens": 5757
        },
        "create_community_reports": {
            "llm_calls": 10,
            "prompt_tokens": 29149,
            "completion_tokens": 3934
        },
        "generate_text_embeddings": {
            "llm_calls": 5,
            "prompt_tokens": 8737,
            "completion_tokens": 0
        }
    }
}

If anyone has tips, design thoughts, or prior work in this area, I’d love to hear your feedback, thanks!

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions