Skip to content

bug(backend): stale LLMService method calls across 5+ files — chat silently broken (#6983 follow-up) #7047

@mrveiss

Description

@mrveiss

Discovery during #6983 / PR #7036 sweep

The #3185 LLMInterface retirement migrated 47 boot-time imports. PR #7036 (#6983) finished the import-migration cleanup for 3 missed callers. A wider sweep reveals method-call drift on LLMService instances that #3185 didn't migrate — code that boots fine but crashes / silently degrades at runtime.

Confirmed runtime bugs

1. api/chat.py:106 — broken module path

def get_llm_service(request: Request) -> Any:
    from llm_service import LLMService                # ← module doesn't exist
    return lazy_init_singleton(..., "llm_service", LLMService)

There is no top-level llm_service.py; the canonical path is services.llm_service. Import is inside the function body, so it fires at first call — silent until exercised.

2. api/chat.py:543 — feature-degradation guard around stale method

if hasattr(llm_service, "generate_response"):
    return await llm_service.generate_response(...)
else:
    return {"content": "I'm currently unable to generate a response. Please try again.", ...}

LLMService doesn't expose generate_response (that was an LLMInterface method). The hasattr guard takes the else-branch on every call — chat always returns the canned "currently unable" message. Major user-facing functional regression that probably ships today.

3. orchestration/workflow_documentation.py:175 (#7042 already filed)

self.llm_interface.chat_completion(...) on an LLMService-typed slot. AttributeError when documentation path runs.

4. Other suspected stale method calls (need per-site verification)

$ grep -rn '\.chat_completion(\|\.generate_response(' autobot-backend \
    --include='*.py' | grep -v 'test_\|__pycache__\|llm_service.py\|llm_interface_pkg\|llm_providers'
Site Pattern Verdict needed
async_chat_workflow.py:344 llm.chat_completion(messages, stream=False) likely stale (variable name llm is service-typed)
modern_ai_integration.py:673 self._llm_interface.chat_completion(llm_req) likely stale (slot named _llm_interface)
utils/graceful_degradation.py:456 strategy.generate_response(request, context) strategy object — different class, may be OK
api/knowledge_search_scoped.py:203 rag_agent.generate_response(...) RAG agent, separate class — likely OK
services/nl_database_service.py:549 self._llm.generate_response(...) likely stale (slot named _llm)
performance_benchmarks.performance_test.py:122/161 self.llm.generate_response(...) covered by #7041

Provider-pattern callers (provider.chat_completion in llm_multi_provider.py, llm_providers/*, api/openai_compat.py) are calling provider-class methods — those have legitimate chat_completion surfaces, NOT LLMService's. Out of scope.

Acceptance criteria

  • Fix api/chat.py:106 import path → services.llm_service
  • Migrate api/chat.py:543 to LLMService.chat() — chat endpoint stops returning canned fallback for every request
  • Verify each row in the table above; migrate stale ones, document the OK ones
  • Add a pre-commit lint rule for the wider pattern: <var>: LLMService (or <var> = get_llm_service()) followed by .chat_completion(/.generate_response(/.check_ollama_connection(/.cleanup( is a violation. Builds on discovery(llm): add static check preventing dict-style access on LLMResponse — 7 production files had latent bugs #6940's hook approach.
  • CI smoke test for the chat path that would have caught (2) — assert non-canned response for a known prompt

Severity

P1 — chat is one of the most used surfaces; bug (2) silently degrades it for every user. Discovered only because I was sweeping for stale imports; no pre-existing test caught it.

Related

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions