Skip to content

feat: add LangWatch as optional observability backend alongside Langfuse#10

Open
tusharjadhav3302 wants to merge 3 commits intoforge-sdlc:mainfrom
tusharjadhav3302:feat/add-langwatch-integration
Open

feat: add LangWatch as optional observability backend alongside Langfuse#10
tusharjadhav3302 wants to merge 3 commits intoforge-sdlc:mainfrom
tusharjadhav3302:feat/add-langwatch-integration

Conversation

@tusharjadhav3302
Copy link
Copy Markdown

Summary

Adds LangWatch as an optional, parallel observability backend alongside Langfuse.

Full proposal with motivation, design, alternatives, and risks:
proposals/009-langwatch-integration.md

Changes

  • New src/forge/integrations/langwatch/ module (2 files, 143 lines)
  • Config additions: LANGWATCH_ENABLED, LANGWATCH_API_KEY, LANGWATCH_ENDPOINT
  • Callback merging in agent, API server startup, worker startup, container env passthrough
  • Updated .env.example and developer guide with LangWatch setup docs
  • Proposal 009 document

No Breaking Changes

  • Langfuse integration is untouched
  • LangWatch is disabled by default (LANGWATCH_ENABLED=false)
  • No new required dependencies

Test Plan

  • uv run forge-serve starts cleanly with LANGWATCH_ENABLED=false
  • uv run forge worker starts cleanly with LANGWATCH_ENABLED=false
  • With LangWatch enabled + valid API key, traces appear in dashboard
  • Langfuse traces still work when both are enabled
  • uv run pytest tests/unit/ passes

Copy link
Copy Markdown

@gryf gryf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small issue with imported module. Other than that - seems right to me.

Comment thread src/forge/main.py Outdated
@eshulman2
Copy link
Copy Markdown
Collaborator

Thanks for the PR and the write-up but I do have some concerns before we can move forward.

On the motivation: A few of the features presented as LangWatch-exclusive actually exist in Langfuse today. For example, Langfuse supports live evaluation and LLM-as-a-judge methods (https://langfuse.com/docs/evaluation/evaluation-methods/llm-as-a-judge), which cover a meaningful part of the evaluation story. Beyond that, Forge's general evaluation philosophy is real-world evaluation - does the code pass CI, does the PR get merged so some of the evaluators mentioned (BLEU, ROUGE) aren't directly relevant to what we're trying to measure. Before we add a second tracing backend, I'd like to see a precise list of: (a) which specific Langfuse capabilities are genuinely missing, (b) how you plan to use the LangWatch equivalents in practice, and (c) how they connect to Forge's actual quality goals.

On technical gaps: From an initial review, there are a few things missing from this PR:

  • No annotations or tagging on the traced data, which would make the traces hard to work with in the dashboard
  • The container support adds the env flags but doesn't include real handling inside the container entrypoint the container agent won't actually emit traces
  • No unit tests, despite the test plan in the PR description

On the broader principle: Adding a second tracing system is a meaningful decision. It adds operational complexity, another service to run, another set of credentials to manage, and another surface to maintain. That cost needs to be justified by a clear capability gap not just "LangWatch has more features." Having additional features doesn't mean we need them or have a good enough reason to use them. My strong preference is to stay with a single tracing system unless there's a concrete, specific capability that Langfuse provably cannot deliver for our use case.

As things stand I'm inclined to reject this on the grounds raised above the PR adds meaningful code complexity without a sufficiently clear reason. If you can provide a complete breakdown of the missing Langfuse capabilities, how you intend to use the LangWatch equivalents, and how they support Forge's tracing and evaluation goals, I'm happy to revisit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants