Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ai): add AI gateway metrics #3087

Merged
merged 11 commits into from
Jul 17, 2024
Merged

feat(ai): add AI gateway metrics #3087

merged 11 commits into from
Jul 17, 2024

Conversation

rickstaa
Copy link
Contributor

@rickstaa rickstaa commented Jun 26, 2024

What does this pull request do? Explain your changes. (required)

This pull request introduces new Gateway AI metrics to the ai-video branch:

  • Ticket_value_sent: The value of the AI tickets that are send to the Orchestrators.
  • Tickets_send: The total amount of tickets send to the Orchestrators.
  • ai_models_requested: Tracks the number of requests per capability and model.
  • ai_request_latency_score: Measures latency scores per model request per orchestrator.
  • ai_request_price: Records the price paid per unit for each model request.
  • ai_request_errors: Logs AI request errors per orchestrator. If no orchestrator is specified, this indicates a general request scenario where no orchestrators were found.

Specific updates (required)

  • Updates census.go to include the new Gateway metrics.
  • Updates ai_process to log these metrics.

How did you test each of these updates (required)

I set up both an on-chain and off-chain gateway to validate the metrics. I verified their visibility at http://localhost:5935/metrics and ensured they were correctly visualized in Grafana.

Does this pull request close any open issues?

This implements the functionality outlined in https://livepeer-ai.productlane.com/roadmap?id=58c9cd5d-1c64-4fb3-b8d0-a7e20b7865a2.

Checklist:

How to test

  1. Check out this pull request.
  2. Spin up an on-chain gateway with attached orchestrators.
  3. Clone the repository https://github.com/rickstaa/livepeer-monitor-test.
  4. Execute the Dockerfile in that repository to launch Prometheus and Grafana servers.
  5. Navigate to http://localhost:5935/metrics to view the new AI Gateway metrics.
  6. Visit http://localhost:3000 to inspect these metrics in Grafana.

@rickstaa rickstaa changed the base branch from master to ai-video June 26, 2024 17:29
@rickstaa rickstaa force-pushed the ai-gateway-metrics-poc branch 2 times, most recently from 7302537 to a456c04 Compare July 8, 2024 13:48
@rickstaa rickstaa mentioned this pull request Jul 8, 2024
5 tasks
@rickstaa rickstaa changed the title [POC] Ai gateway metrics Ai gateway metrics Jul 8, 2024
server/ai_process.go Outdated Show resolved Hide resolved
@rickstaa rickstaa changed the title Ai gateway metrics feat(ai): add AI gateway metrics Jul 14, 2024
@rickstaa rickstaa requested a review from eliteprox July 14, 2024 10:37
eliteprox and others added 10 commits July 17, 2024 15:53
This commit adds the initial AI gateway metrics so that they can
reviewed by others. The code still need to be cleaned up and the buckets
adjusted.
This commit improves the AI metrics so that they are easier to work
with.
This commit ensures that an error is logged when the Gateway could not
find orchestrators for a given model and capability.
This commit ensure that the `ticket_value_sent` abd `tickets_sent`
metrics are also created for a AI Gateway.
This commit ensures that the AI gateway metrics contain the orch address
label.
This commit ensures that the AI job pricing is calculated correctly and
cleans up the codebase.
This commit removes the Orch label from the ai_request_price metrics
since that information is better to be retrieved from another endpoint.
@rickstaa rickstaa merged commit f60c0c5 into ai-video Jul 17, 2024
6 of 8 checks passed
@rickstaa rickstaa deleted the ai-gateway-metrics-poc branch July 17, 2024 15:51
eliteprox added a commit to eliteprox/go-livepeer that referenced this pull request Jul 26, 2024
* Add gateway metric for roundtrip ai times by model and pipeline

* Rename metrics and add unique manifest

* Fix name mismatch

* modelsRequested not working correctly

* feat: add initial POC AI gateway metrics

This commit adds the initial AI gateway metrics so that they can
reviewed by others. The code still need to be cleaned up and the buckets
adjusted.

* feat: improve AI metrics

This commit improves the AI metrics so that they are easier to work
with.

* feat(ai): log no capacity error to metrics

This commit ensures that an error is logged when the Gateway could not
find orchestrators for a given model and capability.

* feat(ai): add TicketValueSent and TicketsSent metrics

This commit ensure that the `ticket_value_sent` abd `tickets_sent`
metrics are also created for a AI Gateway.

* fix(ai): ensure that AI metrics have orch address label

This commit ensures that the AI gateway metrics contain the orch address
label.

* fix(ai): fix incorrect Gateway pricing metric

This commit ensures that the AI job pricing is calculated correctly and
cleans up the codebase.

* refactor(ai): remove Orch label from ai_request_price metric

This commit removes the Orch label from the ai_request_price metrics
since that information is better to be retrieved from another endpoint.

---------

Co-authored-by: Elite Encoder <john@eliteencoder.net>
eliteprox added a commit to eliteprox/go-livepeer that referenced this pull request Jul 26, 2024
* Add gateway metric for roundtrip ai times by model and pipeline

* Rename metrics and add unique manifest

* Fix name mismatch

* modelsRequested not working correctly

* feat: add initial POC AI gateway metrics

This commit adds the initial AI gateway metrics so that they can
reviewed by others. The code still need to be cleaned up and the buckets
adjusted.

* feat: improve AI metrics

This commit improves the AI metrics so that they are easier to work
with.

* feat(ai): log no capacity error to metrics

This commit ensures that an error is logged when the Gateway could not
find orchestrators for a given model and capability.

* feat(ai): add TicketValueSent and TicketsSent metrics

This commit ensure that the `ticket_value_sent` abd `tickets_sent`
metrics are also created for a AI Gateway.

* fix(ai): ensure that AI metrics have orch address label

This commit ensures that the AI gateway metrics contain the orch address
label.

* fix(ai): fix incorrect Gateway pricing metric

This commit ensures that the AI job pricing is calculated correctly and
cleans up the codebase.

* refactor(ai): remove Orch label from ai_request_price metric

This commit removes the Orch label from the ai_request_price metrics
since that information is better to be retrieved from another endpoint.

---------

Co-authored-by: Elite Encoder <john@eliteencoder.net>
@pwilczynskiclearcode
Copy link
Contributor

#3185

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants