From 5cc1bb1fd87d621a00696650bd7a0d1d306a6686 Mon Sep 17 00:00:00 2001 From: Sean Handley Date: Wed, 29 Oct 2025 12:05:38 +0000 Subject: [PATCH 1/2] Add usage instructions for EIS. --- explore-analyze/elastic-inference/eis.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/explore-analyze/elastic-inference/eis.md b/explore-analyze/elastic-inference/eis.md index f8294b6b24..728e5f3678 100644 --- a/explore-analyze/elastic-inference/eis.md +++ b/explore-analyze/elastic-inference/eis.md @@ -68,6 +68,10 @@ Tokens are the fundamental units that language models process for both input and For example, the sentence "It was the best of times, it was the worst of times." contains 52 characters but would tokenize into approximately 14 tokens with a typical word-based approach, though the exact count varies by tokenizer. +### Checking Usage + +You can see your token usage by [checking your overall cloud usage](https://cloud.elastic.co/billing/usage) and looking for items that have "Inference" set as the Billing Dimension. + ## Rate Limits The service enforces rate limits on an ongoing basis. Exceeding a limit will result in HTTP 429 responses from the server until the sliding window moves on further and parts of the limit resets. From 5a939971139f13499a4a101db29cd77771fc5ea0 Mon Sep 17 00:00:00 2001 From: Sean Handley Date: Wed, 29 Oct 2025 13:36:26 +0000 Subject: [PATCH 2/2] Update explore-analyze/elastic-inference/eis.md Co-authored-by: Liam Thompson --- explore-analyze/elastic-inference/eis.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/explore-analyze/elastic-inference/eis.md b/explore-analyze/elastic-inference/eis.md index 728e5f3678..7dd9570c74 100644 --- a/explore-analyze/elastic-inference/eis.md +++ b/explore-analyze/elastic-inference/eis.md @@ -68,11 +68,14 @@ Tokens are the fundamental units that language models process for both input and For example, the sentence "It was the best of times, it was the worst of times." contains 52 characters but would tokenize into approximately 14 tokens with a typical word-based approach, though the exact count varies by tokenizer. -### Checking Usage +### Monitor your token usage -You can see your token usage by [checking your overall cloud usage](https://cloud.elastic.co/billing/usage) and looking for items that have "Inference" set as the Billing Dimension. +To track your token consumption: -## Rate Limits +1. Navigate to [**Billing and subscriptions > Usage**](https://cloud.elastic.co/billing/usage) in the {{ecloud}} Console +2. Look for line items where the **Billing dimension** is set to "Inference" + +## Rate limits The service enforces rate limits on an ongoing basis. Exceeding a limit will result in HTTP 429 responses from the server until the sliding window moves on further and parts of the limit resets.