Metrics Docs Updates #2560

sekyondaMeta · 2023-08-31T15:50:03Z

Updates to metrics docs to make them easier to follow. Information re-organized and emphasized.

Fixes #2495

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
New feature (non-breaking change which adds functionality)
This change requires a documentation update

Feature/Issue validation/testing

Pages built locally to test them.

Checklist:

Did you have fun?
Have you added tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

Updates to metrics docs to make them easier to follow

Something weird with this link. It is possible the site itself rejects the lint check

sekyondaMeta · 2023-08-31T16:04:45Z

@namannandan & @agunapal Could you guys take a look at this, I want to make sure I did not change the meaning of the docs. I made some changes to the structure of the metrics docs to try and make them clearer per the issue #2495.

msaroufim · 2023-08-31T16:07:29Z

docs/metrics_api.md

 For details refer [Torchserve config](configuration.md) docs.

+**Note** This is not to be confused with the [custom metrics API](metrics.md) which is the API used in the backend handler to emit metrics.


Could make this clearer for folks. It doesn't have to fit into a short note either

codecov · 2023-08-31T16:19:06Z

Codecov Report

Merging #2560 (aed33ec) into master (b04f6de) will not change coverage.
The diff coverage is n/a.

❗ Current head aed33ec differs from pull request most recent head 63fdd8e. Consider uploading reports for the commit 63fdd8e to get more accurate results

@@           Coverage Diff           @@
##           master    #2560   +/-   ##
=======================================
  Coverage   70.87%   70.87%           
=======================================
  Files          83       83           
  Lines        3839     3839           
  Branches       58       58           
=======================================
  Hits         2721     2721           
  Misses       1114     1114           
  Partials        4        4

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

lxning · 2023-09-08T15:07:50Z

docs/metrics.md

+Dynamic updates between the frontend and backend are _not_ currently being handled.
+


Can we reword a little bit to express "For performance concern, currently only metrics defined in metrics configuration file can be published to Prometheus" (ie. available via the metrics API endpoint.)?

lxning · 2023-09-08T15:13:14Z

docs/metrics.md


-## Frontend Metrics
+
+Note that **only** the metrics defined in the **metrics configuration file** can be emitted to logs or made available via the [metrics API endpoint](metrics_api.md). This is done to ensure that the metrics configuration file serves as a central inventory of all the metrics that Torchserve can emit.


"can be emitted to logs" => "can be emitted to model_metrics.log", otherwise dumps to model_log.log

Metrics docs updates based off comments.

…into metricsUpdate

sekyondaMeta · 2023-09-08T18:41:14Z

@msaroufim @lxning My github is acting up but made some updates based. off your comments. let me know if other changes are needed.

msaroufim · 2023-09-11T00:02:27Z

@namannandan could you please review and merge this?

namannandan · 2023-09-12T20:25:45Z

docs/metrics.md

+
+TorchServe defines metrics in a [yaml](https://github.com/pytorch/serve/blob/master/ts/configs/metrics.yaml) file, including both frontend metrics (i.e. `ts_metrics`) and backend metrics (i.e. `model_metrics`).
+When TorchServe is started, the metrics definition is loaded in the frontend and backend cache separately.
+The backend flushes the metrics cache once a load model or inference request is completed.


This line can be reworded as follows:

The backend emits metrics logs as and when they are updated. The frontend parses these logs and makes the corresponding metrics available either as logs or via the metrics API endpoint based on the metrics_mode configuration.

namannandan · 2023-09-12T20:31:16Z

docs/metrics.md

+When TorchServe is started, the metrics definition is loaded in the frontend and backend cache separately.
+The backend flushes the metrics cache once a load model or inference request is completed.
+
+Dynamic updates between the frontend and backend are _not_ currently being handled.


This line can be reworded as follows:

Dynamic updates to the metrics configuration file is currently not supported. In order to take into account, updates made to the metrics configuration file, Torchserve will need to be restarted.

namannandan · 2023-09-12T20:34:54Z

docs/metrics.md

+
+Note that **only** the metrics defined in the **metrics configuration file** can be emitted to model_metrics.log or made available via the [metrics API endpoint](metrics_api.md). This is done to ensure that the metrics configuration file serves as a central inventory of all the metrics that Torchserve can emit.
+
+Default metrics are provided in the [metrics.yaml](https://github.com/pytorch/serve/blob/master/ts/configs/metrics.yaml) file, but the user can either delete them to their liking / ignore them altogether, because these metrics will not be emitted unless they are edited.\


Nit:
..... because these metrics will not be emitted unless they are updated.
instead of
..... because these metrics will not be emitted unless they are edited.

namannandan · 2023-09-12T20:37:44Z

docs/metrics.md

+
+### Starting TorchServe Metrics
+
+Whenever torchserve starts, the [backend worker](https://github.com/pytorch/serve/blob/master/ts/model_service_worker.py) initializes `service.context.metrics` with the [MetricsCache](https://github.com/pytorch/serve/blob/master/ts/metrics/metric_cache_yaml_impl.py) object. The `model_metrics` (backend metrics) section within the specified yaml file will be parsed, and Metric objects will be created based on the parsed section and added  that are added to the cache.


Nit: ..... will be created based on the parsed section and added to the cache.

namannandan · 2023-09-12T20:52:43Z

docs/metrics.md

+metric1 = metrics.add_metric("GenericMetric", unit=unit, dimension_names=["name1", "name2", ...], metric_type=MetricTypes.GAUGE)
 metric.add_or_update(value, dimension_values=["value1", "value2", ...])


Replace the above two lines with the following:

metrics.add_metric("GenericMetric", value, unit=unit, dimension_names=["name1", "name2", ...], metric_type=MetricTypes.GAUGE)

namannandan · 2023-09-12T22:59:02Z

docs/metrics.md

@@ -317,35 +338,31 @@ dimN= Dimension(name_n, value_n)

 One can add metrics with generic units using the following function.

-#### Function API to add generic metrics without default dimensions
+Function API


Please retain this section about add_metric_to_cache method since it is part of the API.

namannandan · 2023-09-12T23:00:19Z

docs/metrics.md

@@ -370,52 +387,10 @@ One can add metrics with generic units using the following function.
 # Add Distance as a metric
 # dimensions = [dim1, dim2, dim3, ..., dimN]
 # Assuming batch size is 1 for example
-metric = metrics.add_metric_to_cache('DistanceInKM', unit='km', dimension_names=[...])
+metric = metrics.add_metric('DistanceInKM', unit='km', dimension_names=[...])
 metric.add_or_update(distance, dimension_values=[...])
 ```



Please retain the following section on add_metric.

namannandan · 2023-09-12T23:08:32Z

docs/metrics.md

@@ -425,15 +400,15 @@ Add time-based by invoking the following method:
 Function API

 ```python
-    def add_time(self, name: str, value: int or float, idx=None, unit: str = 'ms', dimensions: list = None,
+    def add_time(self, metric_name: str, value: int or float, idx=None, unit: str = 'ms', dimensions: list = None,


The argument name defined in the source code is name and not metric_name.
The same applies to other functions of the API below.

namannandan · 2023-09-12T23:09:48Z

docs/metrics.md

                 metric_type: MetricTypes = MetricTypes.GAUGE):
        """
        Add a time based metric like latency, default unit is 'ms'
            Default metric type is gauge

        Parameters
        ----------
-        name : str
+        metric_name : str


Same as above

It looks like for these changes, I might have been using a slightly older metrics.md before you made your changes. I have updated this.

namannandan · 2023-09-12T23:16:51Z

docs/metrics_api.md

@@ -1,10 +1,13 @@
 # Metrics API

-Metrics API is listening on port 8082 and only accessible from localhost by default. To change the default setting, see [TorchServe Configuration](configuration.md). The default metrics endpoint returns Prometheus formatted metrics when [metrics_mode](https://github.com/pytorch/serve/blob/master/docs/metrics.md) configuration is set to `prometheus`. You can query metrics using curl requests or point a [Prometheus Server](#prometheus-server) to the endpoint and use [Grafana](#grafana) for dashboards.
+Metrics API is a http API that is used to fetch metrics in the prometheus format. It is listening on port 8082 and only accessible from localhost by default. To change the default setting, see [TorchServe Configuration](configuration.md). The metrics endpoint is on by default and returns Prometheus formatted metrics when [metrics_mode](https://github.com/pytorch/serve/blob/master/docs/metrics.md) configuration is set to `prometheus`. You can query metrics using curl requests or point a [Prometheus Server](#prometheus-server) to the endpoint and use [Grafana](#grafana) for dashboards.


Nit: ..... The metrics endpoint is enabled by default and .....

Updates per Naman's comments

…into metricsUpdate

sekyondaMeta · 2023-09-13T16:41:28Z

@namannandan Thanks for the feedback. I have made the updates. Let me know if any other updates are needed.

sekyondaMeta added 2 commits August 31, 2023 11:39

Metrics Docs Updates

230475e

Updates to metrics docs to make them easier to follow

Update metrics.md

202a884

sekyondaMeta requested a review from namannandan August 31, 2023 15:50

sekyondaMeta requested a review from msaroufim as a code owner August 31, 2023 15:50

Update README.md

0434cbc

Something weird with this link. It is possible the site itself rejects the lint check

sekyondaMeta requested review from jagadeeshi2i and agunapal as code owners August 31, 2023 15:58

msaroufim requested changes Aug 31, 2023

View reviewed changes

Update index.rst

131f5e3

lxning reviewed Sep 8, 2023

View reviewed changes

sekyondaMeta added 3 commits September 8, 2023 13:31

Metric docs updates

34bef7b

Metrics docs updates based off comments.

Merge branch 'pytorch:master' into metricsUpdate

ec6a58d

Merge branch 'metricsUpdate' of https://github.com/sekyondaMeta/serve …

f2dbe3c

…into metricsUpdate

Merge branch 'pytorch:master' into metricsUpdate

08872f9

namannandan requested changes Sep 12, 2023

View reviewed changes

sekyondaMeta added 4 commits September 13, 2023 10:58

Merge branch 'pytorch:master' into metricsUpdate

fa86e84

Metrics Updates

d24748c

Updates per Naman's comments

Update metrics.md

70ec486

Merge branch 'metricsUpdate' of https://github.com/sekyondaMeta/serve …

eb57cef

…into metricsUpdate

namannandan approved these changes Sep 13, 2023

View reviewed changes

namannandan enabled auto-merge September 13, 2023 20:25

Merge branch 'master' into metricsUpdate

63fdd8e

msaroufim self-requested a review September 15, 2023 19:06

msaroufim approved these changes Sep 15, 2023

View reviewed changes

namannandan added this pull request to the merge queue Sep 15, 2023

Merged via the queue into pytorch:master with commit b3eced5 Sep 15, 2023
11 of 13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics Docs Updates #2560

Metrics Docs Updates #2560

sekyondaMeta commented Aug 31, 2023

sekyondaMeta commented Aug 31, 2023

msaroufim Aug 31, 2023

codecov bot commented Aug 31, 2023 •

edited

lxning Sep 8, 2023

lxning Sep 8, 2023

sekyondaMeta commented Sep 8, 2023

msaroufim commented Sep 11, 2023

namannandan Sep 12, 2023

namannandan Sep 12, 2023

namannandan Sep 12, 2023

namannandan Sep 12, 2023

namannandan Sep 12, 2023

namannandan Sep 12, 2023

namannandan Sep 12, 2023

namannandan Sep 12, 2023

namannandan Sep 12, 2023

sekyondaMeta Sep 13, 2023

namannandan Sep 12, 2023

sekyondaMeta commented Sep 13, 2023

		For details refer [Torchserve config](configuration.md) docs.

		Note This is not to be confused with the [custom metrics API](metrics.md) which is the API used in the backend handler to emit metrics.

		Dynamic updates between the frontend and backend are _not_ currently being handled.


		## Frontend Metrics

		Note that only the metrics defined in the metrics configuration file can be emitted to logs or made available via the [metrics API endpoint](metrics_api.md). This is done to ensure that the metrics configuration file serves as a central inventory of all the metrics that Torchserve can emit.


		Note that only the metrics defined in the metrics configuration file can be emitted to model_metrics.log or made available via the [metrics API endpoint](metrics_api.md). This is done to ensure that the metrics configuration file serves as a central inventory of all the metrics that Torchserve can emit.

		Default metrics are provided in the [metrics.yaml](https://github.com/pytorch/serve/blob/master/ts/configs/metrics.yaml) file, but the user can either delete them to their liking / ignore them altogether, because these metrics will not be emitted unless they are edited.\


		### Starting TorchServe Metrics

		Whenever torchserve starts, the [backend worker](https://github.com/pytorch/serve/blob/master/ts/model_service_worker.py) initializes `service.context.metrics` with the [MetricsCache](https://github.com/pytorch/serve/blob/master/ts/metrics/metric_cache_yaml_impl.py) object. The `model_metrics` (backend metrics) section within the specified yaml file will be parsed, and Metric objects will be created based on the parsed section and added that are added to the cache.

		metric1 = metrics.add_metric("GenericMetric", unit=unit, dimension_names=["name1", "name2", ...], metric_type=MetricTypes.GAUGE)
		metric.add_or_update(value, dimension_values=["value1", "value2", ...])

Metrics Docs Updates #2560

Metrics Docs Updates #2560

Conversation

sekyondaMeta commented Aug 31, 2023

Type of change

Feature/Issue validation/testing

Checklist:

sekyondaMeta commented Aug 31, 2023

Choose a reason for hiding this comment

codecov bot commented Aug 31, 2023 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sekyondaMeta commented Sep 8, 2023

msaroufim commented Sep 11, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sekyondaMeta commented Sep 13, 2023

codecov bot commented Aug 31, 2023 •

edited