Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] cache_miss_count is highly misleading for PyTorch models #157385

Closed
droberts195 opened this issue May 11, 2023 · 2 comments · Fixed by #160265 or #160599
Closed

[ML] cache_miss_count is highly misleading for PyTorch models #157385

droberts195 opened this issue May 11, 2023 · 2 comments · Fixed by #160265 or #160599
Assignees
Labels
bug Fixes for quality problems that affect the customer experience Feature:3rd Party Models ML 3rd party models :ml v8.9.0

Comments

@droberts195
Copy link
Contributor

Kibana version: 8.8.0
Elasticsearch version: 8.8.0

This screenshot shows cache_miss_count: 0. For PyTorch models it would be better not to show this.

Screenshot 2023-05-11 at 08 39 55

We have two types of trained models:

  1. Models that run inside the Elasticsearch JVM
  2. PyTorch models that need to be deployed and run in external processes outside of the JVM

Both types of models have a "cache" associated with them, but it means something completely different for the two cases:

  1. Models that run within the JVM are loaded into memory when used, and then cached in memory to avoid repeatedly loading from an index. So in this case the "cache" is a model cache.
  2. PyTorch models also have a cache, but this is per allocation and caches responses for given inputs. So in this case the "cache" is a response cache. To show a sensible figure at the top level would mean summing the per allocation cache hit counts and displaying that.

In the short term, to avoid confusion, the easiest fix is probably to not show cache_miss_count for PyTorch models.

@droberts195 droberts195 added bug Fixes for quality problems that affect the customer experience :ml Feature:3rd Party Models ML 3rd party models labels May 11, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui (:ml)

darnautov added a commit that referenced this issue Jun 22, 2023
## Summary

Resolves #157385

<img width="1355" alt="image"
src="https://github.com/elastic/kibana/assets/5236598/cb779fb5-f9d5-4222-af38-e636724867a3">

### Checklist

- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
@davidkyle
Copy link
Member

Consider removing the Inference Stats section entirely for PyTorch models. The salient information (inference_count, timestamp) is a repeat of what is already displayed in the Deployment Stats section and missing_all_fields_count is also a little confusing as the PyTorch models take a single input field rather than multiple fields as DFA models do.

failure_count is still of interest. The deployment stats have an error_count field, perhaps that can be displayed in the Deployment Stats instead.

@davidkyle davidkyle reopened this Jun 26, 2023
darnautov added a commit that referenced this issue Jun 28, 2023
## Summary

Resolves #157385

Hides inference stats for the PyTorch models. 

- The salient information (`inference_count`, `timestamp`) is a repeat
of what is already displayed in the Deployment Stats section.
- `missing_all_fields_count` is confusing as the PyTorch models take a
single input field rather than multiple fields as DFA models do, hence
omitted.
- The deployment stats have an
[error_count](https://www.elastic.co/guide/en/elasticsearch/reference/current/get-trained-models-stats.html)
field, hence it has been added to the Deployment Stats and
`failure_count` has been removed.
- Displays the stats tab by default for expanded rows if the model has
started deployments
rshen91 pushed a commit that referenced this issue Jun 28, 2023
## Summary

Resolves #157385

Hides inference stats for the PyTorch models. 

- The salient information (`inference_count`, `timestamp`) is a repeat
of what is already displayed in the Deployment Stats section.
- `missing_all_fields_count` is confusing as the PyTorch models take a
single input field rather than multiple fields as DFA models do, hence
omitted.
- The deployment stats have an
[error_count](https://www.elastic.co/guide/en/elasticsearch/reference/current/get-trained-models-stats.html)
field, hence it has been added to the Deployment Stats and
`failure_count` has been removed.
- Displays the stats tab by default for expanded rows if the model has
started deployments
darnautov added a commit to darnautov/kibana that referenced this issue Jun 29, 2023
## Summary

Resolves elastic#157385

Hides inference stats for the PyTorch models.

- The salient information (`inference_count`, `timestamp`) is a repeat
of what is already displayed in the Deployment Stats section.
- `missing_all_fields_count` is confusing as the PyTorch models take a
single input field rather than multiple fields as DFA models do, hence
omitted.
- The deployment stats have an
[error_count](https://www.elastic.co/guide/en/elasticsearch/reference/current/get-trained-models-stats.html)
field, hence it has been added to the Deployment Stats and
`failure_count` has been removed.
- Displays the stats tab by default for expanded rows if the model has
started deployments

(cherry picked from commit 4064e2b)

# Conflicts:
#	x-pack/plugins/ml/public/application/model_management/expanded_row.tsx
darnautov added a commit that referenced this issue Jun 29, 2023
# Backport

This will backport the following commits from `main` to `8.9`:
- [[ML] Hide inference stats for PyTorch models
(#160599)](#160599)

<!--- Backport version: 8.9.7 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Dima
Arnautov","email":"dmitrii.arnautov@elastic.co"},"sourceCommit":{"committedDate":"2023-06-28T12:55:28Z","message":"[ML]
Hide inference stats for PyTorch models (#160599)\n\n##
Summary\r\n\r\nResolves
#157385 inference
stats for the PyTorch models. \r\n\r\n- The salient information
(`inference_count`, `timestamp`) is a repeat\r\nof what is already
displayed in the Deployment Stats section.\r\n-
`missing_all_fields_count` is confusing as the PyTorch models take
a\r\nsingle input field rather than multiple fields as DFA models do,
hence\r\nomitted.\r\n- The deployment stats have
an\r\n[error_count](https://www.elastic.co/guide/en/elasticsearch/reference/current/get-trained-models-stats.html)\r\nfield,
hence it has been added to the Deployment Stats and\r\n`failure_count`
has been removed.\r\n- Displays the stats tab by default for expanded
rows if the model has\r\nstarted
deployments","sha":"4064e2b7d4ea4a9a0c034d8450808f1a542ac0dd","branchLabelMapping":{"^v8.10.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:fix",":ml","Feature:3rd
Party
Models","Team:ML","v8.9.0","v8.10.0"],"number":160599,"url":"#160599
Hide inference stats for PyTorch models (#160599)\n\n##
Summary\r\n\r\nResolves
#157385 inference
stats for the PyTorch models. \r\n\r\n- The salient information
(`inference_count`, `timestamp`) is a repeat\r\nof what is already
displayed in the Deployment Stats section.\r\n-
`missing_all_fields_count` is confusing as the PyTorch models take
a\r\nsingle input field rather than multiple fields as DFA models do,
hence\r\nomitted.\r\n- The deployment stats have
an\r\n[error_count](https://www.elastic.co/guide/en/elasticsearch/reference/current/get-trained-models-stats.html)\r\nfield,
hence it has been added to the Deployment Stats and\r\n`failure_count`
has been removed.\r\n- Displays the stats tab by default for expanded
rows if the model has\r\nstarted
deployments","sha":"4064e2b7d4ea4a9a0c034d8450808f1a542ac0dd"}},"sourceBranch":"main","suggestedTargetBranches":["8.9"],"targetPullRequestStates":[{"branch":"8.9","label":"v8.9.0","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.10.0","labelRegex":"^v8.10.0$","isSourceBranch":true,"state":"MERGED","url":"#160599
Hide inference stats for PyTorch models (#160599)\n\n##
Summary\r\n\r\nResolves
#157385 inference
stats for the PyTorch models. \r\n\r\n- The salient information
(`inference_count`, `timestamp`) is a repeat\r\nof what is already
displayed in the Deployment Stats section.\r\n-
`missing_all_fields_count` is confusing as the PyTorch models take
a\r\nsingle input field rather than multiple fields as DFA models do,
hence\r\nomitted.\r\n- The deployment stats have
an\r\n[error_count](https://www.elastic.co/guide/en/elasticsearch/reference/current/get-trained-models-stats.html)\r\nfield,
hence it has been added to the Deployment Stats and\r\n`failure_count`
has been removed.\r\n- Displays the stats tab by default for expanded
rows if the model has\r\nstarted
deployments","sha":"4064e2b7d4ea4a9a0c034d8450808f1a542ac0dd"}}]}]
BACKPORT-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:3rd Party Models ML 3rd party models :ml v8.9.0
Projects
None yet
5 participants