Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Unified Observability] Metrics overview table is not showing RX or TX metrics although they're available #131152

Closed
formgeist opened this issue Apr 28, 2022 · 24 comments · Fixed by #134471
Assignees
Labels
bug Fixes for quality problems that affect the customer experience Feature:Metrics UI Metrics UI feature Feature:Observability Overview Relaunch of the Observability Overview Page Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services Team:Unified observability v8.4.0

Comments

@formgeist
Copy link
Contributor

Kibana version:

8.3.0-SNAPSHOT

Description of the problem including expected versus actual behavior:

When viewing the Observability overview page panel for Metrics (Hosts) we display the list of available hosts for the selected time range, but the RX and TX metrics are missing. When reviewing the individual host details, we can see the inbound and outbound metrics as expected.

CleanShot 2022-04-28 at 15 23 07@2x

CleanShot 2022-04-28 at 15 22 52@2x

Steps to reproduce:

  1. Go to the Observability overview page and review the metrics shown in the hosts panel
  2. Click on one of the displayed hosts
  3. Perhaps extend the time range from the initial 15 minutes to 1 hour, and there should be metrics displayed for inbound and outbound traffic.
  4. Review the Overview panel again with the new time range, and there are no inbound or outbound metrics displayed in the table.
@formgeist formgeist added bug Fixes for quality problems that affect the customer experience Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services Team:Unified observability Feature:Observability Overview Relaunch of the Observability Overview Page v8.3.0 labels Apr 28, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/infra-monitoring-ui (Team:Infra Monitoring UI)

@elasticmachine
Copy link
Contributor

Pinging @elastic/unified-observability (Team:Unified observability)

@ajoliveira
Copy link

Also seeing this in 8.2 currently - for request being made from browser to Kibana (/api/metrics/overview/top), browser Dev Tools confirms timeseries object is there and includes values for mini chart, but the top level RX/TX values are zeroes.
Screen Shot 2022-05-11 at 8 32 23 AM

@miltonhultgren miltonhultgren added the Feature:Metrics UI Metrics UI feature label May 12, 2022
@jasonrhodes jasonrhodes added v8.4.0 and removed v8.3.0 labels May 23, 2022
@jasonrhodes
Copy link
Member

We didn't have this prioritized into a cycle so this won't make 8.3 FF -- I've moved it to 8.4 and we'll try to prioritize for the next available cycle. If it makes it in time, we can consider adding to 8.3.1 as well.

@miltonhultgren miltonhultgren self-assigned this May 23, 2022
@miltonhultgren
Copy link
Contributor

Some notes from me:
The MetricWithSparkline component that is rendered in that cell does an early return if the value is 0, even if the timeseries has values so that's correct with what the API returns.

Looking at the API, the query grabs the rx/tx values from the fields host.network.in.bytes and host.network.out.bytes but then does the timeseries aggregation on the fields host.network.ingress.bytes and host.network.egress.bytes.

Further, the Node Details page uses the fields system.network.in.bytes and system.network.out.bytes.
So I guess host.network.in.bytes should have been system.network.in.bytes, but that doesn't answer for why the timeseries uses a different field (host.network.ingress.bytes).

@jasonrhodes The fix is mostly just changing a few strings to pick the right field but what is the right field? :)

@miltonhultgren miltonhultgren removed their assignment May 30, 2022
@jasonrhodes
Copy link
Member

Thanks, @miltonhultgren! I think in.bytes and out.bytes are "old" / "legacy" fields. The ECS schema doesn't mention them at all anymore. https://www.elastic.co/guide/en/ecs/current/ecs-host.html

For the other fields mentioned, here's what I found:

ECS field ECS description
host.network.ingress.bytes The number of bytes received (gauge) on all network interfaces by the host since the last metric collection.
host.network.egress.bytes The number of bytes (gauge) sent out on all network interfaces by the host since the last metric collection.
system.network.in.bytes not part of core ECS
system.network.out.bytes not part of core ECS

If I remember correctly, the switch from host.network.in to host.network.ingress was changing the value to a gauge, which may have affected our ability to do the aggregation on it, but I'm not sure, so I'll need to look deeper in.

@kaiyan-sheng / @simianhacker I remember conversations about this migration from long ago, but do either of you have more context on (a) which fields we should be using now, and (b) whether we should be doing anything to accommodate users with older data that may use other fields?

@simianhacker
Copy link
Member

We should be moving to the new host.network fields. The advantage is that they are gauges so we don't have to do the max/derivative calculations anymore.

@kaiyan-sheng
Copy link
Contributor

Yep host.network fields are the ones that should be used.

@neptunian neptunian self-assigned this Jun 7, 2022
@neptunian
Copy link
Contributor

I was able to swap out the non ECS field for the ECS fields which got things working again. I'm not sure whether we need to support either field types for users with older data. I think we would need to create separate inventory and TSVB models to support either field when one doesn't exist. However, we couldn't support users with both fields (old and new data) as the fields have different metric types (cumulative counter vs gauge), so I am leaning towards just supporting the ECS field, but will defer to @jasonrhodes . I guess there is also the option of only using the non ECS fields.

The ECS fields were introduced in 7.14 elastic/beats#24312 .

@jasonrhodes
Copy link
Member

jasonrhodes commented Jun 7, 2022

Yeah I had the same question above. I don't know how we can support more than a tight window of versions from the collectors if we ever want to make progress. But will we be okay telling users that parts of the curated UI simply don't work if they don't use a certain version of Metricbeat?

@neptunian I vote for moving forward and making progress, i.e. only supporting the ECS field types. If we do that, what happens to the chart for someone collecting these data with MB <= 7.13? RX and TX would read as "NA" like they do in the original ticket's screenshot, right?

@neptunian
Copy link
Contributor

Yeah I had the same question above. I don't know how we can support more than a tight window of versions from the collectors if we ever want to make progress. But will we be okay telling users that parts of the curated UI simply don't work if they don't use a certain version of Metricbeat?

Is there a certain version difference where in Kibana we don't have to support "old" fields indexed by older versions of beats? If so I'm tempted to just hold off on using these ECS fields until we know we don't need to look at the "old" fields and they are very likely to have the new ecs ones. I guess I'm assuming most people running Kibana 8 are going to be running metricbeat 8 and not looking at data that old, which makes me just want to use the new ECS fields, but I'm not sure.

@neptunian I vote for moving forward and making progress, i.e. only supporting the ECS field types. If we do that, what happens to the chart for someone collecting these data with MB <= 7.13? RX and TX would read as "NA" like they do in the original ticket's screenshot, right?

Yes it would show N/A. Though this would also affect the Metrics app where I changed the fields as well. Same with the Metrics Inventory Host Details page.

7.13.4 beats data:
Screen Shot 2022-06-07 at 5 48 26 PM

@jasonrhodes
Copy link
Member

Is there a certain version difference where in Kibana we don't have to support "old" fields indexed by older versions of beats?

No, I don't think so. Not that I know of anyway.

I guess I'm assuming most people running Kibana 8 are going to be running metricbeat 8 and not looking at data that old, which makes me just want to use the new ECS fields, but I'm not sure.

I don't know the answer to this, we could possibly inspect telemetry but we'd need support from someone on the beats side of things, likely. Even so, with make it minor in flux, I don't know if we have a concrete policy with 8.0+ on this kind of thing beyond "just try to support everything forever".

Yes it would show N/A. Though this would also affect the Metrics app where I changed the fields as well. Same with the Metrics Inventory Host Details page.

My preference would be to move forward with new fields and use UI helpers to educate users, but we'd need some design help to figure out if that's possible/ok. Small "i" icons next to table headers, for example, that explain the field used to power a given column or row in any metrics presentation, would likely be enough to help avoid SDHs and entice people to upgrade or, at the very least, ignore the NA / "-" values since they understand why the have them.

In the future, if we make our components "Kibana-embeddable" and allow users to swap out visualizations inside our curated UI with clones that they modify, a user could clone our chart, modify the field it uses, and use their own version for older data. We're a long way from there but it's another reason to move in that direction, imo.

@miltonhultgren
Copy link
Contributor

miltonhultgren commented Jun 8, 2022

I wonder if it would make sense to make "Metrics" in the inventory model a configurable item (like another saved object).
So that we can say things like "This object represents the CPU metric for Hosts, anytime in the UI we need that metric we'll use this definition" (not sure what that definition would include beyond field or how to avoid a massive explosion of such objects).
We can display this info in the little info icons saying "If you want to use a different field for this metric, change this object".
Maybe that's very similar to the embeddable idea but I just hope to find a place to embed more of the domain knowledge.
But, tangentially, I also wonder how that relates to things like having the same field in multiple documents from different sources and if that can be treated as the same "value" or not. MetricValueResolverStrategy TM (once a Java programer, always a Java programer)

I also feel like maybe we should think about moving away from the TSDB things to make it easier to swap which field we're using?

@neptunian
Copy link
Contributor

Yes it would show N/A. Though this would also affect the Metrics app where I changed the fields as well. Same with the Metrics Inventory Host Details page.

My preference would be to move forward with new fields and use UI helpers to educate users, but we'd need some design help to figure out if that's possible/ok. Small "i" icons next to table headers, for example, that explain the field used to power a given column or row in any metrics presentation, would likely be enough to help avoid SDHs and entice people to upgrade or, at the very least, ignore the NA / "-" values since they understand why the have them.

I'm not sure this would be easy to accomplish with the popover on the waffle map. I don't think there is room there. I imagine there might be other reasons in these various areas why there could be "N/A", "-", or "0" (they all seem to be have different values in this same scenario) so I think we'd need to update them to all say N/A when the field does not exist and not just because there's no data in the field or its zero and only show the icon in that situation if that's what we choose to do.

This solution doesn't address the problem that the value is not really correct, though. If my time window is a couple of months and I have data that's in the "old" field but also date in the new ECS field it will not take the average of any of the old field data. If we're okay with that I think the messaging should probably just be generic and always there whether or not they have "N/A" so they are aware the value may not be accurate. Since in some cases the chart will be there to show where the data has started, perhaps this isn't necessary.

As an alternative, what if we don't do the UI helper stuff and just rely on the docs? After these docs are updated: https://www.elastic.co/guide/en/observability/current/host-metrics.html with the new field, when an SDH comes in we could refer them here saying 8.4 uses a different field with an added note of what version of metricbeat it was introduced. Could also add to the Inventory page docs. Having the added UI seems like it could be messy if we started doing this in every scenario like this or even just this one.

@katefarrar Any thoughts?

@jasonrhodes
Copy link
Member

If we're okay with that I think the messaging should probably just be generic and always there whether or not they have "N/A" so they are aware the value may not be accurate.

Yeah, that's what I was thinking re: the "i" icon idea: that it would always be there to explain the field that drives that data. It's not really meant to explain the "N/A" but rather to explain the field's data source, which can then be used to track down why data may seem inaccurate, missing, etc. in many different cases.

As an alternative, what if we don't do the UI helper stuff and just rely on the docs?

This makes sense to me, too, and ties in with the idea of just providing information to the user about the fields in use from within the UI. That may not be possible in every place, depending on the UI elements used, but it'd be nice to develop conventions for sharing that information with users as we continue to grow these UIs.

@katefarrar
Copy link
Contributor

As an alternative, what if we don't do the UI helper stuff and just rely on the docs?

This sounds good to me. Let me know if I can help with anything related to this. Thanks!

@neptunian
Copy link
Contributor

@kaiyan-sheng I notice system.network.in.bytes and system.network.out.bytes continue to be sent by metricbeat. Will these not be supported at some point in favor of the new ECS fields? Since they are currently still in use it seems like one option is to just continue using these fields in order to avoid the missing data of switching to the different ECS field.

@neptunian
Copy link
Contributor

Changing of the field also affects the Inventory Threshold Rule for Inbound/Outbound traffic
Screen Shot 2022-06-15 at 2 28 32 PM

@kaiyan-sheng
Copy link
Contributor

@neptunian I'm not sure about when to remove these fields. We suppose to remove these fields at 8.0 but we missed it. @fearful-symmetry Do you know if we can remove these fields for 8.4?

@jasonrhodes
Copy link
Member

@kaiyan-sheng I notice system.network.in.bytes and system.network.out.bytes continue to be sent by metricbeat. Will these not be supported at some point in favor of the new ECS fields? Since they are currently still in use it seems like one option is to just continue using these fields in order to avoid the missing data of switching to the different ECS field.

I think this would basically mean never getting to take advantage of the new optimized fields that we spent a lot of time designing and implementing. A user can create their own dashboard with the old data since it's still there (as long as MB continues to ship it), but we probably don't want to continue to penalize new users. @kaiyan-sheng I think it's likely even more important that MB continue to ship them than it is for the UI to continue to use them, because the existence of the fields guarantees that old alerts continue to work, user-defined dashboards continue to work, etc.

We could create a complicated fork in the curated UI code for every field change like this, but I don't see how that will be sustainable. I think the 80% case is that users don't need to compare these data in our charts back across long periods of time, so I'd prefer for us to move forward with these kinds of changes when they become available. Does that make sense?

@neptunian
Copy link
Contributor

neptunian commented Jun 16, 2022

Agreed. I think I was more curious as to why these non ECS fields continue to exist, but that explanation makes sense. Thanks!

@kaiyan-sheng
Copy link
Contributor

@jasonrhodes Yep I agree. Removing the old fields will be a breaking change so we probably will have to wait for 9.0. But in the mean time, if we can get the UI to start using the new fields soon, that would be awesome.

@neptunian
Copy link
Contributor

neptunian commented Jun 23, 2022

@kaiyan-sheng Should we also be replacing docker.network.outbound.bytes and docker.network.inbound.bytes with this same field? host.network.ingress.bytes and host.network.egress.bytes? We handle them similarly to system.network.inbound.bytes and system.network.outbound.bytes. I noticed the docs for the ECS fields say "Host types include hardware, virtual machines, Docker containers, and Kubernetes nodes."

@kaiyan-sheng
Copy link
Contributor

@neptunian Hmm good point, we are planning on adding a separate group of fields for containers and that will include docker there. Here is the issue for that: elastic/beats#22179

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Metrics UI Metrics UI feature Feature:Observability Overview Relaunch of the Observability Overview Page Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services Team:Unified observability v8.4.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants