-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus graphs traffic drops regularly #169
Comments
I would be surprised if this is coming from the routers if it's happening on all devices. Could you share the prometheus output config used for gNMIc ? Do you have I would also look at the promQL expression used in Grafana, I assume it uses the |
I'd agree they are every 50 minutes. My output config is largely the default
My promQL expression is using Example query |
Maybe worth checking the raw collected metrics over the window where the dip is seen. To check what gnmic actually received from the NE |
Could you try with The second set of graphs show that the dips are not in sync across devices while maintaining the ~50minutes dip interval, I would then manually check the values sent by the device for potential lower counter values as @hellt pointed above. |
Are the timestamps sent by the devices in nanoseconds ? |
They are sent by the device in miliseconds - example timestamp 1688578740000 I confirmed that both the server running gnmic and the server running prometheus are sync'd to the same timesource and the reported time between those servers and the Juniper devices is correct. |
gNMIc Prometheus output converts timestamps from nanoseconds to milliseconds. If the device sends timestamps in ms, there is a processor to convert them to ns: processors:
ts-ms-to-ns:
event-starlark:
source: |
def apply(*events):
for e in events:
# change timestmap to ns
e.timestamp = int(e.timestamp * 1000 * 1000)
return events Then add the processor name under the prometheus output: outputs:
prometheus:
type: prometheus
listen: :9804
path: /metrics
event-processors:
- ts-ms-to-ns # <--new processor
- interface-descriptions
service-registration:
address: 10.249.0.250:8500
service-address: 10.249.1.215 |
I'm using gnmic to monitor various Juniper MX devices, i'm subscribing to interface metrics and the sample interval is configured as 10s. My output is prometheus which is configured to scrape every 20s.
Once I graph my data, there is a routine drop in traffic across every device and every interface roughly every 60 minutes. At this point i'm unsure if it's something the Juniper device is doing, or something gnmic is doing or where to begin troubleshooting.
Example graphs from 2 different devices, different hardware (MX10k vs MX204) running different Junos software versions.
The text was updated successfully, but these errors were encountered: