Skip to content

adding freertr to exporters.md as it just got a built-in, user configurable exporter for all the show commands it have#1759

Closed
mc36 wants to merge 1 commit intoprometheus:masterfrom
mc36:patch-1
Closed

adding freertr to exporters.md as it just got a built-in, user configurable exporter for all the show commands it have#1759
mc36 wants to merge 1 commit intoprometheus:masterfrom
mc36:patch-1

Conversation

@mc36
Copy link

@mc36 mc36 commented Oct 8, 2020

No description provided.

@mc36
Copy link
Author

mc36 commented Oct 8, 2020

the exporter itself can be found here: http://sources.nop.hu/src/serv/servPrometheus.java
it's a generic exporter for freertr, the user can decide what he wants to expose from his router and how.
http://sources.nop.hu/misc/prometheus/ is an initial seed of things we find interesting to expose and dashboards will be also arrive sooner or later here...

@brian-brazil
Copy link
Contributor

I can't find any Prometheus instrumentation in this code base from a quick look, nor any mention of the Java client which Java code should be using for direct instrumentation/exposition, nor the appropriate content-type if you were exposing by hand.

Can you explain more about what this is doing, and how you would use it to get Prometheus metrics exposed?

@mc36
Copy link
Author

mc36 commented Oct 8, 2020

hi,
i've written a mail to the prom-users list explaining that what's going on here,
let me copy-paste it here from https://groups.google.com/g/prometheus-users/c/oRw0ibfzV9M
maybe it not appeared as i'm a new member to that list... please read the details below....
thanks,
cs

hi,

i'm proud to announce you the availability of freeRouter (1) exporter.
it got integrated (2) into the freeRouter itself as a regular server
in freeRouter, so we can say it's a native implementation.

the idea behind the implementation is to be as generic as it can.
it can export any show command available in the router, anywhere,
and the operator of the router can configure what he wants to
expose, as in the nrpe (3) responder case, but with metrics.

we already have an initial seed (4) of configurations in those
txt files that produce interesting graphs with the json files...

to see it in action, i'm showing you a simple example with
the sys.txt from (4). as you can see (5) the configuration
is applied by the operator. that particular command produces
the (6) output for him if he's looking at it from the router
cli. the (7) shows you what will be sent out to the wire
(without the first "results:" line:) if the prometheus
asks me about that particular set of metrics.

i'm writing to you to get your opinion about the implementation,
and if you see something that i should change/improve in order
to better integrate with prometheus.

i know that it could be hard to imagine the outputs that
generated without having a running freerouter instance
and having all the protocols configured to it, so if you're
interested to see most of the (4) configs applied to a live
network, then i joined your irc channel with mc36 nick,
feel free to ask for an endpoint and i'll expose one.

also if you spotted something generic issue from this small
example i just provided here, please to reply right now!

i'm waiting for your opinions!

best regards,
cs

ps: i also submitted a pull request docs/exporters.md :)

1: http://freerouter.nop.hu/ , https://wiki.geant.org/display/RARE , https://rare-freertr.mp.ls/
2: http://sources.nop.hu/src/serv/servPrometheus.java
3: http://sources.nop.hu/src/serv/servNrpe.java
4: http://sources.nop.hu/misc/prometheus/

5:
services>show startup-config prom | include sys
metric gc prepend system_gc_
metric sys command sho watchdog sys | exc name
metric sys prepend system_
metric sys replace \s _
metric sys column 1 name _cnt

services>

6:
services>terminal tablemode fancy
services>show watchdog sys | exclude name
|~~~~~~~~~~~~~~~~~~~~~~~~~~~~|~~~~~~~~~~~~~~~~~~~~~~|

category value
CommittedVirtualMemorySize 7393935360
FreePhysicalMemorySize 8156749824
ProcessCpuLoad 0.007719799857040743
ProcessCpuTime 250260000000
SystemCpuLoad 0.09049320943531093
SystemLoadAverage 1.84
____________________________ ______________________

services>

7:
services>show prometheus home sys | begin result
result:

HELP system_CommittedVirtualMemorySize_cnt column 1 of sho watchdog sys | exc name

TYPE system_CommittedVirtualMemorySize_cnt gauge

system_CommittedVirtualMemorySize_cnt 7389724672

HELP system_FreePhysicalMemorySize_cnt column 1 of sho watchdog sys | exc name

TYPE system_FreePhysicalMemorySize_cnt gauge

system_FreePhysicalMemorySize_cnt 8070684672

HELP system_ProcessCpuLoad_cnt column 1 of sho watchdog sys | exc name

TYPE system_ProcessCpuLoad_cnt gauge

system_ProcessCpuLoad_cnt 0.2727272727272727

HELP system_ProcessCpuTime_cnt column 1 of sho watchdog sys | exc name

TYPE system_ProcessCpuTime_cnt gauge

system_ProcessCpuTime_cnt 266800000000

HELP system_SystemCpuLoad_cnt column 1 of sho watchdog sys | exc name

TYPE system_SystemCpuLoad_cnt gauge

system_SystemCpuLoad_cnt 0.16666666666666666

HELP system_SystemLoadAverage_cnt column 1 of sho watchdog sys | exc name

TYPE system_SystemLoadAverage_cnt gauge

system_SystemLoadAverage_cnt 1.71

services>

@brian-brazil
Copy link
Contributor

That's not a exposing data in a form that Prometheus can scrape, to be part of this list it'd need to expose it over HTTP.

The _cnt on all those metrics is also confusing.

@mc36
Copy link
Author

mc36 commented Oct 8, 2020

hi,
yess, this is just a show output from the router for the operator of the router,
exactly the same sent over the http stream, you can spot it in the servPrometheus.java mentioned above...
so that java class constructs a valid reply and my local prometheus instance scrapes all the stats i expose
from all my routers around, as you can see below. it's a small sized network, all exposing some metrics...
it have a public playground: "ssh dl.nop.hu" with any user/pass will bring you to a menu, press L in there.
it takes you to a live " https://en.wikipedia.org/wiki/Looking_Glass_server " which is a already scraped for days now.
you can issue any cisco-like commands on that router, but here is my suggestions as you're not interested in the
routing metrics in my dn42.net peerings but how i expose them to my prometheus.
show startup-config promet - the apllied network's specific configs from the http://sources.nop.hu/misc/prometheus/ samples.
show prometheus home - the above list without the configs, but with the running stats on them

show prometheus home | begin result
the response i bark back to prometheus in before http headers applied.
that is, the payload of the http response for the given metric.
there is one exception, via the router cli you cannot get the /metrics,
which includes all from the list, except those the operartor configured
to exclude.

and finally, thanks for spotting the _cnt appending in the small sample,
these came from the initial experiments and i hoped that i removed all.
now i did.... :)

regards,
cs

http://10.1.11.198:9001/metrics up instance="10.1.11.198:9001" job="router" 6.147s ago 374ms  
http://10.1.11.1:9001/metrics up instance="10.1.11.1:9001" job="router" 29.45s ago 245.3ms  
http://10.10.10.10:9001/metrics up instance="10.10.10.10:9001" job="router" 23.828s ago 225.1ms  
http://10.10.10.11:9001/metrics up instance="10.10.10.11:9001" job="router" 4.746s ago 266.6ms  
http://10.10.10.15:9001/metrics up instance="10.10.10.15:9001" job="router" 10.19s ago 1.799s  
http://10.10.10.17:9001/metrics up instance="10.10.10.17:9001" job="router" 1.892s ago 524.9ms  
http://10.10.10.180:9001/metrics up instance="10.10.10.180:9001" job="router" 25.17s ago 1.115s  
http://10.10.10.18:9001/metrics up instance="10.10.10.18:9001" job="router" 4.346s ago 716ms  
http://10.10.10.199:9001/metrics up instance="10.10.10.199:9001" job="router" 13.33s ago 443.8ms  
http://10.10.10.1:9001/metrics up instance="10.10.10.1:9001" job="router" 9.046s ago 203.4ms  
http://10.10.10.20:9001/metrics up instance="10.10.10.20:9001" job="router" 6.02s ago 158.1ms  
http://10.10.10.227:9001/metrics down instance="10.10.10.227:9001" job="router" 9.152s ago 4.954ms Get "http://10.10.10.227:9001/metrics": dial tcp 10.10.10.227:9001: connect: no route to host
http://10.10.10.26:9001/metrics up instance="10.10.10.26:9001" job="router" 17.752s ago 419.8ms  
http://10.10.10.28:9001/metrics up instance="10.10.10.28:9001" job="router" 7.588s ago 196.8ms  
http://10.10.10.2:9001/metrics up instance="10.10.10.2:9001" job="router" 9.508s ago 149.8ms  
http://10.10.10.4:9001/metrics up instance="10.10.10.4:9001" job="router" 26.085s ago 333.5ms  
http://10.10.10.5:9001/metrics up instance="10.10.10.5:9001" job="router" 16.221s ago 178.2ms  
http://10.10.10.8:9001/metrics up instance="10.10.10.8:9001" job="router" 14.894s ago 468.4ms  
http://10.26.26.2:9001/metrics up instance="10.26.26.2:9001" job="router" 18.682s ago 471.1ms  
http://10.5.1.10:9001/metrics up instance="10.5.1.10:9001" job="router" 18.948s ago 198.3ms  

@mc36
Copy link
Author

mc36 commented Oct 8, 2020

typo:
show prometheus home | begin result
is
show prometheus home pick-one-from-the-list | begin result

@brian-brazil
Copy link
Contributor

I'm not following, does this expose metrics over HTTP or not? Is there documentation explaining how to use this?

@mc36
Copy link
Author

mc36 commented Oct 8, 2020

yesss for sure it replies via http and prometheus scrapes it correctly!
and yess there are some documentation available https://wiki.geant.org/display/RARE
or https://rare-freertr.mp.ls/ about the project, and https://wiki.geant.org/pages/viewrecentblogposts.action?key=RARE
will be detailed technical blog on how to use it with prometheus.
i've uploaded the output of the
curl http://10.10.10.1:9001/metrics >zzz 2>&1
to https://filebin.net/7qog72xcxm7mged7
that ip is a core router here so exposing a lot of different metrics.
i know that it's easy to get lost but please-please reread this conversation carefully!

@brian-brazil
Copy link
Contributor

Is there any documentation about how a user would enable this and use it with Prometheus? Those links contain nothing about this topic.

Many of those metrics appear to have what should be labels in metric names, for example bfd6_2001_db8_57__2_state, iface_hwcntr_hairpin71_state, and routing_bgp4_65535_neighbors. This limits the usefulness of these metrics, so should be moved to labels.

@mc36
Copy link
Author

mc36 commented Oct 8, 2020

hi,
first of all, thank you again for spotting the label things, we went through our current configs and found some labelable
measurements, the int[sw|hw].* now exports one metric per interface with label dir=[rx|tx|dr|st] according to the direction.
we found an other one, that is the vrf
.[v4|v6] now exports label afi=[uni|multi|flow|label|conn|ifc] according to the adrress-family.
but the routing_.
[0-9]+ neigh|ifaces|_computed_unicast|_computed_multicast|_computed_flowspec|_computed_changed|_redisted_unicast_redisted_multicast|_redisted_flowspec|_redisted_changed
proposal was rejected by the guy who does the dashboards in the project as he was not able to have the required outputs with the labels applied, so we reverted that one to distinct metrics.
the rest of the current exports export one column per table so not suitable for labeling.
according to the above, please find the new output of
curl http://10.10.10.1:9001/metrics >zzz 2>&1
at https://filebin.net/1ifp2b4tx10h02dh
for all these to have, i just commited the these changes:
mc36/freeRtr@fa2c495
unfortunately it's not a clear commit as the last 3 files (src/user/user*) are an other change of the day...
and a planned change, i noticed that you're supporting gzip so i'll add it to the exporter as a configurable option for the slower wan links in mind....

regarding the documentation, currently we have no user's guide on the above mentioned tech blog but there
will soon, as this development happens within a https://geant.org/ working group, and it's a requirement from
their side to have it documented, to be immediately usable by the research & education institutions europe wide.
thanks
cs

@mc36
Copy link
Author

mc36 commented Oct 8, 2020

typo:
the rest of the current exports export one column per table so not suitable for labeling.
-->
the rest of the other exports expose one column per table so not suitable for labeling.

@brian-brazil
Copy link
Contributor

now exports one metric per interface with label dir=[rx|tx|dr|st] according to the direction.

Actually for direction the metric name is usually the right thing to do, as it's rare you'd want to aggregate across both directions as almost everything is duplex these days.
I was more pointing at the interface names, IP addresses and ASNs. To be usable those pretty much have to be labels.

@mc36
Copy link
Author

mc36 commented Oct 9, 2020

hi,
thank you again for the clarifications, with the
mc36/freeRtr@d947c49
and
mc36/freeRtr@619221f
clear commits i made the exporter a bit more configurable, hopefully now i cover all the
possibilities we could have.

we already have some sample configs at http://sources.nop.hu/misc/prometheus/ in the txt files
which use the same syntax as (1) but these needs to be aligned to the given asn's needs.
at least they must to put in their own asn in the show bgp commands, use the igp.txt they
have internally and so on... and yess, the prometheus exporter natively part of the router,
but it cannot guess that which configured routing protocols, which peers, and vrfs and so
on needs to be exported and which are not not interesting at that node. for example every
router in an asn keeps a connection toward the https://en.wikipedia.org/wiki/Border_Gateway_Protocol#Route_reflectors
and it's enough to check rr consistency on some selected client nodes because the rrs replicate
the same information toward all of their clients and it's an expensive operation to compare
all the attributes of 1m routes from the 2 rrs, in case of igp, we can monitor the whole network
from a single node as https://en.wikipedia.org/wiki/Link-state_routing_protocol provide us the
topology table, that is, each node describes all of the neighbors. it is easier to have and works
in mixed vendor environment also, but if we're talking about a mostly rare/freertr network
then a per node igp export can expose more details that are hidden from the topology
table, for example the per interface neighbor count.
anyway iface, vrf and igp already have useful grafana dashboards for
https://en.wikipedia.org/wiki/Network_operations_center for both mixed or mostly rare/freertr
environments, and the others will follow, and when all is done, the tech blog will have an entry
demonstrating the usage, but these are just samples and users must alter their configs according
to their networks.

but. i've cooked some examples (1) for you that produce human-friendly amount of metrics
and demonstrate the possibilities of the exporter. and the rest is just a question of the
router configuration, which is always network specific as described above.
so the examples (1) expose 4 new groups of metrics to prometheus under /metrics
(optionally they could be scraped one by one under /test0..3 and can be excluded
from the /metrics scrapes if needed...)
i limited the router show command in router config to produce small enough (2)
amount of metrics. this output will be transformed and sent out over the wire by the
java class, but could be locally queried so allow me to show you the first 2 via
router cli (3) (4) query and 2 via curl (5) (6). as you can see, this is just some
user configurable text transformation, applied to the same router show commands
that are also subject to some other transformations according to the router cli needs.
so if i understood you correctly, your preference would be (5) here, and we have
no opponent opinion on that so we'll update the dashboard and the samples.

and we'll take your very useful clarifications into account about what fields should be label or part
of the metric name, but as you can see it's outside the scope for the exporter java class, but please
let me introduce you @frederic-loui , he is dealing with the dashboards for the rare project, and he's
already part of the https://nmaas.eu/ project also. please continue the label vs metric name conversation
with him, and allow me to continue talking with you exclusively about the java exporter itself.

we're waiting for you useful comments!

thanks,
cs

1:

sid#show config-differences                                                                         
2020-10-09 05:47:08
server prometheus home
 metric test0 command sho inter summ | inc bundle9
 metric test0 prepend testing_iface_
 metric test0 replace \. _
 metric test0 column 2 name _rx
 metric test0 column 3 name _tx
 metric test1 command sho inter summ | inc bundle9
 metric test1 prepend testing_iface_
 metric test1 replace \. _
 metric test1 column 2 name * dir="rx"
 metric test1 column 3 name * dir="tx"
 metric test2 command sho inter summ | inc bundle9
 metric test2 prepend testing_iface_
 metric test2 name 0 ifc=
 metric test2 replace \. _
 metric test2 column 2 name rx
 metric test2 column 3 name tx
 metric test3 command sho inter summ | inc bundle9
 metric test3 prepend testing_iface
 metric test3 name 0 ifc=
 metric test3 replace \. _
 metric test3 column 2 name * dir="rx"
 metric test3 column 3 name * dir="tx"
 exit

sid#     

2:

sid#show interfaces summary | include bundle9                                                       
2020-10-09 05:48:18
bundle9        up     6451894  5050121  592
bundle9.11     up     5059155  4470699  0
bundle9.12     up     1218749  414804   0

sid#  
sid#terminal tablemode fancy                                                                        
2020-10-09 05:55:24
sid#show interfaces summary | include bundle9                                                       
2020-10-09 05:55:31
 | bundle9       | up    | 8330640 | 6387173 | 592    |
 | bundle9.11    | up    | 6539658 | 5678005 | 0      |
 | bundle9.12    | up    | 1566338 | 497904  | 0      |

sid#

3:

sid#show prometheus home test0 | begin result                                                       
2020-10-09 05:50:32
result:
# HELP testing_iface_bundle9_11_rx column 2 of sho inter summ | inc bundle9
# TYPE testing_iface_bundle9_11_rx gauge
testing_iface_bundle9_11_rx 5526793
# HELP testing_iface_bundle9_11_tx column 3 of sho inter summ | inc bundle9
# TYPE testing_iface_bundle9_11_tx gauge
testing_iface_bundle9_11_tx 4832959
# HELP testing_iface_bundle9_12_rx column 2 of sho inter summ | inc bundle9
# TYPE testing_iface_bundle9_12_rx gauge
testing_iface_bundle9_12_rx 1329064
# HELP testing_iface_bundle9_12_tx column 3 of sho inter summ | inc bundle9
# TYPE testing_iface_bundle9_12_tx gauge
testing_iface_bundle9_12_tx 442281

4:

sid#show prometheus home test1 | begin result                                                       
2020-10-09 05:50:37
result:
# HELP testing_iface_bundle9_11 column 2 of sho inter summ | inc bundle9
# TYPE testing_iface_bundle9_11 gauge
testing_iface_bundle9_11{dir="rx"} 5530425
testing_iface_bundle9_11{dir="tx"} 4843310
# HELP testing_iface_bundle9_12 column 2 of sho inter summ | inc bundle9
# TYPE testing_iface_bundle9_12 gauge
testing_iface_bundle9_12{dir="rx"} 1336110
testing_iface_bundle9_12{dir="tx"} 443335

sid# 

5:

mc36@acer:~$ curl http://10.10.10.227:9001/test2
# HELP testing_iface_rx column 2 of sho inter summ | inc bundle9
# TYPE testing_iface_rx gauge
testing_iface_rx{ifc="bundle9_11"} 5635627
# HELP testing_iface_tx column 3 of sho inter summ | inc bundle9
# TYPE testing_iface_tx gauge
testing_iface_tx{ifc="bundle9_11"} 4935853
testing_iface_rx{ifc="bundle9_12"} 1360412
testing_iface_tx{ifc="bundle9_12"} 449185

6:

mc36@acer:~$ curl http://10.10.10.227:9001/test3
# HELP testing_iface column 2 of sho inter summ | inc bundle9
# TYPE testing_iface gauge
testing_iface{ifc="bundle9_11",dir="rx"} 5637267
testing_iface{ifc="bundle9_11",dir="tx"} 4941068
testing_iface{ifc="bundle9_12",dir="rx"} 1361870
testing_iface{ifc="bundle9_12",dir="tx"} 449601
mc36@acer:~$ 

@mc36
Copy link
Author

mc36 commented Oct 9, 2020

hi,
just a quick update on the last paragraph: finally floui decided to comply with your decision
so our default export format now uses labels for ip and interfaces. it was again a clear commit:
mc36/freeRtr@131832c
scrape works as expected, we'll be happy with the new format soon..
attatching you hereby a sample output of some of the metrics we expose
now using the http://sources.nop.hu/misc/prometheus/ skeletons.
zzz.txt
awaiting your opinion!
thanks,
cs

@brian-brazil
Copy link
Contributor

I'm concerned that there's no docs on how to use this, and even then it seems to require far more work on the part of the user than it should as this is more exposing a metrics conversion framework than metrics.

The most configuration a user should have to do for typical metrics is enabling/disabling specific sets of metrics if they're expensive. They shouldn't have to decide how the metrics are named, or what goes in labels or not - all of that should be determined by the developers and hardcoded. Doing otherwise is not only unnecessary complexity for users, but also prevents sharing dashboards and rules by users.

5 is the closest to what the metrics should look like label wise, though testing_iface_tx is not very informative as a metric name. Whether that is bytes, packets, or something else should also be mentioned in the name.

@beorn7 beorn7 added the exporters and integrations Requests for new entries in the list of exporters and integrations label Oct 12, 2020
@mc36
Copy link
Author

mc36 commented Oct 13, 2020

hi,

first of all, i would like to thank you for spotting the interface
related metric names, now we expose both packets and bytes and
this info is in the metric names, as per your suggestion.

finally we got the other missing pieces you mentioned:
here are the initial set of dashboards we have now:
https://grafana.com/grafana/dashboards?search=freeRouter
and a post on how to configure the metric exposion
in rare/freertr for prometheus to be able collect them:
https://wiki.geant.org/pages/viewpage.action?pageId=154995651

you can use the same dashboard for all the link state routing protocols
(ospf, isis, lsrp) as they expose same set of metrics but from different
protocols. this tipically covers the network core. an other dashboard
covers the distance vector protocols, this is tipically to provider to
customer edge routing. bgp is one an undividable in this context. :)
so we can say, the dashboards are generic. and each have the required
freertr config on it's grafana.com page, which in most cases a simple
copy-paste to apply to freertr. in some cases you need to put in your
own asn or so, but before anything can be exposed, you have to configure
the given protocol to freertr to match to your existing network. i mean,
we cannot expose bgp metrics until bgp is up and running on freertr.
as you can see below, prometheus config is a small portion of a freertr
configuration. obviously the routing protocols are the the bigger part,
copy-pasthing prometheus configs afterwards, and maybe putting the appropiare
network dependent ids is easy-peasy, and need to be figured out once for
a given network, afterwards the operator can use the same freertr prometheus
config for all the network nodes he administer.

services>show running-config all | count
6266 lines, 19547 words, 162823 characters

services>show running-config prometheus | count
285 lines, 1660 words, 10051 characters

services>

so imho this exporter does not put more configuration task
on the operator than for example your official snmp exporter.
https://github.com/prometheus/snmp_exporter
for that to work, you have to tell the exporter that which
mibs, with what credentinals to harvest. even a bit more, as
it have a dedicated config file generator, who also have a
config file with the above mentioned options, and some more...
and on the other hand, all these info also need to be configured
to the polled router also. even more, as snmp stacks have snmp
users and views, that is, the oids you want to poll need to be
explicitly allowed for the credentinals on the polled router.

and we're doing similar things here except that we're the router,
so there is no intermediate station who exposes to prometheus.

your opinion that it's a metric exposer framework in freertr is correct.
if we expose all the counters directly from the classes, we would lose
user's decision on what parts of the freertr he's interested, and we
would need to uncover the classes' internals to the prometheus exporter,
and every time we introduce a new thing, the exporter itself need to be
touched also.
instead, we had this exporter framework and almost the same freertr configs
can make streaming telemetry also, and if need arise, we can add others too.
(and i have plans to remove the counter->text->binary conversion steps)

and similarly to the snmp case, you wont use all the oids at once as most
vendor implementations strictly rate limit snmp so that wont work at all,
and most of those metrics are anyway uninterested for the operator.

and an additional reason for the framework is that most of the show commands in
freertr simply display you the counters, but some perform computations. for example
we have a specific show command to display you only those routes that are advertised
by the remote router as directly connected. in this way we can perform interface up/down
checks on remote nodes from a single freertr instance. this is a cheap calculation
but this specific list is not maintained all the time in freertr, because this is
completely useless for it's normal operation. this list is generated from the learned
prefixes when the show command executed. (by the way, for prometheus, we export
the size of the above list, nrpe can report the missing ones to icinga...)

similarly, the bgp consistency check is an expensive calculation that performed by
the show command, and that result is also not maintained all the time, and this
one cannot, as the router alone cannot guess which peers to compare and which
path attributes to ignore during comparison.

and link state routing protocols also have a trick, we can perform the dijkstra
algorithm with the root set to an other router, in this way we can disconver that
router's equal-cost multi-path possibilities, spot unidirectional links and so...
this is also a specific show command expecting the input of the foreign router id
to use, also unguessable from the freertr alone.

and as we talked about streaming telemetry, those metrics are described in yang files,
for example here is a list, what a cisco router can expose now:
https://github.com/YangModels/yang/tree/master/vendor/cisco/xr/721
not all are metrics, just the _oper files, but it's still a huge list.
and telemetry also not exposes all the metrics as you not snmp-walk
a router regularly. if you want the metrics, you need to subscribe
to a leaf and you'll get those. but first, the operator need to
explicitly configure a list of allowed leafs, similarly to snmp views.
more about these topics here:
https://www.cisco.com/c/en/us/td/docs/iosxr/asr9000/telemetry/b-telemetry-cg-asr9000-61x/b-telemetry-cg-asr9000-61x_chapter_010.html#id_36166
https://www.cisco.com/c/en/us/td/docs/routers/ncs4000/software/configure/guide/configurationguide/configurationguide_chapter_0110110.html#con_1052651

so in my opinion, we're doing similar things here. that is, exposing
a user configurable set of metrics from an ocean...
a one rack unit router usually have 30+ interfaces. if it's a peering router, on each
interface we have at least one bgp peer, most of them telling us 800k routes nowadays...
and each route have some counters attached: how many packets and bytes received,
sent and dropped on that route, both in hardware and software, how many times,
when and which protocol updated the route. just to name the most obvious ones.
below you can see an in-production freertr getting the feed from 2 in-production
route reflectors from the hungarian
https://en.wikipedia.org/wiki/National_research_and_education_network .
as you can see it deals with 1.6m routes. i'm sending these directly to my lab,
(in the next example) where i send these routes on 6 hairpin peerings, finally
reaching 10m routes. this number is quiet normal in field, but from different asns.
each having those counters on it. imho it's not a good idea to send them all
to a collector, as sending those routes directly via bgp takes 15seconds!

thanks,
cs

kaputt.debrecen3#show ipv4 bgp 1955 unicast summary
as learn accept will done neighbor uptime
1955 0 0 0 0 195.111.97.91 never
1955 815560 815560 3 3 195.111.97.92 08:59:23
1955 1941 1941 3 3 195.111.97.93 08:59:23
1955 0 0 815562 815562 195.111.97.178 01:30:54
1955 815560 815560 3 3 195.111.97.179 08:59:23

kaputt.debrecen3#show ipv4 bgp 1955 unicast database | count
1632680 lines, 11318068 words, 105270058 characters

kaputt.debrecen3#show ipv4 bgp 1955 unicast database | first 15
prefix hop metric aspath
0.0.0.0/0 195.111.97.108 255/19999/0/0
0.0.0.0/0 195.111.97.108 255/19999/0/0
0.0.0.0/0 195.111.97.108 255/19999/0/0
1.0.0.0/24 195.111.97.108 255/150/0/0 21320 13335
1.0.0.0/24 195.111.97.108 255/150/0/0 21320 13335
1.0.4.0/22 195.111.97.108 255/160/0/1554 6939 4826 38803
1.0.4.0/22 195.111.97.108 255/160/0/1554 6939 4826 38803
1.0.4.0/24 195.111.97.108 255/150/0/0 21320 6461 6461 6461 6461 4637 4637 4637 4637 4637 4637 4637 4637 4637 4637 4637 1221 38803 38803 38803
1.0.4.0/24 195.111.97.108 255/150/0/0 21320 6461 6461 6461 6461 4637 4637 4637 4637 4637 4637 4637 4637 4637 4637 4637 1221 38803 38803 38803
1.0.5.0/24 195.111.97.108 255/150/0/0 21320 6461 6461 6461 6461 4637 4637 4637 4637 4637 4637 4637 4637 4637 4637 4637 1221 38803 38803 38803 38803
1.0.5.0/24 195.111.97.108 255/150/0/0 21320 6461 6461 6461 6461 4637 4637 4637 4637 4637 4637 4637 4637 4637 4637 4637 1221 38803 38803 38803 38803
1.0.6.0/24 195.111.97.108 255/150/0/0 21320 6461 6461 6461 6461 4637 4637 4637 4637 4637 4637 4637 4637 4637 4637 4637 1221 38803 38803 38803 38803
1.0.6.0/24 195.111.97.108 255/150/0/0 21320 6461 6461 6461 6461 4637 4637 4637 4637 4637 4637 4637 4637 4637 4637 4637 1221 38803 38803 38803 38803
1.0.7.0/24 195.111.97.108 255/150/0/0 21320 6461 6461 6461 6461 4637 4637 4637 4637 4637 4637 4637 4637 4637 4637 4637 1221 38803 38803 38803

sid#show ipv4 bgp 1955 unicast summary
as learn accept will done neighbor uptime
1955 0 0 815563 815563 2.2.2.2 00:34:43
1955 0 0 815563 815563 2.2.2.6 00:34:43
1955 0 0 815563 815563 2.2.2.10 00:34:43
1955 0 0 815563 815563 2.2.2.14 00:01:28
1955 0 0 815563 815563 2.2.2.18 00:01:20
1955 0 0 815563 815563 2.2.2.22 00:13:41
1955 815563 815563 0 0 10.10.10.25 01:30:56
1955 815563 815563 0 0 10.10.10.250 01:30:56

sid#show ipv4 bgp 2 unicast summary
as learn accept will done neighbor uptime
1955 815563 815563 0 0 2.2.2.1 00:34:45
1955 815563 815563 0 0 2.2.2.5 00:34:45
1955 815563 815563 0 0 2.2.2.9 00:34:45
1955 815563 815563 0 0 2.2.2.13 00:01:30
1955 815563 815563 0 0 2.2.2.17 00:01:22
1955 815563 815563 0 0 2.2.2.21 00:13:43

sid#show ipv4 bgp 2 unicast database | count
9786859 lines, 67878622 words, 631202178 characters

sid#show ipv4 bgp 2 unicast database | first 30
prefix hop metric aspath
0.0.0.0/0 195.111.97.108 255/19999/0/0
0.0.0.0/0 195.111.97.108 255/19999/0/0
0.0.0.0/0 195.111.97.108 255/19999/0/0
0.0.0.0/0 195.111.97.108 255/19999/0/0
0.0.0.0/0 195.111.97.108 255/19999/0/0
0.0.0.0/0 195.111.97.108 255/19999/0/0
0.0.0.0/0 195.111.97.108 255/19999/0/0
0.0.0.0/0 195.111.97.108 255/19999/0/0
0.0.0.0/0 195.111.97.108 255/19999/0/0
0.0.0.0/0 195.111.97.108 255/19999/0/0
0.0.0.0/0 195.111.97.108 255/19999/0/0
0.0.0.0/0 195.111.97.108 255/19999/0/0
1.0.0.0/24 195.111.97.108 255/150/0/0 21320 13335
1.0.0.0/24 195.111.97.108 255/150/0/0 21320 13335
1.0.0.0/24 195.111.97.108 255/150/0/0 21320 13335
1.0.0.0/24 195.111.97.108 255/150/0/0 21320 13335
1.0.0.0/24 195.111.97.108 255/150/0/0 21320 13335
1.0.0.0/24 195.111.97.108 255/150/0/0 21320 13335
1.0.0.0/24 195.111.97.108 255/150/0/0 21320 13335
1.0.0.0/24 195.111.97.108 255/150/0/0 21320 13335
1.0.0.0/24 195.111.97.108 255/150/0/0 21320 13335
1.0.0.0/24 195.111.97.108 255/150/0/0 21320 13335
1.0.0.0/24 195.111.97.108 255/150/0/0 21320 13335
1.0.0.0/24 195.111.97.108 255/150/0/0 21320 13335
1.0.4.0/22 195.111.97.108 255/160/0/1554 6939 4826 38803
1.0.4.0/22 195.111.97.108 255/160/0/1554 6939 4826 38803
1.0.4.0/22 195.111.97.108 255/160/0/1554 6939 4826 38803
1.0.4.0/22 195.111.97.108 255/160/0/1554 6939 4826 38803
1.0.4.0/22 195.111.97.108 255/160/0/1554 6939 4826 38803

sid#show ipv4 bgp 2 unicast database 1.0.4.0/22
vrf = v2:4
ipver = 4
rd = 0:0
prefix = 1.0.4.0/118
prefix network = 1.0.4.0
prefix broadcast = 1.0.7.255
prefix wildcard = ::3ff
prefix netmask = ffff:ffff:ffff:ffff:ffff:ffff:ffff:fc00
alternates = 12
alternate #0 attributes: ecmp=true best=false
type = bgp4 2
source = 2.2.2.1
validity = 0
segment routing index = 0
segment routing old base = 0
segment routing base = 0
segment routing size = 0
segment routing prefix = null
bier index = 0
bier old base = 0
bier base = 0
bier range = 0
bier size = 0-32
updated = 2020-10-13 05:28:34 (00:41:31 ago)
version = 1214
distance = 255
metric = 1554
ident = 377634743
hops = 0
interface = null
table = null
nexthop = 195.111.97.108
original nexthop = null
tag = 0
origin type = 0
local preference = 160
evpn label16 = 0
attribute as = 0
attribute value = n/a
tunnel type = 0
tunnel value = n/a
pmsi type = 0
pmsi label
16 = 0
pmsi tunnel = n/a
accumulated igp = 0
bandwidth = 0
atomic aggregator = false
aggregator as = 0
aggregator router = null
originator = 195.111.97.108
cluster list = 195.111.97.179
as path (len=3) = 6939 4826 38803
standard community = 1955:10 1955:30 1955:31
extended community =
large community =
internal source = 1
local label = null
remote label =
alternate #1 attributes: ecmp=true best=false
type = bgp4 2
source = 2.2.2.1
validity = 0
segment routing index = 0
segment routing old base = 0
segment routing base = 0
segment routing size = 0
segment routing prefix = null
bier index = 0
bier old base = 0
bier base = 0
bier range = 0
bier size = 0-32
updated = 2020-10-13 05:28:34 (00:41:31 ago)
version = 1214
distance = 255
metric = 1554
ident = 377634744
hops = 0
interface = null
table = null
nexthop = 195.111.97.108
original nexthop = null
tag = 0
origin type = 0
local preference = 160
evpn label16 = 0
attribute as = 0
attribute value = n/a
tunnel type = 0
tunnel value = n/a
pmsi type = 0
pmsi label
16 = 0
pmsi tunnel = n/a
accumulated igp = 0
bandwidth = 0
atomic aggregator = false
aggregator as = 0
aggregator router = null
originator = 195.111.97.108
cluster list = 195.111.97.179
as path (len=3) = 6939 4826 38803
standard community =
extended community =
large community =
internal source = 1
local label = null
remote label =
alternate #2 attributes: ecmp=true best=false
type = bgp4 2
source = 2.2.2.5
validity = 0
segment routing index = 0
segment routing old base = 0
segment routing base = 0
segment routing size = 0
segment routing prefix = null
bier index = 0
bier old base = 0
bier base = 0
bier range = 0
bier size = 0-32
updated = 2020-10-13 05:28:34 (00:41:31 ago)
version = 1214
distance = 255
metric = 1554
ident = 917880966
hops = 0
interface = null
table = null
nexthop = 195.111.97.108
original nexthop = null
tag = 0
origin type = 0
local preference = 160
evpn label16 = 0
attribute as = 0
attribute value = n/a
tunnel type = 0
tunnel value = n/a
pmsi type = 0
pmsi label
16 = 0
pmsi tunnel = n/a
accumulated igp = 0
bandwidth = 0
atomic aggregator = false
aggregator as = 0
aggregator router = null
originator = 195.111.97.108
cluster list = 195.111.97.179
as path (len=3) = 6939 4826 38803
standard community = 1955:10 1955:30 1955:31
extended community =
large community =
internal source = 1
local label = null
remote label =
alternate #3 attributes: ecmp=true best=false
type = bgp4 2
source = 2.2.2.5
validity = 0
segment routing index = 0
segment routing old base = 0
segment routing base = 0
segment routing size = 0
segment routing prefix = null
bier index = 0
bier old base = 0
bier base = 0
bier range = 0
bier size = 0-32
updated = 2020-10-13 05:28:34 (00:41:31 ago)
version = 1214
distance = 255
metric = 1554
ident = 917880967
hops = 0
interface = null
table = null
nexthop = 195.111.97.108
original nexthop = null
tag = 0
origin type = 0
local preference = 160
evpn label16 = 0
attribute as = 0
attribute value = n/a
tunnel type = 0
tunnel value = n/a
pmsi type = 0
pmsi label
16 = 0
pmsi tunnel = n/a
accumulated igp = 0
bandwidth = 0
atomic aggregator = false
aggregator as = 0
aggregator router = null
originator = 195.111.97.108
cluster list = 195.111.97.179
as path (len=3) = 6939 4826 38803
standard community =
extended community =
large community =
internal source = 1
local label = null
remote label =
alternate #4 attributes: ecmp=true best=false
type = bgp4 2
source = 2.2.2.9
validity = 0
segment routing index = 0
segment routing old base = 0
segment routing base = 0
segment routing size = 0
segment routing prefix = null
bier index = 0
bier old base = 0
bier base = 0
bier range = 0
bier size = 0-32
updated = 2020-10-13 05:28:34 (00:41:31 ago)
version = 1214
distance = 255
metric = 1554
ident = 1120954400
hops = 0
interface = null
table = null
nexthop = 195.111.97.108
original nexthop = null
tag = 0
origin type = 0
local preference = 160
evpn label16 = 0
attribute as = 0
attribute value = n/a
tunnel type = 0
tunnel value = n/a
pmsi type = 0
pmsi label
16 = 0
pmsi tunnel = n/a
accumulated igp = 0
bandwidth = 0
atomic aggregator = false
aggregator as = 0
aggregator router = null
originator = 195.111.97.108
cluster list = 195.111.97.179
as path (len=3) = 6939 4826 38803
standard community = 1955:10 1955:30 1955:31
extended community =
large community =
internal source = 1
local label = null
remote label =
alternate #5 attributes: ecmp=true best=true
type = bgp4 2
source = 2.2.2.9
validity = 0
segment routing index = 0
segment routing old base = 0
segment routing base = 0
segment routing size = 0
segment routing prefix = null
bier index = 0
bier old base = 0
bier base = 0
bier range = 0
bier size = 0-32
updated = 2020-10-13 05:28:34 (00:41:31 ago)
version = 1214
distance = 255
metric = 1554
ident = 1120954401
hops = 0
interface = null
table = null
nexthop = 195.111.97.108
original nexthop = null
tag = 0
origin type = 0
local preference = 160
evpn label16 = 0
attribute as = 0
attribute value = n/a
tunnel type = 0
tunnel value = n/a
pmsi type = 0
pmsi label
16 = 0
pmsi tunnel = n/a
accumulated igp = 0
bandwidth = 0
atomic aggregator = false
aggregator as = 0
aggregator router = null
originator = 195.111.97.108
cluster list = 195.111.97.179
as path (len=3) = 6939 4826 38803
standard community =
extended community =
large community =
internal source = 1
local label = null
remote label =
alternate #6 attributes: ecmp=true best=false
type = bgp4 2
source = 2.2.2.13
validity = 0
segment routing index = 0
segment routing old base = 0
segment routing base = 0
segment routing size = 0
segment routing prefix = null
bier index = 0
bier old base = 0
bier base = 0
bier range = 0
bier size = 0-32
updated = 2020-10-13 06:01:48 (00:08:16 ago)
version = 0
distance = 255
metric = 1554
ident = 1591735643
hops = 0
interface = null
table = null
nexthop = 195.111.97.108
original nexthop = null
tag = 0
origin type = 0
local preference = 160
evpn label16 = 0
attribute as = 0
attribute value = n/a
tunnel type = 0
tunnel value = n/a
pmsi type = 0
pmsi label
16 = 0
pmsi tunnel = n/a
accumulated igp = 0
bandwidth = 0
atomic aggregator = false
aggregator as = 0
aggregator router = null
originator = 195.111.97.108
cluster list = 195.111.97.179
as path (len=3) = 6939 4826 38803
standard community = 1955:10 1955:30 1955:31
extended community =
large community =
internal source = 1
local label = null
remote label =
alternate #7 attributes: ecmp=true best=false
type = bgp4 2
source = 2.2.2.13
validity = 0
segment routing index = 0
segment routing old base = 0
segment routing base = 0
segment routing size = 0
segment routing prefix = null
bier index = 0
bier old base = 0
bier base = 0
bier range = 0
bier size = 0-32
updated = 2020-10-13 06:01:48 (00:08:16 ago)
version = 0
distance = 255
metric = 1554
ident = 1591735644
hops = 0
interface = null
table = null
nexthop = 195.111.97.108
original nexthop = null
tag = 0
origin type = 0
local preference = 160
evpn label16 = 0
attribute as = 0
attribute value = n/a
tunnel type = 0
tunnel value = n/a
pmsi type = 0
pmsi label
16 = 0
pmsi tunnel = n/a
accumulated igp = 0
bandwidth = 0
atomic aggregator = false
aggregator as = 0
aggregator router = null
originator = 195.111.97.108
cluster list = 195.111.97.179
as path (len=3) = 6939 4826 38803
standard community =
extended community =
large community =
internal source = 1
local label = null
remote label =
alternate #8 attributes: ecmp=true best=false
type = bgp4 2
source = 2.2.2.17
validity = 0
segment routing index = 0
segment routing old base = 0
segment routing base = 0
segment routing size = 0
segment routing prefix = null
bier index = 0
bier old base = 0
bier base = 0
bier range = 0
bier size = 0-32
updated = 2020-10-13 06:01:57 (00:08:07 ago)
version = 0
distance = 255
metric = 1554
ident = 535677467
hops = 0
interface = null
table = null
nexthop = 195.111.97.108
original nexthop = null
tag = 0
origin type = 0
local preference = 160
evpn label16 = 0
attribute as = 0
attribute value = n/a
tunnel type = 0
tunnel value = n/a
pmsi type = 0
pmsi label
16 = 0
pmsi tunnel = n/a
accumulated igp = 0
bandwidth = 0
atomic aggregator = false
aggregator as = 0
aggregator router = null
originator = 195.111.97.108
cluster list = 195.111.97.179
as path (len=3) = 6939 4826 38803
standard community = 1955:10 1955:30 1955:31
extended community =
large community =
internal source = 1
local label = null
remote label =
alternate #9 attributes: ecmp=true best=false
type = bgp4 2
source = 2.2.2.17
validity = 0
segment routing index = 0
segment routing old base = 0
segment routing base = 0
segment routing size = 0
segment routing prefix = null
bier index = 0
bier old base = 0
bier base = 0
bier range = 0
bier size = 0-32
updated = 2020-10-13 06:01:57 (00:08:07 ago)
version = 0
distance = 255
metric = 1554
ident = 535677468
hops = 0
interface = null
table = null
nexthop = 195.111.97.108
original nexthop = null
tag = 0
origin type = 0
local preference = 160
evpn label16 = 0
attribute as = 0
attribute value = n/a
tunnel type = 0
tunnel value = n/a
pmsi type = 0
pmsi label
16 = 0
pmsi tunnel = n/a
accumulated igp = 0
bandwidth = 0
atomic aggregator = false
aggregator as = 0
aggregator router = null
originator = 195.111.97.108
cluster list = 195.111.97.179
as path (len=3) = 6939 4826 38803
standard community =
extended community =
large community =
internal source = 1
local label = null
remote label =
alternate #10 attributes: ecmp=true best=false
type = bgp4 2
source = 2.2.2.21
validity = 0
segment routing index = 0
segment routing old base = 0
segment routing base = 0
segment routing size = 0
segment routing prefix = null
bier index = 0
bier old base = 0
bier base = 0
bier range = 0
bier size = 0-32
updated = 2020-10-13 05:49:34 (00:20:30 ago)
version = 1214
distance = 255
metric = 1554
ident = 1994872707
hops = 0
interface = null
table = null
nexthop = 195.111.97.108
original nexthop = null
tag = 0
origin type = 0
local preference = 160
evpn label16 = 0
attribute as = 0
attribute value = n/a
tunnel type = 0
tunnel value = n/a
pmsi type = 0
pmsi label
16 = 0
pmsi tunnel = n/a
accumulated igp = 0
bandwidth = 0
atomic aggregator = false
aggregator as = 0
aggregator router = null
originator = 195.111.97.108
cluster list = 195.111.97.179
as path (len=3) = 6939 4826 38803
standard community = 1955:10 1955:30 1955:31
extended community =
large community =
internal source = 1
local label = null
remote label =
alternate #11 attributes: ecmp=true best=false
type = bgp4 2
source = 2.2.2.21
validity = 0
segment routing index = 0
segment routing old base = 0
segment routing base = 0
segment routing size = 0
segment routing prefix = null
bier index = 0
bier old base = 0
bier base = 0
bier range = 0
bier size = 0-32
updated = 2020-10-13 05:49:34 (00:20:30 ago)
version = 1214
distance = 255
metric = 1554
ident = 1994872708
hops = 0
interface = null
table = null
nexthop = 195.111.97.108
original nexthop = null
tag = 0
origin type = 0
local preference = 160
evpn label16 = 0
attribute as = 0
attribute value = n/a
tunnel type = 0
tunnel value = n/a
pmsi type = 0
pmsi label
16 = 0
pmsi tunnel = n/a
accumulated igp = 0
bandwidth = 0
atomic aggregator = false
aggregator as = 0
aggregator router = null
originator = 195.111.97.108
cluster list = 195.111.97.179
as path (len=3) = 6939 4826 38803
standard community =
extended community =
large community =
internal source = 1
local label = null
remote label =
counter = tx=0(0) rx=0(0) drp=0(0)
hardware counter = tx=0(0) rx=0(0) drp=0(0)

sid#

@brian-brazil
Copy link
Contributor

maybe putting the appropiare network dependent ids is easy-peasy, and need to be figured out once for a given network,

My point is that none of this should need to be configured by the user at all, it should Just Work.

so imho this exporter does not put more configuration task on the operator than for example your official snmp exporter.

The SNMP exporter is a generic exporter. It designed to work with the vast majority of SNMP devices there from a wide variety of vendors, including multiple interesting interpretations of the RFCs. The goal of an exporter should be to require as little configuration as possible, which in the happy case for the SNMP exporter is just knowing which OID trees you care about. The SNMP exporter does not require you to configure all of your metric and label names, as we can automatically determine this from the MIBs. An ideal exporter doesn't need any per-metric configuration at all, you only need tell it what to talk it.

This here is not a generic exporter, it is the built-in metrics exposition of a single vendor's application where they have full knowledge and control of the code. The user configuration required here should be at most opting in/out of specific sets of metrics.

if we expose all the counters directly from the classes, we would lose user's decision on what parts of the freertr he's interested,

By default you should expose everything, Prometheus is quite efficient after all. Only for high cardinality metrics should the option to enable/disable metrics come up.

would need to uncover the classes' internals to the prometheus exporter, and every time we introduce a new thing, the exporter itself need to be touched also.

That's a pretty normal part of providing metrics to your users, they rarely come for free.

That's also only one way to implement it, the issue I see with your current approach is how it requires user configuration for things that should be hardcoded by the vendor.

and similarly to the snmp case, you wont use all the oids at once as most vendor implementations strictly rate limit snmp so that wont work at all,

You are the vendor here, you can ensure your code is sufficiently efficient that you don't have to artificially limit the usefulness of the exposed metrics.

if it's a peering router, on each interface we have at least one bgp peer, most of them telling us 800k routes nowadays...

That would be the sort of thing to disable by default, as that's very high cardinality.

we have a specific show command to display you only those routes that are advertised
by the remote router as directly connected. in this way we can perform interface up/down
checks on remote nodes from a single freertr instance.

That sounds like blackbox monitoring, which should not be tied into whitebox metrics exposition.

Metrics exposition from an application should focus on what the application already knows internally, not reaching out to 3rd party systems or performing expensive calculations.

I'd suggest focusing on getting the basic and simple metrics working easily out of the box, rather than over-generalising the problem and requiring users to copy&paste boilerplate configuration to get any useful metrics.

@roidelapluie roidelapluie deleted the branch prometheus:master October 6, 2021 20:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

exporters and integrations Requests for new entries in the list of exporters and integrations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants