Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mesos input: Collect framework_offers and allocator metrics #5719

Merged
merged 4 commits into from
Aug 10, 2019

Conversation

branden
Copy link
Contributor

@branden branden commented Apr 12, 2019

This PR causes the mesos input plugin to collect allocator metrics, and additional metrics for frameworks. These metrics are documented here:

Required for all PRs:

  • Signed CLA.
  • Associated README.md updated.
  • Has appropriate unit tests.

@glinton glinton added the feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin label Apr 12, 2019
@philipnrmn
Copy link

@danielnelson it'd be great to get this merged! Is there something I can do to help it along?

@russorat russorat added this to the 1.12.0 milestone Jul 23, 2019
@russorat russorat requested a review from glinton July 23, 2019 23:09
@glinton
Copy link
Contributor

glinton commented Jul 24, 2019

I started reviewing this and it looked just fine; however, once I built and ran it, it seems that it isn't backwards compatible, though I do prefer the more generic/cleaned up metrics created from this (plus the ability to properly filter/collect). It doesn't necessarily break compatibility, as inserts still work when using this branch after having used the original plugin, but it removes fields previously gathered.

For example, someone may have a query with dependency on allocator/mesos/roles/<ROLE>/shares/dominant, which is now allocator/roles/shares/dominant. Or master/frameworks/<ENCODED_FRAMEWORK_NAME>/<FRAMEWORK_ID>/calls, which is now master/frameworks/calls_total

It seems like this needs to be trimmed down to simply add the ability to filter the collection of allocators and framework_offers, while preserving the measurements currently collected.

Config (running docker mesos/mesos-mini):

[[inputs.mesos]]
  timeout = 100
  masters = ["http://localhost:5050"]
  master_collections = ["allocator"]

Output from master (same output when master_collections only has an undocumented value (allocators, framework_offers, etc...)):

> mesos,role=master,server=localhost,state=leader,url=http://localhost:5050 allocator/event_queue_dispatches=0,allocator/mesos/allocation_run_latency_ms=0.038912,allocator/mesos/allocation_run_latency_ms/count=1000,allocator/mesos/allocation_run_latency_ms/max=0.212992,allocator/mesos/allocation_run_latency_ms/min=0.006912,allocator/mesos/allocation_run_latency_ms/p50=0.02304,allocator/mesos/allocation_run_latency_ms/p90=0.031078400000000023,allocator/mesos/allocation_run_latency_ms/p95=0.03584,allocator/mesos/allocation_run_latency_ms/p99=0.04892927999999997,allocator/mesos/allocation_run_latency_ms/p999=0.1748861439999991,allocator/mesos/allocation_run_latency_ms/p9999=0.20918141440000249,allocator/mesos/allocation_run_ms=0.205056,allocator/mesos/allocation_run_ms/count=1000,allocator/mesos/allocation_run_ms/max=0.425984,allocator/mesos/allocation_run_ms/min=0.013824,allocator/mesos/allocation_run_ms/p50=0.146944,allocator/mesos/allocation_run_ms/p90=0.2041088,allocator/mesos/allocation_run_ms/p95=0.22506239999999997,allocator/mesos/allocation_run_ms/p99=0.31811071999999985,allocator/mesos/allocation_run_ms/p999=0.4129410559999997,allocator/mesos/allocation_run_ms/p9999=0.42467970560000085,allocator/mesos/allocation_runs=1902,allocator/mesos/event_queue_dispatches=8,allocator/mesos/offer_filters/roles/marathon/active=1,allocator/mesos/resources/cpus/offered_or_allocated=0,allocator/mesos/resources/cpus/total=8,allocator/mesos/resources/disk/offered_or_allocated=0,allocator/mesos/resources/disk/total=411571,allocator/mesos/resources/mem/offered_or_allocated=0,allocator/mesos/resources/mem/total=14760,allocator/mesos/roles/marathon/shares/dominant=0,frameworks/marathon/messages_processed=136,frameworks/marathon/messages_received=136,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/calls=136,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/calls/accept=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/calls/accept_inverse_offers=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/calls/acknowledge=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/calls/acknowledge_operation_status=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/calls/decline=10,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/calls/decline_inverse_offers=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/calls/kill=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/calls/message=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/calls/reconcile=123,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/calls/reconcile_operations=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/calls/request=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/calls/revive=3,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/calls/shutdown=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/calls/subscribe=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/calls/suppress=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/calls/teardown=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/events=130,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/events/error=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/events/failure=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/events/heartbeat=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/events/inverse_offers=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/events/message=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/events/offers=10,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/events/rescind=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/events/rescind_inverse_offer=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/events/subscribed=1,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/events/update=119,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/events/update_operation_status=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/offers/accepted=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/offers/declined=10,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/offers/rescinded=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/offers/sent=10,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/operations=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/operations/create=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/operations/create_disk=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/operations/destroy=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/operations/destroy_disk=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/operations/grow_volume=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/operations/launch=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/operations/launch_group=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/operations/reserve=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/operations/shrink_volume=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/operations/unreserve=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/roles/marathon/suppressed=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/subscribed=1,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/tasks/active/task_killing=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/tasks/active/task_running=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/tasks/active/task_staging=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/tasks/active/task_starting=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/tasks/active/task_unknown=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/tasks/active/task_unreachable=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/tasks/terminal/task_dropped=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/tasks/terminal/task_error=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/tasks/terminal/task_failed=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/tasks/terminal/task_finished=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/tasks/terminal/task_gone=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/tasks/terminal/task_gone_by_operator=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/tasks/terminal/task_killed=0,master/frameworks/marathon/dad8d6f6-8b48-4712-8655-58bd5fdc1682-0000/tasks/terminal/task_lost=0,master/invalid_operation_status_update_acknowledgements=0,master/messages_operation_status_update_acknowledgement=0,master/messages_reconcile_operations=0,master/messages_suppress_offers=0,master/operator_event_stream_subscribers=0,master/slave_unreachable_canceled=0,master/slave_unreachable_completed=0,master/slave_unreachable_scheduled=0,master/slaves_unreachable=0,master/tasks_dropped=0,master/tasks_gone=0,master/tasks_gone_by_operator=0,master/tasks_killing=0,master/tasks_unreachable=0,master/valid_operation_status_update_acknowledgements=0,registrar/log/ensemble_size=1,registrar/log/recovered=1,registrar/queued_operations=0,registrar/registry_size_bytes=287,registrar/state_store_ms/count=2 1563969134000000000

Output from this branch (allocator only):

> mesos,role=master,server=localhost,state=leader,url=http://localhost:5050 allocator/event_queue_dispatches=0,allocator/mesos/allocation_run_latency_ms=0.037888,allocator/mesos/allocation_run_latency_ms/count=1000,allocator/mesos/allocation_run_latency_ms/max=0.174848,allocator/mesos/allocation_run_latency_ms/min=0.006912,allocator/mesos/allocation_run_latency_ms/p50=0.02304,allocator/mesos/allocation_run_latency_ms/p90=0.032,allocator/mesos/allocation_run_latency_ms/p95=0.03584,allocator/mesos/allocation_run_latency_ms/p99=0.04892927999999997,allocator/mesos/allocation_run_latency_ms/p999=0.14211276799999922,allocator/mesos/allocation_run_latency_ms/p9999=0.17157447680000215,allocator/mesos/allocation_run_ms=0.221184,allocator/mesos/allocation_run_ms/count=1000,allocator/mesos/allocation_run_ms/max=0.425984,allocator/mesos/allocation_run_ms/min=0.013824,allocator/mesos/allocation_run_ms/p50=0.147968,allocator/mesos/allocation_run_ms/p90=0.20592640000000004,allocator/mesos/allocation_run_ms/p95=0.22608639999999997,allocator/mesos/allocation_run_ms/p99=0.31811071999999985,allocator/mesos/allocation_run_ms/p999=0.4129410559999997,allocator/mesos/allocation_run_ms/p9999=0.42467970560000085,allocator/mesos/allocation_runs=1926,allocator/mesos/event_queue_dispatches=2,allocator/mesos/resources/cpus/offered_or_allocated=0,allocator/mesos/resources/cpus/total=8,allocator/mesos/resources/disk/offered_or_allocated=0,allocator/mesos/resources/disk/total=411571,allocator/mesos/resources/mem/offered_or_allocated=0,allocator/mesos/resources/mem/total=14760,frameworks/marathon/messages_processed=137,frameworks/marathon/messages_received=137,master/invalid_operation_status_update_acknowledgements=0,master/messages_operation_status_update_acknowledgement=0,master/messages_reconcile_operations=0,master/messages_suppress_offers=0,master/operator_event_stream_subscribers=0,master/slave_unreachable_canceled=0,master/slave_unreachable_completed=0,master/slave_unreachable_scheduled=0,master/slaves_unreachable=0,master/tasks_dropped=0,master/tasks_gone=0,master/tasks_gone_by_operator=0,master/tasks_killing=0,master/tasks_unreachable=0,master/valid_operation_status_update_acknowledgements=0,registrar/log/ensemble_size=1,registrar/log/recovered=1,registrar/queued_operations=0,registrar/registry_size_bytes=287,registrar/state_store_ms/count=2 1563969158000000000
> mesos,role=master,role_name=marathon,server=localhost,state=leader,url=http://localhost:5050 allocator/offer_filters/roles/active=1,allocator/roles/shares/dominant=0 1563969158000000000

Output from this branch (framework_offers only):

> mesos,role=master,server=localhost,state=leader,url=http://localhost:5050 frameworks/marathon/messages_processed=161,frameworks/marathon/messages_received=161,master/invalid_operation_status_update_acknowledgements=0,master/messages_operation_status_update_acknowledgement=0,master/messages_reconcile_operations=0,master/messages_suppress_offers=0,master/operator_event_stream_subscribers=0,master/slave_unreachable_canceled=0,master/slave_unreachable_completed=0,master/slave_unreachable_scheduled=0,master/slaves_unreachable=0,master/tasks_dropped=0,master/tasks_gone=0,master/tasks_gone_by_operator=0,master/tasks_killing=0,master/tasks_unreachable=0,master/valid_operation_status_update_acknowledgements=0,registrar/log/ensemble_size=1,registrar/log/recovered=1,registrar/queued_operations=0,registrar/registry_size_bytes=287,registrar/state_store_ms/count=2 1563969514000000000
> mesos,call_type=acknowledge_operation_status,framework_name=marathon,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/calls=0 1563969514000000000
> mesos,call_type=shutdown,framework_name=marathon,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/calls=0 1563969514000000000
> mesos,call_type=subscribe,framework_name=marathon,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/calls=0 1563969514000000000
> mesos,framework_name=marathon,role=master,server=localhost,state=leader,task_state=task_finished,url=http://localhost:5050 master/frameworks/tasks/terminal=0 1563969514000000000
> mesos,call_type=message,framework_name=marathon,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/calls=0 1563969514000000000
> mesos,call_type=teardown,framework_name=marathon,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/calls=0 1563969514000000000
> mesos,call_type=accept_inverse_offers,framework_name=marathon,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/calls=0 1563969514000000000
> mesos,call_type=accept,framework_name=marathon,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/calls=0 1563969514000000000
> mesos,framework_name=marathon,operation_type=create_disk,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/operations=0 1563969514000000000
> mesos,call_type=decline_inverse_offers,framework_name=marathon,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/calls=0 1563969514000000000
> mesos,framework_name=marathon,role=master,server=localhost,state=leader,task_state=task_error,url=http://localhost:5050 master/frameworks/tasks/terminal=0 1563969514000000000
> mesos,event_type=rescind,framework_name=marathon,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/events=0 1563969514000000000
> mesos,call_type=kill,framework_name=marathon,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/calls=0 1563969514000000000
> mesos,framework_name=marathon,role=master,server=localhost,state=leader,task_state=task_killing,url=http://localhost:5050 master/frameworks/tasks/active=0 1563969514000000000
> mesos,framework_name=marathon,role=master,server=localhost,state=leader,task_state=task_staging,url=http://localhost:5050 master/frameworks/tasks/active=0 1563969514000000000
> mesos,event_type=update_operation_status,framework_name=marathon,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/events=0 1563969514000000000
> mesos,framework_name=marathon,role=master,role_name=marathon,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/roles/suppressed=0 1563969514000000000
> mesos,framework_name=marathon,role=master,server=localhost,state=leader,task_state=task_gone,url=http://localhost:5050 master/frameworks/tasks/terminal=0 1563969514000000000
> mesos,framework_name=marathon,operation_type=reserve,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/operations=0 1563969514000000000
> mesos,framework_name=marathon,role=master,server=localhost,state=leader,task_state=task_dropped,url=http://localhost:5050 master/frameworks/tasks/terminal=0 1563969514000000000
> mesos,framework_name=marathon,role=master,server=localhost,state=leader,task_state=task_gone_by_operator,url=http://localhost:5050 master/frameworks/tasks/terminal=0 1563969514000000000
> mesos,event_type=failure,framework_name=marathon,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/events=0 1563969514000000000
> mesos,framework_name=marathon,role=master,server=localhost,state=leader,task_state=task_unknown,url=http://localhost:5050 master/frameworks/tasks/active=0 1563969514000000000
> mesos,framework_name=marathon,role=master,server=localhost,state=leader,task_state=task_unreachable,url=http://localhost:5050 master/frameworks/tasks/active=0 1563969514000000000
> mesos,event_type=error,framework_name=marathon,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/events=0 1563969514000000000
> mesos,call_type=request,framework_name=marathon,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/calls=0 1563969514000000000
> mesos,framework_name=marathon,operation_type=launch,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/operations=0 1563969514000000000
> mesos,call_type=acknowledge,framework_name=marathon,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/calls=0 1563969514000000000
> mesos,framework_name=marathon,operation_type=destroy_disk,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/operations=0 1563969514000000000
> mesos,framework_name=marathon,operation_type=destroy,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/operations=0 1563969514000000000
> mesos,framework_name=marathon,role=master,server=localhost,state=leader,task_state=task_killed,url=http://localhost:5050 master/frameworks/tasks/terminal=0 1563969514000000000
> mesos,event_type=heartbeat,framework_name=marathon,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/events=0 1563969514000000000
> mesos,event_type=subscribed,framework_name=marathon,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/events=1 1563969514000000000
> mesos,event_type=rescind_inverse_offer,framework_name=marathon,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/events=0 1563969514000000000
> mesos,framework_name=marathon,operation_type=launch_group,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/operations=0 1563969514000000000
> mesos,framework_name=marathon,operation_type=grow_volume,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/operations=0 1563969514000000000
> mesos,framework_name=marathon,role=master,server=localhost,state=leader,task_state=task_lost,url=http://localhost:5050 master/frameworks/tasks/terminal=0 1563969514000000000
> mesos,event_type=update,framework_name=marathon,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/events=143 1563969514000000000
> mesos,call_type=reconcile,framework_name=marathon,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/calls=147 1563969514000000000
> mesos,framework_name=marathon,role=master,server=localhost,state=leader,task_state=task_running,url=http://localhost:5050 master/frameworks/tasks/active=0 1563969514000000000
> mesos,framework_name=marathon,operation_type=unreserve,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/operations=0 1563969514000000000
> mesos,framework_name=marathon,role=master,server=localhost,state=leader,task_state=task_failed,url=http://localhost:5050 master/frameworks/tasks/terminal=0 1563969514000000000
> mesos,framework_name=marathon,role=master,server=localhost,state=leader,task_state=task_starting,url=http://localhost:5050 master/frameworks/tasks/active=0 1563969514000000000
> mesos,event_type=offers,framework_name=marathon,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/events=11 1563969514000000000
> mesos,call_type=reconcile_operations,framework_name=marathon,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/calls=0 1563969514000000000
> mesos,framework_name=marathon,operation_type=shrink_volume,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/operations=0 1563969514000000000
> mesos,call_type=suppress,framework_name=marathon,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/calls=0 1563969514000000000
> mesos,framework_name=marathon,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/calls_total=161,master/frameworks/events_total=155,master/frameworks/offers/accepted=0,master/frameworks/offers/declined=11,master/frameworks/offers/rescinded=0,master/frameworks/offers/sent=11,master/frameworks/operations_total=0,master/frameworks/subscribed_total=1 1563969514000000000
> mesos,event_type=inverse_offers,framework_name=marathon,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/events=0 1563969514000000000
> mesos,framework_name=marathon,operation_type=create,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/operations=0 1563969514000000000
> mesos,call_type=revive,framework_name=marathon,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/calls=3 1563969514000000000
> mesos,call_type=decline,framework_name=marathon,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/calls=11 1563969514000000000
> mesos,event_type=message,framework_name=marathon,role=master,server=localhost,state=leader,url=http://localhost:5050 master/frameworks/events=0 1563969514000000000

Copy link
Contributor

@glinton glinton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following appears sufficient to allow filtering on allocators and framework_offers. Let me know what you think or if I'm way off the mark.

diff --git a/plugins/inputs/mesos/mesos.go b/plugins/inputs/mesos/mesos.go
index 8b322b84..0cf6588c 100644
--- a/plugins/inputs/mesos/mesos.go
+++ b/plugins/inputs/mesos/mesos.go
@@ -380,6 +380,10 @@ func getMetrics(role Role, group string) []string {
                        "master/slaves_connected",
                        "master/slaves_disconnected",
                        "master/slaves_inactive",
+                       "master/slave_unreachable_canceled",
+                       "master/slave_unreachable_completed",
+                       "master/slave_unreachable_scheduled",
+                       "master/slaves_unreachable",
                }
 
                m["frameworks"] = []string{
@@ -405,6 +409,11 @@ func getMetrics(role Role, group string) []string {
                        "master/tasks_running",
                        "master/tasks_staging",
                        "master/tasks_starting",
+                       "master/tasks_dropped",
+                       "master/tasks_gone",
+                       "master/tasks_gone_by_operator",
+                       "master/tasks_killing",
+                       "master/tasks_unreachable",
                }
 
                m["messages"] = []string{
@@ -444,12 +453,18 @@ func getMetrics(role Role, group string) []string {
                        "master/task_lost/source_master/reason_slave_removed",
                        "master/task_lost/source_slave/reason_executor_terminated",
                        "master/valid_executor_to_framework_messages",
+                       "master/invalid_operation_status_update_acknowledgements",
+                       "master/messages_operation_status_update_acknowledgement",
+                       "master/messages_reconcile_operations",
+                       "master/messages_suppress_offers",
+                       "master/valid_operation_status_update_acknowledgements",
                }
 
                m["evqueue"] = []string{
                        "master/event_queue_dispatches",
                        "master/event_queue_http_requests",
                        "master/event_queue_messages",
+                       "master/operator_event_stream_subscribers",
                }
 
                m["registrar"] = []string{
@@ -463,6 +478,11 @@ func getMetrics(role Role, group string) []string {
                        "registrar/state_store_ms/p99",
                        "registrar/state_store_ms/p999",
                        "registrar/state_store_ms/p9999",
+                       "registrar/log/ensemble_size",
+                       "registrar/log/recovered",
+                       "registrar/queued_operations",
+                       "registrar/registry_size_bytes",
+                       "registrar/state_store_ms/count",
                }
        } else if role == SLAVE {
                m["resources"] = []string{
@@ -683,65 +703,22 @@ func (m *Mesos) gatherMainMetrics(u *url.URL, role Role, acc telegraf.Accumulato
                }
        }
 
-       taggedFields := map[string][]TaggedField{}
-       extraTags := map[string]fieldTags{}
-
-       for metricName, val := range jf.Fields {
-               if !strings.HasPrefix(metricName, "master/frameworks/") && !strings.HasPrefix(metricName, "allocator/") {
+       for metricName := range jf.Fields {
+               if !strings.HasPrefix(metricName, "master/frameworks/") && !strings.HasPrefix(metricName, "frameworks/") && !strings.HasPrefix(metricName, "allocator/") {
                        continue
                }
 
                // filter out framework offers/allocator metrics if necessary
-               if (!includeFrameworkOffers && strings.HasPrefix(metricName, "master/frameworks/")) ||
+               if !includeFrameworkOffers &&
+                       (strings.HasPrefix(metricName, "master/frameworks/") || strings.HasPrefix(metricName, "frameworks/")) ||
                        (!includeAllocator && strings.HasPrefix(metricName, "allocator/")) {
                        delete(jf.Fields, metricName)
                        continue
                }
-
-               parts := strings.Split(metricName, "/")
-               if (parts[0] == "master" && len(parts) < 5) || (parts[0] == "allocator" && len(parts) <= 5) {
-                       // All framework offers metrics have at least 5 parts.
-                       // All allocator metrics with <= 5 parts can be sent as is and does not pull
-                       // any params out into tags.
-                       // (e.g. allocator/mesos/allocation_run_ms/count vs allocator/mesos/roles/<role>/shares/dominant)
-                       continue
-               }
-
-               tf := generateTaggedField(parts)
-               tf.Value = val
-
-               if len(tf.tags()) == 0 {
-                       // indicates no extra tags were added
-                       continue
-               }
-
-               tfh := tf.hash()
-               if _, ok := taggedFields[tfh]; !ok {
-                       taggedFields[tfh] = []TaggedField{}
-               }
-               taggedFields[tfh] = append(taggedFields[tfh], tf)
-
-               if _, ok := extraTags[tfh]; !ok {
-                       extraTags[tfh] = tf.tags()
-               }
-
-               delete(jf.Fields, metricName)
        }
 
        acc.AddFields("mesos", jf.Fields, tags)
 
-       for tfh, tfs := range taggedFields {
-               fields := map[string]interface{}{}
-               for _, tf := range tfs {
-                       fields[tf.FieldName] = tf.Value
-               }
-               for k, v := range tags {
-                       extraTags[tfh][k] = v
-               }
-
-               acc.AddFields("mesos", fields, extraTags[tfh])
-       }
-
        return nil
 }

This restores framework and allocator metrics to their original names,
without extracting parts into tags.
@branden
Copy link
Contributor Author

branden commented Aug 2, 2019

Thanks for taking a look, @glinton. You're right, this would break existing queries. I pushed commits that restore the original metric names, per your suggestion.

However, it's been very useful to parse framework names, etc. out of these metrics into tags. Is there a way to accomplish that without breaking backwards compatibility? This seems like a good fit for a processor plugin, though I don't think there's an existing one I can use for this purpose. I think I'll write a small processor tailored for this, if you don't have a better idea.

Copy link
Contributor

@glinton glinton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you on the usefulness of the metrics this originally introduced. I'm not sure whether that's going to be best in a processor or in a plugin of it's own. I'd imagine a plugin of it's own, as processors should be pretty generic.
I can imagine an option to add them to this plugin may work, but we need to be careful with that. We've learned from other plugins that making different measurements based on a config flag can cause more trouble than it solves.

Any thoughts @danielnelson?

// based on presence of "framework_offers"/"allocator" in MasterCols.
// These lines are included to prevent the "unknown" info log below.
m["framework_offers"] = []string{}
m["allocator"] = []string{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we don't list the metrics here and remove the logic in gatherMainMetrics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allocator and framework metric names may include values that can't be predicted, such as framework ID and role (docs here and here). That prevents us from listing all the possible allocator and framework metrics here.

// filter out framework offers/allocator metrics if necessary
if !includeFrameworkOffers &&
(strings.HasPrefix(metricName, "master/frameworks/") || strings.HasPrefix(metricName, "frameworks/")) ||
(!includeAllocator && strings.HasPrefix(metricName, "allocator/")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic ended up a bit too complex for my taste, can you move the filtering into func (m *Mesos) filterMetrics(? I think we could wedge in a switch, something like this:

	for _, k := range metricsDiff(role, selectedMetrics) {
		fmt.Println("group:", k)
		switch k {
		case "allocators":
			// TODO remove all allocator metrics
		case "framework_offers":
			// TODO remove all framework_offers
		default:
            // Removes other categories
			for _, v := range getMetrics(role, k) {
				if _, ok = (*metrics)[v]; ok {
					delete((*metrics), v)
				}
			}
		}
	}

Copy link
Contributor Author

@branden branden Aug 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed 1ee1789. This required a refactor to the test module: I had to move masterMetricNames out of generateMetrics() and make it a global, so that TestMasterFilter() could know which metrics to expect. I did the same with slaveMetricNames just for consistency. LMK what you think!

@danielnelson danielnelson merged commit f5a4d72 into influxdata:master Aug 10, 2019
@branden
Copy link
Contributor Author

branden commented Aug 10, 2019

Thanks for reviewing and merging this, @danielnelson @glinton. Please let me know if you have suggestions for how to add the more useful metrics that this PR originally introduced, while maintaining backwards compatibility. I'm eager to contribute that improvement.

@branden branden deleted the mesos-input-add-metrics branch August 10, 2019 02:01
@danielnelson
Copy link
Contributor

I'm sure we can find a path forward on it. I didn't look much at the original pr, would you be able to write a new feature request issue that lists the changes you would make now if you had full license to break things? This will give me a better idea of how we should go about the changes.

bitcharmer pushed a commit to bitcharmer/telegraf that referenced this pull request Oct 18, 2019
athoune pushed a commit to bearstech/telegraf that referenced this pull request Apr 17, 2020
idohalevi pushed a commit to idohalevi/telegraf that referenced this pull request Sep 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants