Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Marathon input plugin #2369

Closed
wants to merge 4 commits into from
Closed

Conversation

smolse
Copy link
Contributor

@smolse smolse commented Feb 6, 2017

Added an input plugin for gathering metrics from Marathon using REST API provided by it.

Required for all PRs:

  • CHANGELOG.md updated (we recommend not updating this until the PR has been approved by a maintainer)
  • Sign CLA (if not already signed)
  • README.md updated (if adding a new plugin)

@sparrc sparrc added this to the Future Milestone milestone Feb 6, 2017
wg.Add(1)
go func(c string) {
defer wg.Done()
errorChannel <- m.gatherMetrics(c, ":8080", acc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simplify the code, and make the errors logs easier to read by doing acc.AddError(...).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @phemmer, thanks for valuable comments. I am not very experienced in Go and Telegraf yet, so tried to implement the plugin as close to existing plugins as possible. Currently used approach seems to be the most prevalent among the other plugins.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

acc.AddError() is relatively new. It was added because there was a lot of inconsistency among the error handling within plugins, and many of them had problems.
For example, with joining on newline, the problem is that only the first line will contain the timestamp and plugin name, the other lines will not. This can make the logs harder to read.

I really should go through and clean up all the plugins to fix all the error handling now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for the intro. I will update the PR to use acc.AddError() for error handling.

return false
}

func (m *Marathon) filterMetrics(metrics *map[string]interface{}) {
Copy link
Contributor

@phemmer phemmer Feb 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this receiving a pointer to a map? Maps are already a pointer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Fixed.

if contains(m.MetricTypes, k) == false {
delete(*metrics, k)
}
}
Copy link
Contributor

@phemmer phemmer Feb 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


```
for _, mt := range m.MetricTypes {
  delete(*metrics, mt)
}
```

Edit: Nevermind, I misread the code.

@vladshub
Copy link
Contributor

Any updates on this?

@smolse
Copy link
Contributor Author

smolse commented Mar 15, 2017

From my point of view it is ready to be merged.

@deric
Copy link
Contributor

deric commented May 10, 2017

Any progress on this?

@juggie
Copy link

juggie commented May 15, 2017

+1 This would be awesome.

@danielnelson danielnelson modified the milestones: 1.4.0, Future Milestone Jun 1, 2017
@danielnelson
Copy link
Contributor

The current metrics, while matching what Marathon provides, don't fit with the existing metrics produced by Telegraf. We need to think about how we can leverage fields and tags to reduce the number of points and better allow the use of functions when we query.

@smolse Do you think you can take a pass at this?

@nmische
Copy link

nmische commented Jul 25, 2017

+1 We would really like to see this plugin in telegraf.

@danielnelson
Copy link
Contributor

Maybe we can do something similar to the graphite input to transform the metrics to be more in line with our data model. If anyone begins work on this please add a comment here.

@smolse
Copy link
Contributor Author

smolse commented Jul 27, 2017

@danielnelson I have updated the pull request to store each metric in a separate measurement, how does it look now?

@danielnelson
Copy link
Contributor

This is a step in the right direction, but there is still more to do. You may want to read through the schema and data layout documentation first.

I'll try to give a few examples to illustrate some changes that would be desirable. Here is one of the current points, in line protocol format minus the timestamp:

jvm_memory_total_committed value=42
jvm_memory_total_init value=1
jvm_memory_total_max value=100
jvm_memory_total_used value=20

With this model you can't easily write, for instance, a query to calculate the percentage of used memory. This is because each value is part of a different measurement so you cannot use more than one in an InfluxQL function.

Another issue is that the measurement name won't work well with other inputs. If you are collecting from both Marathon and Jolokia, there would be confusion about which service a JVM measurement is from. Usually an input plugin has no more than a handful of measurement names, and I recommend having them all start with marathon, perhaps for this one it would be marathon_jvm_memory?

So after these two transformations we would have all the values as fields with the same measurement name:

marathon_jvm_memory total_committed=42,total_init=1,total_max=100,total_used=20

That is better but some of the metrics can be improved with good use of tags. These two measurement names should make for a good example:

service.mesosphere.chaos.http.ServiceStatusServlet.init
service.mesosphere.marathon.core.task.update.impl.ThrottlingTaskStatusUpdateProcessor.processing
service.mesosphere.marathon.core.task.update.impl.ThrottlingTaskStatusUpdateProcessor.queued

When a measurement has a deeply nested structure like this, usually it contains multiple independent dimensions. Having them as a single item makes it difficult to group by and slow to select subsets of.

I think here it might make sense to have two tags, one for marathon|chaos, and one for the resource. You can probably decompose and name the tags better than I can since you are familiar with Marathon, so don't necessarily use my exact suggestions:

marathon,service=chaos,resource=http.ServiceStatusServlet count=1i,max=1i
marathon,service=marathon,resource=core.task.update.impl.ThrottlingTaskStatusUpdateProcessor processing=1i,queued=2i

One thing that is tricky about this plugin is the large number of metrics. This necessitates determining the measurement name and tags programmatically, which can be difficult if there are any inconsistencies.

@danielnelson
Copy link
Contributor

One thing I should add to my clarify my last few comments. If the format of the marathon stats are inconsistent, it may be impossible to extract tags as I mentioned above automatically. This is where my comment about the graphite parser comes in. It has a template variable which can be used to unpack metrics, perhaps it can be useful. Here is a config file example:

templates = [
    "*.*.* region.region.measurement", # <- all 3-part measurements will match this one.
    "*.*.*.* region.region.host.measurement", # <- all 4-part measurements will match this one.
]

@danielnelson danielnelson added feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin and removed feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin plugin request labels Aug 12, 2017
@danielnelson danielnelson modified the milestones: 1.4.0, 1.5.0 Aug 14, 2017
@jcmartins
Copy link

+1 I really need this plugin in my telegraf.

@danielnelson
Copy link
Contributor

Let's use the dropwizard parser once it is complete to deal with the metric tranformations in this plugin.

@jcmartins
Copy link

Do you have plans to implement Marathon service discovery ?

@danielnelson
Copy link
Contributor

No plans specifically with Marathon, though you might want to read through #272. We will be adding a plugin system for configuration and we could potentially write a plugin that creates input plugins based on what is discovered in Marathon.

@danielnelson danielnelson removed this from the 1.5.0 milestone Nov 29, 2017
@danielnelson
Copy link
Contributor

@smolse: We have merged @atzoum's dropwizard parser on master, we should update this plugin to use it internally for handling the incoming metrics.

@russorat
Copy link
Contributor

russorat commented Apr 9, 2018

@smolse i know this has been open for a while, but are you interested in bringing this across the finish line or should we look for someone to take it over?

@smolse
Copy link
Contributor Author

smolse commented Apr 10, 2018

Unfortunately most likely I will not have time to work on this in the nearest future, please proceed with someone else.

@sjwang90 sjwang90 added the help wanted Request for community participation, code, contribution label Nov 11, 2020
@sjwang90
Copy link
Contributor

If anyone is interested in taking this over from @smolse please go for it!
We can probably get this through even faster as an external plugin that can be used with execd to run seamlessly with Telegraf. Telegraf users would be able to use the plugin sooner by having it as an external plugin since it wouldn't have to go through the typical review process.

Feel free to respond with any questions you may have.

@sjwang90 sjwang90 removed the feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin label Nov 18, 2020
@sjwang90 sjwang90 added the plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins label Jan 26, 2021
@sjwang90
Copy link
Contributor

sjwang90 commented Feb 3, 2021

Closing this PR due to inactivity.

If you would still like this plugin, please submit it as an external plugin that can be used with execd to run seamlessly with Telegraf.

@sjwang90 sjwang90 closed this Feb 3, 2021
@sjwang90 sjwang90 added closed/external-candidate external plugin Plugins that would be ideal external plugin and expedite being able to use plugin w/ Telegraf labels Feb 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external plugin Plugins that would be ideal external plugin and expedite being able to use plugin w/ Telegraf help wanted Request for community participation, code, contribution new plugin plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins
Projects
None yet
Development

Successfully merging this pull request may close these issues.