Skip to content
This repository has been archived by the owner on Sep 2, 2020. It is now read-only.

Add in support for elasticsearch path store #35

Merged
merged 3 commits into from
May 7, 2014
Merged

Conversation

addisonj
Copy link
Contributor

@addisonj addisonj commented May 5, 2014

This adds in support for an elasticsearch path store.

As far as I can tell, its working pretty well. I am going to test this against my cluster today to try and see just how much it helps performance. I will update the PR with the results.

There is currently one minor issue, when you select a path for retrieval, if any metrics have that path as a prefix (foo.bar.baz, foo.bar.baz2), both are included. This should be easy to fix, but wanted to get feedback first.

Implementation notes:
Elastisch is used for the implementation and both the native and rest interfaces are supported via different implementations of the Pathstore protocol.

Paths are stored in elasticsearch as:

{
  "path" : "foo.bar.baz",
  "depth" : 3,
  "leaf": false
}

All paths, both leaf and tree paths, are stored. This allows for quick querying and a wildcard search is used which very closely matches the current behavior.

Also, this is by far the most substantial piece of clojure I have written, so feel free to pull it apart, looking for the feedback as well!

scrollfn (partial esrd/scroll-seq conn)
queryfn (partial esrd/search conn index)]
(if (not (esri/exists? conn "cyanite_paths"))
(esri/create conn "cyanite_paths" :mappings {"path" {:properties {:path {:type "string" :index "not_analyzed"}}}}))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this doesn't take the configured index name if present

(let [depth (path-depth path)
p (str/replace (str/replace path "." "\\.") "*" ".*")
f (vector
{:range {:depth {:from depth :to depth}}}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now using a regexp filter with caching, as far as I can tell, not much performance difference, but it seems to be working well!

@addisonj
Copy link
Contributor Author

addisonj commented May 6, 2014

@pyr addressed those changes and also fixed the bug with multiple graphs drawing.

I have been using this in my cluster since I opened the PR and the difference is pretty drastic. Iterating paths and drawing graphs is now much faster.

There is still some possible improvement that could be made here (such as using bulk updates for paths), but I think this is pretty solid.

One minor note that @brutasse might be interested in, brutasse/graphite-cyanite calls to get the paths for a query and then makes an individual call for each full path. The elasticsearch store supports the same searches for the lookup operation as well. If the memory-store were refactored to do the same, it may allow for graphite-cyanite to just make a single call, which should mean faster render/response times.

Edit: scratch that last bit, apparently that is how it works, apparently I misinterpreted some logs

@pyr
Copy link
Owner

pyr commented May 7, 2014

Thanks a lot @addisonj and @brutasse for the comments.

pyr added a commit that referenced this pull request May 7, 2014
Add in support for elasticsearch path store
@pyr pyr merged commit 4efce29 into pyr:master May 7, 2014
@brutasse
Copy link
Contributor

brutasse commented May 7, 2014

👍

@brutasse
Copy link
Contributor

brutasse commented May 8, 2014

I just deployed this and realize this is rather aggressive with elasticsearch… For every datapoint received it does a GET for every subset of the path.

GET /cyanite-paths/path/collectd
GET /cyanite-paths/path/collectd.georges
GET /cyanite-paths/path/collectd.georges.memory
GET /cyanite-paths/path/collectd.georges.memory.memory-cached

Cyanite went from 1.5% active CPU to 9%.

Not sure of the best way to solve this:

  • combine this with an in-memory cache
  • find another structure for paths to avoid O(n) ES queries
  • batch updates every X minutes or so

Ideas?

@addisonj
Copy link
Contributor Author

addisonj commented May 8, 2014

I actually started on implementing an in memory cache for this same reason,
but then decided to leave it as a later optimization.

I don't think it would be overly complex, but I wanted to get it going
first.

I may try and take a crack at it tomorrow.

One thing to note, it only really seems to matter when the rest interface
is used. Using the native client, the overhead was much lower.
On May 8, 2014 1:01 AM, "Bruno Renié" notifications@github.com wrote:

I just deployed this and realize this is rather aggressive with
elasticsearch… For every datapoint received it does a GET for every
subset of the path.

GET /cyanite-paths/path/collectd
GET /cyanite-paths/path/collectd.georges
GET /cyanite-paths/path/collectd.georges.memory
GET /cyanite-paths/path/collectd.georges.memory.memory-cached

Cyanite went from 1.5% active CPU to 9%.

Not sure of the best way to solve this:

  • combine this with an in-memory cache
  • find another structure for paths to avoid O(n) ES queries
  • batch updates every X minutes or so

Ideas?


Reply to this email directly or view it on GitHubhttps://github.com//pull/35#issuecomment-42518949
.

@brutasse
Copy link
Contributor

brutasse commented May 8, 2014

Indeed, es-native performs better. Who would have though :) It's similar to the in-memory store now.

Still I don't want to put too much pressure on ES so I'm back to the default pathstore.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants