Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a circuit breaker to prevent searches from bringing down a node #2929

Closed
maioriel opened this issue Apr 23, 2013 · 20 comments
Closed

Comments

@maioriel
Copy link

One of the fears that I have when using ElasticSearch is that expensive queries can bring down nodes in my cluster.

It would be really nice if ElasticSearch could detect this type of node-killing event by adding logic that would trigger a circuit breaker and kill the offending query, leaving my node intact. For example, if a search takes X% of the heap, the query would be killed by ElasticSearch. It would be useful to expose the X% of heap_size as a configurable value since the level of concurrency of the system would vary by ES installation.

Another feature that would be helpful is when the circuit breaker is tripped, a response is generated from ElasticSearch saying that the query died from using excess memory.

@tmkujala
Copy link

This is exactly what I'm looking for as well! One of my requirements is to provide open API access to our ElasticSearch data for developers to run adhoc queries. There is a very real possibility that one of them may execute a bad query bringing down a single node or much worse multiple nodes in my cluster.

What would make this feature even better, is additional performance monitoring for what queries are running at any given time and what queries have been run as well as performance metrics for them.

@tlieblfs
Copy link

+1

@s1monw
Copy link
Contributor

s1monw commented Apr 26, 2013

Hey folks, I want to jump in here and tell you that this is something that is pretty high on our wish-list as well. With the foundations 0.90 will bring we can approach things like this much easier and maybe more important more reliable. I might jump in here and have a first cut at this pretty soon.

@ghost ghost assigned s1monw Apr 26, 2013
@rore
Copy link

rore commented May 8, 2013

+1

2 similar comments
@btiernay
Copy link

👍

@lmenezes
Copy link
Contributor

lmenezes commented Jun 5, 2013

+1

@nik9000
Copy link
Member

nik9000 commented Aug 7, 2013

+1
Certainly it'd be cool to get a list of running queries and be able to kill them if they are running wild. That'd be a wonderful first start to anything along these lines.

@avleen
Copy link

avleen commented Nov 20, 2013

@s1monw any update on this? We have some really large indices, and big searches over terrabytes of data can bring down the cluster right now because the searches just keep going forever :-(

@dakrone
Copy link
Member

dakrone commented Nov 20, 2013

@avleen we are actively developing this, so hopefully soon!

@dakrone
Copy link
Member

dakrone commented Nov 26, 2013

Related: #4261

@lukas-vlcek
Copy link
Contributor

Interesting!

BTW, is there any impact on bulk operations? Like bulk update? Meaning once the circuit breaks the bulk operation will still go on but all remaining updates targeting particular shard will not make it?
(Also might impact #2230 if implemented in the future?)

@dakrone
Copy link
Member

dakrone commented Jan 28, 2014

Closing this issue since #4261 landed.

@dakrone dakrone closed this as completed Jan 28, 2014
@roncemer
Copy link

I'd love to see ES automatically detect when a query is going to use more than a certain percentage of the heap, and automatically use temporary files to do its sorting, merging and so on. That would give it the ability to run arbitrary queries (like MySQL) without bringing down the node. The query would just take a long time to run. And in many cases, that's absolutely fine -- especially when doing aggregations and similar analytic queries.

@avleen
Copy link

avleen commented Oct 29, 2014

I wouldn't say many cases. Maybe in some cases :-)
The problem with using disk for this is that you can increase the IO and
also hurt other queries, and again bring down s node. Elasticsearch is
quite sensitive to IO bandwidth. But it would certainly be nice to have the
option.

On Wed, Oct 29, 2014, 17:15 roncemer notifications@github.com wrote:

I'd love to see ES automatically detect when a query is going to use more
than a certain percentage of the heap, and automatically use temporary
files to do its sorting, merging and so on. That would give it the ability
to run arbitrary queries (like MySQL) without bringing down the node. The
query would just take a long time to run. And in many cases, that's
absolutely fine -- especially when doing aggregations and similar analytic
queries.

Reply to this email directly or view it on GitHub
#2929 (comment)
.

@kimchy
Copy link
Member

kimchy commented Oct 30, 2014

just putting note here, that though not "on demand", doc values as an option (using on disk storage for certain expensive, memory wise, fields that are used for aggs and/or sorting). A lot of progress has been made both in Lucene and ES to make them faster, 1.4 would be a huge step forward, and the following ES version that would work with Lucene 5 will be even better. We are heavily investing both in Lucene and ES to make this a performant and viable option.

@avleen
Copy link

avleen commented Oct 30, 2014

Shay, I think we'd noticed a significant I/O impact (probably caused by
more writes?) with doc values.
Do the recent changes improve that situation?

On Wed, Oct 29, 2014, 21:03 Shay Banon notifications@github.com wrote:

just putting note here, that though not "on demand", doc values as an
option (using on disk storage for certain expensive, memory wise, fields
that are used for aggs and/or sorting). A lot of progress has been made
both in Lucene and ES to make them faster, 1.4 would be a huge step
forward, and the following ES version that would work with Lucene 5 will be
even better. We are heavily investing both in Lucene and ES to make this a
performant and viable option.

Reply to this email directly or view it on GitHub
#2929 (comment)
.

@xelldran1
Copy link

+1

1 similar comment
@diannamcallister
Copy link

+1

@vipul-mykaarma
Copy link

any update on this ?

@rebeccahum
Copy link

Hiya, following up too on ^^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests