Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal: Stuck on java.util.HashMap.get? #7478

Closed
maf23 opened this Issue Aug 27, 2014 · 10 comments

Comments

Projects
None yet
3 participants
@maf23
Copy link

commented Aug 27, 2014

We seem to have a problem with stuck threads in an Elasticsearch cluster. It appears at random, but once a thread is stuck it seems to keep being stuck until elasticsearch on that node is restarted. The theads get stuck in a busy loop and the stack trace of one is:

Thread 3744: (state = IN_JAVA)
 - java.util.HashMap.getEntry(java.lang.Object) @bci=72, line=446 (Compiled frame; information may be imprecise)
 - java.util.HashMap.get(java.lang.Object) @bci=11, line=405 (Compiled frame)
 - org.elasticsearch.search.scan.ScanContext$ScanFilter.getDocIdSet(org.apache.lucene.index.AtomicReaderContext, org.apache.lucene.util.Bits) @bci=8, line=156 (Compiled frame)
 - org.elasticsearch.common.lucene.search.ApplyAcceptedDocsFilter.getDocIdSet(org.apache.lucene.index.AtomicReaderContext, org.apache.lucene.util.Bits) @bci=6, line=45 (Compiled frame)
 - org.apache.lucene.search.FilteredQuery$1.scorer(org.apache.lucene.index.AtomicReaderContext, boolean, boolean, org.apache.lucene.util.Bits) @bci=34, line=130 (Compiled frame)
 - org.apache.lucene.search.IndexSearcher.search(java.util.List, org.apache.lucene.search.Weight, org.apache.lucene.search.Collector) @bci=68, line=618 (Compiled frame)
 - org.elasticsearch.search.internal.ContextIndexSearcher.search(java.util.List, org.apache.lucene.search.Weight, org.apache.lucene.search.Collector) @bci=225, line=173 (Compiled frame)
 - org.apache.lucene.search.IndexSearcher.search(org.apache.lucene.search.Query, org.apache.lucene.search.Collector) @bci=11, line=309 (Interpreted frame)
 - org.elasticsearch.search.scan.ScanContext.execute(org.elasticsearch.search.internal.SearchContext) @bci=54, line=52 (Interpreted frame)
 - org.elasticsearch.search.query.QueryPhase.execute(org.elasticsearch.search.internal.SearchContext) @bci=174, line=119 (Compiled frame)
 - org.elasticsearch.search.SearchService.executeScan(org.elasticsearch.search.internal.InternalScrollSearchRequest) @bci=49, line=233 (Interpreted frame)
 - org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(org.elasticsearch.search.internal.InternalScrollSearchRequest, org.elasticsearch.transport.TransportChannel) @bci=8, line=791 (Interpreted frame)
 - org.elasticsearch.search.action.SearchServiceTransportAction$SearchScanScrollTransportHandler.messageReceived(org.elasticsearch.transport.TransportRequest, org.elasticsearch.transport.TransportChannel) @bci=6, line=780 (Interpreted frame)
 - org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run() @bci=12, line=270 (Compiled frame)
 - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1145 (Compiled frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=724 (Interpreted frame)

It looks very much as the known problem of using the non-synchronized HashMap class in a threaded environment, see (http://stackoverflow.com/questions/17070184/hashmap-stuck-on-get). Unfortunately I'm not familiar enough with the es code to know if this can be the issue.

The solution mentioned at the link is to use ConcurrentHashMap instead.

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Aug 27, 2014

this does look like the bug you are referring to. thanks for reporting this!

@maf23

This comment has been minimized.

Copy link
Author

commented Aug 27, 2014

An additional note, we have noted that this seems to happen (at least more frequently) when we post multiple parallel scan queries. Which seems to make sense from what I can see in the stack trace.

@martijnvg

This comment has been minimized.

Copy link
Member

commented Aug 27, 2014

@maf23 Were you using the same scroll id multiple times in the parallel scan queries?

@maf23

This comment has been minimized.

Copy link
Author

commented Aug 27, 2014

The scroll id should be different. But we will check our code to make sure
this is actually the case.

On Wed, Aug 27, 2014 at 2:30 PM, Martijn van Groningen <
notifications@github.com> wrote:

@maf23 https://github.com/maf23 Were you using the same scroll id
multiple times in the parallel scan queries?


Reply to this email directly or view it on GitHub
#7478 (comment)
.

@martijnvg

This comment has been minimized.

Copy link
Member

commented Aug 27, 2014

I can see how this situation can occur if multiple scroll requests are scrolling in parallel with the same scroll id (or same scroll id prefix), the scroll api was never designed to support this. I think we need proper validation if two search requests try to access the same scan context that is open on a node.

@martijnvg

This comment has been minimized.

Copy link
Member

commented Aug 27, 2014

Also running the clear scoll api during a scroll session can cause this bug.

@martijnvg

This comment has been minimized.

Copy link
Member

commented Aug 28, 2014

@maf23 Can you share what jvm version and vendor you're using?

@maf23

This comment has been minimized.

Copy link
Author

commented Aug 28, 2014

Sure, Oracle JVM 1.7.0_25

@martijnvg

This comment has been minimized.

Copy link
Member

commented Aug 28, 2014

Ok thanks, like you mentioned the ConcurrentHashMap should be used here since the map in question is accessed by different threads during the entire scroll.

martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Aug 28, 2014

Scan: Use ConcurrentHashMap instead of HashMap, because the readerSta…
…tes is accessed by multiple threads during the entire scroll session.

Closes elastic#7499
Closes elastic#7478

martijnvg added a commit that referenced this issue Aug 28, 2014

Scan: Use ConcurrentHashMap instead of HashMap, because the readerSta…
…tes is accessed by multiple threads during the entire scroll session.

Closes #7499
Closes #7478

martijnvg added a commit that referenced this issue Aug 28, 2014

Scan: Use ConcurrentHashMap instead of HashMap, because the readerSta…
…tes is accessed by multiple threads during the entire scroll session.

Closes #7499
Closes #7478

@martijnvg martijnvg closed this in 4c690fa Aug 28, 2014

martijnvg added a commit that referenced this issue Aug 28, 2014

Scan: Use ConcurrentHashMap instead of HashMap, because the readerSta…
…tes is accessed by multiple threads during the entire scroll session.

Closes #7499
Closes #7478
@martijnvg

This comment has been minimized.

Copy link
Member

commented Aug 28, 2014

@maf23 Pushed a fix for this bug, which will be included in the next release. Thanks for reporting this!

@clintongormley clintongormley changed the title Stuck on java.util.HashMap.get? Internal: Stuck on java.util.HashMap.get? Sep 8, 2014

martijnvg added a commit that referenced this issue Sep 8, 2014

Scan: Use ConcurrentHashMap instead of HashMap, because the readerSta…
…tes is accessed by multiple threads during the entire scroll session.

Closes #7499
Closes #7478

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015

Scan: Use ConcurrentHashMap instead of HashMap, because the readerSta…
…tes is accessed by multiple threads during the entire scroll session.

Closes elastic#7499
Closes elastic#7478

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015

Scan: Use ConcurrentHashMap instead of HashMap, because the readerSta…
…tes is accessed by multiple threads during the entire scroll session.

Closes elastic#7499
Closes elastic#7478
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.