Infinite Loop in scala.collection.mutable.HashTable #10436

huang-hf · 2017-07-28T02:26:31Z

scala version:2.10.4
java version: java version "1.7.0_79" (64-Bit)

When I run unit test by maven, It cause infinite loop every 3 or 4 times.
I make sure the HashMap or HashTable only visit or update in single thread.
My unit test mostly like:

val hashmap = scala.collection.mutable.HashMap[String, Double]()
hashmap("key") = 1.0

I search others had same problem.
Also see:

It seems that has this problem few year ago.
It was already solved? If it was, please show the bug URL. Thank you.

Jstack info show below:
nid=0x39d9,nid=0x39a2 this two threads always had mostly 100% cpu-use in 'top' command in Linux.

2017-07-27 16:02:57
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.72-b15 mixed mode):

"Executor task launch worker-6" #1260 daemon prio=5 os_prio=0 tid=0x00007fc8903ea000 nid=0x39d9 runnable [0x00007fc87f8f7000]
java.lang.Thread.State: RUNNABLE
at scala.collection.mutable.HashTable$class.elemEquals(HashTable.scala:347)
at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:39)
at scala.collection.mutable.HashTable$class.scala$collection$mutable$HashTable$$findEntry0(HashTable.scala:134)
at scala.collection.mutable.HashTable$class.findOrAddEntry(HashTable.scala:162)
at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.put(HashMap.scala:75)
at scala.collection.mutable.HashMap.update(HashMap.scala:80)
# business code
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
# business code
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
# business code
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

"Executor task launch worker-5" #1257 daemon prio=5 os_prio=0 tid=0x00007fc8922fc800 nid=0x39a2 runnable [0x00007fc880400000]
java.lang.Thread.State: RUNNABLE
at scala.collection.mutable.HashTable$class.elemEquals(HashTable.scala:347)
at scala.collection.mutable.HashMap.elemEquals(HashMap.scala:39)
at scala.collection.mutable.HashTable$class.scala$collection$mutable$HashTable$$findEntry0(HashTable.scala:134)
at scala.collection.mutable.HashTable$class.findOrAddEntry(HashTable.scala:162)
at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.put(HashMap.scala:75)
at scala.collection.mutable.HashMap.update(HashMap.scala:80)
# business code
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
# business code
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
# business code
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

    # other info.......

JNI global references: 314

The text was updated successfully, but these errors were encountered:

SethTisue · 2017-07-28T03:21:06Z

I don't see anything obvious in the git history or the ticket database on this, but maybe I missed it.

Perhaps someone else will recognize it and comment. Regardless, I am closing the ticket unless the bug can be demonstrated in a current version of Scala.

ediweissmann · 2018-01-28T16:12:19Z

I think I bumped into this using scala 2.12.4

Thread dump stacktrace:

java.lang.Thread.State: RUNNABLE
        at scala.collection.mutable.HashTable.findEntry0(HashTable.scala:139)
        at scala.collection.mutable.HashTable.findEntry(HashTable.scala:135)
        at scala.collection.mutable.HashTable.findEntry$(HashTable.scala:134)
        at scala.collection.mutable.HashMap.findEntry(HashMap.scala:40)
        at scala.collection.mutable.HashMap.contains(HashMap.scala:61)
        at code.util.Functions$.executeOnce(Functions.scala:12)

Relevant code snippet in Functions.scala:

import scala.collection.mutable
[...]
val executed = mutable.Map[String, Boolean]()
[...]
if(!executed.contains(key)) {
      executed.put(key, true)
      fn()
    }

I'm not sure how to reproduce this. I think it happened due to multithreading.
I've now replaced the mutable.Map with ConcurrentHashMap.keysSet() to make my code thread safe.

SethTisue · 2018-01-28T17:27:28Z

scala.collection.mutable.HashTable isn't threadsafe, so weird exceptions if accessed at the same time by multiple threads would be expected behavior

bryce-anderson · 2018-04-16T16:00:41Z

I've also run into a similar issue on 2.11.11, with an infinite loop with stack trace:

scala.collection.mutable.HashTable$class.resize(HashTable.scala:268)
scala.collection.mutable.HashTable$class.scala$collection$mutable$HashTable$$addEntry0(HashTable.scala:157)
scala.collection.mutable.HashTable$class.findOrAddEntry(HashTable.scala:169)
scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:40)
scala.collection.mutable.HashMap.put(HashMap.scala:107)
scala.collection.mutable.HashMap.update(HashMap.scala:112)

This is likely due to concurrent use. I'd have expected and been happy with an exception being thrown.

NthPortal · 2018-04-16T21:26:36Z

Honestly, it is very difficult to make your collections detect concurrent modification, and may have some overhead cost as well. Just look through the source code for java.util.AbstractList for modCount, expectedModCount, and checkForComoditication - there's a lot of tracking going on behind the scenes. Combine that with the fact that Scala's collections are implemented by mixins in a significant way, and it becomes quite difficult to implement and maintain such tracking.

SethTisue · 2018-04-16T22:32:09Z

Agree... I kinda half-wanted to reopen this in response to Bryce's report, but I'd be reluctant to get anybody's hopes up that this is likely to change/improve; I think it isn't.

bryce-anderson · 2018-04-16T22:48:49Z

I think this is a known problem with a number of hash-map implementations. I haven't looked into it, but if there was an easy/performant way to not enter an infinite loop that would be great, but as already noted, it may not be trivial.
That said, for a server, an infinite loop is almost (and depending on the tolerability for corruption, may be) the worst case scenario.

bingbai0912 · 2018-08-07T03:06:45Z

I've also run into a similar issue on 2.11.8, with an infinite loop with stack trace:

Thread 9756: (state = IN_JAVA)
 - Thread 9756: (state = IN_JAVA)
 - scala.collection.mutable.HashTable$class.scala$collection$mutable$HashTable$$findEntry0(scala.collection.mutable.HashTable, java.lang.Object, int) @bci=41, line=136 (Compiled frame; information may be imprecise)
 - scala.collection.mutable.HashTable$class.findEntry(scala.collection.mutable.HashTable, java.lang.Object) @bci=15, line=132 (Interpreted frame)
 - scala.collection.mutable.HashMap.findEntry(java.lang.Object) @bci=2, line=40 (Interpreted frame)
 - scala.collection.mutable.HashMap.get(java.lang.Object) @bci=2, line=70 (Interpreted frame)
 - scala.collection.mutable.MapLike$class.getOrElseUpdate(scala.collection.mutable.MapLike, java.lang.Object, scala.Function0) @bci=2, line=192 (Interpreted frame)
 - scala.collection.mutable.AbstractMap.getOrElseUpdate(java.lang.Object, scala.Function0) @bci=3, line=80 (Interpreted frame)
 - scala.util.parsing.combinator.syntactical.StdTokenParsers$class.keyword(scala.util.parsing.combinator.syntactical.StdTokenParsers, java.lang.String) @bci=16, line=37 (Interpreted frame)
 - scala.util.parsing.json.Parser.keyword(java.lang.String) @bci=2, line=113 (Interpreted frame)
 - scala.util.parsing.json.Parser$$anonfun$jsonObj$2.apply() @bci=6, line=135 (Interpreted frame)
 - scala.util.parsing.json.Parser$$anonfun$jsonObj$2.apply() @bci=1, line=135 (Interpreted frame)
 - scala.util.parsing.combinator.Parsers$Parser.p$lzycompute$4(scala.Function0, scala.runtime.ObjectRef, scala.runtime.VolatileByteRef) @bci=18, line=295 (Interpreted frame)
 - scala.util.parsing.combinator.Parsers$Parser.scala$util$parsing$combinator$Parsers$Parser$$p$5(scala.Function0, scala.runtime.ObjectRef, scala.runtime.VolatileByteRef) @bci=15, line=295 (Interpreted frame)
 - scala.util.parsing.combinator.Parsers$Parser$$anonfun$$less$tilde$1.apply(java.lang.Object) @bci=16, line=296 (Interpreted frame)
 - scala.util.parsing.combinator.Parsers$Parser$$anonfun$$less$tilde$1.apply(java.lang.Object) @bci=2, line=296 (Interpreted frame)
 - scala.util.parsing.combinator.Parsers$Success.flatMapWithNext(scala.Function1) @bci=5, line=143 (Compiled frame)
 - scala.util.parsing.combinator.Parsers$Parser$$anonfun$flatMap$1.apply(scala.util.parsing.input.Reader) @bci=12, line=234 (Interpreted frame)
 - scala.util.parsing.combinator.Parsers$Parser$$anonfun$flatMap$1.apply(java.lang.Object) @bci=5, line=234 (Interpreted frame)
 - scala.util.parsing.combinator.Parsers$$anon$3.apply(scala.util.parsing.input.Reader) @bci=5, line=217 (Interpreted frame)
 - scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(scala.util.parsing.input.Reader) @bci=5, line=237 (Interpreted frame)
 - scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(java.lang.Object) @bci=5, line=237 (Interpreted frame)
 - scala.util.parsing.combinator.Parsers$$anon$3.apply(scala.util.parsing.input.Reader) @bci=5, line=217 (Interpreted frame)
 - scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(scala.util.parsing.input.Reader) @bci=5, line=249 (Interpreted frame)
 - scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(java.lang.Object) @bci=5, line=249 (Interpreted frame)
 - scala.util.parsing.combinator.Parsers$$anon$3.apply(scala.util.parsing.input.Reader) @bci=5, line=217 (Interpreted frame)
 - scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply() @bci=11, line=882 (Interpreted frame)
 - scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply() @bci=1, line=882 (Interpreted frame)
 - scala.util.DynamicVariable.withValue(java.lang.Object, scala.Function0) @bci=14, line=58 (Interpreted frame)
 - scala.util.parsing.combinator.Parsers$$anon$2.apply(scala.util.parsing.input.Reader) @bci=21, line=881 (Interpreted frame)
 - scala.util.parsing.json.JSON$.parseRaw(java.lang.String) @bci=20, line=51 (Interpreted frame)
 - scala.util.parsing.json.JSON$.parseFull(java.lang.String) @bci=2, line=65 (Interpreted frame)

with the code in Spark

val readFromLzo = (path: String) => {sc.textFile(path)
    .map{ row =>
JSON.parseFull(row).getOrElse(Map()).asInstanceOf[Map[String,Any]]}}

Jasper-M · 2018-08-07T08:19:45Z

@bingbai0912 You're using, in a distributed and multi threaded environment, a class that is both deprecated and doesn't pretend to be thread safe. I think that's the issue you need to solve. You'd have to at least make sure that the same parser isn't shared between threads, and probably best to use a JSON parser that's not deprecated.

SethTisue · 2018-08-07T19:06:47Z

@bingbai0912 background on scala.util.parsing.combinator being deprecated: scala/scala-parser-combinators#99

### What changes were proposed in this pull request? `EventLoggingListener.codecMap` change `mutable.HashMap` to `ConcurrentHashMap` ### Why are the changes needed? 2.x version of history server `EventLoggingListener.codecMap` is of type mutable.HashMap, which is not thread safe. This will cause the history server to suddenly get stuck and not work. The 3.x version was changed to `EventLogFileReader.codecMap` to `ConcurrentHashMap` type, so there is no such problem.(SPARK-28869) Multiple threads call `openEventLog`, `codecMap` is updated by multiple threads, `mutable.HashMap` may fall into an infinite loop during `resize`, resulting in history server not working. scala/bug#10436 PID 117049 0x1c939 ![image](https://user-images.githubusercontent.com/3898450/104753904-9239c280-5793-11eb-8a2d-89324ccfb92c.png) ![image](https://user-images.githubusercontent.com/3898450/104753921-9534b300-5793-11eb-99e6-51ac66051d2a.png) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? exist ut Closes #31194 from cxzl25/SPARK-34125. Authored-by: sychen <sychen@ctrip.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>

SethTisue closed this as completed Jul 28, 2017

hzxa21 mentioned this issue Sep 29, 2018

KAFKA-7459: Use thread-safe Pool instead of non-thread-safe mutable.HashMap for requestRateInternal apache/kafka#5717

Merged

3 tasks

cxzl25 mentioned this issue Jan 15, 2021

[SPARK-34125][CORE][2.4] Make EventLoggingListener.codecMap thread-safe apache/spark#31194

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infinite Loop in scala.collection.mutable.HashTable #10436

Infinite Loop in scala.collection.mutable.HashTable #10436

huang-hf commented Jul 28, 2017 •

edited by SethTisue

Loading

SethTisue commented Jul 28, 2017

ediweissmann commented Jan 28, 2018 •

edited

Loading

SethTisue commented Jan 28, 2018 •

edited

Loading

bryce-anderson commented Apr 16, 2018

NthPortal commented Apr 16, 2018

SethTisue commented Apr 16, 2018

bryce-anderson commented Apr 16, 2018

bingbai0912 commented Aug 7, 2018 •

edited

Loading

Jasper-M commented Aug 7, 2018 •

edited

Loading

SethTisue commented Aug 7, 2018 •

edited

Loading

Infinite Loop in scala.collection.mutable.HashTable #10436

Infinite Loop in scala.collection.mutable.HashTable #10436

Comments

huang-hf commented Jul 28, 2017 • edited by SethTisue Loading

SethTisue commented Jul 28, 2017

ediweissmann commented Jan 28, 2018 • edited Loading

SethTisue commented Jan 28, 2018 • edited Loading

bryce-anderson commented Apr 16, 2018

NthPortal commented Apr 16, 2018

SethTisue commented Apr 16, 2018

bryce-anderson commented Apr 16, 2018

bingbai0912 commented Aug 7, 2018 • edited Loading

Jasper-M commented Aug 7, 2018 • edited Loading

SethTisue commented Aug 7, 2018 • edited Loading

huang-hf commented Jul 28, 2017 •

edited by SethTisue

Loading

ediweissmann commented Jan 28, 2018 •

edited

Loading

SethTisue commented Jan 28, 2018 •

edited

Loading

bingbai0912 commented Aug 7, 2018 •

edited

Loading

Jasper-M commented Aug 7, 2018 •

edited

Loading

SethTisue commented Aug 7, 2018 •

edited

Loading