Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Denial of service when parsing JSON object with keys that have the same hash code #277
Sub-quadratic decreasing of throughput when number of JSON object fields (with keys that have the same hash code) is increasing
On contemporary CPUs parsing of such JSON object (with a sequence of 100000 fields like below that is ~1.6Mb) can took more than 160 seconds:
Below are results of the benchmark where
Reproducible Test Case
To run that benchmarks on your JDK:
See scala/bug#11203 for the underlying issue in Scala HashMaps.
One simple remedy could be to limit the number of fields per object for cases where the parser is run with untrusted data. One hurdle is that we currently don't support any configuration and we need to see how to introduce configuration for the parser.
Btw. thanks a lot, @plokhotnyuk for the report.
For me its with collisions:
(didn't run with 100000)
vs without collisions:
i.e. the slow downs are
What are the options for an immutable map with wrapping? Would that mean that if a user updates the map, the complete wrapped mutable Java map has to be copied? That would be one possibility but it would introduce another performance surprise (though, updating JSON objects might not be a primary use case).
I adapted the benchmark to compare different map implementations under the colliding and a "simple" scenario:
Here's another run just comparing HashMap and TreeMap on non-colliding input:
That would mean that for reasonably sized objects with a size < 100, a TreeMap is only ~6% slower than a HashMap. So changing to a TreeMap for now seems to be a good and simple solution for this particular case.