It is not possible to use JavaSymbolSolver in a multithreading context #2671

corradomio · 2020-05-14T03:23:43Z

The current implementation of JavaSymbolSolver can be NOT used in a multithreading application. It is necessary to reimplement some classes and to control the access to some internal data structures.

The main problem is that it uses too memory: this because a type resolver is a dictionary and it is necessary to create the hierarchy of type solvers for each source code analyzed. BUT the content of this dictionary is the same for ALL source codes analyzed in parallel.

It is necessary to separate the dictionary from the typesolver: multiple type solvers, same dictionary. For example the JDK is the same for ALL.

Another problem is JavaParserFacade.get(): synchronizid this method we introduce a point of synchronization that can be avoid.

This is only the list of the major problems found.

MysterAitch · 2020-05-14T09:16:15Z

I see that the goal is to deprecate and remove JavaParserFacade in favour of using : #1377

More than happy to have improvements made until that happens though :)

Modules and multi-threading are two of my weakest areas, so some extra support from you on these would be very gratefully received.

Personally I'm a fan of the idea supporting multi-threading, I'm just not sure what is needed to make it happen! Hopefully then we can also run the test suites in parallel thus much faster!! :)

#2671
#2668

MysterAitch · 2020-05-14T23:09:11Z

A reply I got on chat when I asked for comment about this issue is that we would usually advise not using JP across threads whenever similar questions are asked (somewhat paraphrasing here -- note that I'm still quite new and @matozoid has been around a lot longer than I have, so I'm inclined to trust him more than I trust my own views here!).

My presumption is that it would be a substantive amount of work to do well before we could have "thread-safe" being something that could actually be used to describe JavaParser. Or maybe this is an easy/trivial thing to do for someone who knows what they're doing? At the risk of being annoyingly open (and repetitive 😉) about pointing this out, personally I don't know enough to say one way or the other.

Given that you have clearly gone further down this route than I have and it sounds like you are already using a modified/patched/extended version that is able to be used across threads (pending #2668), would you mind sharing some details about this and what would be involved if we were to actively pursue a thread-safe JavaParser?

perhaps there is a distinction to be made between making JP thread-safe, versus something that actively blocks/breaks other people trying to add a thread-safe wrapper/extension to JP?

corradomio · 2020-05-15T01:34:37Z

There are some ""philosophical observations"" to consider: 1) if you must process 30.000 or 100.000 sources (my project) and you have a 64core+HT machine (or, 10core+HT) and ONLY 32GB of ram, it is not smart not to use all the available computing power -> sequential analysis is not a solution! 1.1) it is possible, and simple to create the AST (parsing the code) in parallel. 2) to resolve a symbol you need: 2.1) the source code 2.2) the external libraries 2.3) the rules to follow for the resolution you suppose to remove each ""secondary caching"" mechanism used ONLY to improve the speed. 2.1) and 2.2) are READONLY objects, this means that they can be used in parallel without problems. If they are NOT already analyzed, this can be done in a synchronized way and put the result in a ""primary cache"" 3) the analysis of each source code needs a TypeSolver tree. This tree is the same for each source (this is not correct, but for now it is enough), and each solver tree (assigned to each source file) refers to the SAME source code and external libraries. This means that it is NOT responsibility of the type solver to maintain the dictionary of the symbols, (otherwise you must maintain in memory 64*2 copies of the same information), but it must use a reference to a centralized repository that contains the definitions of 2.1) and 2.2) (a single copy used by ALL solvers) 4) using this approach, you can analyze each source in parallel. To speed up the analysis, it is possible to add some extra "caching" mechanism. There are two contexts: 4.1) at global level, share by ALL threads 4.2) at thread level (single source) But this is only a ""speedup" solution. 5) it is necessary to introduce also another minimal infrastructure. More or less, each ""medium/big project" is composed by 2 or more ""modules"". Each module contains a list of sources and a list of dependencies with other modules and external libraries. This means that to solve the symbols in sources in different modules you must use different type solver trees. This means also that it is not responsibility of the parser to assign a type solver to the compilation unit, because the parser is the same. Ok, it is possible to use multiple parser configured in different way, BUT it is possible that the place where the source is parsed doesn't contain the information to create the solver tree. Now you have two objects: 1) a type solver tree 2) the AST to analyze You can: to assign 1) to 2) (current library implementation), to assign 2) to 1) (my current mplementation). Or you can use another object that contain a reference to the AST, a reference to the solver tree. In this way you can use the same tree multiple times. This is the list of rules I followed to create a ""parallelizable"" version of JavaSymbolSolver. I use a ""custom" version of JavaParserTypeSolver and JarTypeSolvers, plus some ""module"" type solvers: 1) I use a "ByteCodeAnalzer" that cache the content of ALL jars, and a ByteCodeTypeSolver and ByteCodeCollectionTypeSolver as type solvers 2) I split JavaParserTypeSolver in two parts: a JavaParserPool, used to compile the sources of a single module (it is responsible to put in cache the AST, and to collect the list source roots) and the ""new"" JavaParserTypeSolver that contains a link to the pool. You can do ALL this WITHOUT touching the library. BUT the library USES ""JavaParserFacade.get(typeSolver)"" (in several places) that uses a WeakHashMap that it is not thread-safe. A simple ""synchronized"" on the method ""get"" it is able to resolve 99% of the problems. It has no impact with the current implementation, and when used in a single thread application. I am not sure that I solved ALL problems, but I am sure that this is the correct direction. And it is the only direction that I can take.

corradomio · 2020-05-15T01:54:13Z

At the moment, "JavaParserFacade.get()" IS ""the problem""because

it is a GLOBAL/STATIC method
it caches ALL type solvers
it is used in several places inside the library
it is NOT synchronized

I tried to remove it, but I have not implemented this part.
The current ""possible"" solution is

to create a JavaParserFacadeFactory that contains the map typesolver -> JavaParserFacade (the current static part of JavaParserFacade inserted in a separated class)
to extend TypeSolver with a method "getFactory" that return a JavaParserFacadeFactory >

The call (repeat, used in the library implementation)

JavaParserFacade.get(typeSolver)

it will converted in

public static JavaParserFacade get(TypeSolver typeSolver) {
    return typesolver.getRoot().getFactory().get(typeSolver)
}

In this way, the root typeSolver contains the ""thread-local"" cache of type solvers, and with a minimum impact on the current code. The problem is to change the current typeSolver interface.

But I am not sure.

MysterAitch · 2020-05-15T08:25:49Z

It'll take me a few more read-throughs before I can properly understand, and then intelligently reply to, what you have written...

First though, I wanted to drop a note sooner rather than later to say that I really appreciate the detailed and clearly well thought out comments -- thank you!

maartenc · 2020-05-19T21:27:04Z

What is the problem you encounter in a multithreaded environment?
I'm using the JSS on to analyse all our projects (the biggest one has 60000 source files) with 3 concurrent threads. I didn't notice any threading-issue yet.

Maarten

corradomio · 2020-05-19T22:31:17Z

I have already described the problems in the previous posts.
SymbolSolver NOT JavaParser

memory used by type solvers
WeakHaskMap (used in JavaParserFacade.get(...) USED INSIDE JSS) that, when updated in parallel, creates loops in the internal linked list handled by ""WeakHaskMap.table"". Method: "WeakHaskMap.get(...)". In Java 8 source code, file WeakHaskMap.java, line 403.

public V get(Object key) {
        ...
        Entry<K,V> e = tab[index];
        while (e != null) {
            if (e.hash == h && eq(k, e.get()))
                return e.value;
            e = e.next;              <========
        }
        return null;
    }

JavaParser/SymbolSolver 3.15.21 & 3.15.22. Obviously I have not idea if previous versions of the solver have the same problem or not.

I am using 8, 20 or 64 thread (it depends on the hardware used).

MysterAitch added the Improvement Not a bug, but a way that JP can be be enhanced to work better. label May 14, 2020

corradomio changed the title ~~It is not possible to use JavaSymbolResolver in a multithreading context~~ It is not possible to use JavaSymbolSolver in a multithreading context May 15, 2020

This was referenced May 22, 2020

JavaSymbolResolver used in parallel #2668

Closed

Add the modifier synchronized to JavaParserFacade#get #2694

Merged

MysterAitch added the Thread safety / Multithreading label Mar 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

It is not possible to use JavaSymbolSolver in a multithreading context #2671

It is not possible to use JavaSymbolSolver in a multithreading context #2671

corradomio commented May 14, 2020

MysterAitch commented May 14, 2020

MysterAitch commented May 14, 2020

corradomio commented May 15, 2020 via email •

edited

Loading

corradomio commented May 15, 2020 •

edited

Loading

MysterAitch commented May 15, 2020

maartenc commented May 19, 2020 •

edited

Loading

corradomio commented May 19, 2020 •

edited

Loading

It is not possible to use JavaSymbolSolver in a multithreading context #2671

It is not possible to use JavaSymbolSolver in a multithreading context #2671

Comments

corradomio commented May 14, 2020

MysterAitch commented May 14, 2020

MysterAitch commented May 14, 2020

corradomio commented May 15, 2020 via email • edited Loading

corradomio commented May 15, 2020 • edited Loading

MysterAitch commented May 15, 2020

maartenc commented May 19, 2020 • edited Loading

corradomio commented May 19, 2020 • edited Loading

corradomio commented May 15, 2020 via email •

edited

Loading

corradomio commented May 15, 2020 •

edited

Loading

maartenc commented May 19, 2020 •

edited

Loading

corradomio commented May 19, 2020 •

edited

Loading