-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add some comments on common pitfalls #4240
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lol, I'm always happy to provide the example of code for people to learn what not to do 😉
Perhaps we can make an entry in the Joern documentation? I suppose a general "Developer Guidelines" section so that it doesn't get hidden and forgotten.
In fact, seems like there is no developer guidelines page on the doc website, only some on the Joern README
@@ -43,6 +43,7 @@ object ProgramSummary { | |||
/** Combines two namespace-to-type maps. | |||
*/ | |||
def combine[T <: TypeLike[_, _]](a: Map[String, Set[T]], b: Map[String, Set[T]]): Map[String, Set[T]] = { | |||
//fixme: Use mutable datatypes, otherwise folds are quadratc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this suggesting that we'd want something more like DiffGraphBuilder.absorb
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
The reason is described in the generalBestPractices.md: We parallelize via map-reduce / fold style, with a combine
function.
In order to avoid quadratic runtime with arbitrary folds, the cost of combining is only allowed to scale linearly (or log-linearly) in the smaller of the two to-be-merged guys.
Alternatively we can (and maybe should?) do something simpler: Use a parallel map to construct all the ProgrammSummaries, and then use a sequential foldl
to combine.
If we do that, we can use a simpler pattern like having a ProgramSummaryBuilder
that uses mutable structures and has two functions: summaryBuilder.addSubSummary(summary: ProramSummary): this.type
and summaryBuilder.build(): ProgramSummary
.
This simpler pattern doesn't work well for general CpgPass, because some of our passes parallelize over a generateParts()
that create millions of parts; but it works well if we only parallelize over e.g. files or methods.
Maybe some of the discussion on time complexity of collection classes can be replaced with a link to the Scala docs on the topic? (I found these very useful when I finally decided I should actually know where each of these types is appropriate.) https://docs.scala-lang.org/overviews/collections-2.13/concrete-immutable-collection-classes.html (These links go to the Scala 2.13 docs, because the Scala 3 book still links there, too.) |
* Migrade Developer Guide from Joern README here * Added @bbrehm's PR content from joernio/joern#4240 * Added an entry for standalone-ext
Before this branch gets ignored into oblivion, I've made a candidate entry on the Joern docs joernio/website#109 |
Don't feel bad about it -- concurrency is hard, lazy iterators are really scary API, and I am also giving examples from the java standard library like |
Yeah, let's put it in the docs. The chances of it being read there are higher. |
@bbrehm then are you happy with joernio/website#109? I'll give it another comb through first then add you as a reviewer once I feel it's ready |
* Migrade Developer Guide from Joern README here * Added @bbrehm's PR content from joernio/joern#4240 * Added an entry for standalone-ext
It's been on my backlog for a while, but will be implementing the mutable program summary maps and concurrent util stream update today |
As pointed out in #4240, combining this nested immutable map-like structure has a quadratic performance, and the more performant strategy would be to use nested data-structures to merge. For now, I've decided not to opt for a builder pattern, but rather keep the underlying structure mutable, and accessor methods return immutable structures.
As pointed out in #4240, combining this nested immutable map-like structure has a quadratic performance, and the more performant strategy would be to use nested data-structures to merge. For now, I've decided not to opt for a builder pattern, but rather keep the underlying structure mutable, and accessor methods return immutable structures.
As pointed out in #4240, combining this nested immutable map-like structure has a quadratic performance, and the more performant strategy would be to use nested data-structures to merge. For now, I've decided not to opt for a builder pattern, but rather keep the underlying structure mutable, and accessor methods return immutable structures.
Closed by #4620 and joernio/website#109 |
This adds some fixme notes on fundamental architecture/API/performance issues I spotted.
I also started to write down some of the general pitfalls. How do we want these general pitfalls documented @fabsx00 ?