Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add some comments on common pitfalls #4240

Closed
wants to merge 2 commits into from
Closed

Conversation

bbrehm
Copy link
Contributor

@bbrehm bbrehm commented Feb 27, 2024

This adds some fixme notes on fundamental architecture/API/performance issues I spotted.

I also started to write down some of the general pitfalls. How do we want these general pitfalls documented @fabsx00 ?

Copy link
Collaborator

@DavidBakerEffendi DavidBakerEffendi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol, I'm always happy to provide the example of code for people to learn what not to do 😉

Perhaps we can make an entry in the Joern documentation? I suppose a general "Developer Guidelines" section so that it doesn't get hidden and forgotten.

In fact, seems like there is no developer guidelines page on the doc website, only some on the Joern README

@@ -43,6 +43,7 @@ object ProgramSummary {
/** Combines two namespace-to-type maps.
*/
def combine[T <: TypeLike[_, _]](a: Map[String, Set[T]], b: Map[String, Set[T]]): Map[String, Set[T]] = {
//fixme: Use mutable datatypes, otherwise folds are quadratc
Copy link
Collaborator

@DavidBakerEffendi DavidBakerEffendi Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this suggesting that we'd want something more like DiffGraphBuilder.absorb?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

The reason is described in the generalBestPractices.md: We parallelize via map-reduce / fold style, with a combine function.

In order to avoid quadratic runtime with arbitrary folds, the cost of combining is only allowed to scale linearly (or log-linearly) in the smaller of the two to-be-merged guys.

Alternatively we can (and maybe should?) do something simpler: Use a parallel map to construct all the ProgrammSummaries, and then use a sequential foldl to combine.

If we do that, we can use a simpler pattern like having a ProgramSummaryBuilder that uses mutable structures and has two functions: summaryBuilder.addSubSummary(summary: ProramSummary): this.type and summaryBuilder.build(): ProgramSummary.

This simpler pattern doesn't work well for general CpgPass, because some of our passes parallelize over a generateParts() that create millions of parts; but it works well if we only parallelize over e.g. files or methods.

@DavidBakerEffendi DavidBakerEffendi added the documentation Concerns documenting Joern's code or README label Feb 28, 2024
@maltek
Copy link
Contributor

maltek commented Feb 28, 2024

Maybe some of the discussion on time complexity of collection classes can be replaced with a link to the Scala docs on the topic? (I found these very useful when I finally decided I should actually know where each of these types is appropriate.)

https://docs.scala-lang.org/overviews/collections-2.13/concrete-immutable-collection-classes.html
https://docs.scala-lang.org/overviews/collections-2.13/performance-characteristics.html

(These links go to the Scala 2.13 docs, because the Scala 3 book still links there, too.)

DavidBakerEffendi added a commit to joernio/website that referenced this pull request Feb 29, 2024
* Migrade Developer Guide from Joern README here
* Added @bbrehm's PR content from joernio/joern#4240
* Added an entry for standalone-ext
@DavidBakerEffendi
Copy link
Collaborator

Before this branch gets ignored into oblivion, I've made a candidate entry on the Joern docs joernio/website#109

@bbrehm
Copy link
Contributor Author

bbrehm commented Mar 1, 2024

Lol, I'm always happy to provide the example of code for people to learn what not to do 😉

Don't feel bad about it -- concurrency is hard, lazy iterators are really scary API, and I am also giving examples from the java standard library like Collectors.toList(), so you're in very good company ;)

@bbrehm bbrehm requested a review from mpollmeier March 1, 2024 15:28
@fabsx00
Copy link
Contributor

fabsx00 commented Mar 7, 2024

Yeah, let's put it in the docs. The chances of it being read there are higher.

@DavidBakerEffendi
Copy link
Collaborator

Yeah, let's put it in the docs. The chances of it being read there are higher.

@bbrehm then are you happy with joernio/website#109? I'll give it another comb through first then add you as a reviewer once I feel it's ready

DavidBakerEffendi added a commit to joernio/website that referenced this pull request Mar 15, 2024
* Migrade Developer Guide from Joern README here
* Added @bbrehm's PR content from joernio/joern#4240
* Added an entry for standalone-ext
@DavidBakerEffendi
Copy link
Collaborator

It's been on my backlog for a while, but will be implementing the mutable program summary maps and concurrent util stream update today

DavidBakerEffendi added a commit that referenced this pull request May 30, 2024
As pointed out in #4240, combining this nested immutable map-like structure has a quadratic performance, and the more performant strategy would be to use nested data-structures to merge.

For now, I've decided not to opt for a builder pattern, but rather keep the underlying structure mutable, and accessor methods return immutable structures.
DavidBakerEffendi added a commit that referenced this pull request May 30, 2024
As pointed out in #4240, combining this nested immutable map-like structure has a quadratic performance, and the more performant strategy would be to use nested data-structures to merge.

For now, I've decided not to opt for a builder pattern, but rather keep the underlying structure mutable, and accessor methods return immutable structures.
DavidBakerEffendi added a commit that referenced this pull request May 30, 2024
As pointed out in #4240, combining this nested immutable map-like structure has a quadratic performance, and the more performant strategy would be to use nested data-structures to merge.

For now, I've decided not to opt for a builder pattern, but rather keep the underlying structure mutable, and accessor methods return immutable structures.
@DavidBakerEffendi
Copy link
Collaborator

DavidBakerEffendi commented May 30, 2024

Closed by #4620 and joernio/website#109

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Concerns documenting Joern's code or README
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants