-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Paper review: lessons learned from building static analysis tools at google #233
Conversation
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
Review of "Lessons from building static analysis tools at Google" Signed-off-by: Alexander Bezzubov <bzz@apache.org>
762d65d
to
10fe848
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, could you break the long lines into shorter easier to review chunks?
Otherwise I end up having too many comments per line to understand anything.
Same for the other PR, please.
|
||
{{% center %}} … {{% /center %}} | ||
|
||
[“Lessons from Building Static Analysis Tools at Google](https://cacm.acm.org/magazines/2018/4/226371-lessons-from-building-static-analysis-tools-at-google/fulltext)” by Caitlin Sadowski, Edward Aftandilian, [Alex Eagle](undefined), Liam Miller-Cushon, Ciera Jaspan presents 2 stories: history of failed attempts of integrating [FindBugs](https://github.com/findbugsproject/findbugs), a static analysis tool for Java at [Google](http://twitter.com/Google), and lessons learned from success story of incorporating extensible analysis framework, a [Tricorder project](https://research.google.com/pubs/pub43322.html), to development workflow at Google. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
learned from the success story
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, a Tricorder project
|
||
## Source code analysis recap | ||
|
||
Before digging deeper into the paper, a bit of the context on what kind of analysis in general is applicable to programs. There are two main type of program analysis: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two main types
|
||
Before digging deeper into the paper, a bit of the context on what kind of analysis in general is applicable to programs. There are two main type of program analysis: | ||
|
||
* *Static code analysis*, looks only at the source code, without running the program |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks only at considers only
Before digging deeper into the paper, a bit of the context on what kind of analysis in general is applicable to programs. There are two main type of program analysis: | ||
|
||
* *Static code analysis*, looks only at the source code, without running the program | ||
* *Dynamic code analysis*, when (potentially customized) program is executed and results are analyzed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dynamic code analysis studies how a program executes over time, without paying much attention to the source code
"Results" here can be confused with analyzing stdout and written files.
|
||
* *Static code analysis*, looks only at the source code, without running the program | ||
* *Dynamic code analysis*, when (potentially customized) program is executed and results are analyzed | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about mentioning a few examples of both types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great suggestion!
But as this is a cross-post (and not a completely new draft), I would at first try keep the factual content intact as much as possible. But will take another look after all other concerns are addressed
|
||
It was a bunch of static HTML files, that later became a database for a web application, produced as a part of nightly build using Maven that were copied and served from a well known URL inside the company. HTMLs would contain tables of potential software defects, obtained by running multiple existing static analysis tools — [FindBugs](https://github.com/findbugsproject/findbugs), [PMD](https://pmd.github.io/) and [CheckStyle](https://github.com/checkstyle/checkstyle). Each defect was attributed to the latest change in the codebase using “git blame” and “assigned” to a particular engineer who introduced the change. | ||
|
||
It’s internal adoption numbers, although driven top-down by the management decision, were *very low* and aligned with the paper — needless to say that not many engineers were motivated enough to go to a separate [http://code-quality.company.com](http://code-quality.company.com) every day, only to find that they are #N by the amount of bugs introduced to the codebase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s internal adoption numbers?..
Finally understood that you mean the Personalized Quality Dashboard. I suggest to avoid "it" here - very confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
go to the separate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PAIN!!! I feel sorry for your first job Alex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored
|
||
It’s internal adoption numbers, although driven top-down by the management decision, were *very low* and aligned with the paper — needless to say that not many engineers were motivated enough to go to a separate [http://code-quality.company.com](http://code-quality.company.com) every day, only to find that they are #N by the amount of bugs introduced to the codebase. | ||
|
||
Curiously enough, one can see some open source projects like i.e Git going though the similar stage right now i.e [https://larsxschneider.github.io/git-scan](https://larsxschneider.github.io/git-scan/) with contributors introducing language-specific analysis tools to the build profiles and publishing a dashboards with the results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"like" or "i.e." - but not both :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
second "i.e." ...
|
||
Curiously enough, one can see some open source projects like i.e Git going though the similar stage right now i.e [https://larsxschneider.github.io/git-scan](https://larsxschneider.github.io/git-scan/) with contributors introducing language-specific analysis tools to the build profiles and publishing a dashboards with the results. | ||
|
||
Despite challenges in adopting such solutions, one can also see companies, i.e [https://scan.coverity.com](https://scan.coverity.com) — a closed-sourced static analysis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Despite the challenges
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait. Is it "i.e." or "e.g."? https://www.grammarly.com/blog/know-your-latin-i-e-vs-e-g/
so actually you should change all "i.e."-s to "e.g."-s.
Despite challenges in adopting such solutions, one can also see companies, i.e [https://scan.coverity.com](https://scan.coverity.com) — a closed-sourced static analysis | ||
solution for Java, C/C++, C#, JavaScript, Ruby and Python [founded in 2006](https://scan.coverity.com/about) jointly with U.S. Department of Homeland Security, being gradually adopted by some OSS projects. | ||
|
||
Companies building rule-based analysis platforms like [https://lgtm.com](https://lgtm.com), offspring of University of Oxford-based [https://semmle.com](https://semmle.com/) founded in 2007, are following this adoption path. Theri success, in my opinion, can be attributed to the fact that both support “hard” native languages like [C++](https://lgtm.com/blog/how_lgtm_builds_cplusplus). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like lgtm - an offspring of the University of Oxford-based semmle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theri -> Their
Companies building rule-based analysis platforms like [https://lgtm.com](https://lgtm.com), offspring of University of Oxford-based [https://semmle.com](https://semmle.com/) founded in 2007, are following this adoption path. Theri success, in my opinion, can be attributed to the fact that both support “hard” native languages like [C++](https://lgtm.com/blog/how_lgtm_builds_cplusplus). | ||
|
||
### 2. 2009 Filing bugs/Fixit | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point I ran out of my time, will continue reviewing soon.
|
||
### 2. 2009 Filing bugs/Fixit | ||
|
||
Next attempt introducing static analysis tools for Java, documented in the paper was filing the results of analysis as bugs in the project bug-tracking system. Then, a companywide dedicated effort was made though a “Fixit” week for all engineers to have a time to clean up those issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The next attempt to integrate static analysis tools for Java which was documented in the paper was filing ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then , the company-wide ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
though -> through
... was made in the format of a "Fix-it" week so that engineers had preallocated time to clean up ...
|
||
This approach has some advantages: | ||
|
||
* it is valid scientific approach as it allows to quantify the results very well: how many of reported issues were actually fixed by developers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is a valid proven
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is that "Proven" in this context does not seem to be used very often and also may rise unnecessary questions like "proven by whom? where is the proof?", etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with "valid" is that it assumes that there is an invalid scientific approach, which is ugly too. How about "good" or "reliable"?
* other researchers use similar approach i.e in early “[Learning Natural Coding Conventions](https://arxiv.org/abs/1402.4182)” paper by [Miltos Allamanis](undefined) and [https://ml4code.github.io](https://ml4code.github.io/) group | ||
> We demonstrate that coding conventions are important to software teams, by showing that 1) empirically, programmers enforce conventions heavily through code review feedback and corrective commits, and 2) **patches that were based on NATURALIZE suggestions have been incorporated** into 5 of the most popular open source Java projects on GitHub — of the 18 patches that we submitted, 14 were accepted | ||
|
||
For organization it has a huge disadvantage though — it’s laborious and hard to scale. If conducted without a proper care, results will not only be ignored by developers, but can also contribute to overall issue-tracker value depreciation for the project. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has a huge con/disadvantage for the organization though
* other researchers use similar approach i.e in early “[Learning Natural Coding Conventions](https://arxiv.org/abs/1402.4182)” paper by [Miltos Allamanis](undefined) and [https://ml4code.github.io](https://ml4code.github.io/) group | ||
> We demonstrate that coding conventions are important to software teams, by showing that 1) empirically, programmers enforce conventions heavily through code review feedback and corrective commits, and 2) **patches that were based on NATURALIZE suggestions have been incorporated** into 5 of the most popular open source Java projects on GitHub — of the 18 patches that we submitted, 14 were accepted | ||
|
||
For organization it has a huge disadvantage though — it’s laborious and hard to scale. If conducted without a proper care, results will not only be ignored by developers, but can also contribute to overall issue-tracker value depreciation for the project. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without a proper care
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
contribute to the overall
|
||
For organization it has a huge disadvantage though — it’s laborious and hard to scale. If conducted without a proper care, results will not only be ignored by developers, but can also contribute to overall issue-tracker value depreciation for the project. | ||
|
||
Despite that, one can see this approach been used by companies in this field i.e [American Software Safety Reliability Company](http://www.assrc.us), Atlanta-based enterprise that seems to have deep roots in software verifications and somehow [supported by DARPA](http://www.qbitlogic.com/darpa-bigcode/), to do archive the same — test some of their products like [https://www.mycode.ai](https://www.mycode.ai/) solution, that is planned to deploy across all of the U.S. Department of Defense software development divisions, i.e on [Git, popular OSS project](https://public-inbox.org/git/CAGm8dMApDdLEzeKU-h16G0NSpnuk9LMTWA29t4MxO1qcNpUvhA@mail.gmail.com/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"to do archive the same"? did you mean achieve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
divisions -> departments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
second one taken from http://www.qbitlogic.com/darpa-bigcode/
* presence of false-positives in FindBugs results made developers to lose confidence in the tool as a whole | ||
* customization of results view per-developer lead to an inconsistent view of analysis outcome | ||
|
||
## What worked & Lessons learned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lessons -> lessons ? or do you refer to the paper's headers
|
||
## What worked & Lessons learned | ||
|
||
As opposed to integrating each particular analysis tool in a different way, an internal “platform” — easily extensible and with support for many different kinds of program-analysis tools, including static and dynamic analyses, was built, known as [Tricorder project](https://research.google.com/pubs/pub43322.html). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an internal -> the internal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
known as -> named as
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I belive it's still correct to keep it as "an internal platform ... known as Ticoder project"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW how about rephrasing:
"platform" was built which is easily extensible and supports many different kinds of program-analysis tools including static and dynamic analyses. It is known as Tricorder
|
||
As opposed to integrating each particular analysis tool in a different way, an internal “platform” — easily extensible and with support for many different kinds of program-analysis tools, including static and dynamic analyses, was built, known as [Tricorder project](https://research.google.com/pubs/pub43322.html). | ||
|
||
As it was taking into account all the lessons learned from the history above, it managed to re-gain the trust of users and proved to be a success inside Google. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it took into account
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
history -> stories
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-gain -> recover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the users
|
||
As it was taking into account all the lessons learned from the history above, it managed to re-gain the trust of users and proved to be a success inside Google. | ||
|
||
Paper contains few lessons like *Developer happiness is key* and *Crowdsource analysis development* that are nice, but I would rather to highlight instead a few key takeaways that seems to drive rest of the technical decisions, responsible for success of new analysis platform. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few lessons
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather highlight
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seond time a few, consider a synonym
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems -> seem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drive the rest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decisions which are responsible for success of the new
Paper contains few lessons like *Developer happiness is key* and *Crowdsource analysis development* that are nice, but I would rather to highlight instead a few key takeaways that seems to drive rest of the technical decisions, responsible for success of new analysis platform. | ||
|
||
There are two main takeaways that drove the overall tooling design: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This time I stopped here.
### 1. Best way to **measure a success of analysis** | ||
> by number of bugs fixed (or prevented), not the number of issues identified | ||
|
||
This way of measuring success have few notable implications |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have -> has a few
a few -> several?
|
||
This way of measuring success have few notable implications | ||
|
||
* If a tool that finds a bug, also suggests a fix - it will be much more successful using this metrics. This, by necessity, constraints the scope of possible analysis and a tooling required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the tool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, also
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these metrics or this metric
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This of course/consequently constraints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scope of the possible
the tooling
|
||
* If a tool that finds a bug, also suggests a fix - it will be much more successful using this metrics. This, by necessity, constraints the scope of possible analysis and a tooling required | ||
|
||
* It also means that repairing programs is important. For a discussion of tooling available for code transformation see **Technical Details** section below. Learning such modification from examples, instead of manual coding by engineers is also a bleeding edge research in academia [https://github.com/KTH/learning4repair](https://github.com/KTH/learning4repair) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See Technical Details section below for the discussion of tooling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modification -> modifications
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is a bleeding edge research topic
|
||
This immediately implies that **reporting issues sooner is better.** | ||
|
||
That leads to conclusion that the best bet is to integrate checks either ***directly into compilers*** - familiar tools on who’s feedback as errors and warnings developers are already relaying day to day. Or, if that is not possible, _**code review** is a good time_ for new changes — before they are commited to the version control system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To conclude, ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"on who’s feedback"?..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commited -> committed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
That leads to conclusion that the best bet is to integrate checks either ***directly into compilers*** - familiar tools on who’s feedback as errors and warnings developers are already relaying day to day. Or, if that is not possible, _**code review** is a good time_ for new changes — before they are commited to the version control system. | ||
|
||
Criterias that must hold for a ***compile time*** checks: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a
|
||
Although it is not being disclosed, but an attentive reader might have noticed that **Compilation Index** part of the pipeline is very similar to something called [Compilation Database](https://kythe.io/docs/kythe-compilation-database.html) in open source Kythe project. | ||
|
||
It might be interesting to take a close look at example of API for AST query and of transformation for C++. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
close -> closer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the example
the API
{{% grid-cell %}} | ||
This callback will generate a code transformation: for the matched nodes it will replace the matching text of the function name with the “Baz”. | ||
|
||
For code transformations in Java, **Error-Prone** has a similar low-level [patching API](http://errorprone.info/docs/patching), that is very close to native AST manipulation API. Same as for the Clang, it was found to have step learning curve and thus pose a high entry barrier — even an experience engineer would need few weeks, before one can be productive creating a fix suggestions or refactorings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Regarding code transformations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, that is very close
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was found to ... similar to Clang, and thus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a fix suggestions
{{% /grid-cell %}} | ||
{{% /grid %}} | ||
|
||
That is why a higher-level API was built for Java first as [Refaster](https://research.google.com/pubs/pub41876.html) project, that was [integrated in Error-Prone](http://errorprone.info/docs/refaster) later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the higher level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the Refaster project
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, that which
|
||
That is why a higher-level API was built for Java first as [Refaster](https://research.google.com/pubs/pub41876.html) project, that was [integrated in Error-Prone](http://errorprone.info/docs/refaster) later. | ||
|
||
So a usual workflow would look like — after running all the checks and emitting a collection of suggested fixes, shard diffs to a smaller patches, run all the tests over the changes and if passed, submit for code review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a the usual workflow is: after running ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a smaller patches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if they are passed
{{% center %}} … {{% /center %}} | ||
|
||
{{% center %}} | ||
##### Thank you for reading, stay tuned and keep you codebase healthy! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YES!!!
|
😮 Thank you so much for a thoughtful and thorough review! |
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
Thank you again, @vmarkovtsev feedback addressed in e9127ee
Done, although I understand the rationale, for the plain text this requirement seems quite un-friendly to different screen/font sizes, modern editors, semantics of the text and tools like Grammarly, no? 😕 Ready for another pass. |
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
Sorry about the inconvenience, @bzz. Any other improvements to the current flow are welcome. |
draft: true | ||
--- | ||
|
||
A recent paper with empirical research on the application of static code analysis tool caught my attention: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/tool/tools/ ?
and [The Morning Paper](https://blog.acolyer.org/) — I always wanted to experiment with publishing notes. | ||
This will be the first attempt. | ||
|
||
{{% center %}} … {{% /center %}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider using <hr>
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using of HTML tags in blog posts is strictly forbidden, suggestion on adding a shortcode for <hr>
is already tracked under #230
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bzz for keeping yourself following the rules ;)
I'll try find some time next week to develop a {{% separator %}}
shortcode, replacing old {{% center %}} … {{% /center %}}
with the new proposed feature :D
{{% center %}} … {{% /center %}} | ||
|
||
[“Lessons from Building Static Analysis Tools at Google](https://cacm.acm.org/magazines/2018/4/226371-lessons-from-building-static-analysis-tools-at-google/fulltext)” | ||
by Caitlin Sadowski, Edward Aftandilian, [Alex Eagle](undefined), Liam Miller-Cushon, Ciera Jaspan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
undefined?
According to the paper, Google only runs simpler, *intra-procedural* type of analysis — the only one feasible | ||
to run at the scale of 2 billion lines of code. | ||
|
||
Interesting enough, this is somehow different from the approach taken by Facebook’s project |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly
the paper: | ||
|
||
* *Large investment.* Although theoretically better and more complex analysis exists, it will require | ||
the non-trivial engineering effort to scale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
finish every sentence with a .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
||
And those are ClangMR and JavacFlume — projects that are only briefly mentioned in this insightful paper. | ||
|
||
*That is it, thank you for reading. We will post more on papers in this field soon.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the end of the blog post?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it is :)
|
||
{{% center %}} … {{% /center %}} | ||
|
||
Now I will take a liberty and cover a few technical details that were not in the scope of the original paper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be against splitting this into a second blog post?
I think the topic is by itself interesting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to #241
|
||
Project [Error-Prone](https://github.com/google/error-prone) is a compiler extension that is able to perform | ||
arbitrary analysis on the *fully typed AST*. One thing to notice is that one can not get such input by using | ||
only a parser even as advanced as [https://doc.bblf.sh](https://doc.bblf.sh/). Running a full build would be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the name of the project, not the link itself [babelfish](https://doc.bblf.sh)
Same for all other links
application of those fixes to the whole codebase, called JavacFlume — which I would guess looks something like | ||
an Apache Spark job that applies patches in some generic format. | ||
|
||
Here is an example of how a full pipeline looks for C++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:
at the end
alex: | ||
name: Alexander Bezzubov | ||
thumbnail: https://avatars1.githubusercontent.com/u/5582506?s=460&v=4 | ||
bio: "Data engineer" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain a bit more in here 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the simplicity of the description
Looking good, @bzz! |
Thank you very much @campoy, appreciate your feedback! All suggestions make sense to me, will start applying. As soon as it's done will ping you and push it to staging for preview. |
All feedback has been addressed, deployed on staging to https://blog-staging.srcd.run/post/review-building-static-analysis-tools/ @campoy ready for another pass |
@campoy friendly ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bzz Really good post ! I think in general we should try to summarize as much as possible and keep these posts on the shorter side for people who are interested but don't really have that much time to read. I would also add images and add some call to actions at the end i.e Sign up for our weekly newsletter. check out awesome mloncode list, etc
a static analysis tool for Java at [Google](http://twitter.com/Google), and lessons learned from the success | ||
story of incorporating extensible analysis framework, [Tricorder](https://research.google.com/pubs/pub43322.html), | ||
to development workflow at Google. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have an image to illustrate one or both of these stories ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, not really
the paper: | ||
|
||
* *Large investment.* Although theoretically better and more complex analysis exists, it will require | ||
the non-trivial engineering effort to scale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
||
## History of integrating FindBugs at Google | ||
|
||
### 0. IDE/Editor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should explicit that with an introduction sentence.
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
LGTM I think it's ready to be published early next week. |
@bzz scheduled for next Thursday as you can see in the "Blog & Press" google calendar I created |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from a technical pov, it LGTM 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Excited to see this published today. |
Me to! When it's done will also submit a medium post to the source{d} publication. |
First "paper review" blog post. Cross-post from medium
Closes #213