Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paper review: lessons learned from building static analysis tools at google #233

Merged
merged 10 commits into from
Sep 27, 2018

Conversation

bzz
Copy link
Contributor

@bzz bzz commented Jul 27, 2018

First "paper review" blog post. Cross-post from medium

Closes #213

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
@bzz bzz requested review from campoy and dpordomingo July 27, 2018 08:20
Review of "Lessons from building static analysis tools at Google"

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
Copy link
Contributor

@campoy campoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, could you break the long lines into shorter easier to review chunks?

Otherwise I end up having too many comments per line to understand anything.

Same for the other PR, please.


{{% center %}} … {{% /center %}}

[“Lessons from Building Static Analysis Tools at Google](https://cacm.acm.org/magazines/2018/4/226371-lessons-from-building-static-analysis-tools-at-google/fulltext)” by Caitlin Sadowski, Edward Aftandilian, [Alex Eagle](undefined), Liam Miller-Cushon, Ciera Jaspan presents 2 stories: history of failed attempts of integrating [FindBugs](https://github.com/findbugsproject/findbugs), a static analysis tool for Java at [Google](http://twitter.com/Google), and lessons learned from success story of incorporating extensible analysis framework, a [Tricorder project](https://research.google.com/pubs/pub43322.html), to development workflow at Google.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

learned from the success story

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, a Tricorder project


## Source code analysis recap

Before digging deeper into the paper, a bit of the context on what kind of analysis in general is applicable to programs. There are two main type of program analysis:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two main types


Before digging deeper into the paper, a bit of the context on what kind of analysis in general is applicable to programs. There are two main type of program analysis:

* *Static code analysis*, looks only at the source code, without running the program
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks only at considers only

Before digging deeper into the paper, a bit of the context on what kind of analysis in general is applicable to programs. There are two main type of program analysis:

* *Static code analysis*, looks only at the source code, without running the program
* *Dynamic code analysis*, when (potentially customized) program is executed and results are analyzed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dynamic code analysis studies how a program executes over time, without paying much attention to the source code

"Results" here can be confused with analyzing stdout and written files.


* *Static code analysis*, looks only at the source code, without running the program
* *Dynamic code analysis*, when (potentially customized) program is executed and results are analyzed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about mentioning a few examples of both types?

Copy link
Contributor Author

@bzz bzz Aug 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great suggestion!

But as this is a cross-post (and not a completely new draft), I would at first try keep the factual content intact as much as possible. But will take another look after all other concerns are addressed


It was a bunch of static HTML files, that later became a database for a web application, produced as a part of nightly build using Maven that were copied and served from a well known URL inside the company. HTMLs would contain tables of potential software defects, obtained by running multiple existing static analysis tools — [FindBugs](https://github.com/findbugsproject/findbugs), [PMD](https://pmd.github.io/) and [CheckStyle](https://github.com/checkstyle/checkstyle). Each defect was attributed to the latest change in the codebase using “git blame” and “assigned” to a particular engineer who introduced the change.

It’s internal adoption numbers, although driven top-down by the management decision, were *very low* and aligned with the paper — needless to say that not many engineers were motivated enough to go to a separate [http://code-quality.company.com](http://code-quality.company.com) every day, only to find that they are #N by the amount of bugs introduced to the codebase.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s internal adoption numbers?..

Finally understood that you mean the Personalized Quality Dashboard. I suggest to avoid "it" here - very confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go to the separate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PAIN!!! I feel sorry for your first job Alex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored


It’s internal adoption numbers, although driven top-down by the management decision, were *very low* and aligned with the paper — needless to say that not many engineers were motivated enough to go to a separate [http://code-quality.company.com](http://code-quality.company.com) every day, only to find that they are #N by the amount of bugs introduced to the codebase.

Curiously enough, one can see some open source projects like i.e Git going though the similar stage right now i.e [https://larsxschneider.github.io/git-scan](https://larsxschneider.github.io/git-scan/) with contributors introducing language-specific analysis tools to the build profiles and publishing a dashboards with the results.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"like" or "i.e." - but not both :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

second "i.e." ...


Curiously enough, one can see some open source projects like i.e Git going though the similar stage right now i.e [https://larsxschneider.github.io/git-scan](https://larsxschneider.github.io/git-scan/) with contributors introducing language-specific analysis tools to the build profiles and publishing a dashboards with the results.

Despite challenges in adopting such solutions, one can also see companies, i.e [https://scan.coverity.com](https://scan.coverity.com) — a closed-sourced static analysis
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite the challenges

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait. Is it "i.e." or "e.g."? https://www.grammarly.com/blog/know-your-latin-i-e-vs-e-g/

so actually you should change all "i.e."-s to "e.g."-s.

Despite challenges in adopting such solutions, one can also see companies, i.e [https://scan.coverity.com](https://scan.coverity.com) — a closed-sourced static analysis
solution for Java, C/C++, C#, JavaScript, Ruby and Python [founded in 2006](https://scan.coverity.com/about) jointly with U.S. Department of Homeland Security, being gradually adopted by some OSS projects.

Companies building rule-based analysis platforms like [https://lgtm.com](https://lgtm.com), offspring of University of Oxford-based [https://semmle.com](https://semmle.com/) founded in 2007, are following this adoption path. Theri success, in my opinion, can be attributed to the fact that both support “hard” native languages like [C++](https://lgtm.com/blog/how_lgtm_builds_cplusplus).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like lgtm - an offspring of the University of Oxford-based semmle

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theri -> Their

Companies building rule-based analysis platforms like [https://lgtm.com](https://lgtm.com), offspring of University of Oxford-based [https://semmle.com](https://semmle.com/) founded in 2007, are following this adoption path. Theri success, in my opinion, can be attributed to the fact that both support “hard” native languages like [C++](https://lgtm.com/blog/how_lgtm_builds_cplusplus).

### 2. 2009 Filing bugs/Fixit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point I ran out of my time, will continue reviewing soon.


### 2. 2009 Filing bugs/Fixit

Next attempt introducing static analysis tools for Java, documented in the paper was filing the results of analysis as bugs in the project bug-tracking system. Then, a companywide dedicated effort was made though a “Fixit” week for all engineers to have a time to clean up those issues.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next attempt to integrate static analysis tools for Java which was documented in the paper was filing ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then , the company-wide ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though -> through

... was made in the format of a "Fix-it" week so that engineers had preallocated time to clean up ...


This approach has some advantages:

* it is valid scientific approach as it allows to quantify the results very well: how many of reported issues were actually fixed by developers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is a valid proven

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that "Proven" in this context does not seem to be used very often and also may rise unnecessary questions like "proven by whom? where is the proof?", etc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with "valid" is that it assumes that there is an invalid scientific approach, which is ugly too. How about "good" or "reliable"?

* other researchers use similar approach i.e in early “[Learning Natural Coding Conventions](https://arxiv.org/abs/1402.4182)” paper by [Miltos Allamanis](undefined) and [https://ml4code.github.io](https://ml4code.github.io/) group
> We demonstrate that coding conventions are important to software teams, by showing that 1) empirically, programmers enforce conventions heavily through code review feedback and corrective commits, and 2) **patches that were based on NATURALIZE suggestions have been incorporated** into 5 of the most popular open source Java projects on GitHub — of the 18 patches that we submitted, 14 were accepted

For organization it has a huge disadvantage though — it’s laborious and hard to scale. If conducted without a proper care, results will not only be ignored by developers, but can also contribute to overall issue-tracker value depreciation for the project.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has a huge con/disadvantage for the organization though

* other researchers use similar approach i.e in early “[Learning Natural Coding Conventions](https://arxiv.org/abs/1402.4182)” paper by [Miltos Allamanis](undefined) and [https://ml4code.github.io](https://ml4code.github.io/) group
> We demonstrate that coding conventions are important to software teams, by showing that 1) empirically, programmers enforce conventions heavily through code review feedback and corrective commits, and 2) **patches that were based on NATURALIZE suggestions have been incorporated** into 5 of the most popular open source Java projects on GitHub — of the 18 patches that we submitted, 14 were accepted

For organization it has a huge disadvantage though — it’s laborious and hard to scale. If conducted without a proper care, results will not only be ignored by developers, but can also contribute to overall issue-tracker value depreciation for the project.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without a proper care

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contribute to the overall


For organization it has a huge disadvantage though — it’s laborious and hard to scale. If conducted without a proper care, results will not only be ignored by developers, but can also contribute to overall issue-tracker value depreciation for the project.

Despite that, one can see this approach been used by companies in this field i.e [American Software Safety Reliability Company](http://www.assrc.us), Atlanta-based enterprise that seems to have deep roots in software verifications and somehow [supported by DARPA](http://www.qbitlogic.com/darpa-bigcode/), to do archive the same — test some of their products like [https://www.mycode.ai](https://www.mycode.ai/) solution, that is planned to deploy across all of the U.S. Department of Defense software development divisions, i.e on [Git, popular OSS project](https://public-inbox.org/git/CAGm8dMApDdLEzeKU-h16G0NSpnuk9LMTWA29t4MxO1qcNpUvhA@mail.gmail.com/).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"to do archive the same"? did you mean achieve?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

divisions -> departments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* presence of false-positives in FindBugs results made developers to lose confidence in the tool as a whole
* customization of results view per-developer lead to an inconsistent view of analysis outcome

## What worked & Lessons learned
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lessons -> lessons ? or do you refer to the paper's headers


## What worked & Lessons learned

As opposed to integrating each particular analysis tool in a different way, an internal “platform” — easily extensible and with support for many different kinds of program-analysis tools, including static and dynamic analyses, was built, known as [Tricorder project](https://research.google.com/pubs/pub43322.html).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an internal -> the internal

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

known as -> named as

Copy link
Contributor Author

@bzz bzz Aug 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I belive it's still correct to keep it as "an internal platform ... known as Ticoder project"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW how about rephrasing:

"platform" was built which is easily extensible and supports many different kinds of program-analysis tools including static and dynamic analyses. It is known as Tricorder


As opposed to integrating each particular analysis tool in a different way, an internal “platform” — easily extensible and with support for many different kinds of program-analysis tools, including static and dynamic analyses, was built, known as [Tricorder project](https://research.google.com/pubs/pub43322.html).

As it was taking into account all the lessons learned from the history above, it managed to re-gain the trust of users and proved to be a success inside Google.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it took into account

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

history -> stories

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-gain -> recover?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the users


As it was taking into account all the lessons learned from the history above, it managed to re-gain the trust of users and proved to be a success inside Google.

Paper contains few lessons like *Developer happiness is key* and *Crowdsource analysis development* that are nice, but I would rather to highlight instead a few key takeaways that seems to drive rest of the technical decisions, responsible for success of new analysis platform.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few lessons

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather highlight

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seond time a few, consider a synonym

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems -> seem

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive the rest

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decisions which are responsible for success of the new

Paper contains few lessons like *Developer happiness is key* and *Crowdsource analysis development* that are nice, but I would rather to highlight instead a few key takeaways that seems to drive rest of the technical decisions, responsible for success of new analysis platform.

There are two main takeaways that drove the overall tooling design:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This time I stopped here.

### 1. Best way to **measure a success of analysis**
> by number of bugs fixed (or prevented), not the number of issues identified

This way of measuring success have few notable implications
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have -> has a few

a few -> several?


This way of measuring success have few notable implications

* If a tool that finds a bug, also suggests a fix - it will be much more successful using this metrics. This, by necessity, constraints the scope of possible analysis and a tooling required
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the tool

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, also

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these metrics or this metric

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This of course/consequently constraints

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scope of the possible

the tooling


* If a tool that finds a bug, also suggests a fix - it will be much more successful using this metrics. This, by necessity, constraints the scope of possible analysis and a tooling required

* It also means that repairing programs is important. For a discussion of tooling available for code transformation see **Technical Details** section below. Learning such modification from examples, instead of manual coding by engineers is also a bleeding edge research in academia [https://github.com/KTH/learning4repair](https://github.com/KTH/learning4repair)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See Technical Details section below for the discussion of tooling

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modification -> modifications

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is a bleeding edge research topic


This immediately implies that **reporting issues sooner is better.**

That leads to conclusion that the best bet is to integrate checks either ***directly into compilers*** - familiar tools on who’s feedback as errors and warnings developers are already relaying day to day. Or, if that is not possible, _**code review** is a good time_ for new changes — before they are commited to the version control system.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To conclude, ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"on who’s feedback"?..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commited -> committed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


That leads to conclusion that the best bet is to integrate checks either ***directly into compilers*** - familiar tools on who’s feedback as errors and warnings developers are already relaying day to day. Or, if that is not possible, _**code review** is a good time_ for new changes — before they are commited to the version control system.

Criterias that must hold for a ***compile time*** checks:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a


Although it is not being disclosed, but an attentive reader might have noticed that **Compilation Index** part of the pipeline is very similar to something called [Compilation Database](https://kythe.io/docs/kythe-compilation-database.html) in open source Kythe project.

It might be interesting to take a close look at example of API for AST query and of transformation for C++.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

close -> closer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the example
the API

{{% grid-cell %}}
This callback will generate a code transformation: for the matched nodes it will replace the matching text of the function name with the “Baz”.

For code transformations in Java, **Error-Prone** has a similar low-level [patching API](http://errorprone.info/docs/patching), that is very close to native AST manipulation API. Same as for the Clang, it was found to have step learning curve and thus pose a high entry barrier — even an experience engineer would need few weeks, before one can be productive creating a fix suggestions or refactorings.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Regarding code transformations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, that is very close

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was found to ... similar to Clang, and thus

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, before

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a fix suggestions

{{% /grid-cell %}}
{{% /grid %}}

That is why a higher-level API was built for Java first as [Refaster](https://research.google.com/pubs/pub41876.html) project, that was [integrated in Error-Prone](http://errorprone.info/docs/refaster) later.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the higher level

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Refaster project

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, that which


That is why a higher-level API was built for Java first as [Refaster](https://research.google.com/pubs/pub41876.html) project, that was [integrated in Error-Prone](http://errorprone.info/docs/refaster) later.

So a usual workflow would look like — after running all the checks and emitting a collection of suggested fixes, shard diffs to a smaller patches, run all the tests over the changes and if passed, submit for code review.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a the usual workflow is: after running ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a smaller patches

Copy link
Contributor

@vmarkovtsev vmarkovtsev Aug 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if they are passed

{{% center %}} … {{% /center %}}

{{% center %}}
##### Thank you for reading, stay tuned and keep you codebase healthy!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YES!!!

@vmarkovtsev
Copy link
Contributor

, means removing a comma

@bzz
Copy link
Contributor Author

bzz commented Aug 8, 2018

😮 Thank you so much for a thoughtful and thorough review!
Will address asap 🚀

bzz added 2 commits August 8, 2018 20:05
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
@bzz
Copy link
Contributor Author

bzz commented Aug 8, 2018

Thank you again, @vmarkovtsev feedback addressed in e9127ee

@campoy

Hey, could you break the long lines into shorter easier to review chunks?

Done, although I understand the rationale, for the plain text this requirement seems quite un-friendly to different screen/font sizes, modern editors, semantics of the text and tools like Grammarly, no? 😕

Ready for another pass.

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
@campoy
Copy link
Contributor

campoy commented Aug 13, 2018

Sorry about the inconvenience, @bzz.
I agree it's a pain to limit the line length but otherwise reviews become basically impossible.

Any other improvements to the current flow are welcome.

draft: true
---

A recent paper with empirical research on the application of static code analysis tool caught my attention:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/tool/tools/ ?

and [The Morning Paper](https://blog.acolyer.org/) — I always wanted to experiment with publishing notes.
This will be the first attempt.

{{% center %}} … {{% /center %}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider using <hr> instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using of HTML tags in blog posts is strictly forbidden, suggestion on adding a shortcode for <hr> is already tracked under #230

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bzz for keeping yourself following the rules ;)
I'll try find some time next week to develop a {{% separator %}} shortcode, replacing old {{% center %}} … {{% /center %}} with the new proposed feature :D

{{% center %}} … {{% /center %}}

[“Lessons from Building Static Analysis Tools at Google](https://cacm.acm.org/magazines/2018/4/226371-lessons-from-building-static-analysis-tools-at-google/fulltext)”
by Caitlin Sadowski, Edward Aftandilian, [Alex Eagle](undefined), Liam Miller-Cushon, Ciera Jaspan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undefined?

According to the paper, Google only runs simpler, *intra-procedural* type of analysis — the only one feasible
to run at the scale of 2 billion lines of code.

Interesting enough, this is somehow different from the approach taken by Facebook’s project
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly

the paper:

* *Large investment.* Although theoretically better and more complex analysis exists, it will require
the non-trivial engineering effort to scale
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finish every sentence with a .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


And those are ClangMR and JavacFlume — projects that are only briefly mentioned in this insightful paper.

*That is it, thank you for reading. We will post more on papers in this field soon.*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the end of the blog post?

Copy link
Contributor Author

@bzz bzz Aug 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it is :)


{{% center %}} … {{% /center %}}

Now I will take a liberty and cover a few technical details that were not in the scope of the original paper
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be against splitting this into a second blog post?
I think the topic is by itself interesting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to #241


Project [Error-Prone](https://github.com/google/error-prone) is a compiler extension that is able to perform
arbitrary analysis on the *fully typed AST*. One thing to notice is that one can not get such input by using
only a parser even as advanced as [https://doc.bblf.sh](https://doc.bblf.sh/). Running a full build would be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the name of the project, not the link itself [babelfish](https://doc.bblf.sh)

Same for all other links

application of those fixes to the whole codebase, called JavacFlume — which I would guess looks something like
an Apache Spark job that applies patches in some generic format.

Here is an example of how a full pipeline looks for C++
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

: at the end

alex:
name: Alexander Bezzubov
thumbnail: https://avatars1.githubusercontent.com/u/5582506?s=460&v=4
bio: "Data engineer"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain a bit more in here 😄

Copy link
Contributor Author

@bzz bzz Aug 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the simplicity of the description

@campoy
Copy link
Contributor

campoy commented Aug 13, 2018

Looking good, @bzz!

@bzz
Copy link
Contributor Author

bzz commented Aug 17, 2018

Thank you very much @campoy, appreciate your feedback! All suggestions make sense to me, will start applying.

As soon as it's done will ping you and push it to staging for preview.

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
@bzz
Copy link
Contributor Author

bzz commented Aug 29, 2018

All feedback has been addressed, deployed on staging to https://blog-staging.srcd.run/post/review-building-static-analysis-tools/

@campoy ready for another pass

@bzz
Copy link
Contributor Author

bzz commented Sep 3, 2018

@vcoisne @eiso Since Francesc went on vacations, maybe you could review this instead please?

all previous comments were addressed

@bzz
Copy link
Contributor Author

bzz commented Sep 5, 2018

@campoy friendly ping

Copy link
Contributor

@vcoisne vcoisne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bzz Really good post ! I think in general we should try to summarize as much as possible and keep these posts on the shorter side for people who are interested but don't really have that much time to read. I would also add images and add some call to actions at the end i.e Sign up for our weekly newsletter. check out awesome mloncode list, etc

a static analysis tool for Java at [Google](http://twitter.com/Google), and lessons learned from the success
story of incorporating extensible analysis framework, [Tricorder](https://research.google.com/pubs/pub43322.html),
to development workflow at Google.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an image to illustrate one or both of these stories ?

Copy link
Contributor Author

@bzz bzz Sep 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, not really

the paper:

* *Large investment.* Although theoretically better and more complex analysis exists, it will require
the non-trivial engineering effort to scale
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


## History of integrating FindBugs at Google

### 0. IDE/Editor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should explicit that with an introduction sentence.

Signed-off-by: Alexander Bezzubov <bzz@apache.org>
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
Signed-off-by: Alexander Bezzubov <bzz@apache.org>
@bzz
Copy link
Contributor Author

bzz commented Sep 17, 2018

@campoy @vcoisne appreciate you feedback, it's been addressed, re-deployed on staging, ready for another round.

As it's been almost 2 months we are working on this, it would be nice to try to wrap it up soon.

@vcoisne
Copy link
Contributor

vcoisne commented Sep 19, 2018

LGTM I think it's ready to be published early next week.

@vcoisne
Copy link
Contributor

vcoisne commented Sep 20, 2018

@bzz scheduled for next Thursday as you can see in the "Blog & Press" google calendar I created

Copy link
Contributor

@dpordomingo dpordomingo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from a technical pov, it LGTM 👍

Copy link
Contributor

@campoy campoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@eiso
Copy link
Member

eiso commented Sep 27, 2018

Excited to see this published today.

@bzz
Copy link
Contributor Author

bzz commented Sep 27, 2018

Me to! :shipit:

When it's done will also submit a medium post to the source{d} publication.

@bzz bzz merged commit c23a0e2 into src-d:master Sep 27, 2018
@bzz bzz deleted the review-static-analysis branch September 27, 2018 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants