Serverless github webhook for project history analysis
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
src
.gitignore
README.md
build.gradle
example_review.png
serverless.yml

README.md

This is a demo serverless application for handling github webhooks.

About

In this application we're using java JGit library to detect hot-spots in changed files.

Hot-spots are the files which are the most frequently edited in the project history.

Modifying such files may introduce a potential bug. It also signals that the file may violate good design practices like single responsibility principle, especially if it's a large file.

Analysing a history of changes in a project may lead to many more interesting discoveries than just static source code analysis. The subject is called "code forensics" and more about it can be found in the great book Your Code as a Crime Scene by Adam Tornhill.

How it works

The service is using Serverless Framework.

Github webhook is configured to invoke HTTP request to AWS API Gateway endpoint.

API Gateway invokes a lambda function (implementation in com.serverless.ApiGatewayHandler).

All the analysis could be done in single Lambda function, but because this may take much longer than maximum timeout for API Gateway of 30 sec., an another Lambda function (implementation in com.serverless.Job) is invoked asynchronously and the first function responds immediately to API Gateway.

The second function is limited to max. Lambda execution time which is currently 15 min.

It does the analysis and posts results in pull request comments.

Why JGit

What is important when working with git on Lambda is that there is no git executable installed.

So either we have to install it programmatically or we can use any git Java API which does not require git installation. One of such is jgit library.

You can check my other post where I explained how we can use it to analyse git history. I reused the ideas from that post here to fetch the list of hot files.

(Side note: If you use node.js, there is a module which you can use to install git executable: https://github.com/pimterry/lambda-git)

Feedback to git users

There are a few alternatives of giving feedback after the git history analysis:

  1. Reviews API

The API is used to publish comments on each PR update (creation or subsequent commits to the PR), e.g.:

curl -X POST -H 'Authorization: token PERSONAL_API_TOKEN' \
  -d '{"event" : "COMMENT", "body" : "Be careful with\n ```notes.txt``` file"}' \
  https://api.github.com/repos/piczmar/git-code-stats/pulls/3/reviews

where PERSONAL_API_TOKEN is a token generated for Github user, see more about Personal API Tokens.

The advantage of this solution is that individual users can integrate their repository on their own without need from organization admin to install a Github App.

The drawback is that there is no indicator that the analysis is running on PR page in Github, in contrary when using the Checks API mentioned below.

An example review comment looks as below:

Example review comment

  1. Checks API

An alternative would be to register a new Github App and use Checks API, which is available only to Github Apps currently.

We could not use it with Personal API Tokens authorization method.

In this service the first approach was taken - the Reviews API.