Skip to content

License detection

lisa-noeth edited this page Jan 19, 2022 · 3 revisions

Introduction

This page summarizes the first version of the implementation and integration of the QMSTR build graphs into FASTEN's toolchain.

What is Quartermaster

Developed by project partner Endocode AG, Quartermaster (QMSTR) is an Open Source license compliance solution that aims to establish industry standards regarding the documentation of Open Source license information across the supply chain. The command-line tool integrates into the build system to learn about the software product, its sources, and dependencies and then performs an analysis of the gathered information. Its goal is to reduce risk and friction in the reuse of Open Source code. With its bidirectional connection, QMSTR has the role to detect license compliance, collecting information about the dependencies from the FASTEN call graphs, and then reporting it back to the FASTEN Knowledge Base with the license and compliance information.

The first step for this process is the generation of the concrete build graph that consists of information about all the generated artifacts that will be distributed together with the necessary source code and dependency information.

While the build graph isn’t trivial, the construction and analysis of it are vital for complex projects, enhancing the accuracy of the license and compliance analysis since the only important files for it are the ones that are being shipped within the package.

QMSTR before the integration with FASTEN

QMSTR was born as a command-line tool to be launched locally.
This, however, did not align with the ultimate purpose of integrating FASTEN in CI/CD pipelines (§§ 4.1 4.3.2, D6.3).

QMSTR/FASTEN integration: first version (build graph)

QMSTR has been integrated into FASTEN's toolchain as a dependency and can be launched through its dedicated plugin.
For the first version of D4.1, we concentrate on showcasing the build graph of any Maven project.

To make this integration happen, QMSTR moved to the cloud: the build graph is being built in the cloud while building the Maven project. All these tasks are being performed by different containers. A fully-distributed multi-pod architecture is currently under development.

The FASTEN QMSTR plugin is triggered by the FASTEN server; however, it can also be launched as a standalone plugin for debugging purposes (step-by-step guide, video).

QMSTR/FASTEN integration: second version (analysis)

The License and Compliance plugin now also analyzes Maven projects using scancode.
As a result of this phase, our graph database is augmented with license and compliance information.

Generated Build Graph example

The left part of the graph consists of the usual build graph, having, in this case, a single (Java) package node in green as the central node. License and compliance information is on the right, having the analyzer node in pink right in the middle.

This was part of deliverable 4.2 "Detection of license obligations and metadata and application to the call graphs".

QMSTR/FASTEN integration: third version (report)

As a result of the analysis phase, the License and Compliance plugin produces a Kafka message having this format.
More specifically, a custom QMSTR reporter interrogates the internal graph database to fetch license information, formats the result into a message having the previous format, and sends it back to Kafka.

This was part of the deliverable 4.3 "D4.3 Implementation of a license compliance and compatibility solver operating on the call graphs - Version 1".

Technical details

QMSTR is a modular application composed of three main phases: build, analysis, and report.

To achieve this task, we developed a custom reporter (phase 3 module) to fetch data of interest and format it accordingly to this format. More specifically, this version returns licenses and SHA-1 hashes of *.java and *.class files.

So far, QMSTR has been using classic gRPC calls to orchestrate its different modules, but to achieve better maintainability, robustness, and (horizontal) scalability, it is progressively moving towards a message broker-based solution. The custom reporter waits for a RabbitMQ message before starting its execution: this will make sure that reporting starts once the build and analysis phase are over. From Quartermaster to the license detector & feeder plugins

The role of Quartermaster in FASTEN

FASTEN’s original plan was to detect licenses with Quartermaster, the open-source license compliance tool developed by Endocode AG. As a first step, Quartermaster builds the software project in order to extrapolate build information and store it in a so-called “build graph”. It then scans the entire project looking for license text in all files and augments the build graph accordingly. As the last step, Quartermaster queries the build graph so that only those licenses that actually end up in the final package can be considered for a compliance check.

Abandoning Quartermaster

Apart from the technical difficulty of readapting a distributed batch job like Quartermaster into a self-contained plugin to be run in a streaming application (problem arisen

by an intrinsic architecture, incompatibility), letting Quartermaster build the software project violates the separation of concerns principle. That is to say, FASTEN’s OPAL plugin already generates call graphs. A license detector plugin should only detect licenses and augment the call graph with such findings. OPAL should be the only plugin responsible for the creation of call graphs, not the one intended to detect licenses.

The new license detector & feeder plugins

Accordingly, a new, streamlined plugin simply called “license detector plugin” only takes care of running a license scanner, and report back these findings to Kafka; no need to build projects anymore. The “license feeder” plugin will subsequently consume that message and augment the call graph.

Technical details

Pull Request #301 contains the source code of the two previously-mentioned plugins.

Its description lists the steps that have been necessary for the two plugins to accomplish a successful license detection, as well as their progress.

Overview

First, the license detector plugin consumes a Kafka record belonging to a joint topic that combines:

• fasten.RepoCloner.out, meaning that the repository has been cloned, and

• fasten.MetadataDBJavaExtension.out, issued as soon as the call graph has been stored into the database. The detector then proceeds to scan the entire project, looking for license text inside files. Those findings are properly formatted into a new Kafka record,

fasten.LicenseDetector.out. Licenses are detected both at the file and at the pack- age level. For the latter category, the detector scans the main pom.xml file. In case the developer hasn’t specified any license in the pom.xml file and the repository is hosted on GitHub, the detector contacts their API to retrieve the so-called “outbound license”. The license feeder will then consume the fasten.LicenseDetector.out record and insert license findings into the call graph, only for those files that are present in the database.

Next steps