Understanding Accuracy #8

spalt08 · 2022-04-13T07:44:09Z

spalt08
Apr 13, 2022
Maintainer

Since the package matching algorithm is based on a probabilistic approach, it is very important to understand how we are measuring the accuracy.

Testing Methodology

A single test suite for GradeJS consists of several JS files and a module mapping metadata. Module mapping is a relation between a webpack-bundled module ID and an NPM package file (usually file name, package name and version). By running the package detection pipeline on the test JS files independently and comparing the output with original module metadata we can mark each final decision as a successful match, semi-match (package version mismatch), false positive or false negative.

For instance, a false positive is usually considered as a wrong positive algorithm output about an NPM package which is not presented in the bundle. A false negative, retrospectively, is a nullish output related to the package presented in the bundle.

Currently we have two independent test subsets:

Synthetic bundles. Pre-defined lists of packages which were bundled manually by using different boilerplates or environments. The original module metadata were extracted from a webpack json stats file.
Real-world bundles. Crawled JS files from websites which have public source maps and are built with webpack. An NPM package source code restored from the source maps is used for identifying original package name (from script path) and version (by comparing sha1 hashes with the index). This information is used for building the module mapping metadata.

Accuracy

The accuracy is defined by the percent of matches and semi-matches of NPM packages and varies considerably between 60-85% and depends on the multiple factors, such as number of packages, mode, webpack version or terser version. We also see the 5-10% false positive rate.

The real average accuracy we see is ~70% with a 5% false positive rate.

Subset Name	Match	Version Mismatch	False Positive	False Negative
Synthetic Bundles	67.1%	6.2%	3.6%	26.6%
Real-World	70.4%		10.4%	29.6%

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding Accuracy #8

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Understanding Accuracy #8

spalt08 Apr 13, 2022 Maintainer

Testing Methodology

Accuracy

Replies: 0 comments

spalt08
Apr 13, 2022
Maintainer