Understanding Accuracy #8
spalt08
announced in
Useful Information
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Since the package matching algorithm is based on a probabilistic approach, it is very important to understand how we are measuring the accuracy.
Testing Methodology
A single test suite for GradeJS consists of several JS files and a module mapping metadata. Module mapping is a relation between a webpack-bundled module ID and an NPM package file (usually file name, package name and version). By running the package detection pipeline on the test JS files independently and comparing the output with original module metadata we can mark each final decision as a successful match, semi-match (package version mismatch), false positive or false negative.
For instance, a false positive is usually considered as a wrong positive algorithm output about an NPM package which is not presented in the bundle. A false negative, retrospectively, is a nullish output related to the package presented in the bundle.
Currently we have two independent test subsets:
Accuracy
The accuracy is defined by the percent of matches and semi-matches of NPM packages and varies considerably between 60-85% and depends on the multiple factors, such as number of packages, mode, webpack version or terser version. We also see the 5-10% false positive rate.
The real average accuracy we see is ~70% with a 5% false positive rate.
Beta Was this translation helpful? Give feedback.
All reactions