What is Scancode-Results-Analyzer
Scancode Toolkit ScanCode detects licenses, copyrights, package manifests and direct dependencies and more both in source code and binary files.
ScanCode license detection is using multiple techniques to accurately detect licenses based on automatons, inverted indexes and multiple sequence alignments. The detection is not always accurate enough. The goal of this project is to improve the accuracy of license detection leveraging the ClearlyDefined data set, where ScanCode is used to massively scan millions of packages.
This project aims to:
- write tools and create models to massively analyze the accuracy of license detection
- detect areas where the accuracy could be improved.
- Write reusable tools and models to assist in the semi-automated reviews of scan results.
- It will also create new license detection rules semi-automatically to fix the detected anomalies
Quickstart - Local Machine
Download and Get Anaconda Installed.
Navigate to the
Create the Conda Environment
conda env create -f env_files/load_into_dataframes/environment.yml
Activate the Conda Environment
conda activate results-analyzer-load
Open Jupyter Lab in this conda environment
Navigate to the
.ipynbfile you want to open on the left, and click to open.
Run the Cells using
Quickstart - Google Colab
Clicking that Opens the Jupyter Notebook in Google Colab. Then Run the First Two Group of Cells that do the following tasks.
scancode-results-analyzerGitHub Repository so that the Classes/Data can be loaded into the Jupyter Notebook Environment.
Installing conda and some additional requirements from the
Everything is set up and the Code is Ready To Execute.
GSoC Project Details
- Name: Improve ScanCode License detection accuracy, by leveraging the ClearlyDefined dataset of Scans
- Year: 2020
- Link to Project in GSoC website
- Link to Proposal
- Link to Project Kanban Board for GSoC Phase #1
- Mentors and Reviewers:
- Author: @AyanSinhaMahapatra