-
-
Notifications
You must be signed in to change notification settings - Fork 533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: categorize codebase files to prioritize license-related risk analysis #2945
Comments
johnmhoran
added a commit
that referenced
this issue
May 9, 2022
Signed-off-by: John M. Horan <johnmhoran@gmail.com>
johnmhoran
added a commit
that referenced
this issue
May 9, 2022
Signed-off-by: John M. Horan <johnmhoran@gmail.com>
johnmhoran
added a commit
that referenced
this issue
May 13, 2022
Signed-off-by: John M. Horan <johnmhoran@gmail.com>
johnmhoran
added a commit
that referenced
this issue
May 17, 2022
Signed-off-by: John M. Horan <johnmhoran@gmail.com>
johnmhoran
added a commit
that referenced
this issue
May 17, 2022
Signed-off-by: John M. Horan <johnmhoran@gmail.com>
johnmhoran
added a commit
that referenced
this issue
May 17, 2022
Signed-off-by: John M. Horan <johnmhoran@gmail.com>
johnmhoran
added a commit
that referenced
this issue
May 19, 2022
Signed-off-by: John M. Horan <johnmhoran@gmail.com>
johnmhoran
added a commit
that referenced
this issue
May 19, 2022
Signed-off-by: John M. Horan <johnmhoran@gmail.com>
johnmhoran
added a commit
that referenced
this issue
May 19, 2022
Reference: #2945 Signed-off-by: John M. Horan <johnmhoran@gmail.com>
johnmhoran
added a commit
that referenced
this issue
May 20, 2022
Reference: #2945 Signed-off-by: John M. Horan <johnmhoran@gmail.com>
johnmhoran
added a commit
that referenced
this issue
May 20, 2022
Reference: #2945 Signed-off-by: John M. Horan <johnmhoran@gmail.com>
johnmhoran
added a commit
that referenced
this issue
May 23, 2022
Reference: #2945 Signed-off-by: John M. Horan <johnmhoran@gmail.com>
johnmhoran
added a commit
that referenced
this issue
May 23, 2022
Reference: #2945 Signed-off-by: John M. Horan <johnmhoran@gmail.com>
johnmhoran
added a commit
that referenced
this issue
Jun 12, 2023
Signed-off-by: John M. Horan <johnmhoran@gmail.com>
johnmhoran
added a commit
that referenced
this issue
Jun 12, 2023
Signed-off-by: John M. Horan <johnmhoran@gmail.com>
johnmhoran
added a commit
that referenced
this issue
Jun 12, 2023
Signed-off-by: John M. Horan <johnmhoran@gmail.com>
johnmhoran
added a commit
that referenced
this issue
Jun 12, 2023
Signed-off-by: John M. Horan <johnmhoran@gmail.com>
johnmhoran
added a commit
that referenced
this issue
Jun 12, 2023
Signed-off-by: John M. Horan <johnmhoran@gmail.com>
johnmhoran
added a commit
that referenced
this issue
Jun 12, 2023
Signed-off-by: John M. Horan <johnmhoran@gmail.com>
johnmhoran
added a commit
that referenced
this issue
Jun 12, 2023
Signed-off-by: John M. Horan <johnmhoran@gmail.com>
johnmhoran
added a commit
that referenced
this issue
Jun 12, 2023
Signed-off-by: John M. Horan <johnmhoran@gmail.com>
johnmhoran
added a commit
that referenced
this issue
Jun 12, 2023
Reference: #2945 Signed-off-by: John M. Horan <johnmhoran@gmail.com>
johnmhoran
added a commit
that referenced
this issue
Jun 12, 2023
Reference: #2945 Signed-off-by: John M. Horan <johnmhoran@gmail.com>
johnmhoran
added a commit
that referenced
this issue
Jun 12, 2023
Reference: #2945 Signed-off-by: John M. Horan <johnmhoran@gmail.com>
johnmhoran
added a commit
that referenced
this issue
Jun 12, 2023
Reference: #2945 Signed-off-by: John M. Horan <johnmhoran@gmail.com>
johnmhoran
added a commit
that referenced
this issue
Jun 12, 2023
* Add in postscan plugin entry points Signed-off-by: John M. Horan <johnmhoran@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
A codebase can have thousands or more files of a vast range of types, from those that are clearly code (
.py
,.cpp
,.so
) to those that clearly are not code (.txt
,.xlsx
), and many that are somewhere in between (.jsp
,.html
,.erb
). Our goal: automate the identification of which files are more or less important in the risk analysis process so that the more important files can be analyzed first, and those that need not be analyzed can be identified and omitted from analysis.To accomplish this goal, we intend to add a feature that will (1) apply a set of rules -- definitions, really -- to a subset of each file's attributes and (2) add three fields to the ScanCode Toolkit scan output metadata that identify the file by category and sub-category and rank each file for license-risk-related importance, i.e., analysis priority.
Relevant file attributes
Define each of the rules in terms of the contents of one or more of these attributes:
extension
name
mime_type
file_type
programming_language
Categorization fields
analysis_priority
==> a scale from 1-3, with 1 being most relevant for license-risk-analysis purposesfile_category
==> e.g.,archive
,binary
,source
,manifest
,doc
,media
,script
file_subcategory
==> e.g.,c++
,python
,make
,json
,license
,audio
,data
Output format
We anticipate that this file categorization data will be available as three new fields in all of ScanCode Toolkit's output formats, e.g.,
.json
,.xlsx
et al.The text was updated successfully, but these errors were encountered: