Skip to content

License Scanner fail on Python packages without the License metadata key. #1729

@pcastellazzi

Description

@pcastellazzi

I run osv-scanner scan --licenses=MIT . on a simple project to test it out.
As you can see in the report below well known libraries are reported as "UNKNOWN"

╭──────────────────────────────────────────────────────────────┬───────────┬───────────────────────────┬─────────────────┬─────────╮
│ LICENSE VIOLATION                                            │ ECOSYSTEM │ PACKAGE                   │ VERSION         │ SOURCE  │
├──────────────────────────────────────────────────────────────┼───────────┼───────────────────────────┼─────────────────┼─────────┤
│ UNKNOWN                                                      │ PyPI      │ attrs                     │ 25.3.0          │ uv.lock │
│ non-standard                                                 │ PyPI      │ binaryornot               │ 0.4.4           │ uv.lock │
│ BSD-2-Clause                                                 │ PyPI      │ boolean-py                │ 4.0             │ uv.lock │
│ non-standard                                                 │ PyPI      │ chardet                   │ 5.2.0           │ uv.lock │
│ UNKNOWN                                                      │ PyPI      │ click                     │ 8.1.8           │ uv.lock │
│ UNKNOWN                                                      │ PyPI      │ colorama                  │ 0.4.6           │ uv.lock │
│ Apache-2.0                                                   │ PyPI      │ coverage                  │ 7.7.0           │ uv.lock │
│ UNKNOWN                                                      │ PyPI      │ foss-flame                │ 0.21.1          │ uv.lock │
│ UNKNOWN                                                      │ PyPI      │ iniconfig                 │ 2.1.0           │ uv.lock │
│ UNKNOWN                                                      │ PyPI      │ jinja2                    │ 3.1.6           │ uv.lock │
│ UNKNOWN                                                      │ PyPI      │ jsonschema-specifications │ 2024.10.1       │ uv.lock │
│ Apache-2.0                                                   │ PyPI      │ license-expression        │ 30.4.1          │ uv.lock │
│ non-standard                                                 │ PyPI      │ markupsafe                │ 3.0.2           │ uv.lock │
│ UNKNOWN                                                      │ PyPI      │ osadl-matrix              │ 2024.5.22.10535 │ uv.lock │
│ UNKNOWN                                                      │ PyPI      │ packaging                 │ 24.2            │ uv.lock │
│ BSD-3-Clause                                                 │ PyPI      │ psutil                    │ 7.0.0           │ uv.lock │
│ GPL-2.0-or-later                                             │ PyPI      │ python-debian             │ 1.0.1           │ uv.lock │
│ UNKNOWN                                                      │ PyPI      │ referencing               │ 0.36.2          │ uv.lock │
│ Apache-2.0 AND CC-BY-SA-4.0 AND CC0-1.0 AND GPL-3.0-or-later │ PyPI      │ reuse                     │ 5.0.2           │ uv.lock │
│ UNKNOWN                                                      │ PyPI      │ tomli                     │ 2.2.1           │ uv.lock │
│ UNKNOWN                                                      │ PyPI      │ typing-extensions         │ 4.12.2          │ uv.lock │
│ UNKNOWN                                                      │ PyPI      │ utools                    │ 0.1.0           │ uv.lock │
╰──────────────────────────────────────────────────────────────┴───────────┴───────────────────────────┴─────────────────┴─────────╯

I did a little digging on why this may be happening and i think it is related on how osv-scanner reads the licenses.
My understanding is that osv-scanner read the information of a package from PyPI, that explains why utools my package is reported as unknown, since it is not published.

Checking the output of https://pypi.org/pypi/attrs/json i found out attrs is not using the field info.license, but info.license_expression instead. According to https://packaging.python.org/en/latest/specifications/core-metadata/#license it should take priority when present.

The package click provides its license as a classifier, which is the oldest method. According (again to PyPA), when the license used on the project is already registered as a valid classifier that must be used and the field info.license should be used for variations when needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    autoclosedClosed by automationstaleThe issue or PR is stale and pending automated closure

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions