Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rules with "unqualified" license names are references, not notices #2759

Merged
merged 3 commits into from
Nov 26, 2021

Conversation

petergardfjall
Copy link
Contributor

@petergardfjall petergardfjall commented Nov 17, 2021

This PR changes a number of rules with unqualified license names (such as Apache 2.0 License) to be classified as (the weaker) reference rather than (the stronger) notice.

To be precise, this PR changes the is_license_notice property to an is_license_reference property for the aforementioned rules.

I'm sure there are more occurrences like these. I just picked a few that I found. Also, I'm not entirely sure if this is the correct classification (please let know if you disagree). I am also not entirely sure if this will have any real impact (besides a more strict adherence to the nomenclature as described here)

Tasks

  • Reviewed contribution guidelines
  • PR is descriptively titled 📑 and links the original issue above 🔗
  • Tests pass -- look for a green checkbox ✔️ a few minutes after opening your PR
    Run tests locally to check for errors.
  • Commits are in uniquely-named feature branch and has no merge conflicts 📁

@pombredanne
Copy link
Member

Whoa! This is an awesome find!
You think you could pull a quick list of the rule name and texts to paste in a comment here? That would help a lot with the review since the texts are not changed!

How did you find all theses quirks?

@@ -1,3 +1,3 @@
license_expression: agpl-3.0-plus
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

@petergardfjall petergardfjall Nov 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AGPL 3 or later

@@ -1,5 +1,5 @@
license_expression: apache-2.0
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

@petergardfjall petergardfjall Nov 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Apache License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0.txt

@@ -1,5 +1,5 @@
license_expression: apache-2.0
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Apache License version 2](http://www.apache.org/licenses/

@@ -1,3 +1,3 @@
license_expression: apache-2.0
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Apache 2.0 license.

@@ -1,3 +1,3 @@
license_expression: apache-2.0
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apache 2.0 license.

@@ -1,3 +1,3 @@
license_expression: apache-2.0
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Apache license.

@@ -1,5 +1,5 @@
license_expression: apache-2.0
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Apache License version 2](https://www.apache.org/licenses/

@@ -1,3 +1,3 @@
license_expression: bitstream
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Bitstream Vera License

@@ -1,3 +1,3 @@
license_expression: bitstream
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bitstream Vera License

@@ -1,3 +1,3 @@
license_expression: bitstream
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bitstream License

@@ -1,4 +1,4 @@
license_expression: bsd-new
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BSD3 License

@@ -1,3 +1,3 @@
license_expression: lgpl-2.0-plus
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"LGPL version 2 or later",

@@ -1,3 +1,3 @@
license_expression: lgpl-2.1-plus
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lesser General Public License

@@ -1,3 +1,3 @@
license_expression: lgpl-2.1
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lesser General Public License Version 2.1

@@ -1,3 +1,3 @@
license_expression: lgpl-3.0
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GNU lesser General Public License Version 3.0

@@ -1,3 +1,3 @@
license_expression: lgpl-3.0
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lesser General Public License Version 3.0

@@ -1,5 +1,5 @@
license_expression: mit
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MIT License (MIT) - http://www.opensource.org/licenses/mit-license.php

@@ -1,5 +1,5 @@
license_expression: mit
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MIT License (MIT) - https://www.opensource.org/licenses/mit-license.php

@@ -1,5 +1,5 @@
license_expression: mit
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MIT License (MIT): http://opensource.org/licenses/MIT

@@ -1,5 +1,5 @@
license_expression: odbl-1.0
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open Database License http://www.opendatacommons.org/licenses/odbl/

@@ -1,3 +1,3 @@
license_expression: public-domain OR apache-2.0
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Public Domain or Apache 2.0 license

@@ -1,3 +1,3 @@
license_expression: redis-source-available-1.0
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redis Source Available License Agreement

@@ -1,5 +1,5 @@
license_expression: unlicense
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Public domain - http://unlicense.org

@petergardfjall
Copy link
Contributor Author

petergardfjall commented Nov 17, 2021

Whoa! This is an awesome find! You think you could pull a quick list of the rule name and texts to paste in a comment here? That would help a lot with the review since the texts are not changed!

I should've done that from the start, but I've commented on each change what the matching rule contains.

How did you find all theses quirks?

Basically I first discovered it by coincidence while looking at a result file, and decided to make a small script to look for more such "wrongly classified" rules.

I used this (admittedly hacky 🙈) script. One still has to go through the output since it still contains quite a few rules correctly labelled as notices.

#!/bin/bash


rules_dir="src/licensedcode/data/rules"
rule_files=$(ls -1 ${rules_dir}/*.RULE)
for rule_file in ${rule_files}; do
    rule_name="${rule_file%.RULE}"
    rule_yaml="${rule_name}.yml"

    if egrep -E 'is_license_notice: yes' ${rule_yaml} > /dev/null; then
        lines=$(wc -l ${rule_file} | cut -d' ' -f 1)
        # only consider single-line rules
        if [[ ${lines} -eq 0 ]]; then

            #
            # filter out rules that have "qualifying text" that highly indicate
            # an actual license notice.
            #
            qualfiying_words=("provided" "under" "unter" "terms" "licensed" "license is" "available under" "released" "distributed" "this" "these" "applicable" "distribution" "license:" "licensing:" "available" "covered by" "subject to" "uses" "project" "project uses" "use the" "applicable" "includes" "inherits" "governed by" "applies" "@license" "is copyrighted by")

            for qualifying_word in "${qualfiying_words[@]}"; do
                if cat ${rule_file} | grep -i "${qualifying_word}" > /dev/null; then
                    continue 2 # continue the outer loop
                fi
            done

            echo "*** ${rule_name}: ${rule_file} ${rule_yaml} ***"
            echo "single-line notice:";
            cat ${rule_file}
            echo
            echo
        fi
    fi
done

Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great... I have a few nitpickings for your consideration:

  1. Can you shorten the first line of your commit message, so it is under 50 chars? May be like "Requalify some notice license rules as references"

  2. Do you mind to squash the commits? This should also take care of the DCO bot complaints as 4152ea7 is missing a signoff

  3. there are two failing tests that used to have some scan details with license detection results using some of the rules that you are re-qualifying. You could regenerate the expected results with this:

$ sed -i -e "s|regen=False|regen=True|g" tests/summarycode/test_score.py tests/packagedcode/test_debian_copyright.py

$ pytest -vvs tests/summarycode/test_score.py::TestLicenseScore::test_license_clarity_score_spdx_licenses tests/packagedcode/test_debian_copyright.py::TestDebianCopyrightLicenseDetection::test_debian_parse_copyright_file_detailed_main_g_glib2_0_stable_copyright

$ sed -i -e "s|regen=True|regen=False|g" tests/summarycode/test_score.py tests/packagedcode/test_debian_copyright.py

@@ -1,5 +1,5 @@
license_expression: apache-2.0
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Apache License, Version 2.0 https://www.apache.org/licenses/LICENSE-2.0.txt

@@ -1,5 +1,5 @@
license_expression: bsd-2-clause-views
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BSD 3-Clause License : http://www.freebsd.org/copyright/freebsd-license.html

@@ -1,4 +1,4 @@
license_expression: secret-labs-2011
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PIL Software License

@@ -1,3 +1,3 @@
license_expression: ecosrh-1.1
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Red Hat eCos Public License v1.1

@petergardfjall petergardfjall force-pushed the change-notices-to-references branch 2 times, most recently from cb51363 to 6ea7036 Compare November 18, 2021 13:00
@@ -1,3 +1,3 @@
license_expression: gpl-2.0-plus
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GNU General Public License v2.0 or later

@@ -1,8 +1,9 @@
license_expression: gpl-2.0
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ftp://prep.ai.mit.edu/pub/gnu/GPL

@@ -1,3 +1,3 @@
license_expression: gpl-2.0
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GPL v2 License

@@ -1,3 +1,3 @@
license_expression: gpl-1.0-plus
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General Public License

@@ -1,3 +1,3 @@
license_expression: gpl-3.0-plus
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GNU GPL v3 or above

@@ -1,3 +1,3 @@
license_expression: gpl-3.0-plus
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GNU General Public License version 3, or any later version

@@ -1,3 +1,3 @@
license_expression: lgpl-2.0
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GNU Library General Public License Version 2.

@@ -1,4 +1,4 @@
license_expression: lgpl-2.1-plus
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GNU Lesser General Public License (LGPLv2.1+)

@@ -1,3 +1,3 @@
license_expression: lgpl-3.0-plus
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"LGPL version 3 or later",

@@ -1,5 +1,5 @@
license_expression: proprietary-license
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http://greensock.com/standard-license

@@ -1,5 +1,5 @@
license_expression: proprietary-license
is_license_notice: yes
is_license_reference: yes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http://www.greensock.com

@petergardfjall
Copy link
Contributor Author

I added some more corrections and tried to address your comments.

  1. Can you shorten the first line of your commit message, so it is under 50 chars? May be like "Requalify some notice license rules as references"

✔️

  1. Do you mind to squash the commits? This should also take care of the DCO bot complaints as 4152ea7 is missing a signoff

✔️

  1. there are two failing tests that used to have some scan details with license detection results using some of the rules that you are re-qualifying. You could regenerate the expected results with this:

A few local test case failures. Let's see if they show up here as well. Perhaps you can direct me @pombredanne ?

This changes a number of rules with unqualified license names (such as "Apache 2.0
License") to be classified as (the weaker) "reference" rather than (the
stronger) notice.

In practice, this means that the `is_license_notice` property is changed to
`is_license_reference` for a number of rules.

Signed-off-by: Peter Gardfjäll <peter.gardfjall.work@gmail.com>
Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great!
Almost merge ready: for the failing tests you would need to run the tests once with the regen=True flags locally, then restore the regen=False and commit only the updated JSON expected files.

@@ -26,7 +26,7 @@
def check_expected_parse_copyright_file(
test_loc,
expected_loc,
regen=False,
regen=True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to revert this to "regen=False", the regen True was only for local regeneration ... otherwise the tests are testing nothing. And you would need to commit the updated expected JSON test results too

@@ -31,7 +31,7 @@
test_env.test_data_dir = os.path.join(os.path.dirname(__file__), 'data')


def make_test_function(test_name, test_dir, expected_file, regen=False):
def make_test_function(test_name, test_dir, expected_file, regen=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: You need to revert this to "regen=False", the regen True is only for local regeneration and cannot be committed... otherwise the tests are testing nothing.
And you would need to commit the updated expected JSON test results too

@pombredanne
Copy link
Member

pombredanne commented Nov 19, 2021

So just to make sure you did not miss this, this would be what to do and it is important to do all these steps before a commit:

$ sed -i -e "s|regen=False|regen=True|g" tests/summarycode/test_score.py tests/packagedcode/test_debian_copyright.py

$ pytest -vvs tests/summarycode/test_score.py::TestLicenseScore::test_license_clarity_score_spdx_licenses tests/packagedcode/test_debian_copyright.py::TestDebianCopyrightLicenseDetection::test_debian_parse_copyright_file_detailed_main_g_glib2_0_stable_copyright

$ sed -i -e "s|regen=True|regen=False|g" tests/summarycode/test_score.py tests/packagedcode/test_debian_copyright.py

The commit would contains the updated, regenerated JSON or YAML fixtures for each of the two previously failing tests:

  • for test_license_clarity_score_spdx_licenses these files are:

    • tests: tests/summarycode/data/score/spdx_licenses
    • expected: tests/summarycode/data/score/spdx_licenses-expected.json <--- this will be regenerated with the procedure above and needs to be reviewed and committed
  • for test_debian_parse_copyright_file_detailed_main_g_glib2_0_stable_copyright these files are:

    • tests: tests/packagedcode/data/debian/copyright/debian-2019-11-15/main/g/glib2.0/stable_copyright
    • expected: tests/packagedcode/data/debian/copyright/debian-2019-11-15/main/g/glib2.0/stable_copyright-detailed.expected.yml <--- this will be regenerated with the procedure above and needs to be reviewed and committed

@petergardfjall
Copy link
Contributor Author

So just to make sure you did not miss this, this would be what to do and it is important to do all these steps before a commit:

$ sed -i -e "s|regen=False|regen=True|g" tests/summarycode/test_score.py tests/packagedcode/test_debian_copyright.py

$ pytest -vvs tests/summarycode/test_score.py::TestLicenseScore::test_license_clarity_score_spdx_licenses tests/packagedcode/test_debian_copyright.py::TestDebianCopyrightLicenseDetection::test_debian_parse_copyright_file_detailed_main_g_glib2_0_stable_copyright

$ sed -i -e "s|regen=True|regen=False|g" tests/summarycode/test_score.py tests/packagedcode/test_debian_copyright.py

The commit would contains the updated, regenerated JSON or YAML fixtures for each of the two previously failing tests:

  • for test_license_clarity_score_spdx_licenses these files are:

    • tests: tests/summarycode/data/score/spdx_licenses
    • expected: tests/summarycode/data/score/spdx_licenses-expected.json <--- this will be regenerated with the procedure above and needs to be reviewed and committed
  • for test_debian_parse_copyright_file_detailed_main_g_glib2_0_stable_copyright these files are:

    • tests: tests/packagedcode/data/debian/copyright/debian-2019-11-15/main/g/glib2.0/stable_copyright
    • expected: tests/packagedcode/data/debian/copyright/debian-2019-11-15/main/g/glib2.0/stable_copyright-detailed.expected.yml <--- this will be regenerated with the procedure above and needs to be reviewed and committed

I've tried to correct this but I have to admit that I don't fully understand the test structure nor what you are suggesting above. Some tests are still failing.

@pombredanne
Copy link
Member

pombredanne commented Nov 23, 2021

@petergardfjall fair enough... I reckon this is not clear!
Let me push a commit to your branch and list exactly the commands I ran here

@pombredanne
Copy link
Member

@petergardfjall in 72a6d9c

I ran this sequence after pulling your branch:

sed -i -e "s|regen=False|regen=True|g" tests/summarycode/test_score.py
pytest -vvs --test-suite=validate tests/summarycode/test_score.py::TestLicenseScore::test_license_clarity_score_spdx_licenses
sed -i -e "s|regen=True|regen=False|g" tests/summarycode/test_score.py

Then I doubled checked there were no leftover regens: grep -r tests/ -e "regen=True"

git status
On branch change-notices-to-references
Your branch is up to date with 'petergardfjall/change-notices-to-references'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   tests/summarycode/data/score/spdx_licenses-expected.json

then committed the updated tests/summarycode/data/score/spdx_licenses-expected.json

I guess what tripped you is the --test-suite test markers which are not super well documented (that's an understatement)

Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost ready to merge!
If could could just amend the commit message for 162b2ae with your DCO signoff we are done.
Thanks!

Signed-off-by: Peter Gardfjäll <peter.gardfjall.work@gmail.com>
Signed-off-by: Peter Gardfjäll <peter.gardfjall.work@gmail.com>
@petergardfjall
Copy link
Contributor Author

Almost ready to merge! If could could just amend the commit message for 162b2ae with your DCO signoff we are done. Thanks!

Thank you so much for the help! I've made the changes you suggested, and updated the DCO. The tests pass. 🎉

Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@pombredanne pombredanne merged commit 1fdf40f into nexB:develop Nov 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants