Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add license category-change detection and scoring #86

Merged
merged 3 commits into from
Mar 16, 2018

Conversation

johnmhoran
Copy link
Member

No description provided.

  * Modify DeltaCode.license_diff() to detect and identify the addition
    of certain categories of licenses.
  * Refactor scoring structure and methods to discard use of score for
    delta identification.
  * Modify sorting of delta results: by score descending, then
    alphabetically ascending.
  * Modify utils.deltas() to adjust to refactored scoring structure.
  * Add File.licenses_is_empty() and File.copyrights_is_empty() methods.
  * Add Delta.is_modified() and Delta.is_unmodified() methods.
  * Fix failing tests, add tests.

Signed-off-by: John M. Horan <johnmhoran@gmail.com>
  * Clarify Delta sorting.
  * Simplify set creation.
  * Simplify implementation of category-change detection.
  * Rename methods.
  * Fix failing tests, add new test codebases and tests.

Signed-off-by: John M. Horan <johnmhoran@gmail.com>
delta.update(20, 'license info added')
# no license ==> 'Copyleft Limited'or higher
for item in sorted(unique_categories):
if item in new_categories:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic should be reversed for line 221-222:

for category in new_categories:
    if category in unique_categories:

return

if delta.new_file.licenses == [] and len(delta.old_file.licenses) > 0:
if not delta.new_file.has_licenses() and delta.old_file.has_licenses():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should put this block (226-228) near the top, as it is the least logically complex code block.

@@ -211,6 +232,13 @@ def license_diff(self):

if new_keys != old_keys:
delta.update(10, 'license change')
for item in sorted(new_categories - old_categories):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we need to sort here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaJuRG I sorted here (and earlier, line 221, mentioned in your comment above) to display the license category-change factors alphabetically -- i.e., in a consistent order -- thinking that would make it easier for a user to compare and analyze factors.

Removing the sort in both locations works fine -- I just need to modify a few failing tests in which I assert [list A] == [list B], which I can replace with assert set([list A]) == set([list B]).

Copy link
Contributor

@steven-esser steven-esser Mar 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets remove it from both; if anything this will incur a performance cost.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using set() works here since there are no duplicates, but perhaps it's better to use sorted() instead. Tested, works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does for item in new_categories - old_categories: not work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it works -- I was referring to the modifications needed to fix the failing related tests -- there, we can use either set() or sorted() to compare the factors list comprehension with the expected Delta.factors list and not worry about the order of the lists' elements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, in the test cases it is up to you; sorting test results is always fine.


unique_commercial_categories = set([
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not have a separate distinction/process for commercial licenses.

meaning forget about my mentioning "anything ==> Commercial" in tickets or otherwise.

if len(old_categories & unique_categories) == 0 and item in unique_categories:
delta.update(20, item.lower() + ' added')
# anything ==> 'Proprietary Free' or 'Commercial'
elif item in unique_commercial_categories:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not have a separate distinction/process for commercial licenses.

meaning forget about my mentioning "anything ==> Commercial" in tickets or otherwise.

  * Stop sorting in category-change process, and modify related tests to
    compare lists that may have dissimilar ordering.
  * Remove rule: anything ==> 'Proprietary Free' or 'Commercial'.

Signed-off-by: John M. Horan <johnmhoran@gmail.com>
@steven-esser steven-esser merged commit b375834 into develop Mar 16, 2018
@steven-esser steven-esser deleted the 71-account-for-license-category branch March 23, 2018 04:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants