Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid license key change when dependency license text hasn't changed #21

Merged
merged 7 commits into from Apr 5, 2018

Conversation

jonabc
Copy link
Contributor

@jonabc jonabc commented Apr 3, 2018

Resolves #2.

This PR affects the cache command, attempting to reuse the license metadata value when possible to cut down on noise.

licensed cache now compares the license available in a dependency's cached file to the content found from dependency.detect_license!. When no changes are found, licensed cache will reuse the cached license value. License#license_text_match? uses Licensee::ContentHelper to compare normalized license content instead of the raw data.

A side effect of this PR is a slight behavior change that should be entirely backwards compatible - a licensed text section will always be written to the cached file. We used to strip out nil license text and legal notices which means that licensed couldn't know if the first (or only) text in a cached file is the license text or a legal notice.

Ensuring that Dependency#detect_license! always includes section for license text in self.text means that we can reliably use that section to compare cached license text against the license text found from Dependency#detect_license!.

This makes the assumption that it is 馃憤 to only compare license text to determine that the license key can be reused. If Dependency#notices should also be compared, then https://github.com/github/licensed/tree/less-license-chattiness should be considered instead of this PR.

/cc @lildude FYI as github/linguist was asking for this. Please let me know if you'd like me to ping someone else for linguists attention 馃檱

removes `compact` which would remove nil `license_text`
also sorts the notice paths for output consistency
content is the license text without legal notices
- handles text without separator 馃憤
- won't throw errors with cached files from previous versions 馃憤
- previous versions won't throw errors with cached files from these changes 馃憤
- keeps all other detected metadata
- keeps detected license text and legal notices
@mlinksva
Copy link
Contributor

mlinksva commented Apr 3, 2018

This makes the assumption that it is 馃憤 to only compare license text to determine that the license key can be reused. If Dependency#notices should also be compared, then https://github.com/github/licensed/tree/less-license-chattiness should be considered instead of this PR.

Changed notices don't have any behavior impact when license text can be detected, so I don't think they should for this case either, so long as any changed notices actually get updated in the cache.

It might be useful for licensed to report on changed notices or optionally require review of same, but that's a different feature.

Copy link

@benbalter benbalter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

馃憤 Comparing content_normalized should make the comparison more robust across versions as normalization changes/improves.

# based on whether the normalized license content matches
def license_text_match?(other)
return false unless other.is_a?(License)
self.content_normalized == other.content_normalized

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

馃憤

@lildude
Copy link
Member

lildude commented Apr 4, 2018

/cc @lildude FYI as github/linguist was asking for this. Please let me know if you'd like me to ping someone else for linguists attention 馃檱

I'm it so this is fiiiiine.

@jonabc
Copy link
Contributor Author

jonabc commented Apr 4, 2018

@mlinksva @benbalter do you know if there are any legal issues with using placeholder text when licensed can't find/determine license text? Or any opinion on the topic?

I'd imagine it's fine since we're not creating, modifying or misrepresenting any data from the source, but I'm hardly an expert on the topic 馃槂

It seems like it would be more human-friendly to be explicit that licensed/licensee was unable to find or determine the license text content.

Without notices, in the current state of the PR

---
<metadata>
---

When notices are found, in the current state of the PR

---
<metadata>
---
------------------------------------------------
<notice text>

Proposing something like

---
<metadata>
---
License text is not available

and

---
<metadata>
---
License text is not available
--------------------------------------------------
<notice text>

@mlinksva
Copy link
Contributor

mlinksva commented Apr 4, 2018

I'd rather not output "License text is not available" -- that may not be the case -- licensed just may not have found it. If it's meant to be an informative message, maybe something like "Licensed did not find license text; see <link?> for information on where licensed looks."

@jonabc
Copy link
Contributor Author

jonabc commented Apr 4, 2018

@mlinksva gotcha. I used "available" because Dependency#license_text is nil when multiple license files are found, so "not found" didn't seem fully appropriate either.

Linking to documentation is helpful but makes the message dependent on an outside resource that could change and mess with content matching.

I'm not sure this is worth pursuing TBH - missing license text is an error case for licensed status and users shouldn't allow cached data with empty text.

It was just a thought I had and since the PR deals with what to cache when license text isn't found, it seemed like the right time to bring up.

@mlinksva
Copy link
Contributor

mlinksva commented Apr 4, 2018

Dependency#license_text is nil when multiple license files are found

Hmm, it's a different issue, but I suspect it would be helpful to cache all of the license texts in this case rather than none of them.

@jonabc
Copy link
Contributor Author

jonabc commented Apr 4, 2018

Hmm, it's a different issue, but I suspect it would be helpful to cache all of the license texts in this case rather than none of them.

Agreed, I looked into that but wasn't sure how it would affect, or if it would cause a weird mismatch scenario with, Dependency#license_key a.k.a. the license metadata value. I'll open up an issue for that.

@jonabc
Copy link
Contributor Author

jonabc commented Apr 5, 2018

This makes the assumption that it is 馃憤 to only compare license text to determine that the license key can be reused. If Dependency#notices should also be compared, then https://github.com/github/licensed/tree/less-license-chattiness should be considered instead of this PR.

Changed notices don't have any behavior impact when license text can be detected, so I don't think they should for this case either, so long as any changed notices actually get updated in the cache.

@mlinksva sorry I only noticed this now. What is the behavior impact from notices when license text can't be detected?

@mlinksva
Copy link
Contributor

mlinksva commented Apr 5, 2018

@jonabc I was saying the behavior of the code in this PR's branch is fine. Notices should be cached (and updated if changed) for any license state, whether same or changed, detected or not, but notice changes shouldn't invalidate reuse of license key.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants