Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include Copyright owner/date in HTML license report; show multiple licenses #114

Merged
merged 9 commits into from
Nov 16, 2019

Conversation

DonnKey
Copy link
Contributor

@DonnKey DonnKey commented Oct 24, 2019

This change addresses issue #109 (nee #50) about including Copyright text in the report.
It also handles the rare (but real -- see "Checker Qual") case where there are multiple license kinds by including all the licenses. Some minor changes to improve readability of both the raw html and the final result.

The raw information in the POM is often incomplete in this regard, so it makes a "best guess" when that happens. Those are easily discovered and edited in the raw html if needed.

Incidentally, it also adds another spelling of the MIT license URL (with a .php suffix). I didn't check whether the other "standard" licenses have that form.

@jaredsburrows
Copy link
Owner

Can you resolve the conflicts?

@DonnKey
Copy link
Contributor Author

DonnKey commented Oct 26, 2019

I resolved the conflicts. The edit in GitHub dropped a }, which I've fixed locally but not pushed yet.

However... I also realized I hadn't run the regressions, and expected that it would fail because of my changes. I tried to fix that, and ran up against test harness problems. (Several.)

Details follow below.

Before getting into the test harness issues, there are a couple of cosmetic issues that I'm stuck on. I don't like to blame the tools, but in this case I'm stumped and it really does look like the tools to me.  (I've been doing this stuff for a very long time, so that statement is not lightly made, but I also freely admit I could be wrong.)

The resulting HTML has some problems with whitespace/indentations around the <a> entries (and a <br>, but that might be a cascade); it looks to me as if the code to adjust indentation is simply losing track of where it is.  I've spent several days causing the code to generate "wrong" html trying to determine if there's some alternate pattern I should be following, and the <a> simply sticks at the left margin. I'm open to suggestions, but I lean toward living with it.  (The html is correct and as intended except for the whitespace.)

Here's a snippet. Note where the <a> entries align.

    <ul>
      <li><a href="#-1748871456">Android SDK Fabric (1.4.8)</a>
        <dl>
          <dt>Copyright &copy; 20xx The original author or authors</dt>
        </dl>
      </li>
<a name="-1748871456"></a>
      <pre>Fabric Software and Services Agreement
<a href="https://fabric.io/terms">https://fabric.io/terms</a></pre>
<br>
      <hr>

The problems I've run across in the test harness are not yours, I don't think, but I'm not sure how to hack around them. (Only the first of the errors is reported at any time, so you have to unmask them one at a time.) Let me know how you'd like to proceed. (My preference is to check the "right answer" in "as intended", and try to fix the test harness over time - but that's messy for you!)

  1. Fatal error in assertHtml because the resulting HTML contains &copy; : it's being reported as an "undefined entity" even if it is official HTML. (Yeah, using (c) would work, but the copyright symbol is available and a better choice.)
  2. Fatal error in assertHtml because there's no matching </hr> for the <hr>. Except that <hr> doesn't require a matching </hr>! (Some HTML dialects apparently do, but at least w3schools says <hr> has no end tag for straight HTML.) The <hr> really makes the result more readable, and I haven't a workaround in mind.
  3. Fatal error like the above for <br>. Here there's no ambiguity: <br> has never had an end tag, and it makes no sense to have one. The <br> is really needed, and I don't see an alternative.
  4. I'm using Windows, and it looks as if there's a problem (another one) with respect to Windows path names. I'm seeing the following
result.output.find("Wrote HTML report to .*${reportFolder}/${taskName}.html.")
|      |      |                             |               |
|      |      |                             |               licenseDebugReport
|      |      |                             C:\Users\donnt\AppData\Local\Temp\junit2089520722103704024/build/reports/licenses
|      |      java.util.regex.PatternSyntaxException: Illegal/unsupported escape sequence near index 26
|      |      Wrote HTML report to .*C:\Users\donnt\AppData\Local\Temp\junit2089520722103704024/build/reports/licenses/licenseDebugReport.html.
|      |                                ^
|       

Note the caret under the \U - it looks as if it's trying to treat the Windows path separator as an escape. I'm inclined to fix everything else however we decide to do it, and rely on the automated testing to deal with that (give up testing on Windows). (Yeah, I know, "Windows"...).

Lastly, for the record if you haven't seen it, I'm seeing a gripe about Groovy internals. Presumably someone is working on it, but for the record:

  • An illegal reflective access operation has occurred
  • Illegal reflective access by org.codehaus.groovy.vmplugin.v7.Java7$1 (file:/C:/Users/donnt/.gradle/wrapper/dists/gradle-5.4.1-all/3221gyojl5jsh0helicew7rwx/gradle-5.4.1/lib/groovy-all-1.0-2.5.4.jar) to constructor java.lang.invoke.MethodHandles$Lookup(java.lang.Class,int)
  • Please consider reporting this to the maintainers of org.codehaus.groovy.vmplugin.v7.Java7$1
  • Use --illegal-access=warn to enable warnings of further illegal reflective access operations All illegal access operations will be denied in a future release 

@DonnKey
Copy link
Contributor Author

DonnKey commented Oct 27, 2019

Over night it occurred to me to look at exactly what assertHtml did (I had assumed it was a built-in). The problem is that it's comparing XML when the input is HTML, and it matters in this case. I didn't see any HTML comparison tools similar to DiffBuilder, and I suspect that <hr> and <br> can't be dealt with at all because it's assuming well-formed (properly closed) XML. Either a preliminary translation phase to convert the HTML into legal XML (e.g. <hr> -> <hr/>) or a completely different comparison strategy will be needed. As final owner (and arbiter) I think at least the outline of the solution should be your call.

@jaredsburrows
Copy link
Owner

@DonnKey If you have a better library or way to compare the HTML, please let me know.

@DonnKey
Copy link
Contributor Author

DonnKey commented Oct 27, 2019

I don't know of anything that would work directly, and didn't find anything on the web that looked at all promising. Pre-transforming the strings to make them into something the XML comparison will accept seems possible (subject to actually trying it). Of course, straight string comparison would work but be less resilient to unintended changes (such as the indentation of <a> elements). If you want me to give pre-transformation a try, I can, if it would be something you'd accept if it worked.

@DonnKey
Copy link
Contributor Author

DonnKey commented Oct 29, 2019

The remaining failure is in comparing some Unicode characters that seem to be being changed/escaped differently on Windows and (presumably) Linux. It works fine for me locally. I'll do something to "neutralize" the difference.

@jaredsburrows
Copy link
Owner

jaredsburrows commented Oct 30, 2019 via email

@jaredsburrows
Copy link
Owner

If we are updating the output, should we also update the README.md?

@DonnKey
Copy link
Contributor Author

DonnKey commented Nov 1, 2019

README is mostly done. I left the indentation problems with the HTML as is. Let me know (either way, please) if you'd prefer that to be corrected in the README for readability reasons.

The CHANGELOG doesn't seem to need anything from me.

@jaredsburrows
Copy link
Owner

@DonnKey It would be nice to keep the README.md up to date with the code. This way people know what they are getting when they use the plugin.

// this exact case, so update as needed.
text = text.replaceAll('<br>', '<br/>')
text = text.replaceAll('<hr>', '<hr/>')
text = text.replaceAll('&copy;', '(c)')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, one last thing before we merge, why remove copy with (c)?

Copy link
Contributor Author

@DonnKey DonnKey Nov 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The XML comparison recognizes that &copy; is an "entity" (ampersand-semicolon), but doesn't have the copyright entity built-in, yielding a complaint about "undefined entity" and failing the test. The replacement doesn't really matter, but that seemed obvious. (The message the XML comparison generates is obvious only after you've figured out what it means!)

<title>Open source licenses</title>
</head>
<body>
<h3>Notice for packages:</h3>
<ul>
<li>
<a href="#314129783">appcompat-v7</a>
<li><a href="#1934118923">appcompat-v7 (26.1.0)</a>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, one last thing before we merge, can we remove the parentheses around the version?

Copy link
Contributor Author

@DonnKey DonnKey Nov 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that explicitly so that if there wasn't a version (theoretically possible, but, granted, unlikely) there'd be an obvious placeholder that could be edited. I don't mind making the change, but there was a reason for doing it this way. If you still want a change, would you prefer no placeholder or some other placeholder?

@jaredsburrows jaredsburrows merged commit c5d71d7 into jaredsburrows:master Nov 16, 2019
@jaredsburrows
Copy link
Owner

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants