Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: lint output per line #956

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
Open

Conversation

nicorikken
Copy link
Member

@nicorikken nicorikken commented Apr 10, 2024

Adds an '-line' or '-l' option to the 'lint' command. Prints a line for
each error, starting with the file to which the error belongs.

This output can be a starting point for some parsers, in particular ones
that implement something similar to Vim errorformat parsing.

Related to #925, needed for #578

Additional work needed:

  • Needs tests
  • Error messages might have to be improved
  • Not all errors are found, as some issues aren't in the
    FileReports. This requires additional investigation.

Signed-off-by: Nico Rikken nico@nicorikken.eu

@nicorikken
Copy link
Member Author

nicorikken commented Apr 20, 2024

Current version works based on ProjectReport to ensure also license errors are included. A structural improvement might be to create structural reports for license errors, even as FileReports.
And for file reports ideally the line number is kept to attribute errors to lines where possible.

@nicorikken
Copy link
Member Author

nicorikken commented Apr 22, 2024

Two remaining suggestions for the future:

  • Include source code line numbers of the finding. Requires keeping track of this information when detecting
  • How best to deal with edge-case filenames that include colons and spaces?

Current output:

/tmp/pytest-of-nico/pytest-44/fake_repository14/invalid-license.py: bad license: invalid
LICENSES/invalid-license-text: bad license: invalid-license-text
LICENSES/Nokia-Qt-exception-1.1.txt: deprecated license
LICENSES/MIT: license without file extension
/tmp/pytest-of-nico/pytest-44/fake_repository14/invalid-license.py: missing license: invalid
LICENSES/MIT: unused license
LICENSES/Nokia-Qt-exception-1.1.txt: unused license
LICENSES/invalid-license-text: unused license
/tmp/pytest-of-nico/pytest-44/fake_repository14/restricted.py: read error
/tmp/pytest-of-nico/pytest-44/fake_repository14/file with spaces.py: without license
/tmp/pytest-of-nico/pytest-44/fake_repository14/file:with:colons.py: without license
/tmp/pytest-of-nico/pytest-44/fake_repository14/no-license.py: without license
/tmp/pytest-of-nico/pytest-44/fake_repository14/file with spaces.py: without copyright
/tmp/pytest-of-nico/pytest-44/fake_repository14/file:with:colons.py: without copyright
/tmp/pytest-of-nico/pytest-44/fake_repository14/invalid-license.py: without copyright

Note: I have now committed a change to remove colons from the error messages to aid in parsing.

@nicorikken nicorikken marked this pull request as ready for review April 22, 2024 05:27
@nicorikken
Copy link
Member Author

nicorikken commented Apr 22, 2024

I'm now trying to validate with the errorformat library: errorformat "%f: %m"
It seems it cannot succesfully parse the output of files with spaces and colons:

/tmp/pytest-of-nico/pytest-44/fake_repository14/invalid-license.py|| bad license invalid
LICENSES/invalid-license-text|| bad license: invalid-license-text
LICENSES/Nokia-Qt-exception-1.1.txt|| deprecated license
LICENSES/MIT|| license without file extension
/tmp/pytest-of-nico/pytest-44/fake_repository14/invalid-license.py|| missing license invalid
LICENSES/MIT|| unused license
LICENSES/Nokia-Qt-exception-1.1.txt|| unused license
LICENSES/invalid-license-text|| unused license
/tmp/pytest-of-nico/pytest-44/fake_repository14/restricted.py|| read error
|| /tmp/pytest-of-nico/pytest-44/fake_repository14/file with spaces.py: without license
/tmp/pytest-of-nico/pytest-44/fake_repository14/file:with:colons.py|| without license
/tmp/pytest-of-nico/pytest-44/fake_repository14/no-license.py|| without license
|| /tmp/pytest-of-nico/pytest-44/fake_repository14/file with spaces.py: without copyright
/tmp/pytest-of-nico/pytest-44/fake_repository14/file:with:colons.py|| without copyright
/tmp/pytest-of-nico/pytest-44/fake_repository14/invalid-license.py|| without copyright

So probably filenames will have to be escaped.

I tried rendering output for SARIF, but it seems that it requires a built-in format and that all compatible built-in formats require line numbers and column numbers.

Adds an '-line' or '-l' option to the 'lint' command. Prints a line for
each error, starting with the file to which the error belongs.

This output can be a starting point for some parsers, in particular ones
that implement something similar to Vim errorformat parsing.

Related to #925

Additional work needed:

- [ ] Needs tests
- [ ] Error messages might have to be improved
- [ ] Not all errors are found, as some issues aren't in the
  FileReports. This requires additional investigation.

Signed-off-by: Nico Rikken <nico@nicorikken.eu>
Update to the format_lines variant, now based on the ProjectReport to include
errors that relate to the licenses.

Add a property to the Project that resembles the LICENSES/ directory to prevent
duplication of resolving that.

No additional ordering is done as it would only slow things down and it can
easily be done by the user.

Signed-off-by: Nico Rikken <nico@nicorikken.eu>
ENure a relative path, rather than an absolute path.

Signed-off-by: Nico Rikken <nico@nicorikken.eu>
To prevent code duplication.

Signed-off-by: Nico Rikken <nico@nicorikken.eu>
Signed-off-by: Nico Rikken <nico@nicorikken.eu>
A complete integration test.
Creates a repository that includes all the error types.
Output is validated according to a regex.
Validation written in a way that is isn't dependent on ordering as ordering
isn't guaranteed.

Signed-off-by: Nico Rikken <nico@nicorikken.eu>
Add notes in the line format output to reflect current thoughts.

Signed-off-by: Nico Rikken <nico@nicorikken.eu>
Use report.licenses to resolve license paths.

Signed-off-by: Nico Rikken <nico@nicorikken.eu>
Signed-off-by: Nico Rikken <nico@nicorikken.eu>
Signed-off-by: Nico Rikken <nico@nicorikken.eu>
To prevent parsing errors, remove colons from messages.

Signed-off-by: Nico Rikken <nico@nicorikken.eu>
Used syntax was too new.

Signed-off-by: Nico Rikken <nico@nicorikken.eu>
As the approach in lint module no longer use the license_directory, the change
can be reverted.

Signed-off-by: Nico Rikken <nico@nicorikken.eu>
Remove test for filename with colons, as this is unrealistic for files on
Windows and atypical for files on Unix systems even though it is allowed.

Signed-off-by: Nico Rikken <nico@nicorikken.eu>
@nicorikken
Copy link
Member Author

I originally had a test with a filename with colons, but this was causing errors on Windows. I removed it.

Somehow line numbers don't add up on Windows. Debug it.

Signed-off-by: Nico Rikken <nico@nicorikken.eu>
As the permission error doesn't work on Windows, move it to posix only, just as
the more general permission test.

Signed-off-by: Nico Rikken <nico@nicorikken.eu>
@nicorikken
Copy link
Member Author

The issue of using files with spaces seems to be an issue with the errorformat library: reviewdog/errorformat#97 I don't think reuse-tool needs to escape filename output as there is already a clear separator in this output.
This leaves the edge-case of filenames with colons, but this is also rare in a development environment that will have to cater to Windows which doesn't allow colons in filenames.

Copy link
Member

@carmenbianca carmenbianca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this feature @nicorikken ! Two very small nitpicks which I will patch up.

src/reuse/lint.py Outdated Show resolved Hide resolved
src/reuse/lint.py Outdated Show resolved Hide resolved
Signed-off-by: Carmen Bianca BAKKER <carmenbianca@fsfe.org>
Signed-off-by: Carmen Bianca BAKKER <carmenbianca@fsfe.org>
The general structure of this function is that all per-license lines go
first, and all per-file lines go after. 'Missing licenses' is,
deceptively, in the per-file category.

Signed-off-by: Carmen Bianca BAKKER <carmenbianca@fsfe.org>
Signed-off-by: Carmen Bianca BAKKER <carmenbianca@fsfe.org>
@carmenbianca
Copy link
Member

I also spent a while looking at escaping options. Thank you @nicorikken for your analysis. I think there are only two characters that really concern us here:

  • Newlines in filenames (yes...)
  • Colons in filenames

I gave it a spin with Pylint to see how other tools handle it. Here's the result:

$ pylint src/reuse
************* Module reuse.hello
world
src/reuse/hello
world.py:1:0: C0103: Module name "hello
world" doesn't conform to snake_case naming style (invalid-name)
************* Module reuse.hello:world
src/reuse/hello:world.py:1:0: C0103: Module name "hello:world" doesn't conform to snake_case naming style (invalid-name)

The answer is: don't.

I think we can be lazy about this and not bother.

Signed-off-by: Carmen Bianca BAKKER <carmenbianca@fsfe.org>
@carmenbianca carmenbianca self-requested a review April 27, 2024 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants