Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BSD-2-clause isFsfLibre needs fixing. #77

Closed
uniqx opened this issue Oct 2, 2019 · 25 comments · May be fixed by wking/fsf-api#24
Closed

BSD-2-clause isFsfLibre needs fixing. #77

uniqx opened this issue Oct 2, 2019 · 25 comments · May be fixed by wking/fsf-api#24

Comments

@uniqx
Copy link

uniqx commented Oct 2, 2019

BSD-2-clause is not marked as isFsfLibre: json/licenses.json#L471

According to FSF this license is free: https://www.gnu.org/licenses/license-list.html#FreeBSD

goneall referenced this issue in goneall/fsf-api Oct 2, 2019
…g guidelines, BSD-2-Clause-FreeBSD and BSD-2-Clause are equiv. licenses. Resolves https://github.com/spdx/license-list-data/issues/52

Signed-off-by: Gary O'Neall <gary@sourceauditor.com>
@goneall
Copy link
Member

goneall commented Oct 2, 2019

I just added a PR to resolve this issue: wking/fsf-api#24

Once this PR is merged and the license-list-data is regenerated, the JSON data should reflect the BSD-2-clause as isFsfLibre=true

@lassik
Copy link

lassik commented Oct 4, 2019

Here are all the SPDX license IDs that start with BSD:

  • BSD-1-Clause
  • BSD-2-Clause
  • BSD-2-Clause-FreeBSD
  • BSD-2-Clause-NetBSD
  • BSD-2-Clause-Patent
  • BSD-3-Clause
  • BSD-3-Clause-Attribution
  • BSD-3-Clause-Clear
  • BSD-3-Clause-LBNL
  • BSD-3-Clause-No-Nuclear-License
  • BSD-3-Clause-No-Nuclear-License-2014
  • BSD-3-Clause-No-Nuclear-Warranty
  • BSD-3-Clause-Open-MPI
  • BSD-4-Clause
  • BSD-4-Clause-UC
  • BSD-Protection
  • BSD-Source-Code

Of those, at least FreeBSD and NetBSD are probably FSF libre.

Is 4-clause the one with the advertising clause?

Not much idea about the others. No-Nuclear-Warranty 😂

@goneall
Copy link
Member

goneall commented Oct 4, 2019

@lassik Thanks for the additional analysis - saved us one round trip with the tools updates. I just added NetBSD to the pull request. FreeBSD is already marked as FSF libre.

For the other licenses, we typically only add FSF Libre to licenses which match the text of licenses referenced on the FSF website. For the matching we are pretty strict about using the License Matching Guidelines. I use the SPDX license check tool to find if licenses are equivalent.

@goneall
Copy link
Member

goneall commented Sep 9, 2020

Transferring issue to the LicenseListPublisher to track the related issue against wking/fsf-api.

@goneall goneall transferred this issue from spdx/license-list-data Sep 9, 2020
@CAM-Gerlach
Copy link

Hey @goneall , what's the current status of this issue? BSD-2-Clause-FreeBSD is was deprecated and replaced with BSD-2-Clause-Views in spdx/license-list-XML#1078 , and the FreeBSD developers confirmed it was never actually part of the actual license text. Its listed in the FSF license list, which furthermore explicitly defined it as the BSD 3-clause with one clause removed and states that it is also known as the BSD 2-clause, making no mention of the extra text.

As the Python packaging ecosystem is planning to switch to SPDX identifiers for licensing, and there are proposals to eventually restrict PyPI to only allow projects with OSI and/or FSF approved licenses, If PyPI goes this route and uses the SPDX list for validation, and either FSF or OSI and FSF is required, without an explicit exception this would prevent such packages, with the second-most popular license on PyPI from being uploaded. Could you clarify the path forward here? Thanks!

@goneall
Copy link
Member

goneall commented Mar 10, 2021

@CAM-Gerlach We're pulling the data from a different repo maintained by @wking. I submitted a PR a long time ago and it has not been merged. I am assuming @wking is no longer maintaining the repo.

What would be ideal is if someone in the FSF would maintain this data. If the FSF doesn't maintain a machine readable format and @wking doesn't maintain the library, I'll see about moving this into the SPDX repo for future maintenance.

One reason I hesitate to maintain this myself is that I tend to focus more on the Java code - the FSF tool is written in Python. @CAM-Gerlach if you would like to clone/maintain this to help move it forward I can create a clone in the SPDX repo.

@wking Let me know if you would like me to move this over or if you plan on maintaining. If I don't hear anything in a week, I'll assume it is not being maintained.

@jlovejoy @swinslow If we don't have anyone volunteering to maintain the Python scripts, I'm wondering if we should just move this into the license list and maintain it the same way we do the OSI flags. I'm not sure maintaining a tool to scrape the FSF website is worth it for the amount of changes we see on that website. Let me know your thoughts.

@CAM-Gerlach
Copy link

Yep, thanks; I did notice that, which is actually why I replied here since I figured I had a better chance of a response.

As you say, ideally the FSF, as the canonical source, should ideally host their own data some sort of unambiguous, proper machine-readable form, or @wking is able to maintain their work. However, if not, as Python is my primary language, the code is reasonably well organized, relatively modern and not overly complex, and the changes requested are fairly minimal, and IANAL but I have some experience working on licensing-related issues for open source projects, I'd be willing to step up and help maintain this if neither of those two preferred alternatives end up working out, just lmk. Thanks!

@CAM-Gerlach
Copy link

(BTW, your very own @pombredanne is the author of the aforementioned proposal to add a SPDX license field, and eventually PyPI validation, to the Python package metadata standards.)

@pombredanne
Copy link
Member

@goneall re:

I'm not sure maintaining a tool to scrape the FSF website is worth it for the amount of changes we see on that website. Let me know your thoughts.

@CAM-Gerlach hey!

IMHO this is small enough that a tool may be nice to have but could be overkill. The pace of change at the FSF web site is modest, to say the least.

@goneall
Copy link
Member

goneall commented Mar 11, 2021

@CAM-Gerlach Thanks for offering to help! If we don't hear back from @wking within a week and @jlovejoy / @swinslow don't volunteer the legal team to maintain the data in the license list XML repo (see below for details), I think I'll take you up on the offer.

More details on the possible solutions:

I agree with @pombredanne that the tool may be overkill if it is only used for the License List metadata. It is actually has more data than we use in the License List, so it could be useful for other project. However, I'm not aware of any projects and the fact no one else has submitted any issues or maintained it implies we may be the only user. Originally, we were hoping the FSF would take the utility or just publish a machine readable file - but as the U.S. midwestern saying goes, "you can lead a horse to water, but you can't make it drink" ;)

To resolve this particular issue, I can think of four approaches (ordered from least effort to most effort):

  1. @wking accepts the PR and re-publishes the data
  2. We fork the fsf-api repo to the SPDX organization repo or another repo with @CAM-Gerlach as a maintainer to merge in the changes and republish. I'll need to do a very small change to the LicenseListPublisher to pick up the new source
  3. We take the current output of the fsf-api and maintain the file directly. In looking at the file, it is a bit complex but could be edited directly. I would suggest we put the file in the LicenseListXML repo along with the source for the other license metadata. If the legal team led by @jlovejoy and @swinslow agree to this, it may be a simpler solution compared to maintaining the tool.
  4. We add a field to the LicenseListXML schema for fsfLibre and update all of the license XML files with this attribute. Since there are a lot of license updates, we would probably want some script to go through and make the initial changes. After that, any updates to fsfLibre would be done in the License XML documents the same way we maintain the OSI flags. This would probably be the most comfortable to the SPDX legal team, but the most work to setup.

@CAM-Gerlach
Copy link

CAM-Gerlach commented Mar 11, 2021

Hey @pombredanne ! Awesome work on PEP 639; I've been looking forward to it and hopefully I can help do my small part to help make it a reality.

I agree the toll may be a bit overenginered just for the SPDX license list, but since it offers the lowest friction to use, keeps things modular, provides a strict superset of the data needed by SPDX, and may be useful for other applications as well as a template for other license data source APIs, it would seem a pity to abandon it given it doesn't need that much work to be kept up to date, unless the FSF makes a major breaking change to their site and the script isn't fixable, in which case we'd be no worse off than now.

As for option 1 and 2, the script actually needs a few changes to work properly; in particular, because the license page is now HTML5 instead of XHTML. I tried parsing it with lxml.html instead of lxml.etree, but while the script ran without error, it didn't actually find the licenses when I did a naive substitution of the former instead of the latter; I'll probably need to play around with the parameters and methods a bit to get that working, and can also try the standard library html parser instead.

However, since the actual structure of the page was basically unchanged, all I needed to do to get things to work perfectly was to replace the HTML named entities in the source, that XML doesn't recognize, with their standard UTF-8 equivalents. A bit of a dirty hack, for sure, but it was the simplest fix to get things working, and will suffice at least until they actually significantly change the syntax to be HTML-only, and it will hopefully be fully ported to lxml.html or html.parser soon.

Also the parameters it uses are incompatible with Python's built-in elementtree, so I removed that broken fallback; however, I may be able to use the build in html parser instead per the above, restoring support for running without dependencies, though its not vital.

You can check it out at the fix-xml-parsing branch on my CAM-Gerlach/fsf-api fork; if @wking comes back, I can submit it as a PR upstream. See the gh-pages-new branch there to preview the built result and view the changes over the previous.

Aside from fixing the existing known issues with the data, first priority for me, if I were to help maintain it, would be to slap on an actual CI (i.e. Github Actions) to test PRs and re-build the gh_pages branch when changes are merged to master; given how easy it is to do these days its a no-brainer. This would make maintenance drastically simpler and avoid things breaking or getting out of sync, aside from major changes to the upstream site.

We take the current output of the fsf-api and maintain the file directly.

Maintaining it manually isn't as easy as I thought, since its more than just one file but rather a whole bunch, with info repeated multiple places, and we wouldn't want it to get out of sync. However, it would be possible; the Wayback Machine diffs view of the page would be very helpful for that. There haven't been a ton of changes, but a number of mostly nonfree licenses have been added, as well as many links and some names updated, which would be quite tedious to do manually.

@CAM-Gerlach
Copy link

Also, as a sidenote, there has been some discussion of developing a mapping from PyPI Trove classifiers to SPDX identifiers as needed for that work, where unambiguously possible (which in fact was the topic of the initial discussion on pypa/trove-classifiers#17 that eventually sparked @pombredanne to propose the PEP in the first place. While much of the code would be unnecessary for that application, the basic structure and schema from the similar fsf-api could be a useful basis for building a simple, canonical PyPA-hosted package or API (or part of an existing, e.g. trove, packaging, etc) that served a similar purpose as a bridge between our ecosystems.

@goneall
Copy link
Member

goneall commented Mar 12, 2021

Also, as a sidenote, there has been some discussion of developing a mapping from PyPI Trove classifiers to SPDX identifiers as needed for that work

One area I am extremely interested in is making it easier to convert package data between different ecosystems (e.g. Python, Maven and NPM). Having a bias towards SPDX, I tend to think of the package managers for these ecosystems producing SPDX formats natively, however, it would be great if we had a well defined translation to SPDX based on the standards of the community. I'm not terribly familiar with Trove and the Python packaging environment, but from a first glance it looks quite doable. Making an API available online would be a great step forward.

We happen to have an opportunity with the Google Summer of Code program to get some student help if we hade a project in mind and mentor bandwidth.

We did have a student work on generating SPDX as part of PIP: https://github.com/spdx/spdx-py-build-tool

@goneall
Copy link
Member

goneall commented Sep 5, 2021

@CAM-Gerlach I'm cleaning up some of the issues for the license list publisher and realized I haven't resolved this issue yet.

I just forked the wking/fsf-api repo into the SPDX repo and duplicated the PR that should resolve this: spdx/fsf-api#1

If you could take a look and let me know if there are any other changes we should make to the fsf-api code before running it and re-generating the JSON file, I'll try to include the fix in the next release of the license list.

@CAM-Gerlach
Copy link

Hey @goneall , sorry for loosing track of this myself. I'll take a look now and open issues/PRs for any significant issues.

@CAM-Gerlach
Copy link

CAM-Gerlach commented Sep 6, 2021

There's also a few minor but seemingly trivial to resolve issues with the license data that are still open at the original upstream that can likely be cleaned up by anyone familiar enough with SPDX policies to sign off on them (like yourself), and one simple PR that can be ported over. Given none of them really involve Python, which seems to be more the area you were requesting my input in, I'll focus the changes I have on any maintenance issues with the code and docs. Also, since issues are not enabled for the repo, I guess I'll make them as a PR.

Also, if this is now considered the official SPDX upstream (probably a good idea to host it under the org from now on, given it prevents the current maintainer abandonment issue that started this whole mess and minimizes bus factor), you could consider de-forking it so that users viewing it know it is now the main home of the code, get contribution credit, avoid issues if something happens to the original, and a few other UX things.

@CAM-Gerlach
Copy link

It seems I completely forgot about it, but I evidently fixed several breaking issues with the XML parsing a while back and completely forgot about it. I updated that, and fixed a number of other issues and limitations, and submitted it as PR spdx/fsf-api#2 . Aside from as previously discussed, the further immediate changes I have to suggest are more maintainability/usability/refactoring related rather than directly related to the output, so I suggest something like the following plan:

  1. Enable issues for that repo, so we can move this discussion over there and break it into more appropriately scoped issues
  2. Merge my fix PR so that it actually runs
  3. Merge your PR as well as those fixing the other easy to resolve output-relevant issues/PRs over on the original upstream issue
  4. Update the copyright dates, GH pages links and docs accordingly
  5. Release the rebuilt version of the API output
  6. Work on long-term maintainability and UX improvements

For the record, here's what I would suggest on that (and can potentially help with):

  • Make it an installable package for easier installation and use
  • Separate the data from the code, e.g. into JSON files, for easier maintenance
  • Use pathlib for path munging for easier and more robust handling than string paths
  • Add basic pre-commit checks
  • Add at least very basic smoke tests and run them in CIs
  • Build and deploy to GitHub pages via CI

@goneall
Copy link
Member

goneall commented Sep 7, 2021

Thanks @CAM-Gerlach - I enabled issues and reviewed the PR. I'll work on some of the remaining items and update this PR later today or tomorrow.

@goneall
Copy link
Member

goneall commented Sep 7, 2021

@CAM-Gerlach I (mostly) completed 1-5 plus a few other things - you can look through the closed PR's in the repo for more info.

I copied over issues which I did not resolve and I thought could be resolved. If you see any remaining issues not copied, feel free to add to the issues list.

Would you be OK being one of the maintainer/contributors? We could use someone with more Python skills than I have. I'll send you an invite to the repo.

@CAM-Gerlach
Copy link

Thanks @goneall ! Looking over it all and reviewing your PR now.

Sure, happy to. Its a small and self-contained project and while I can't promise the bandwidth to make sweeping changes, I can certainly at least help out with reviewing PRs and keeping things running. Biggest short-term priority, if you agree, will be automating most of that by adding basic functional tests, running them in CI and using GitHub Actions for deployment.

@goneall
Copy link
Member

goneall commented Sep 8, 2021

Biggest short-term priority, if you agree, will be automating most of that by adding basic functional tests, running them in CI and using GitHub Actions for deployment.

Completely agree - I was thinking the same thing

@CAM-Gerlach
Copy link

I will do that right now, in fact, to obviate the need for the convoluted and maintenance-intensive API update process to be documented in the Contributing guide, as well as keep the API content in sync with the code and serve as a basic test of PRs. I have existing Github Actions workflows to do all that, so its mostly a copy/paste job.

@goneall
Copy link
Member

goneall commented Sep 10, 2021

Resolved with PR #119

@goneall goneall closed this as completed Sep 10, 2021
@goneall
Copy link
Member

goneall commented Dec 5, 2021

@CAM-Gerlach - Just FYI - I mentioned to Philippe the request for support on the PEP/peps#2164 on the SPDX general call on Thursday and he mentioned that he was aware of the request but has been quite busy

@CAM-Gerlach
Copy link

Thanks @goneall ! He responded there a few days ago, gave me the go-ahead to add myself as a co-author and said he'd give it a review. I'm sure he's very busy and I'm happy to continue taking care of it for him, though I'd love his feedback if and when he gets the chance. Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants