Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems creating a package from a SourceForge download URL #26

Open
DennisClark opened this issue Jan 2, 2024 · 13 comments
Open

Problems creating a package from a SourceForge download URL #26

DennisClark opened this issue Jan 2, 2024 · 13 comments
Assignees
Labels
help wanted Extra attention is needed integration Integration with other applications question Further information is requested

Comments

@DennisClark
Copy link
Member

Perhaps this is a user "pilot" error, but when I create a Package in DejaCode from a SourceForge download URL, I get strange results. A recent Add Package using
https://sourceforge.net/projects/scribus/files/scribus/1.6.0/scribus-1.6.0.tar.gz/download
resulted in a Package with a filename of download rather than scribus-1.6.0.tar.gz.
It also resulted in the rather verbose PURL value of
pkg:generic/download?download_url=https://sourceforge.net/projects/scribus/files/scribus/1.6.0/scribus-1.6.0.tar.gz/download

I scanned the package, using the same download URL, directly in SCIO v32.0.8, and it returned a PURL value of
pkg:autotools/scribus-1.6.0
in the key_files_packages section

So it appears that the rather eccentric download conventions of SourceForge are messing things up a bit.

  • Can we improve DejaCode to interpret the results of such a scan differently?
  • Does such an improvement rather belong in SCIO?
  • or should we prompt the DejaCode user with instructions how to provide a different, better, less eccentric download URL when processing a SourceForge package?
@DennisClark DennisClark added help wanted Extra attention is needed question Further information is requested integration Integration with other applications labels Jan 2, 2024
@pombredanne
Copy link
Member

The problem stems that https://sourceforge.net/projects/scribus/files/scribus/1.6.0/scribus-1.6.0.tar.gz/download is not the actual direct download URL but is followed by several URL redirects that end up in a mirror.

The final destination is something like where the first segment changes from mirror to mirror:
https://kumisystems.dl.sourceforge.net/project/scribus/scribus/1.6.0/scribus-1.6.0.tar.gz

The stable final URL would be https://master.dl.sourceforge.net/project/scribus/scribus/1.6.0/scribus-1.6.0.tar.gz

None of these are practically visible and accessible. Therefore we should IMHO do these:

  • Convert Sourceforge download URL to PURL.
    Update the the code to properly translate a Sourceforge URL to a PURL, either here or in the Python packageurl library, or both places.
  • Consider updating "legacy" Sourceforge URLs to a canonical URL.
    This should be the one that is visible when browsing, ignoring redirections: https://sourceforge.net/projects/scribus/files/scribus/1.6.0/scribus-1.6.0.tar.gz/download
  • Update MineCode Sourceforge miners to handle and store download URLs correctly

@DennisClark
Copy link
Member Author

thanks @pombredanne your proposed solution looks good to me!

@tdruez
Copy link
Member

tdruez commented Jan 4, 2024

Note that we have support for the https://*.sourceforge.net/project/scribus/scribus/1.6.0/scribus-1.6.0.tar.gz URLs in the packageurl library, returning pkg:sourceforge/scribus/scribus@1.6.0

We simply have to add support for this URL syntax: https://sourceforge.net/projects/scribus/files/scribus/1.6.0/scribus-1.6.0.tar.gz/download

tdruez added a commit that referenced this issue Jan 4, 2024
Signed-off-by: tdruez <tdruez@nexb.com>
@tdruez
Copy link
Member

tdruez commented Jan 4, 2024

@DennisClark I've added support for those type of URLs in the purl library, see package-url/packageurl-python#139
Also, as @pombredanne suggested, we are now using the final redirect URL to extract the proper filename.

With those changes, we now generate a proper PURL and filename:
Screenshot 2024-01-04 at 14 08 37

@DennisClark
Copy link
Member Author

DennisClark commented Jan 4, 2024

Hi @tdruez I'm getting mixed results in Staging. My original scribus case went just fine, but I then tried another package from SourceForge, turbovnc-3.1.tar.gz , on staging with download URL of

https://sourceforge.net/projects/turbovnc/files/3.1/turbovnc-3.1.tar.gz/download

and it all went fine, including a scan, except that it did not assign any PURL values. See attached.

turbovnc-3 1 tar gz test on staging

tdruez added a commit that referenced this issue Jan 5, 2024
Signed-off-by: tdruez <tdruez@nexb.com>
@DennisClark
Copy link
Member Author

@tdruez I tested the 3 you identified in your comment, plus the scribus package, and they all look rather good, with one small issue.

When I simply click on the download link for the ventoy package, it downloads a file name Ventoy 1.0.96 release source code.tar.gz which I think is correct and what they call it on the web site, but in DejaCode the filename is shown as Ventoy%201.0.96%20release%20source%20code.tar.gz with all the escape characters for the spaces. If we simply don't allow spaces in the DejaCode filename field, I guess that's ok, but it does look kind of strange. See attached.

ventoy package in staging

@DennisClark
Copy link
Member Author

@tdruez one other observation, which is not directly related to this issue, but something that is somewhat perplexing. DejaCode found the existing scans that I created yesterday for the 4 packages (good) and apparently they did not get re-scanned (fine I think) but it did not perform any of the auto-updates to fields on the package (not so good), such as the license-expression, even though 3 of the 4 scans have a declared license. See attached.

Screenshot 2024-01-05 at 09 25 54

@DennisClark
Copy link
Member Author

In the example above, the geoserver does not have a detected license anyway, so that's not a big deal, but the other 3 all have declared licenses.

@DennisClark
Copy link
Member Author

@tdruez Sorry I did not catch this one yesterday, but the results from creating a package with

https://sourceforge.net/projects/spacesniffer/files/spacesniffer_1_3_0_2.zip/download

do not look so great. See attached.

spacesniffer in staging

@DennisClark
Copy link
Member Author

It appears that there are an unknown number of (arbitrary) variations in the SourceForge download url's, suggesting we really do not have a satisfactory way to determine if we got them all. I'm sure you would like to finish this one, but it is possibly an unmanageable task. I'm ok if we go with "good enough" once we have fixed the ones we have actually discovered.

tdruez added a commit that referenced this issue Jan 8, 2024
Signed-off-by: tdruez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Jan 8, 2024
Signed-off-by: tdruez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Jan 9, 2024
Improve the support for SourgeForge download URLs #26
@tdruez
Copy link
Member

tdruez commented Jan 9, 2024

@DennisClark changes available for review:

  • Ventoy%201.0.96%20release%20source%20code.tar.gz is now properly unquoted
  • Added support for https://sourceforge.net/projects/spacesniffer/files/spacesniffer_1_3_0_2.zip/download

one other observation, which is not directly related to this issue, but something that is somewhat perplexing. DejaCode found the existing scans that I created yesterday for the 4 packages (good) and apparently they did not get re-scanned (fine I think) but it did not perform any of the auto-updates to fields on the package (not so good), such as the license-expression, even though 3 of the 4 scans have a declared license. See attached.

Entered as #30

@DennisClark
Copy link
Member Author

@tdruez The spacesniffer package creation works great now. The Ventoy package creation issue is fixed, although it was very slow to complete the Add Package step, with the cursor spinning for more than 2 minutes; I tested it with a different Ventoy version and had the same slow response. So it all appears to be working fine, but you might want to check on the performance problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed integration Integration with other applications question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants