Archive-Any 0.0940 includes copyrighted code without a license in t/ #1

Closed
real-dam opened this Issue Oct 21, 2013 · 10 comments

Projects

None yet

4 participants

@real-dam

While updating the Debian package of Archive-Any to the latest available release, I noticed the addition of several archive files under t/, which very much seem like somebody else's creation, but which lack any license for distributing.

These include: LoadHtml.5_0.tar.gz, naughty.tar.gz and your-0.01.tar.gz.

My only current option is to remove these from the Debian sources, and build the package without them. My assumption is that this would degrade the test suite, which is not good.

Could you replace the non-licensed contents with something else?

@oalders
Owner

I'll have a look at those files. To be honest, I just got maint to apply a one-line patch, so I have no idea about what's in the those files, but I can go ahead and fix this too. The current release already has some issues since it turns out I only got maint on some of the modules in the distribution. I'm trying to get that fixed. In the meantime, I've scheduled this upload for deletion while I get the perms sorted.

@book

I have looked at the archive files in t/, and tracked where they came from.

Here's a test script for those:

#!/bin/sh
BACKPAN=http://backpan.perl.org

# check files with the same name
for URL in \
    $BACKPAN/authors/id/K/KA/KANE/Acme-POE-Knee-1.10.zip \
    $BACKPAN/authors/id/T/TU/TURNERJW/LoadHtml.5_0.tar.gz \
    $BACKPAN/authors/id/M/MS/MSCHWERN/your-0.01.tar.gz \
; do
    FILE=t/`basename $URL`
    echo "# Checking source for $FILE"
    sha1sum $FILE
    wget -O- -q $URL | sha1sum | sed -e "s|-|$URL|"
    echo
done

# check "doctored" archives
FILE=t/naughty.tar.gz
URL=$BACKPAN/authors/id/K/KJ/KJALB/File-Spec-0.6.tar.gz
echo "# Checking source for $FILE"
tar xzf $FILE -O | sha1sum | sed -e "s|-|$FILE|"
wget -O- -q $URL | tar xzf - -O | sha1sum | sed -e "s|-|$URL|"
echo

And its output when run:

# Checking source for t/Acme-POE-Knee-1.10.zip
a3bfd4aed31421e577d7594d42802d2d6a590774  t/Acme-POE-Knee-1.10.zip
a3bfd4aed31421e577d7594d42802d2d6a590774  http://backpan.perl.org/authors/id/K/KA/KANE/Acme-POE-Knee-1.10.zip

# Checking source for t/LoadHtml.5_0.tar.gz
20b841079312157ac32ec448090574a40c312a25  t/LoadHtml.5_0.tar.gz
20b841079312157ac32ec448090574a40c312a25  http://backpan.perl.org/authors/id/T/TU/TURNERJW/LoadHtml.5_0.tar.gz

# Checking source for t/your-0.01.tar.gz
7880ec041404564c0dd3d56d14d1590f6553b43a  t/your-0.01.tar.gz
7880ec041404564c0dd3d56d14d1590f6553b43a  http://backpan.perl.org/authors/id/M/MS/MSCHWERN/your-0.01.tar.gz

# Checking source for t/naughty.tar.gz
tar: Removing leading `/' from member names
1644ae74b50ecc7b52f5f18605cf8da155351bbc  t/naughty.tar.gz
1644ae74b50ecc7b52f5f18605cf8da155351bbc  http://backpan.perl.org/authors/id/K/KJ/KJALB/File-Spec-0.6.tar.gz

Being distributions released on CPAN, I would assume they are proper open source packages, and that distribution is therefore allowed.

Would a t/README or t/LICENSE file containing the list of of archives and their CPAN download URL be sufficient to conform to the Debian rules?

@oalders
Owner

@real-dam Would that work for you? ^^^

@oalders
Owner

@book++ for doing the heavy lifting.

@real-dam

Would a t/README or t/LICENSE file containing the list of of archives and their CPAN download URL be sufficient to conform to the Debian rules?

Not really. Having a distribution downloaded from CPAN doesn't mean much in terms of licensing. One can assume the intent of the author, yes, but when it comes to legal stuff, assuming doesn't work. As an example, we still review copyright/licensing on Archive-Any, despite it being downloaded from CPAN :)

What could work is downloading the archives during tests, skipping if they can't be downloaded. The later is needed since Debian packages are built in a restricted environment without network access.

Thanks for caring.

@book

Downloading stuff during tests assumes more than can be expected (e.g. local CPAN mirror + restricted Internet access). Let's not do that.

I see two options here:

  1. Manually verify the license for those 4 archives (based on their content or later versions of the software), and explicitely document it (with links to the later version if necessary) in an additional file (t/LICENSE?).

    For example, your-0.01.tar.gz does not include any license information, but the later version your-1.00 does. Adding proper licensing information was most probably one of the reasons for that release. (And the license is "This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself", yay!)

    Or does the license information have to apply to the included archives (in which case your-0.01.tar.gz does not have any) and/or be part of the archives themselves?

  2. Given that the tests do not assume any specific content from the archives (other than the file listings), another option at this stage is produce new archives with the same file listings but dummy content. A README file could explain that the test archives do not have any significant content.

Which option would work better? (Option 1 is requires less work if all that's needed is to find out if the archive or a later version has an adequate license, I think. My assumption is that Debian requires us to do option 2.)

@book book added a commit to book/archive-any that referenced this issue Jan 28, 2015
@book book a support program to anonymize the archives, per oalders/archive-any#1 d132e1a
@pwr22

Hmm, I've packaged for debian before and it should have been fine to check the licenses in CPAN then add this to the LICENSE assuming they are all free

@oalders oalders closed this in #2 Jan 28, 2015
@book

@pwr22, I went looking for the archives on CPAN, and the issue was that most had no explicit license information, neither in the docs or in a LICENSE file.

That is why I went with replacing the content of the archives members, and learnt a few things about tar and Archive::Tar in the process.

@real-dam

@book, sorry for not replying earlier. Going with option 2 (purging the actual content, leaving the file lists) would indeed address all my concerns. No content - no copyrights, no license needed.

@book

@real-dam, and this is what was done in PR #2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment