Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CIFAR10 dataset #2

Merged
merged 4 commits into from
Feb 16, 2015
Merged

Add CIFAR10 dataset #2

merged 4 commits into from
Feb 16, 2015

Conversation

vdumoulin
Copy link
Contributor

Based off MNIST, uses data that can currently be obtained through Pylearn2's download_cifar10.sh script.

@bartvm
Copy link
Member

bartvm commented Feb 15, 2015

Tests are just failing right now because the data is missing on Travis. I think we can add that as with MNIST (we can store it in the cache in Travis's new container environment so that it doesn't need to be redownloaded each time).

I'll make a ticket though, because I feel that there is a lot of code duplication between this and MNIST, and there's a variety of things that could be factored out for use by future datasets as well.

@vdumoulin
Copy link
Contributor Author

@bartvm Even though I think I've properly added the CIFAR10 files to cache, the tests fail. I'm not very Travis-savvy, is there something I left out?

@bartvm
Copy link
Member

bartvm commented Feb 16, 2015

On my phone, but the log says:

tar: Old option f' requires an argument. Trytar --help' or `tar --usage' for more information.

Maybe there's something wrong with the Tar command so that it doesn't actually unpack?

1 similar comment
@bartvm
Copy link
Member

bartvm commented Feb 16, 2015

On my phone, but the log says:

tar: Old option f' requires an argument. Trytar --help' or `tar --usage' for more information.

Maybe there's something wrong with the Tar command so that it doesn't actually unpack?

@vdumoulin
Copy link
Contributor Author

I think I found what the issue was, but I can't fix it since I don't think I can clear the cache by myself.

@bartvm
Copy link
Member

bartvm commented Feb 16, 2015

I think the bash command still doesn't fully work. The problem is that curl -O doesn't write to stdout, but to a file instead. Also, tar f requires you to give a filename to read from, but in this case you want to read from stdin for which you need to pass -. I think the final command should look like this:

curl http://www.cs.utoronto.ca/~kriz/cifar-10-python.tar.gz | tar xzf -

@vdumoulin
Copy link
Contributor Author

Yes, I think you're right. I gleaned the command from the Pylearn2 script, but I missed the dash at the end of the tar command. There's also apparently a way to use the same trick with curl -O which is present in the original command:

curl -O http://www.cs.utoronto.ca/~kriz/cifar-10-python.tar.gz - | tar xvzf -

I'll try to push a commit to force resetting the cache properly, and if it works I'll squash all commits related to fixing my mistake.

@vdumoulin
Copy link
Contributor Author

@bartvm I wasn't able to repair my cache mistake. I put the right curl command and cleaned up the commits, but I think removing cifar10 from the cache will require manual intervention.

Sorry about that!

@bartvm
Copy link
Member

bartvm commented Feb 16, 2015

No problem! I cleared the cache manually and restarted the build

@bartvm
Copy link
Member

bartvm commented Feb 16, 2015

Python 2 passes now, but you should use from six.moves import cPickle for Python 3 compatibility.

@vdumoulin
Copy link
Contributor Author

@bartvm Yes! For xrange as well. I'm on it.

@bartvm
Copy link
Member

bartvm commented Feb 16, 2015

Try opening it with mode 'rb', Python 3 refuses to cast automatically between string- and byte-encoded data.

@bartvm
Copy link
Member

bartvm commented Feb 16, 2015

Sorry, my bad, I only just realised you were trying to unpickle the file that you downloaded, which I guess was pickled with Python 2... They use different encoding schemes, but this could work:

cPickle.load(f, encoding='latin1')

@vdumoulin
Copy link
Contributor Author

That will fail under Python 2, since cPickle doesn't accept the encoding argument.

@vdumoulin
Copy link
Contributor Author

One option would be this:

try:
    data = cPickle.load(f, encoding='latin1')
except TypeError:
    data = cPickle.load(f)

but it isn't very readable in my opinion.

@bartvm
Copy link
Member

bartvm commented Feb 16, 2015

I agree, but you can use if six.PY3: which is okay I think.

@vdumoulin
Copy link
Contributor Author

@bartvm Thanks for the tip. The tests now pass.

bartvm added a commit that referenced this pull request Feb 16, 2015
@bartvm bartvm merged commit 85af481 into mila-iqia:master Feb 16, 2015
@vdumoulin vdumoulin deleted the cifar10 branch February 16, 2015 18:31
vdumoulin pushed a commit to vdumoulin/fuel that referenced this pull request Dec 2, 2015
Fix failing test_prepare_hdf5_file test
basveeling pushed a commit to basveeling/fuel that referenced this pull request Apr 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants