Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

get_sample_data still broken on v.1.1.x #498

Merged
merged 1 commit into from

5 participants

@jdh2358
Owner

It appears that cbook.get_sample_data is still broken, at least in some configurations. I start with an empty cache (custom MPLCONFIGDIR) and run an example that requires the sample data once, and it pulls from github. All is well. When I run the example a second time, it get a failure (traceback below)

Running in v.1.1.x branch with commit 0c7f83d on opensuse python2.7 64 bit. herre is the --verbose-debug output:

remove the sample_data cache

johnh@lettuce:doc> rm -rf /export/home/johnh/.matplotlib.linux/sample_data

first run is OK

johnh@lettuce:doc> MPLCONFIGDIR=/export/home/johnh/.matplotlib.linux PYTHONPATH=/export/home/johnh/devlinux/lib64/python2.7/site-packages/ python ../examples/pylab_examples/image_demo3.py -dGTKAgg --verbose-debug
$HOME=/home/titan/johnh
matplotlib data path /export/home/johnh/devlinux/lib64/python2.7/site-packages/matplotlib/mpl-data
loaded rc file /export/home/johnh/matplotlib.matplotlib/doc/matplotlibrc
matplotlib version 1.1.0
verbose.level debug
interactive is False
platform is linux2
loaded modules: ['heapq', snip...]
CONFIGDIR=/export/home/johnh/.matplotlib.linux
Using fontManager instance from /export/home/johnh/.matplotlib.linux/fontList.cache
backend GTKAgg version 2.22.0
ViewVCCachedServer: files listed in cache.pck: set([])
ViewVCCachedServer: files in cache directory: set([])
ViewVCCachedServer: retrieving https://raw.github.com/matplotlib/sample_data/master/lena.jpg
ViewVCCachedServer: received response 200: OK

second run crashes

johnh@lettuce:doc> MPLCONFIGDIR=/export/home/johnh/.matplotlib.linux PYTHONPATH=/export/home/johnh/devlinux/lib64/python2.7/site-packages/ python ../examples/pylab_examples/image_demo3.py -dGTKAgg --verbose-debug
$HOME=/home/titan/johnh
matplotlib data path /export/home/johnh/devlinux/lib64/python2.7/site-packages/matplotlib/mpl-data
loaded rc file /export/home/johnh/matplotlib.matplotlib/doc/matplotlibrc
matplotlib version 1.1.0
verbose.level debug
interactive is False
platform is linux2
loaded modules: ['heapq', snip...]
CONFIGDIR=/export/home/johnh/.matplotlib.linux
Using fontManager instance from /export/home/johnh/.matplotlib.linux/fontList.cache
backend GTKAgg version 2.22.0
ViewVCCachedServer: files listed in cache.pck: set(['/export/home/johnh/.matplotlib.linux/sample_data/lena.jpg'])
ViewVCCachedServer: files in cache directory: set(['/export/home/johnh/.matplotlib.linux/sample_data/lena.jpg', '/export/home/johnh/.matplotlib.linux/sample_data/cache.pck'])
ViewVCCachedServer: retrieving https://raw.github.com/matplotlib/sample_data/master/lena.jpg
Traceback (most recent call last):
File "../examples/pylab_examples/image_demo3.py", line 10, in
datafile = cbook.get_sample_data('lena.jpg')
File "/export/home/johnh/devlinux/lib64/python2.7/site-packages/matplotlib/cbook.py", line 688, in get_sample_data
return myserver.get_sample_data(fname, asfileobj=asfileobj)
File "/export/home/johnh/devlinux/lib64/python2.7/site-packages/matplotlib/cbook.py", line 617, in get_sample_data
response = self.opener.open(url)
File "/usr/lib64/python2.7/urllib2.py", line 391, in open
response = self._open(req, data)
File "/usr/lib64/python2.7/urllib2.py", line 409, in _open
'_open', req)
File "/usr/lib64/python2.7/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/usr/lib64/python2.7/urllib2.py", line 1197, in https_open
return self.do_open(httplib.HTTPSConnection, req)
File "/usr/lib64/python2.7/urllib2.py", line 1158, in do_open
h.request(req.get_method(), req.get_selector(), req.data, headers)
File "/usr/lib64/python2.7/httplib.py", line 946, in request
self._send_request(method, url, body, headers)
File "/usr/lib64/python2.7/httplib.py", line 986, in _send_request
self.putheader(hdr, value)
File "/usr/lib64/python2.7/httplib.py", line 924, in putheader
str = '%s: %s' % (header, '\r\n\t'.join(values))
TypeError: sequence item 0: expected string, NoneType found
johnh@lettuce:doc>

@jkseppan jkseppan was assigned
@jkseppan
Collaborator

Cannot replicate, but I have a guess. Could you send me your /export/home/johnh/.matplotlib.linux/sample_data/cache.pck file?

@jdh2358
Owner

Hey Jouni, I will email you the cache.pck file momentairly. But I want to clarify how I am using this and the undersirable behavior (to me) that I am seeing. Perhaps I am abusing the code and there is no sane way to do what I am doing, but here goes. I have my sample_data in my local MPLCONFIGDIR. I frequently build the docs etc with examples.download=False and point to this directory. I usually get this directory as a github checkout from matplotlib.sample_data But sometimes I am running as a normal user, eg not building the docs, and I still have my MPLCONFIGDIR pointing to the same place and examples.download=True. When I run in that environment, the sample_data code removes everything that is not under it's control, eg it wipes the githup checkout clean including the .git directory. This seems a bit heavy handed to me.

I'm including another shell session below which shows the workflow with debug verbosity. Let me know if you think this is the right behavior.

flush sample_data, get a clean gihub checkout for my local copy

lettuce:/export/home/johnh/.matplotlib.linux $ rm -rf sample_data/
lettuce:/export/home/johnh/.matplotlib.linux $ git clone git://github.com/matplotlib/sample_data.git
Cloning into sample_data...
remote: Counting objects: 69, done.
remote: Compressing objects: 100% (65/65), done.
remote: Total 69 (delta 15), reused 0 (delta 0)
Receiving objects: 100% (69/69), 4.17 MiB | 1.73 MiB/s, done.
Resolving deltas: 100% (15/15), done.
lettuce:/export/home/johnh/.matplotlib.linux $ cd ../matplotlib.matplotlib/examples/pylab_examples/
lettuce:/export/home/johnh/matplotlib.matplotlib/examples/pylab_examples $ ls /export/home/johnh/.matplotlib.linux/sample_data/
aapl.csv goog.npy msft_nasdaq.npy
AAPL.dat INTC.dat README.txt
aapl.npy lena.jpg s1045.ima
axes_grid lena.png screenshots
ct.raw logo2.png setup.py
data_x_x2_x3.csv MANIFEST.in testdata.csv
demodata.csv membrane.dat testdir
eeg.dat Minduka_Present_Blue_Pack.png
embedding_in_wx3.xrc msft.csv

run the image demo that pulls lena.jpg for the first time; note how it wipes the dir clean

lettuce:/export/home/johnh/matplotlib.matplotlib/examples/pylab_examples $ MPLCONFIGDIR=/export/home/johnh/.matplotlib.linux python image_demo3.test.py --verbose-debug -dagg
$HOME=/home/titan/johnh
matplotlib data path /export/home/johnh/devlinux/lib64/python2.7/site-packages/matplotlib/mpl-data
loaded rc file /export/home/johnh/matplotlib.matplotlib/examples/pylab_examples/matplotlibrc
matplotlib version 1.1.0
verbose.level debug
interactive is False
platform is linux2
loaded modules: ['heapq', ...snip...]
CONFIGDIR=/export/home/johnh/.matplotlib.linux
Using fontManager instance from /export/home/johnh/.matplotlib.linux/fontList.cache
backend agg version v2.2
ViewVCCachedServer: files listed in cache.pck: set([])
ViewVCCachedServer: files in cache directory: set(['/export/home/johnh/.matplotlib.linux/sample_data/screenshots/etopo20data.gz', '/export/home/johnh/.matplotlib.linux/sample_data/.git/objects/pack/pack-1a002eaf9d001208ee83f99b706ad73098e74105.pack', '/export/home/johnh/.matplotlib.linux/sample_data/aapl.csv', '/export/home/johnh/.matplotlib.linux/sample_data/lena.png', '/export/home/johnh/.matplotlib.linux/sample_data/.git/hooks/commit-msg.sample', '/export/home/johnh/.matplotlib.linux/sample_data/.git/hooks/post-commit.sample', '/export/home/johnh/.matplotlib.linux/sample_data/ct.raw', '/export/home/johnh/.matplotlib.linux/sample_data/testdir/subdir/testsub.csv', '/export/home/johnh/.matplotlib.linux/sample_data/.git/hooks/post-update.sample', '/export/home/johnh/.matplotlib.linux/sample_data/.git/packed-refs', '/export/home/johnh/.matplotlib.linux/sample_data/.git/hooks/pre-rebase.sample', '/export/home/johnh/.matplotlib.linux/sample_data/screenshots/intc.csv', '/export/home/johnh/.matplotlib.linux/sample_data/INTC.dat', '/export/home/johnh/.matplotlib.linux/sample_data/.git/hooks/applypatch-msg.sample', '/export/home/johnh/.matplotlib.linux/sample_data/screenshots/500hgtdata.gz', '/export/home/johnh/.matplotlib.linux/sample_data/README.txt', '/export/home/johnh/.matplotlib.linux/sample_data/screenshots/s1045.ima', '/export/home/johnh/.matplotlib.linux/sample_data/screenshots/hst.zdat', '/export/home/johnh/.matplotlib.linux/sample_data/AAPL.dat', '/export/home/johnh/.matplotlib.linux/sample_data/.git/hooks/post-receive.sample', '/export/home/johnh/.matplotlib.linux/sample_data/MANIFEST.in', '/export/home/johnh/.matplotlib.linux/sample_data/membrane.dat', '/export/home/johnh/.matplotlib.linux/sample_data/aapl.npy', '/export/home/johnh/.matplotlib.linux/sample_data/.git/hooks/prepare-commit-msg.sample', '/export/home/johnh/.matplotlib.linux/sample_data/screenshots/etopo20lons.gz', '/export/home/johnh/.matplotlib.linux/sample_data/.git/info/exclude', '/export/home/johnh/.matplotlib.linux/sample_data/setup.py', '/export/home/johnh/.matplotlib.linux/sample_data/logo2.png', '/export/home/johnh/.matplotlib.linux/sample_data/.git/objects/pack/pack-1a002eaf9d001208ee83f99b706ad73098e74105.idx', '/export/home/johnh/.matplotlib.linux/sample_data/.git/logs/HEAD', '/export/home/johnh/.matplotlib.linux/sample_data/s1045.ima', '/export/home/johnh/.matplotlib.linux/sample_data/screenshots/500hgtlats.gz', '/export/home/johnh/.matplotlib.linux/sample_data/demodata.csv', '/export/home/johnh/.matplotlib.linux/sample_data/data_x_x2_x3.csv', '/export/home/johnh/.matplotlib.linux/sample_data/lena.jpg', '/export/home/johnh/.matplotlib.linux/sample_data/axes_grid/bivariate_normal.npy', '/export/home/johnh/.matplotlib.linux/sample_data/screenshots/eeg.dat', '/export/home/johnh/.matplotlib.linux/sample_data/screenshots/500hgtlons.gz', '/export/home/johnh/.matplotlib.linux/sample_data/goog.npy', '/export/home/johnh/.matplotlib.linux/sample_data/.git/index', '/export/home/johnh/.matplotlib.linux/sample_data/.git/description', '/export/home/johnh/.matplotlib.linux/sample_data/.git/HEAD', '/export/home/johnh/.matplotlib.linux/sample_data/msft.csv', '/export/home/johnh/.matplotlib.linux/sample_data/.git/refs/heads/master', '/export/home/johnh/.matplotlib.linux/sample_data/.git/hooks/pre-commit.sample', '/export/home/johnh/.matplotlib.linux/sample_data/screenshots/etopo20lats.gz', '/export/home/johnh/.matplotlib.linux/sample_data/screenshots/chandra.dat', '/export/home/johnh/.matplotlib.linux/sample_data/.git/hooks/pre-applypatch.sample', '/export/home/johnh/.matplotlib.linux/sample_data/screenshots/msft.csv', '/export/home/johnh/.matplotlib.linux/sample_data/msft_nasdaq.npy', '/export/home/johnh/.matplotlib.linux/sample_data/embedding_in_wx3.xrc', '/export/home/johnh/.matplotlib.linux/sample_data/.git/logs/refs/heads/master', '/export/home/johnh/.matplotlib.linux/sample_data/.git/hooks/update.sample', '/export/home/johnh/.matplotlib.linux/sample_data/.git/config', '/export/home/johnh/.matplotlib.linux/sample_data/Minduka_Present_Blue_Pack.png', '/export/home/johnh/.matplotlib.linux/sample_data/.git/refs/remotes/origin/HEAD', '/export/home/johnh/.matplotlib.linux/sample_data/eeg.dat', '/export/home/johnh/.matplotlib.linux/sample_data/testdata.csv'])
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/screenshots/etopo20data.gz
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/.git/objects/pack/pack-1a002eaf9d001208ee83f99b706ad73098e74105.pack
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/aapl.csv
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/lena.png
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/.git/hooks/commit-msg.sample
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/INTC.dat
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/ct.raw
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/testdir/subdir/testsub.csv
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/.git/hooks/post-update.sample
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/.git/packed-refs
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/.git/hooks/pre-rebase.sample
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/screenshots/intc.csv
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/.git/hooks/post-commit.sample
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/.git/hooks/applypatch-msg.sample
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/screenshots/500hgtdata.gz
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/README.txt
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/screenshots/s1045.ima
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/screenshots/hst.zdat
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/AAPL.dat
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/.git/hooks/post-receive.sample
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/MANIFEST.in
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/membrane.dat
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/aapl.npy
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/.git/hooks/prepare-commit-msg.sample
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/screenshots/etopo20lons.gz
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/.git/info/exclude
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/setup.py
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/logo2.png
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/.git/objects/pack/pack-1a002eaf9d001208ee83f99b706ad73098e74105.idx
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/.git/logs/HEAD
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/s1045.ima
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/screenshots/500hgtlats.gz
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/demodata.csv
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/data_x_x2_x3.csv
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/lena.jpg
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/axes_grid/bivariate_normal.npy
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/screenshots/eeg.dat
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/screenshots/500hgtlons.gz
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/goog.npy
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/.git/index
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/.git/description
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/.git/HEAD
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/msft.csv
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/.git/refs/heads/master
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/.git/hooks/pre-commit.sample
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/screenshots/etopo20lats.gz
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/screenshots/chandra.dat
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/.git/hooks/pre-applypatch.sample
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/screenshots/msft.csv
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/msft_nasdaq.npy
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/embedding_in_wx3.xrc
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/.git/logs/refs/heads/master
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/.git/hooks/update.sample
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/.git/config
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/Minduka_Present_Blue_Pack.png
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/.git/refs/remotes/origin/HEAD
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/eeg.dat
ViewVCCachedServer:remove_stale_files: removing /export/home/johnh/.matplotlib.linux/sample_data/testdata.csv
ViewVCCachedServer: retrieving https://raw.github.com/matplotlib/sample_data/master/lena.jpg
ViewVCCachedServer: received response 200: OK

now rerun a second time to reproduce error

lettuce:/export/home/johnh/matplotlib.matplotlib/examples/pylab_examples $ ls /export/home/johnh/.matplotlib.linux/sample_data/axes_grid cache.pck lena.jpg screenshots testdir
lettuce:/export/home/johnh/matplotlib.matplotlib/examples/pylab_examples $ MPLCONFIGDIR=/export/home/johnh/.matplotlib.linux python image_demo3.test.py --verbose-debug -dagg
$HOME=/home/titan/johnh
matplotlib data path /export/home/johnh/devlinux/lib64/python2.7/site-packages/matplotlib/mpl-data
loaded rc file /export/home/johnh/matplotlib.matplotlib/examples/pylab_examples/matplotlibrc
matplotlib version 1.1.0
verbose.level debug
interactive is False
platform is linux2
loaded modules: ['heapq', ...snip...]
CONFIGDIR=/export/home/johnh/.matplotlib.linux
Using fontManager instance from /export/home/johnh/.matplotlib.linux/fontList.cache
backend agg version v2.2
ViewVCCachedServer: files listed in cache.pck: set(['/export/home/johnh/.matplotlib.linux/sample_data/lena.jpg'])
ViewVCCachedServer: files in cache directory: set(['/export/home/johnh/.matplotlib.linux/sample_data/lena.jpg', '/export/home/johnh/.matplotlib.linux/sample_data/cache.pck'])
ViewVCCachedServer: retrieving https://raw.github.com/matplotlib/sample_data/master/lena.jpg
Traceback (most recent call last):
File "image_demo3.test.py", line 12, in
datafile = cbook.get_sample_data('lena.jpg')
File "/export/home/johnh/devlinux/lib64/python2.7/site-packages/matplotlib/cbook.py", line 689, in get_sample_data
return myserver.get_sample_data(fname, asfileobj=asfileobj)
File "/export/home/johnh/devlinux/lib64/python2.7/site-packages/matplotlib/cbook.py", line 617, in get_sample_data
response = self.opener.open(url)
File "/usr/lib64/python2.7/urllib2.py", line 391, in open
response = self._open(req, data)
File "/usr/lib64/python2.7/urllib2.py", line 409, in _open
'_open', req)
File "/usr/lib64/python2.7/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/usr/lib64/python2.7/urllib2.py", line 1197, in https_open
return self.do_open(httplib.HTTPSConnection, req)
File "/usr/lib64/python2.7/urllib2.py", line 1158, in do_open
h.request(req.get_method(), req.get_selector(), req.data, headers)
File "/usr/lib64/python2.7/httplib.py", line 946, in request
self._send_request(method, url, body, headers)
File "/usr/lib64/python2.7/httplib.py", line 986, in _send_request
self.putheader(hdr, value)
File "/usr/lib64/python2.7/httplib.py", line 924, in putheader
str = '%s: %s' % (header, '\r\n\t'.join(values))
TypeError: sequence item 0: expected string, NoneType found
lettuce:/export/home/johnh/matplotlib.matplotlib/examples/pylab_examples $

@jdh2358
Owner
@jkseppan
Collaborator

I think you accidentally sent the pck file to the github email address of this issue instead of to me, and github doesn't show the attachment.

In any case, which exact version of Python is this? I have Python 2.7.1 on OS X, and the httplib line numbers in the traceback don't match.

Update: I installed 2.7.2, and the line numbers still don't match, and I still can't reproduce your error. Does the OpenSUSE version have some patches that are not in standard Python?

Update 2: I don't see any patches in python-2.7.2-7.1.src.rpm that would explain this, but I have no idea if that is the same rpm that you have.

@jkseppan
Collaborator

About your usage: it's not how this code was intended to be used. The original idea was to be able to download example files from Sourceforge without packaging all of them in the tarball and without requiring the user to have a Subversion client. How I implemented it was to use HTTP instead of the Subversion-specific protocol, since any reasonable HTTP server has support for caching and Python has useful HTTP-related packages in its standard library. HTTP caching works by having the server send one or two optional response headers (ETag based on content, and Last-Modified based on modification date) that the client can later use to ask the server if the file has been changed. In particular, Github seems to only send ETag, not Last-Modified, and ETag is a completely opaque identifier that the server is allowed to generate in any way.

Now when you get files in some other way (by pointing the cache directory at a git checkout) you don't get the cache-related HTTP headers, so the retrieval code has no way of asking the HTTP server if this file is up-to-date. That's why it decides that all the files in that directory are out-of-date.

We could add a check to see if your sample_data directory is actually a git checkout, and in that case just do a git pull, but that leads to a whole bunch of other problems. (What if you have modified some files in the checkout? What if the pull fails with a merge conflict? etc.) I'm not sure that it would be worth the added complication.

@jkseppan jkseppan referenced this pull request from a commit in jkseppan/matplotlib
@jkseppan jkseppan Don't set http request headers with content None
Attempts to fix issue #498. I can't reproduce the issue myself, so
I don't know if this is the real culprit, but it shouldn't do any
harm.
1133e83
@jkseppan
Collaborator

Pull request #501 attempts to fix this, but since I can't reproduce this, I don't know if it really helps.

@jkseppan
Collaborator

I'm starting to think we should make httplib2 a dependency and use it for get_sample_data. HTTP is somewhat tricky to get right (see issue #478: we don't handle redirections right), and using a library specialized for that purpose would make more sense than having a half-baked implementation of our own. But I'm hesitant to make that big a change this close to the release.

@jdh2358
Owner

I just tested your pull request #501 and it appears to be working fine.

On the issue of trying to detect a git repo, I agree this is probably not a good idea because of the additional complexity. But what do you think about not removing files in the directory. It seems like our managed files could live besides files they know nothing about, which would allow my use case to work reasonably well. Again, blowing away all the files feels a bit heavy handed and may lead to unhappy surprises.

On the issue of httplib2, I agree it should wait, and we would probably have to distribute it ourselves which causes it's own problems.

Alternatively, we could consider moving the sample data back into the main tree. It's <11M at this point, and we could simply distribute it. But let's see how well things work with the new fixes and not upset the apple cart at this point.

Thanks for the fixes.

@jkseppan
Collaborator

I removed the release_critical label since the immediate problem got fixed, but let's not close this yet since we should fix this properly on master. Not deleting files is doable as you say, but shipping the sample data as part of the matplotlib package might be the simplest solution.

@pelson
Collaborator

@jkseppan: Any idea where this ticket is at? Anything left to be done?

@jkseppan
Collaborator

I would like to suggest that we drop the current get_sample_data mechanism and start including the sample data as part of the downloadable matplotlib packages.

The root cause of problems like #478 is that we get the files from github (or sourceforge, which we used previously) and don't really control the server side: the hoster can move the files elsewhere and leave a redirect behind, or perhaps only a "404 Not Found" and a human-readable explanation.

The original rationale for get_sample_data was that the gallery can include new examples with new data, and you could use them with older versions of matplotlib that didn't come with that data file. In practice it doesn't seem that we get a lot of new data in the repository. There are no commits from 2012; three commits from 2011 that add data files (and some more that increment version numbers or similarly modify the infrastructure); no commits from 2010; a lot of activity in 2009). My guess is that there is sufficient sample data there for demoing the various plot types matplotlib has, and almost all new examples can use the existing data.

I seem to recall that the Debian packager wanted a self-contained package that doesn't download data during the documentation build. I guess there's some special patch for Debian to handle this.

@mdboom mdboom Turn get_sample_data into a much simpler function that merely returns…
… files from an installed sample_data directory. Include the sample data locally. Remove sample data that is no longer used.
6c5e961
@mdboom
Owner

I've attached a commit that includes all of the sample data locally and makes get_sample_data a much simpler function. This seems like the right thing to do -- apologies to whoever implemented this slick piece of code in the first place -- it was handy in the past but is probably no longer necessary.

@pelson
Collaborator

@mdboom: Looks good. I would be prepared to merge it. One concern however: how big is the sample data being added?

@efiring efiring merged commit 6c5e961 into matplotlib:master
@efiring
Owner

@pelson, it is only 1.4 MB.

@mdboom mdboom deleted the mdboom:issue498 branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Aug 29, 2012
  1. @mdboom

    Turn get_sample_data into a much simpler function that merely returns…

    mdboom authored
    … files from an installed sample_data directory. Include the sample data locally. Remove sample data that is no longer used.
Something went wrong with that request. Please try again.