Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow URL strings to be passed to imread #4256

Merged
merged 7 commits into from Jun 9, 2015

Conversation

rnelsonchem
Copy link
Contributor

These modifications allow valid URL strings to be passed directly into imread. URL recognition is done using the standard library function urlparse. If the string contains a valid url scheme, it will be used to download and process the data using a combination of urlopen and BytesIO (StringIO in python 2). (I followed methodology in the pandas.read_* functions, which also allow valid URL strings.)

Things that I haven't done:

  • No tests
  • Update documentation
    I'll do these things if this code addition seems acceptable.

One other thing that I've noticed, the matplotlib.pyplot.imread function signature does not match the matplotlib.image.imread signature. It seems like the doc string would be a little easier to follow if the function signatures matched. Maybe that is not something to address in this PR.

These modifications allow valid URL strings to be passed directly into imread.
URL recognition is done using the standard library function urlparse. If the
string contains a valid url scheme, it will be used to download and process
the data using a combination of urlopen and BytesIO (StringIO in python 2).
@rnelsonchem
Copy link
Contributor Author

I should have mentioned, the following code is now possible in Python 2.7 and 3.4 with this PR:

import matplotlib.pyplot as plt

url = 'http://www.libpng.org/pub/png/img_png/pngnow.png'
image = plt.imread(url) 
plt.imshow(image)
plt.show()

url = 'http://www.jpeg.org/images/jpeg-home.jpg'
image = plt.imread(url, format='jpeg')
plt.imshow(image)
plt.show()

@WeatherGod
Copy link
Member

While it would be a very neat feature to make it very easy to read images
over the internet, I am a bit hesitant. Could this be an attack vector with
mal-formed images? I am not a security expert, so I really have no clue
what the security implications are.

On Sat, Mar 21, 2015 at 10:48 AM, Ryan Nelson notifications@github.com
wrote:

I should have mentioned, the following code is now possible in Python 2.7
and 3.4 with this PR:

import matplotlib.pyplot as plt

url = 'http://www.libpng.org/pub/png/img_png/pngnow.png'
image = plt.imread(url)
plt.imshow(image)
plt.show()

url = 'http://www.jpeg.org/images/jpeg-home.jpg'
image = plt.imread(url, format='jpeg')
plt.imshow(image)
plt.show()


Reply to this email directly or view it on GitHub
#4256 (comment)
.

@rnelsonchem
Copy link
Contributor Author

I suppose it is possible... But then you could download a malicious image from the internet by hand and try to load it with this function. In that case, the security issue is not with this addition, but with Pillow and MPLs internal PNG reader. Yes?

@rnelsonchem
Copy link
Contributor Author

I suppose I also should have commented on the motivation here. In Python 2, you could do something like the following to download an image from the internet:
plt.imread( urllib2.urlopen(url))
I noticed that for some reason with my Linux-based Python installations, the following call signature would break in Python 3:
plt.imread(urllib.request.urlopen(url))
However, the following does work in Python 3:
plt.imread(io.BytesIO(urllib.request.urlopen(url).read()))
It seems like rather than having the users tinker with trying to get the proper call signature for their system and python version, it might be more straightforward to implement the logic in imread to handle URLs directly.

@tacaswell tacaswell added this to the next point release milestone Mar 21, 2015
@tacaswell
Copy link
Member

Closing this due to the discussion in #4252 where this feature was added upstream to pillow (python-pillow/Pillow#1151 and python-pillow/Pillow#1157).

@rnelsonchem Thank you and please do not be discouraged! If I am confused, ping me to have this re-addressed.

@rnelsonchem
Copy link
Contributor Author

No problem @tacaswell
The only issue is that the MPL uses it's own handler for png files, so changes to pillow won't affect the png behavior.
It's not a big deal, though. This was only a small feature request, and it's not critical.

@tacaswell
Copy link
Member

I thought that the png handler did do the right thing with urls...

On Sun, Apr 12, 2015, 08:21 Ryan Nelson notifications@github.com wrote:

No problem @tacaswell https://github.com/tacaswell
The only issue is that the MPL uses it's own handler for png files, so
changes to pillow won't affect the png behavior.
It's not a big deal, though. This was only a small feature request, and
it's not critical.


Reply to this email directly or view it on GitHub
#4256 (comment)
.

@rnelsonchem
Copy link
Contributor Author

That might be the case; I didn't try that. But the current code checks to see if fname is string-like, and if that check passes, the first thing it does is try to open(fname, 'rb'), which of course fails with URLs. See:
https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/image.py#L1228
There's a couple other spots like that as well.

@rnelsonchem
Copy link
Contributor Author

Sorry, I should have referenced this line:
https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/image.py#L1260

@tacaswell
Copy link
Member

Ah, I understand, sorry about that.

Can you remove the PIL related changes?

The Pillow docs state that Image.open can handle either a filename string or a
file object. That simplifies the pilread function substantially.
@tacaswell
Copy link
Member

This is now failing on both py2k flavors.

@tacaswell
Copy link
Member


======================================================================
ERROR: matplotlib.tests.test_image.test_imread_pil_uint16
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/matplotlib-1.5.dev1-py2.7-linux-x86_64.egg/matplotlib/testing/decorators.py", line 51, in failer
    result = f(*args, **kwargs)
  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/matplotlib-1.5.dev1-py2.7-linux-x86_64.egg/matplotlib/tests/test_image.py", line 95, in test_imread_pil_uint16
    'baseline_images', 'test_image', 'uint16.tif'))
  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/matplotlib-1.5.dev1-py2.7-linux-x86_64.egg/matplotlib/pyplot.py", line 2215, in imread
    return _imread(*args, **kwargs)
  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/matplotlib-1.5.dev1-py2.7-linux-x86_64.egg/matplotlib/image.py", line 1269, in imread
    fd = BytesIO(urlopen(fname).read())
  File "/opt/python/2.7.9/lib/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/opt/python/2.7.9/lib/python2.7/urllib2.py", line 423, in open
    protocol = req.get_type()
  File "/opt/python/2.7.9/lib/python2.7/urllib2.py", line 285, in get_type
    raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type: /home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/matplotlib-1.5.dev1-py2.7-linux-x86_64.egg/matplotlib/tests/baseline_images/test_image/uint16.tif

======================================================================
ERROR: test suite for <class 'matplotlib.tests.test_png.test_pngsuite'>
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/nose/plugins/multiprocess.py", line 788, in run
    self.setUp()
  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/nose/suite.py", line 292, in setUp
    self.setupContext(ancestor)
  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/nose/plugins/multiprocess.py", line 770, in setupContext
    super(NoSharedFixtureContextSuite, self).setupContext(context)
  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/nose/suite.py", line 315, in setupContext
    try_run(context, names)
  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/nose/util.py", line 470, in try_run
    return func()
  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/matplotlib-1.5.dev1-py2.7-linux-x86_64.egg/matplotlib/testing/decorators.py", line 134, in setup_class
    cls._func()
  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/matplotlib-1.5.dev1-py2.7-linux-x86_64.egg/matplotlib/tests/test_png.py", line 28, in test_pngsuite
    data = plt.imread(fname)
  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/matplotlib-1.5.dev1-py2.7-linux-x86_64.egg/matplotlib/pyplot.py", line 2215, in imread
    return _imread(*args, **kwargs)
  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/matplotlib-1.5.dev1-py2.7-linux-x86_64.egg/matplotlib/image.py", line 1269, in imread
    fd = BytesIO(urlopen(fname).read())
  File "/opt/python/2.7.9/lib/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/opt/python/2.7.9/lib/python2.7/urllib2.py", line 423, in open
    protocol = req.get_type()
  File "/opt/python/2.7.9/lib/python2.7/urllib2.py", line 285, in get_type
    raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type: /home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/matplotlib-1.5.dev1-py2.7-linux-x86_64.egg/matplotlib/tests/baseline_images/pngsuite/basn0g01.png

======================================================================
ERROR: matplotlib.tests.test_quiver.test_quiver_animate.test
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/matplotlib-1.5.dev1-py2.7-linux-x86_64.egg/matplotlib/testing/decorators.py", line 51, in failer
    result = f(*args, **kwargs)
  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/matplotlib-1.5.dev1-py2.7-linux-x86_64.egg/matplotlib/testing/decorators.py", line 186, in do_test
    self._tol, in_decorator=True)
  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/matplotlib-1.5.dev1-py2.7-linux-x86_64.egg/matplotlib/testing/compare.py", line 326, in compare_images
    rms = calculate_rms(expectedImage, actualImage)
  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/matplotlib-1.5.dev1-py2.7-linux-x86_64.egg/matplotlib/testing/compare.py", line 246, in calculate_rms
    abs_diff_image = abs(expectedImage - actualImage)
ValueError: operands could not be broadcast together with shapes (600,800,3) (200,1500,3) 

----------------------------------------------------------------------

@rnelsonchem
Copy link
Contributor Author

@tacaswell This is very confusing... I do not see those errors on my Py2 installation. Also, the traceback for the first two errors says they occur in line 1269 of the image module. However, the line that is printed in the traceback and the actual line in the file are not equivalent.

Also, if I run the following in Py2:

from urlparse import urlparse
urlparse('/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/matplotlib-1.5.dev1-py2.7-linux-x86_64.egg/matplotlib/tests/baseline_images/test_image/uint16.tif')

the scheme == '', so I don't understand how the code gets past line 1267, which says:
if parsed.scheme is not ''

I'll look a bit more at the third failure, but I don't see that on my local install either...

@tacaswell
Copy link
Member

This may be a pil vs pillow or a version issue. You can look at the .Travis.yml file to see exactly what we install.

@WeatherGod
Copy link
Member

when lines in a traceback do not match up with the actual file, that is
usually a sign that an install is messed up somehow. Usually that the .py
and the .pyc files are not in sync.

On Thu, Apr 16, 2015 at 8:16 AM, Thomas A Caswell notifications@github.com
wrote:

This may be a pil vs pillow or a version issue. You can look at the
.Travis.yml file to see exactly what we install.


Reply to this email directly or view it on GitHub
#4256 (comment)
.

It seems that the identity test for the parsed URL scheme was not sufficient.
I changed this to an equality test, and things are now working properly.
@rnelsonchem
Copy link
Contributor Author

Okay. So it turns out that the identity testing is not was not working properly. I converted that to a equality test != and things work fine now in Py2.

@WeatherGod
Copy link
Member

yeah, that would be the correct change, even in python 3. There is no guarantee that an empty string in one variable is the same empty string elsewhere.

@jenshnielsen
Copy link
Member

@WeatherGod since strings are immutable I think they should be. I think the issue in Python2 is that:

"" is u""

is not true. And we are typically using from __future__ import unicode_literals

However using != is the right solution IMHO

@tacaswell
Copy link
Member

If you are using is, even with immutable types, you are taking the gamble that python bothered to determine if it already had an object with that value:

In [104]: x =  1544656783524

In [105]: y =  1544656783524

In [106]: x is y

Out[106]: False

In [107]: x == y

Out[107]: True

in this case != is definitely the correct solution.

@rnelsonchem
Copy link
Contributor Author

Shoot. I misunderstood the Pillow PRs for which @tacaswell provided links earlier. It seems that the changes that have been made to that library allow you to do something like the following:

PIL.Image.open(urlopen(url))

In this case, the current version of that open function will do the read call and conversion to BytesIO under the hood. Unfortunately, that does not help us here, because I am adding logic to check if a string argument is a URL.

If we are going to implement this functionality, then we need something that is a combination of my first and last commit. Right now the PNG version works fine. Is this something that has a chance to get merged? If so, I can redo my initial commit, which seems like the best course of action. I should also add some tests as well. What would be a good PNG and JPG (for example) URLs that we could ping every time someone runs the test suite?

@tacaswell
Copy link
Member

We can use files from the matplotlib.org website, just make sure to wrap it
so that lack of network is not a failure.

I would prefer to merge just the png version and leave the everything else
up to pillow doing it right (so we blindly pass the string down and pray),
unless I Ann also misunderstanding what those prs do.

On Thu, Apr 16, 2015, 16:17 Ryan Nelson notifications@github.com wrote:

Shoot. I misunderstood the Pillow PRs for which @tacaswell
https://github.com/tacaswell provided links earlier. It seems that the
changes that have been made to that library allow you to do something like
the following:

In this case, the current version of that open function will do the read call and conversion to BytesIO under the hood. Unfortunately, that does not help us here, because I am adding logic to check if a string argument is a URL.

If we are going to implement this functionality, then we need something that is a combination of my first and last commit. Right now the PNG version works fine. Is this something that has a chance to get merged? If so, I can redo my initial commit, which seems like the best course of action. I should also add some tests as well. What would be a good PNG and JPG (for example) URLs that we could ping every time someone runs the test suite?


Reply to this email directly or view it on GitHub
#4256 (comment)
.

@rnelsonchem
Copy link
Contributor Author

Ok. That's fine. As things currently stand, though, I have made two separate changes essentially in a single PR.

  1. Allow URL strings pointing to PNG files to be passed into the imread function. These will automatically be downloaded and then converted into the appropriate format.
  2. Removed a lot of the logic from the pilread function that is defined inside of imread. This is still okay because the Pillow Image.open function can accept filename string objects, file objects, and HTTPResponse objects (from urlopen). But it will not like URL string objects that point to remote files.

These two changes are not exactly related, and I just want to make sure that it will be okay to leave them as is in a single PR.

One potential problem. Pillow open accepts HTTPResponse objects, but imread will not for PNG files... I guess that I could add another check if folks think this is going to be a problem.

parsed = urlparse(fname)
# If fname is a URL, download the data
if parsed.scheme != '':
print('Scheme: %s' % parsed.scheme)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove debugging print please.

@tacaswell
Copy link
Member

I am 👍 on this going in modulo removing that print statement and adding an entry in https://github.com/matplotlib/matplotlib/tree/master/doc/users/whats_new

@rnelsonchem
Copy link
Contributor Author

@tacaswell Thanks for the reminder email... Sorry for the delay. I removed that print statement and added a note to the What's New docs. In addition, I updated the docstring and added a single test case, which downloads the MPL logo from the website. Let me know if you see any other necessary changes.

@WeatherGod
Copy link
Member

Uhm, that test case might be problematic. I remember having to do some
changes once due to Debian packaging rules. Anyone remember details?
On May 28, 2015 12:11 PM, "Ryan Nelson" notifications@github.com wrote:

@tacaswell https://github.com/tacaswell Thanks for the reminder
email... Sorry for the delay. I removed that print statement and added a
note to the What's New docs. In addition, I updated the docstring and added
a single test case, which downloads the MPL logo from the website. Let me
know if you see any other necessary changes.


Reply to this email directly or view it on GitHub
#4256 (comment)
.

@tacaswell
Copy link
Member

Good point attn @sandrotosi , does this run afoul of debian rules?

@rnelsonchem
Copy link
Contributor Author

@WeatherGod @tacaswell I'm a little confused. Could you provide a little more detail into the problem with my test. I'm happy to change it if there is an issue.

@tacaswell
Copy link
Member

As part of the debian packaging process the test suite gets run and they
have rules about what is allowed to be done during the test, and
@WeatherGod is suggesting that one of their rules is no reaching out to the
internet.

Given the importance of debian (and it's derivatives) making sure we stay
with in their guide lines is in our best interest :)

On Fri, May 29, 2015 at 9:43 AM Ryan Nelson notifications@github.com
wrote:

@WeatherGod https://github.com/WeatherGod @tacaswell
https://github.com/tacaswell I'm a little confused. Could you provide a
little more detail into the problem with my test. I'm happy to change it if
there is an issue.


Reply to this email directly or view it on GitHub
#4256 (comment)
.

@rnelsonchem
Copy link
Contributor Author

@tacaswell Gotcha. It looks like one alternative would be to create a separate http server (only listening on localhost) to serve local files, and then just request the local picture file through the local web server. I could create that in a new process or using threads. Might be a bit of extra work for one test. I could also just remove the test entirely.

@tacaswell
Copy link
Member

Lets go with just removing the test from this PR (and putting in a new one
with the test pending feed back from the debian folks).

The other place we have this issue is in the tests for styles which maybe
should also be removed?

On Fri, May 29, 2015 at 12:33 PM Ryan Nelson notifications@github.com
wrote:

@tacaswell https://github.com/tacaswell Gotcha. It looks like one
alternative would be to create a separate http server
https://docs.python.org/3/library/http.server.html#http.server.SimpleHTTPRequestHandler
(only listening on localhost) to serve local files, and then just request
the local picture file through the local web server. I could create that in
a new process or using threads
https://docs.python.org/3.4/library/socketserver.html#asynchronous-mixins.
Might be a bit of extra work for one test. I could also just remove the
test entirely.


Reply to this email directly or view it on GitHub
#4256 (comment)
.

@rnelsonchem
Copy link
Contributor Author

I removed the unit test and added a PR #4485 for the test in question.

tacaswell added a commit that referenced this pull request Jun 9, 2015
ENH: Allow URL strings to be passed to imread
@tacaswell tacaswell merged commit 31929e6 into matplotlib:master Jun 9, 2015
@tacaswell
Copy link
Member

@rnelsonchem Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants