Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_urllib2 fails - urlopen error file not on local host #49875

Closed
ned-deily opened this issue Mar 31, 2009 · 10 comments
Closed

test_urllib2 fails - urlopen error file not on local host #49875

ned-deily opened this issue Mar 31, 2009 · 10 comments
Assignees
Labels
stdlib Python modules in the Lib dir

Comments

@ned-deily
Copy link
Member

BPO 5625
Nosy @csernazs, @orsenthil, @ned-deily
Files
  • patch-nad0017-trunk-26.txt
  • patch-nad0017-py3k-30.txt
  • test_urllib2.py.diff
  • unnamed
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/orsenthil'
    closed_at = <Date 2009-12-27.10:17:25.269>
    created_at = <Date 2009-03-31.15:48:07.908>
    labels = ['library']
    title = 'test_urllib2 fails - urlopen error file not on local host'
    updated_at = <Date 2010-12-16.10:48:17.898>
    user = 'https://github.com/ned-deily'

    bugs.python.org fields:

    activity = <Date 2010-12-16.10:48:17.898>
    actor = 'orsenthil'
    assignee = 'orsenthil'
    closed = True
    closed_date = <Date 2009-12-27.10:17:25.269>
    closer = 'orsenthil'
    components = ['Library (Lib)']
    creation = <Date 2009-03-31.15:48:07.908>
    creator = 'ned.deily'
    dependencies = []
    files = ['13514', '13515', '20050', '20051']
    hgrepos = []
    issue_num = 5625
    keywords = ['patch']
    message_count = 10.0
    messages = ['84806', '94163', '96900', '96901', '124008', '124014', '124016', '124017', '124019', '124123']
    nosy_count = 4.0
    nosy_names = ['csernazs', 'orsenthil', 'dmorr', 'ned.deily']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue5625'
    versions = ['Python 2.6', 'Python 3.1', 'Python 2.7', 'Python 3.2']

    @ned-deily
    Copy link
    Member Author

    [NOTE: applies to 2.x urllib2 and similar code in merged 3.x urllib]

    test_urllib2 can fail because urllib2.FileHandler assumes incorrectly
    that the local host has only a single IP address. It is not uncommon
    to have host IP configurations where a host has more than one network
    interface and the same IP host name is associated with each address.

    Both the urllib module and test_urllib2 use
    socket.gethostbyname(socket.gethostname())
    to find "the" host IP address. But, as can be seen here,
    consecutive calls may produce different addresses depending on the
    network configuration and underlying os implementation:

    Python 2.6.1 (r261:67515, Dec 17 2008, 23:27:50) 
    [GCC 4.0.1 (Apple Inc. build 5490)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import socket
    >>> socket.gethostbyname(socket.gethostname())
    '10.52.12.105'
    >>> socket.gethostbyname(socket.gethostname())
    '10.52.12.105'
    >>> socket.gethostbyname(socket.gethostname())
    '10.52.12.205'
    >>>

    This leads to predictable test failures when the calls in test_urllib2
    and urllib2.FileHandler return different addresses:

    test_urllib2
    test test_urllib2 failed -- Traceback (most recent call last):
      File 
    "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/test/te
    st_urllib2.py", line 621, in test_file
        r = h.file_open(Request(url))
      File 
    "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2
    .py", line 1229, in file_open
        return self.open_local_file(req)
      File 
    "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2
    .py", line 1266, in open_local_file
        raise URLError('file not on local host')
    URLError: <urlopen error file not on local host>
    The simplest way to avoid the test failure is to modify
    urllib2.FileHandler to use socket.gethostbyname_ex which returns all
    of the IPv4 addresses associated with a hostname:
    >>> socket.gethostbyname_ex(socket.gethostname())
    ('myhost.net', [], ['10.52.12.205', '10.52.12.105'])

    Attached patches for 2.x urllib2 and 3.x urllib do that. Note that
    there remain other issues in this area:

    • when urllib2 is enhanced to support IPv6, code is needed to return
      all of the host's IPv6 addresses as well (-> adding a note to open
      bpo-1675455)
    • the merged 3.0 urlib has two nearly identical functions named
      open_local_file, one each from 2.x urllib.URLopener and
      urllib2.FileHandler, and both use similarly flawed
      socket.gethostbyname(socket.gethostname()) tests but the tests for
      local vs remote file URLs is somewhat different in each.
      (The patches here do not attempt to address this other than to add
      a comment.)

    @ned-deily ned-deily added the stdlib Python modules in the Lib dir label Mar 31, 2009
    @ned-deily
    Copy link
    Member Author

    While you're poking around in urllib2, perhaps I can interest you in
    looking at these patches.

    @orsenthil orsenthil self-assigned this Oct 18, 2009
    @orsenthil
    Copy link
    Member

    Thanks for the patch, Ned. Fixed in the trunk revision 77058.

    @orsenthil
    Copy link
    Member

    Merged the fixes in r77059, r77060 and r77061
    I fixed the thishost function to return all ips in py3k.

    @csernazs
    Copy link
    Mannequin

    csernazs mannequin commented Dec 15, 2010

    Could you please add this change to test_urllib2.py as well?

    It has the following line:
    localaddr = socket.gethostbyname(socket.gethostname())

    But urllib2.py has the change related to this bug.
    That makes test_urllib2 failing when gethostbyname reports different IP than gethostbyname_ex:

    (Pdb) socket.gethostbyname_ex(socket.gethostname())[2]
    ['172.31.92.26']
    (Pdb) socket.gethostbyname(socket.gethostname())
    '172.31.72.206'

    @orsenthil
    Copy link
    Member

    Zsolt,

    The change in the urllib2 was at a place where tuple of all local ips
    were required.
    In test_urllib2, which testcase failed?
    Also, can you make this change and see if this helps in your case.

    •         localaddr = socket.gethostbyname(socket.gethostname())
      

    + localaddr = socket.gethostbyname('localhost')

    If this is sufficient, this change can be made in the trunk.

    @csernazs
    Copy link
    Mannequin

    csernazs mannequin commented Dec 15, 2010

    The test which failed was HandlerTests.test_file, and I'm using python 2.7.1.

    socket.gethostbyname('localhost') returns "127.0.0.1" which is ok, but in the unittest it's already tested (line 671).

    The problem is that my /etc/hosts file contains a different IP than the DNS (I cannot change this behaviour as I'm not the administrator of the host) and that's the difference between gethostbyname and gethostbyname_ex.

    The unittest creates an url which is not local (from urllib2 point of view). I'm attaching a patch which has fixed my problem.

    @orsenthil
    Copy link
    Member

    + localaddr = socket.gethostbyname_ex(socket.gethostname())[2][0]

    May not be a generic solution, because in another system the other ip
    could be first in the list. Because the failure was in the test_file,
    which was basically exercising file://'localhost' in the url, I
    suggested that you replace with 'localhost'. I think, the solution is
    okay, even thought localhost has been exercised in another test.

    @csernazs
    Copy link
    Mannequin

    csernazs mannequin commented Dec 15, 2010

    The order of the IP addresses doesn't matter as urllib2 is flexible enough to handle all local IP addresses as local (that was the original bug - it handled only one IP returned by gethostbyname which returned a random IP if there were more than one).

    So picking up the first IP is ok I think as the order of the IP addresses doesn't matter - urllib2 will handle all of them as local.
    See urllib2.FileHandler.get_names().

    The problem is that gethostbyname doesn't guarantee that it returns one IP address from the set returned by gethostbyname_ex as gethostbyname looks up the name in /etc/hosts file first (or as configured in NSS).

    @orsenthil
    Copy link
    Member

    Well, ignore my comment on order of ip addresses. It definitely does not matter in this case for test_urllib2.

    However, readability does matter again as per my previous explanation, since http://localhost/ was being exercised in the test_file, gethostbyname('localhost') is much better than that return value's [2][0] element.

    I overlooked one thing in your first message, namely gethostbyname and gethostbyname_ex()[2] returning completely different ips and turning out to be exclusive. This should not be the case. gethostbyname_ex()[2] should include the ip which was returned by gethostbyname. If it were the case, the test would not have failed as well.

    And btw, both these are supposed have similar behavior (The default action is to query named(8), followed by /etc/hosts) only thing is gethostbyname_ex uses the reentrant c function call and is thread-safe.

    (You may probably want to identify the problem for the difference in o/p there)

    And for this bug report, I am still inclined to having 'localhost' for readability purposes or leaving it as such because the problem seems be elsewhere.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants