Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib.splitport -- is it official or not? #71672

Closed
gvanrossum opened this issue Jul 11, 2016 · 19 comments
Closed

urllib.splitport -- is it official or not? #71672

gvanrossum opened this issue Jul 11, 2016 · 19 comments
Labels
3.8 (EOL) end of life

Comments

@gvanrossum
Copy link
Member

BPO 27485
Nosy @brettcannon, @jaraco, @orsenthil, @vstinner, @ambv, @vadmium, @serhiy-storchaka, @csabella
PRs
  • bpo-27485: Rename and deprecate undocumented functions in urllib.parse #2205
  • bpo-27485: Change urlparse tests to use private methods #7070
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-06-03.14:47:49.714>
    created_at = <Date 2016-07-11.18:02:03.453>
    labels = ['3.8']
    title = 'urllib.splitport -- is it official or not?'
    updated_at = <Date 2021-09-01.22:07:44.498>
    user = 'https://github.com/gvanrossum'

    bugs.python.org fields:

    activity = <Date 2021-09-01.22:07:44.498>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2018-06-03.14:47:49.714>
    closer = 'serhiy.storchaka'
    components = []
    creation = <Date 2016-07-11.18:02:03.453>
    creator = 'gvanrossum'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 27485
    keywords = ['patch']
    message_count = 19.0
    messages = ['270193', '270200', '270215', '270218', '270253', '270536', '295962', '295965', '295966', '295967', '295983', '296049', '315765', '315769', '317354', '317391', '318554', '334794', '400875']
    nosy_count = 8.0
    nosy_names = ['brett.cannon', 'jaraco', 'orsenthil', 'vstinner', 'lukasz.langa', 'martin.panter', 'serhiy.storchaka', 'cheryl.sabella']
    pr_nums = ['2205', '7070']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue27485'
    versions = ['Python 3.8']

    @gvanrossum
    Copy link
    Member Author

    I've seen and written some code that uses urllib.splitport() [1], but it's not in the export list, nor in the docs. However I see no easy other way to perform the same function. Should we make it official, or get rid of it? It's used internally in urllib/request.py [2]. There's a test for it in test_urlparse.py [3], but another test [4] also acknowledges that it's "undocumented" (which suggests that the author of that test didn't know what to do with it either).

    Same question for the others in that list [4]:
    'splitattr', 'splithost', 'splitnport', 'splitpasswd',
    'splitport', 'splitquery', 'splittag', 'splittype', 'splituser',
    'splitvalue',
    'Quoter', 'ResultBase', 'clear_cache', 'to_bytes', 'unwrap',

    References:
    [1] https://hg.python.org/cpython/file/tip/Lib/urllib/parse.py#l956
    [2] https://hg.python.org/cpython/file/tip/Lib/urllib/request.py#l106
    [3] https://hg.python.org/cpython/file/tip/Lib/test/test_urlparse.py#l1015
    [4] https://hg.python.org/cpython/file/tip/Lib/test/test_urlparse.py#l946

    @serhiy-storchaka
    Copy link
    Member

    splitport() doesn't work with IPv6 ("[::1]", see bpo-18191), nor with authority ("user:password@example.com"). Note that there is a almost duplicate function splitnport(). The existence of two similar functions that behave differently in corner cases looks confusing. And seems splitport() and splitnport() not always used correctly internally (see bpo-20271).

    @vadmium
    Copy link
    Member

    vadmium commented Jul 11, 2016

    Previous discussion: bpo-1722, bpo-11009.

    In Python 2, most of the split- functions _have_ been in urllib.__all__ since revision 5d68afc5227c (2.1). Also, since revision c3656dca65e7 (bpo-1722, 2.7.4), the RST documentation does mention that at least some of them are deprecated in favour of the “urlparse” module. However there are no index entries, and splitport() is not mentioned by name.

    In Python 3, these functions wandered into urllib.parse. There is no RST documentation, and the functions are not in __all__ (which was added for bpo-13287 in 3.3).

    I think you can use the documented urllib.parse API instead of splitport(), but it is borderline unwieldy:

    >>> netloc = "[::1]:80"
    >>> urllib.parse.splitport(netloc)  # [Brackets] kept!
    ('[::1]', '80')
    >>> split = urlsplit("//" + netloc); (split.hostname, split.port)
    ('::1', 80)
    >>> split = SplitResult("", netloc, path="", query="", fragment=""); (split.hostname, split.port)
    ('::1', 80)

    I opened bpo-23416 with a suggestion that would make SplitResult a bit simpler to use here. But maybe it makes the implementation too complicated.

    I don’t think the non-split-names (Quoter, etc) are in much doubt. They were never in __all__.

    @gvanrossum
    Copy link
    Member Author

    Aha. I see you are referring to this note in the 2.7 docs for urllib:

    urllib also exposes certain utility functions like splittype, splithost and
    others parsing URL into various components. But it is recommended to use
    :mod:`urlparse` for parsing URLs rather than using these functions directly.
    Python 3 does not expose these helper functions from :mod:`urllib.parse`
    module.
    

    This is somewhat ironic because those functions still exist in urllib.parse.

    I've rewritten my code using your suggestions of using urllib.parse.urlparse().

    Shall we just close this issue or is there still an action item? (Maybe actually delete those functions whose deletion has been promised so long ago, or at least rename them to _splitport() etc.?)

    @brettcannon
    Copy link
    Member

    Probably a rename is good. Question then becomes whether the old names should raise an DeprecationWarning for a release?

    @orsenthil
    Copy link
    Member

    I think that we use encourage everyone to use the higher level functions like urlparse() or urlsplit() and then get the .port from the named tuple result.

    Two things to do.

    1. Update that Note the documentation which states a false statement that those helper functions are not exposed in urllib.parse

    2. Raise a DeprecationWarning and rename them to _splitport()

    @csabella
    Copy link
    Contributor

    Would it be OK for me to work on this?

    @brettcannon
    Copy link
    Member

    Go for it, Cheryl!

    @gvanrossum
    Copy link
    Member Author

    Skimming the issue I can't even figure out what the task is -- Cheryl, I suppose you have, could you post a brief summary of your plan here?

    @csabella
    Copy link
    Contributor

    Thank you.

    From my understanding, urllib didn't officially supported the split* functions (splittype, splithost, splitport, splinport, splituser, splitpasswd, splitattr, splitquery, splitvalue, splittag) even though they were migrated to urllib.parse. The 2.7 documentation recommended using urlparse and stated that these split* functions would not be exposed in Python 3, but they are.

    The proposal would be as Senthil suggested - to raise a DeprecationWarning if the current names are used and to rename them with a single underscore (to _split*).

    However, I did have some questions.

    1. With the DeprecationWarning for the current function names, should the return value be a call to the _split* function or should it call urlparse/urlsplit?
    2. Some of the return values from these split* functions can be different than urlsplit, as Martin showed in msg270215. It seems that the return values should remain the same for now, but would the differences need to be documented?
    3. These functions are used in requests.py. Should they be changed to use the _split* functions or changed to use urlparse?

    @vadmium
    Copy link
    Member

    vadmium commented Jun 14, 2017

    I don't think it is worth changing the implementations to be in terms of urlsplit or urlparse. This is proposed for splithost in <https://github.com/python/cpython/pull/1849\>, but I suspect it would change the behaviour in some corner cases. See bpo-22852 for some deficiencies with urlsplit.

    1. Change existing usage to the internal _split functions, unless there is a reason (bug or security problem) to make further changes.

    @csabella
    Copy link
    Contributor

    Martin, thank you for the information and for pointing out those other related issues. It makes sense to separate the security or bug issues from this change.

    @ambv
    Copy link
    Contributor

    ambv commented Apr 25, 2018

    New changeset 0250de4 by Łukasz Langa (Cheryl Sabella) in branch 'master':
    bpo-27485: Rename and deprecate undocumented functions in urllib.parse (GH-2205)
    0250de4

    @ambv
    Copy link
    Contributor

    ambv commented Apr 26, 2018

    Thanks! This is now fixed for Python 3.8 \o/

    @ambv ambv added the 3.8 (EOL) end of life label Apr 26, 2018
    @ambv ambv closed this as completed Apr 26, 2018
    @serhiy-storchaka
    Copy link
    Member

    This change made test_urlparse failing when ran with -We. Or just producing a lot of warnings in the default mode.

    ======================================================================
    ERROR: test_splitattr (test.test_urlparse.Utility_Tests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/serhiy/py/cpython-gc/Lib/test/test_urlparse.py", line 1113, in test_splitattr
        self.assertEqual(splitattr('/path;attr1=value1;attr2=value2'),
      File "/home/serhiy/py/cpython-gc/Lib/urllib/parse.py", line 1103, in splitattr
        DeprecationWarning, stacklevel=2)
    DeprecationWarning: urllib.parse.splitattr() is deprecated as of 3.8, use urllib.parse.urlparse() instead

    ======================================================================
    ERROR: test_splithost (test.test_urlparse.Utility_Tests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/serhiy/py/cpython-gc/Lib/test/test_urlparse.py", line 1006, in test_splithost
        self.assertEqual(splithost('//www.example.org:80/foo/bar/baz.html'),
      File "/home/serhiy/py/cpython-gc/Lib/urllib/parse.py", line 977, in splithost
        DeprecationWarning, stacklevel=2)
    DeprecationWarning: urllib.parse.splithost() is deprecated as of 3.8, use urllib.parse.urlparse() instead

    ======================================================================
    ERROR: test_splitnport (test.test_urlparse.Utility_Tests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/serhiy/py/cpython-gc/Lib/test/test_urlparse.py", line 1077, in test_splitnport
        self.assertEqual(splitnport('parrot:88'), ('parrot', 88))
      File "/home/serhiy/py/cpython-gc/Lib/urllib/parse.py", line 1049, in splitnport
        DeprecationWarning, stacklevel=2)
    DeprecationWarning: urllib.parse.splitnport() is deprecated as of 3.8, use urllib.parse.urlparse() instead

    ======================================================================
    ERROR: test_splitpasswd (test.test_urlparse.Utility_Tests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/serhiy/py/cpython-gc/Lib/test/test_urlparse.py", line 1050, in test_splitpasswd
        self.assertEqual(splitpasswd('user:ab'), ('user', 'ab'))
      File "/home/serhiy/py/cpython-gc/Lib/urllib/parse.py", line 1013, in splitpasswd
        DeprecationWarning, stacklevel=2)
    DeprecationWarning: urllib.parse.splitpasswd() is deprecated as of 3.8, use urllib.parse.urlparse() instead

    ======================================================================
    ERROR: test_splitport (test.test_urlparse.Utility_Tests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/serhiy/py/cpython-gc/Lib/test/test_urlparse.py", line 1066, in test_splitport
        self.assertEqual(splitport('parrot:88'), ('parrot', '88'))
      File "/home/serhiy/py/cpython-gc/Lib/urllib/parse.py", line 1026, in splitport
        DeprecationWarning, stacklevel=2)
    DeprecationWarning: urllib.parse.splitport() is deprecated as of 3.8, use urllib.parse.urlparse() instead

    ======================================================================
    ERROR: test_splitquery (test.test_urlparse.Utility_Tests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/serhiy/py/cpython-gc/Lib/test/test_urlparse.py", line 1091, in test_splitquery
        self.assertEqual(splitquery('http://python.org/fake?foo=bar'),
      File "/home/serhiy/py/cpython-gc/Lib/urllib/parse.py", line 1073, in splitquery
        DeprecationWarning, stacklevel=2)
    DeprecationWarning: urllib.parse.splitquery() is deprecated as of 3.8, use urllib.parse.urlparse() instead

    ======================================================================
    ERROR: test_splittag (test.test_urlparse.Utility_Tests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/serhiy/py/cpython-gc/Lib/test/test_urlparse.py", line 1101, in test_splittag
        self.assertEqual(splittag('http://example.com?foo=bar#baz'),
      File "/home/serhiy/py/cpython-gc/Lib/urllib/parse.py", line 1088, in splittag
        DeprecationWarning, stacklevel=2)
    DeprecationWarning: urllib.parse.splittag() is deprecated as of 3.8, use urllib.parse.urlparse() instead

    ======================================================================
    ERROR: test_splittype (test.test_urlparse.Utility_Tests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/serhiy/py/cpython-gc/Lib/test/test_urlparse.py", line 998, in test_splittype
        self.assertEqual(splittype('type:opaquestring'), ('type', 'opaquestring'))
      File "/home/serhiy/py/cpython-gc/Lib/urllib/parse.py", line 956, in splittype
        DeprecationWarning, stacklevel=2)
    DeprecationWarning: urllib.parse.splittype() is deprecated as of 3.8, use urllib.parse.urlparse() instead

    ======================================================================
    ERROR: test_splituser (test.test_urlparse.Utility_Tests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/serhiy/py/cpython-gc/Lib/test/test_urlparse.py", line 1035, in test_splituser
        self.assertEqual(splituser('User:Pass@www.python.org:080'),
      File "/home/serhiy/py/cpython-gc/Lib/urllib/parse.py", line 1000, in splituser
        DeprecationWarning, stacklevel=2)
    DeprecationWarning: urllib.parse.splituser() is deprecated as of 3.8, use urllib.parse.urlparse() instead

    ======================================================================
    ERROR: test_splitvalue (test.test_urlparse.Utility_Tests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/serhiy/py/cpython-gc/Lib/test/test_urlparse.py", line 1124, in test_splitvalue
        self.assertEqual(splitvalue('foo=bar'), ('foo', 'bar'))
      File "/home/serhiy/py/cpython-gc/Lib/urllib/parse.py", line 1117, in splitvalue
        DeprecationWarning, stacklevel=2)
    DeprecationWarning: urllib.parse.splitvalue() is deprecated as of 3.8, use urllib.parse.parse_qsl() instead

    ======================================================================
    ERROR: test_to_bytes (test.test_urlparse.Utility_Tests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/serhiy/py/cpython-gc/Lib/test/test_urlparse.py", line 1131, in test_to_bytes
        result = urllib.parse.to_bytes('http://www.python.org')
      File "/home/serhiy/py/cpython-gc/Lib/urllib/parse.py", line 920, in to_bytes
        DeprecationWarning, stacklevel=2)
    DeprecationWarning: urllib.parse.to_bytes() is deprecated as of 3.8

    ======================================================================
    ERROR: test_unwrap (test.test_urlparse.Utility_Tests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/serhiy/py/cpython-gc/Lib/test/test_urlparse.py", line 1137, in test_unwrap
        url = urllib.parse.unwrap('<URL:type://host/path>')
      File "/home/serhiy/py/cpython-gc/Lib/urllib/parse.py", line 940, in unwrap
        DeprecationWarning, stacklevel=2)
    DeprecationWarning: urllib.parse.unwrap() is deprecated as of 3.8

    @csabella
    Copy link
    Contributor

    Serhiy,

    Thanks for finding this. I've submitted a PR to fix the tests.

    @serhiy-storchaka
    Copy link
    Member

    New changeset 867b825 by Serhiy Storchaka (Cheryl Sabella) in branch 'master':
    bpo-27485: Change urlparse tests to use private methods. (GH-7070)
    867b825

    @jaraco
    Copy link
    Member

    jaraco commented Feb 3, 2019

    Please refer to bpo-35891 for a description of an important use-case broken by the planned removal of splituser.

    @vstinner
    Copy link
    Member

    vstinner commented Sep 1, 2021

    Follow-up: I created bpo-45084 to remove these undocumented and deprecated functions in Python 3.11.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 (EOL) end of life
    Projects
    None yet
    Development

    No branches or pull requests

    9 participants