Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mimetypes.guess_type("//example.com") misinterprets host name as file name #66543

Open
vadmium opened this issue Sep 6, 2014 · 15 comments
Open
Labels
3.7 only security fixes 3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@vadmium
Copy link
Member

vadmium commented Sep 6, 2014

BPO 22347
Nosy @ned-deily, @vadmium, @maxking, @corona10, @miss-islington
PRs
  • bpo-22347: Update mimetypes.guess_type to allow proper parsing of URLs #15522
  • [3.8] bpo-22347: Update mimetypes.guess_type to allow proper parsing of URLs (GH-15522) #15685
  • [3.7] bpo-22347: Update mimetypes.guess_type to allow proper parsing of URLs (GH-15522) #15687
  • bpo-38449: Revert "bpo-22347: Update mimetypes.guess_type to allow oper parsing of URLs (GH-15522)" #16724
  • [3.7] bpo-38449: Revert "bpo-22347: Update mimetypes.guess_type to allow oper parsing of URLs (GH-15522)" (GH-16724) #16725
  • [3.7] bpo-38449: Revert "bpo-22347: Update mimetypes.guess_type to allow oper parsing of URLs (GH-15685)" (GH-16724) #16727
  • [3.8] bpo-38449: Revert "bpo-22347: Update mimetypes.guess_type to allow oper parsing of URLs (GH-15685)" (GH-16724) #16728
  • Dependencies
  • bpo-35939: Remove urllib.parse._splittype from mimetypes.guess_type
  • Files
  • mimetypes-host.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2014-09-06.02:52:37.225>
    labels = ['3.7', '3.8', 'type-bug', 'library']
    title = 'mimetypes.guess_type("//example.com") misinterprets host name as file name'
    updated_at = <Date 2019-10-15.07:30:25.613>
    user = 'https://github.com/vadmium'

    bugs.python.org fields:

    activity = <Date 2019-10-15.07:30:25.613>
    actor = 'ned.deily'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2014-09-06.02:52:37.225>
    creator = 'martin.panter'
    dependencies = ['35939']
    files = ['38222']
    hgrepos = []
    issue_num = 22347
    keywords = ['patch']
    message_count = 15.0
    messages = ['226467', '236479', '335123', '351156', '351157', '351158', '351162', '351164', '351167', '354471', '354521', '354535', '354538', '354543', '354697']
    nosy_count = 5.0
    nosy_names = ['ned.deily', 'martin.panter', 'maxking', 'corona10', 'miss-islington']
    pr_nums = ['15522', '15685', '15687', '16724', '16725', '16727', '16728']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue22347'
    versions = ['Python 3.7', 'Python 3.8']

    @vadmium
    Copy link
    Member Author

    vadmium commented Sep 6, 2014

    The documentation says that guess_type() takes a URL, but:

    >>> mimetypes.guess_type("http://example.com")
    ('application/x-msdownload', None)

    I suspect the MS download is a reference to *.com files (like DOS's command.com). My current workaround is to strip out the host name from the URL, since I cannot imagine it would be useful for determining the content type. I am also stripping the fragment part. An argument could probably be made for stripping the “;parameters” and “?query” parts as well.

    >>> # Workaround for mimetypes.guess_type("//example.com")
    ... # interpreting host name as file name
    ... url = urlparse("http://example.com")
    >>> url = net.url_replace(url, netloc="", fragment="")
    >>> url
    'http://'
    >>> mimetypes.guess_type(url, strict=False)
    (None, None)

    @vadmium vadmium added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Sep 6, 2014
    @vadmium
    Copy link
    Member Author

    vadmium commented Feb 24, 2015

    Posting a patch to fix this. It passes the URL through a urlsplit() → urlunsplit() stage, while removing the scheme://netloc parts.

    @corona10
    Copy link
    Member

    corona10 commented Feb 9, 2019

    The proposed patch I mentioned on bpo-35939 also solve the above situation.

    Python 3.8.0a1+ (heads/bpo-12317:96d37dbcd2, Feb  8 2019, 12:03:40)
    [Clang 9.1.0 (clang-902.0.39.1)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import mimetypes
    >>> mimetypes.guess_type("http://example.com")
    (None, None)
    >>> mimetypes.guess_type("example.com")
    ('application/x-msdownload', None)
    >>>

    I've also added the unit tests of mimetypes-host.patch. It works well.
    I think that we close this issue also when the bpo-35939 is closed.

    Thanks alot!

    @corona10 corona10 added 3.7 only security fixes 3.8 only security fixes labels Feb 9, 2019
    @miss-islington
    Copy link
    Contributor

    New changeset 87bd207 by Miss Islington (bot) (Dong-hee Na) in branch 'master':
    bpo-22347: Update mimetypes.guess_type to allow proper parsing of URLs (GH-15522)
    87bd207

    @miss-islington
    Copy link
    Contributor

    New changeset 6d7a786 by Miss Islington (bot) in branch '3.8':
    bpo-22347: Update mimetypes.guess_type to allow proper parsing of URLs (GH-15522)
    6d7a786

    @miss-islington
    Copy link
    Contributor

    New changeset 8873bff by Miss Islington (bot) (Dong-hee Na) in branch '3.7':
    [3.7] bpo-22347: Update mimetypes.guess_type to allow proper parsing of URLs (GH-15522) (GH-15687)
    8873bff

    @corona10
    Copy link
    Member

    corona10 commented Sep 5, 2019

    @vstinner(my mentor) @maxking
    Now this issue is solved.
    I'd like to close this issue. Is it okay?

    @maxking
    Copy link
    Contributor

    maxking commented Sep 5, 2019

    I think so, yes.

    Also, while you are at it, can you also close bpo-35939 with a comment that points to this issue and the right PR for the fix?

    @corona10
    Copy link
    Member

    corona10 commented Sep 5, 2019

    Great! I will close bpo-35939 also.

    @corona10 corona10 closed this as completed Sep 5, 2019
    @ned-deily
    Copy link
    Member

    This change introduces a potential 3.7 regression; see bpo-38449.

    @miss-islington
    Copy link
    Contributor

    New changeset 19a3d87 by Miss Islington (bot) (Abhilash Raj) in branch 'master':
    bpo-38449: Revert "bpo-22347: Update mimetypes.guess_type to allow oper parsing of URLs (GH-15522)" (GH-16724)
    19a3d87

    @maxking
    Copy link
    Contributor

    maxking commented Oct 12, 2019

    I am going to re-open this since the fixes were reverted in all the branches.

    @maxking maxking reopened this Oct 12, 2019
    @maxking
    Copy link
    Contributor

    maxking commented Oct 12, 2019

    New changeset 5a638a8 by Abhilash Raj in branch '3.8':
    [3.8] bpo-38449: Revert "bpo-22347: Update mimetypes.guess_type to allow oper parsing of URLs" (GH-16724) (GH-16728)
    5a638a8

    @miss-islington
    Copy link
    Contributor

    New changeset 164bee2 by Miss Islington (bot) (Abhilash Raj) in branch '3.7':
    [3.7] bpo-38449: Revert "bpo-22347: Update mimetypes.guess_type to allow oper parsing of URLs (GH-15685)" (GH-16724) (GH-16727)
    164bee2

    @ned-deily
    Copy link
    Member

    New changeset 2a40559 by Ned Deily (Abhilash Raj) in branch '3.7':
    [3.7] bpo-38449: Revert "bpo-22347: Update mimetypes.guess_type to allow oper parsing of URLs (GH-15685)" (GH-16724) (GH-16727)
    2a40559

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 only security fixes 3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants