Make use of content-type to detect format #252
Conversation
tabulator/helpers.py
Outdated
@@ -17,6 +18,19 @@ | |||
|
|||
# Module API | |||
|
|||
# Maps mime-type to format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use https://docs.python.org/3.6/library/mimetypes.html ?
tabulator/helpers.py
Outdated
req = requests.head(source, allow_redirects=True) | ||
if req.status_code == requests.codes.ok: | ||
content_type = req.headers.get('Content-type') | ||
if not content_type is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if content_type is not None
is more readable I think
tabulator/helpers.py
Outdated
if req.status_code == requests.codes.ok: | ||
content_type = req.headers.get('Content-type') | ||
if not content_type is None: | ||
format = CONTENT_TYPE_FORMAT.get(content_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here you can use mimetypes.guess_extension(...)
tabulator/helpers.py
Outdated
@@ -58,6 +72,14 @@ def detect_scheme_and_format(source): | |||
if query_string_format is not None and len(query_string_format) == 1: | |||
format = query_string_format[0] | |||
|
|||
# Test if format info can be extracted from Content-type header | |||
elif source.startswith('http'): | |||
req = requests.head(source, allow_redirects=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some exception handling is needed here.
@pierredittgen looks good, I added some comments. |
@akariv I think we had this implemented already at some point in time (should be possible to find the implementation in the history). But I disabled it because it was really inaccurate (the main problem was with Github Mime-Types as I can remember). Probably we need a flag to make this feature optional (disabled by default). Or something like this. If my remembering is correct. |
I think that Github serves all raw files as text to avoid abuse by 3rd
parties (as well as using them as a hosting service for images and videos).
'Regular' files from the internet should have more accurate MIME types I
think.
…On Mon, Oct 15, 2018 at 11:19 AM roll ***@***.***> wrote:
@akariv <https://github.com/akariv>
@pierredittgen <https://github.com/pierredittgen>
I have some input on this one.
I think we had this implemented already at some point in time (should be
possible to find the implementation in the history). But I disabled it
because it was really inaccurate (the main problem was with Github
Mime-Types as I can remember).
Probably we need a flag like to make this feature optional (disabled by
default). Or something like this.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#252 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAQMdUzkkp1LLPPU3XKQeutV5ql0hwL0ks5ulEUngaJpZM4XVSlE>
.
|
But controlling this behaviour using a flag is not a bad idea.
…On Mon, Oct 15, 2018 at 11:26 AM Adam Kariv ***@***.***> wrote:
I think that Github serves all raw files as text to avoid abuse by 3rd
parties (as well as using them as a hosting service for images and videos).
'Regular' files from the internet should have more accurate MIME types I
think.
On Mon, Oct 15, 2018 at 11:19 AM roll ***@***.***> wrote:
> @akariv <https://github.com/akariv>
> @pierredittgen <https://github.com/pierredittgen>
> I have some input on this one.
>
> I think we had this implemented already at some point in time (should be
> possible to find the implementation in the history). But I disabled it
> because it was really inaccurate (the main problem was with Github
> Mime-Types as I can remember).
>
> Probably we need a flag like to make this feature optional (disabled by
> default). Or something like this.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#252 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AAQMdUzkkp1LLPPU3XKQeutV5ql0hwL0ks5ulEUngaJpZM4XVSlE>
> .
>
|
Has this pull-request been reviewed and accepted? What could we do @pierredittgen and I to move on? The parameter |
@cbenz Happy to review but not sure what's status of the PR (the flag, tests). |
What is missing to merge this PR? |
Sorry for the really late reply. I'm probably slightly out of sync and can be wrong. But I have a few questions:
|
In fact, in our code, we directly call the |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@roll It seems a lot of work & progress was made here but not totally finished. What's left to do in order to merge this PR? On our side, @pierredittgen can still be available to help close this. :) |
The main problem was that it'd been intended to update the
We need to add a publically available option to the And probably copy this code to something like |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Currently, when using tabulator with URL sources, format is infered from:
format
URL parameter (e.g. https://example.com/foobar?format=xls¶m=baz -> xls)One step beyond: when an URL has no format suffix, neither format url parameter, tabulator can request HEAD information and infer format from Content-type information. E.g content type 'text/csv' means csv format.
Content-type
toformat
is defined as a dict (CONTENT_TYPE_FORMAT
) that can be extended as needed to support more mime-types.