inaccurate doc string in utils.python.isbinarytext #1389

GregoryVigoTorres · 2015-07-28T12:53:55Z

It says:
"""Return True if the given text is considered binary, or False
otherwise, by looking for binary bytes at their chars
"""
However, instead of returning False, a TypeError is raised if text is not bytes.

context:
Porting responsetypes to Python 3
on branch tmp-py3, just pulled a few minutes ago
running
<tests.test_responsetypes.ResponseTypesTest testMethod=test_from_body>

kmike · 2015-07-28T13:12:20Z

Yeah, text argument name is very confusing. This function checks if data (a bytes object) looks like text or like a binary data. So text must be bytes, and the function should raise TypeError, but argument names, docstring and even function name are confusing.

Any suggestions?

GregoryVigoTorres · 2015-07-28T13:28:25Z

It's totally confusing.

How about changing the docstring to:
"""Looks for binary bytes in chars and raises a TypeError if text is not bytes
"""

kmike · 2015-07-28T20:34:59Z

I wouldn't focus the docstring on raising this TypeError - it is just an expected type of an argument, not the main function purpose. What about this?

def isbinarytext(text):
    """Return True if the given ``text`` argument (a ``bytes`` object) 
    contains bytes that are uncommon in textual data.
    """

//cc @dangra @eliasdorneles

eliasdorneles · 2015-07-28T20:50:56Z

Wouldn't isbinarydata be a better name?

kmike · 2015-07-28T20:53:01Z

isbinarydata sounds good to me, it is better than isbinarytext.

nyov · 2015-07-29T05:33:32Z

I stumbled about the same, while doing this change 691b7f3
and had to go confirm it wouldn't just consider it all binary in py3 now.
It's just checking if there are control characters in it or if it is sane to convert it to a string.

So my vote for more clarity, reversing the expectation:

def binary_is_text(data):
    """Returns True if the given ``data`` argument (a ``bytes`` object) 
    does not contain unprintable control characters.
    """

to write

if binary_is_text(data):
    text = data.decode(encoding)

though it breaks the naming convention of is....

kmike · 2015-07-29T07:48:32Z

I like @nyov's suggestion.

GregoryVigoTorres · 2015-07-29T11:39:14Z

Could isinstance(text, six.binary_type) be used instead?

Could isbinarytextbe renamed isbinary or isbinarytype?

I also think @nyov's doc string is OK.

kmike · 2015-07-29T11:55:23Z

Could isinstance(text, six.binary_type) be used instead?
Could isbinarytextbe renamed isbinary or isbinarytype?

No, isbinarytext is different. It is not checking the type of argument, it checks its contents. Scrapy receives bytes from network and tries to detect how to decode them; usually there are some clues like HTTP headers, but if conventional methods fail Scrapy uses a heuristic: if contents looks like text (i.e. it doesn't contain control characters) then Scrapy tries to decode it as text. This is what isbinarytext is checking. If used correctly, it should only receive bytes; TypeError is just a sanity check.

This detection method is not perfect, and this function was intended to be internal, an implementation detail.

GregoryVigoTorres · 2015-07-29T12:13:48Z

yeah, I'm seeing that six.binary_typeis not the same.
I guess my real issue is that isbinarytext raises an exception instead of returning False.

nyov · 2015-07-29T13:30:24Z

If you feed it the correct input, it doesn't raise any exception as @kmike already noted.
You can't feed a complex datatype to a function which expects an integer argument, either, without getting an exception.

GregoryVigoTorres · 2015-07-29T14:42:33Z

I see this now. I was confused about something.
I still think it's a little unclear, but it's not really a bug.

Closes scrapy#1389

Closes #1389

redapple added the docs label Jan 27, 2016

nyov added a commit to nyov/scrapy that referenced this issue Mar 6, 2016

Rename isbinarytext function to binary_is_text for clarity

cfecdbf

Closes scrapy#1389

nyov mentioned this issue Mar 6, 2016

[MRG+1] Rename isbinarytext function to binary_is_text for clarity #1851

Merged

nyov added a commit to nyov/scrapy that referenced this issue Mar 7, 2016

Rename isbinarytext function to binary_is_text for clarity

7e86bf5

Closes scrapy#1389

nyov added a commit to nyov/scrapy that referenced this issue Mar 7, 2016

Rename isbinarytext function to binary_is_text for clarity

99c48b5

Closes scrapy#1389

nyov added a commit to nyov/scrapy that referenced this issue Mar 8, 2016

Rename isbinarytext function to binary_is_text for clarity

4a12c00

Closes scrapy#1389

nyov added a commit to nyov/scrapy that referenced this issue Mar 17, 2016

Rename isbinarytext function to binary_is_text for clarity

ebf0efc

Closes scrapy#1389

redapple closed this as completed in e8ca467 Mar 31, 2016

redapple pushed a commit that referenced this issue Mar 31, 2016

Rename isbinarytext function to binary_is_text for clarity

5ae8863

Closes #1389

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inaccurate doc string in utils.python.isbinarytext #1389

inaccurate doc string in utils.python.isbinarytext #1389

GregoryVigoTorres commented Jul 28, 2015

kmike commented Jul 28, 2015

GregoryVigoTorres commented Jul 28, 2015

kmike commented Jul 28, 2015

eliasdorneles commented Jul 28, 2015

kmike commented Jul 28, 2015

nyov commented Jul 29, 2015

kmike commented Jul 29, 2015

GregoryVigoTorres commented Jul 29, 2015

kmike commented Jul 29, 2015

GregoryVigoTorres commented Jul 29, 2015

nyov commented Jul 29, 2015

GregoryVigoTorres commented Jul 29, 2015

inaccurate doc string in utils.python.isbinarytext #1389

inaccurate doc string in utils.python.isbinarytext #1389

Comments

GregoryVigoTorres commented Jul 28, 2015

kmike commented Jul 28, 2015

GregoryVigoTorres commented Jul 28, 2015

kmike commented Jul 28, 2015

eliasdorneles commented Jul 28, 2015

kmike commented Jul 28, 2015

nyov commented Jul 29, 2015

kmike commented Jul 29, 2015

GregoryVigoTorres commented Jul 29, 2015

kmike commented Jul 29, 2015

GregoryVigoTorres commented Jul 29, 2015

nyov commented Jul 29, 2015

GregoryVigoTorres commented Jul 29, 2015