Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inaccurate doc string in utils.python.isbinarytext #1389

Closed
GregoryVigoTorres opened this issue Jul 28, 2015 · 12 comments
Closed

inaccurate doc string in utils.python.isbinarytext #1389

GregoryVigoTorres opened this issue Jul 28, 2015 · 12 comments
Labels

Comments

@GregoryVigoTorres
Copy link
Contributor

It says:
"""Return True if the given text is considered binary, or False
otherwise, by looking for binary bytes at their chars
"""
However, instead of returning False, a TypeError is raised if text is not bytes.

context:
Porting responsetypes to Python 3
on branch tmp-py3, just pulled a few minutes ago
running
<tests.test_responsetypes.ResponseTypesTest testMethod=test_from_body>

@kmike
Copy link
Member

kmike commented Jul 28, 2015

Yeah, text argument name is very confusing. This function checks if data (a bytes object) looks like text or like a binary data. So text must be bytes, and the function should raise TypeError, but argument names, docstring and even function name are confusing.

Any suggestions?

@GregoryVigoTorres
Copy link
Contributor Author

It's totally confusing.

How about changing the docstring to:
"""Looks for binary bytes in chars and raises a TypeError if text is not bytes
"""

@kmike
Copy link
Member

kmike commented Jul 28, 2015

I wouldn't focus the docstring on raising this TypeError - it is just an expected type of an argument, not the main function purpose. What about this?

def isbinarytext(text):
    """Return True if the given ``text`` argument (a ``bytes`` object) 
    contains bytes that are uncommon in textual data.
    """

//cc @dangra @eliasdorneles

@eliasdorneles
Copy link
Member

Wouldn't isbinarydata be a better name?

@kmike
Copy link
Member

kmike commented Jul 28, 2015

isbinarydata sounds good to me, it is better than isbinarytext.

@nyov
Copy link
Contributor

nyov commented Jul 29, 2015

I stumbled about the same, while doing this change 691b7f3
and had to go confirm it wouldn't just consider it all binary in py3 now.
It's just checking if there are control characters in it or if it is sane to convert it to a string.

So my vote for more clarity, reversing the expectation:

def binary_is_text(data):
    """Returns True if the given ``data`` argument (a ``bytes`` object) 
    does not contain unprintable control characters.
    """

to write

if binary_is_text(data):
    text = data.decode(encoding)

though it breaks the naming convention of is....

@kmike
Copy link
Member

kmike commented Jul 29, 2015

I like @nyov's suggestion.

@GregoryVigoTorres
Copy link
Contributor Author

Could isinstance(text, six.binary_type) be used instead?

Could isbinarytextbe renamed isbinary or isbinarytype?

I also think @nyov's doc string is OK.

@kmike
Copy link
Member

kmike commented Jul 29, 2015

Could isinstance(text, six.binary_type) be used instead?
Could isbinarytextbe renamed isbinary or isbinarytype?

No, isbinarytext is different. It is not checking the type of argument, it checks its contents. Scrapy receives bytes from network and tries to detect how to decode them; usually there are some clues like HTTP headers, but if conventional methods fail Scrapy uses a heuristic: if contents looks like text (i.e. it doesn't contain control characters) then Scrapy tries to decode it as text. This is what isbinarytext is checking. If used correctly, it should only receive bytes; TypeError is just a sanity check.

This detection method is not perfect, and this function was intended to be internal, an implementation detail.

@GregoryVigoTorres
Copy link
Contributor Author

yeah, I'm seeing that six.binary_typeis not the same.
I guess my real issue is that isbinarytext raises an exception instead of returning False.

@nyov
Copy link
Contributor

nyov commented Jul 29, 2015

If you feed it the correct input, it doesn't raise any exception as @kmike already noted.
You can't feed a complex datatype to a function which expects an integer argument, either, without getting an exception.

@GregoryVigoTorres
Copy link
Contributor Author

I see this now. I was confused about something.
I still think it's a little unclear, but it's not really a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants