New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inaccurate doc string in utils.python.isbinarytext #1389
Comments
Yeah, Any suggestions? |
It's totally confusing. How about changing the docstring to: |
I wouldn't focus the docstring on raising this TypeError - it is just an expected type of an argument, not the main function purpose. What about this?
//cc @dangra @eliasdorneles |
Wouldn't |
isbinarydata sounds good to me, it is better than isbinarytext. |
I stumbled about the same, while doing this change 691b7f3 So my vote for more clarity, reversing the expectation: def binary_is_text(data):
"""Returns True if the given ``data`` argument (a ``bytes`` object)
does not contain unprintable control characters.
""" to write if binary_is_text(data):
text = data.decode(encoding) though it breaks the naming convention of |
I like @nyov's suggestion. |
Could Could I also think @nyov's doc string is OK. |
No, isbinarytext is different. It is not checking the type of argument, it checks its contents. Scrapy receives bytes from network and tries to detect how to decode them; usually there are some clues like HTTP headers, but if conventional methods fail Scrapy uses a heuristic: if contents looks like text (i.e. it doesn't contain control characters) then Scrapy tries to decode it as text. This is what isbinarytext is checking. If used correctly, it should only receive bytes; TypeError is just a sanity check. This detection method is not perfect, and this function was intended to be internal, an implementation detail. |
yeah, I'm seeing that |
If you feed it the correct input, it doesn't raise any exception as @kmike already noted. |
I see this now. I was confused about something. |
It says:
"""Return True if the given text is considered binary, or False
otherwise, by looking for binary bytes at their chars
"""
However, instead of returning False, a TypeError is raised if text is not bytes.
context:
Porting responsetypes to Python 3
on branch tmp-py3, just pulled a few minutes ago
running
<tests.test_responsetypes.ResponseTypesTest testMethod=test_from_body>
The text was updated successfully, but these errors were encountered: