New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix string length computation for issue_command function #33

Open
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
3 participants
@M157q
Copy link

M157q commented Jun 28, 2017

When arg is not a ascii char,
len(arg) may be differrent from len(arg.encode("utf-8")).
For example, a Chinese character '哈',
len('哈') is 1, but len('哈'.encode("utf-8")) is 3,
which will raise an InvalidResponseError because of the wrong length.

This PR should fix this kind of problem.

@M157q

This comment has been minimized.

Copy link
Author

M157q commented Jun 28, 2017

hmm... looks like #28 and #31 are also dealing with this problem.
I should add a patch which can solve it once for all and this is related to the annoyed problem of unicode and bytes of Python 2 and Python 3.

A quick, trivial yet not beautiful patch for issue_command
+ Determine the Python version of current environment
+ Deal with the args according to py2 or py3

@M157q M157q force-pushed the Tagtoo:fix-issue-command branch from e4f3c9c to e0d9072 Jun 28, 2017

@M157q

This comment has been minimized.

Copy link
Author

M157q commented Jun 29, 2017

@quique0194, @pjrobertson:
Could both of you help me testing if this patch works for both of you in your scenario?
I've tested this patch in both Python 2 and Python 3 with this simple script and it worked for me,
I would be appreciate if you can help testing this patch,
since one of you is using Python 2 and another is using Python 3.
Thank you.
(Click Details to see the script)

# -*- coding: utf-8 -*-

import sys
import dryscrape


dryscrape.start_xvfb()
br = dryscrape.Session()
br.visit("https://google.com")


if sys.version_info[0] == 2:
    PY2 = True
    PY3 = False
if sys.version_info[0] == 3:
    PY2 = False
    PY3 = True


def test_py2():
    q = br.at_xpath('//*[@name="q"]')
    q.set(u"canción")
    q.form().submit()
    br.render("py2-unicode.png")

    q = br.at_xpath('//*[@name="q"]')
    q.set("北京")
    q.form().submit()
    br.render("py2-str.png")

    q = br.at_xpath('//*[@name="q"]')
    q.set(b"123")
    q.form().submit()
    br.render("py2-bytes.png")


def test_py3():
    q = br.at_xpath('//*[@name="q"]')
    q.set("北京")
    q.form().submit()
    br.render("py3-unicode.png")

    q = br.at_xpath('//*[@name="q"]')
    q.set("canción".encode("utf-8"))
    q.form().submit()
    br.render("py3-bytes.png")


if __name__ == "__main__":
    if PY2:
        test_py2()
    if PY3:
        test_py3()

    print("Done")
""" Deal with the unicode and bytes problem in Python 2 and Python 3 for socket.sendall """

if PY2:
if type(arg) != unicode:

This comment has been minimized.

@M157q

M157q Jun 29, 2017

Author

I am not sure if this raise an NameError in Python 3 since there's no unicode type in Python 3.
But I've tested it in Python 3 and it worked.
Would be appreciate if anyone has some advices on this.

This comment has been minimized.

@PhilippVerpoort

PhilippVerpoort Jul 19, 2017

While this would be a problem in other languages such as C++, where identifiers need to be defined at compilation time even if they are never used at runtime, this should not be an issue for Python, since here definitions of identifiers are only checked when encountered, and not during the syntax check.

In Python, it is also common to import packages only when they are required, and therefore import statements can be nested into if statements, indicating that an identifier does not have to be defined until it is encountered by the interpreter.

Therefore, I do not expect to find any problems with this line of code. In addition, I have successfully tested the suggested changes while observing no error messages or warnings.

arg = str(arg)
self._writeline(str(len(arg)))
self._sock.sendall(arg.encode("utf-8"))
self.send_arg(arg)

This comment has been minimized.

@PhilippVerpoort

PhilippVerpoort Jul 19, 2017

This file seems to be using an indentation of 2 spaces rather than 4 consistently throughout the file. I suggest correcting the indentation of line 523.

This comment has been minimized.

@M157q

M157q Jul 20, 2017

Author

Thanks, I've fixed it in this commit 62ad2e9.

@PhilippVerpoort

This comment has been minimized.

Copy link

PhilippVerpoort commented Jul 19, 2017

I came across the same problem, and found a very similar solution. I can confirm that the fundamental problem is the computed length of each argument as len(arg) in line 517 of the changed file.

I have reviewed the solution presented in this pull request. I find the updated code well written, and have successfully tested the updated version with both Python2 and Python3 using the provided example script from Dryscrape, as well as a script that I had written myself for a different purpose. In response to @M157q's comment on the use of the unicode identifier after checking for Python2, I believe that this should not be a problem (see my reply to the comment). I have added one comment to the changed file requesting a fix for the indentation.

I suggest to the maintainers of this repository to merge this pull request #33, to reject pull requests #28 and #31, and to mark issues #27 and #30 as fixed (or duplicates where appropriate).

Change the indentation from 4 to 2 for line 523
To fit the coding style of this project.

ref: #33 (comment)
@niklasb

This comment has been minimized.

Copy link
Owner

niklasb commented Jul 26, 2017

@M157q thank you for the patch and @PhilippVeerport thank you for your detailed analysis. I will try to look at this ASAP.

@niklasb

This comment has been minimized.

Copy link
Owner

niklasb commented Aug 22, 2017

@M157q @PhilippVerpoort Thank you for your contribution. Unfortunately I have decided to no longer maintain dryscrape or webkit-server, since properly doing so would require rewriting webkit-server using QtWebEngine. Consider using a different package for your scraping if you do not want to be vulnerable to old WebKit bugs.

@PhilippVerpoort

This comment has been minimized.

Copy link

PhilippVerpoort commented Aug 22, 2017

@niklasb Thanks for your reply. This is completely fair, but would you consider implementing this change and filing a new release with this bug fix anyway? At least people who do consider for some reason to use this piece of software will not have to go through the same process of manually applying this patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment