Skip to content
This repository has been archived by the owner on Dec 9, 2018. It is now read-only.

Method Not Allowed (error code 202) #8

Closed
voxsim opened this issue Apr 23, 2012 · 9 comments
Closed

Method Not Allowed (error code 202) #8

voxsim opened this issue Apr 23, 2012 · 9 comments

Comments

@voxsim
Copy link

voxsim commented Apr 23, 2012

this is my code:

import dryscrape


# set up a web scraping session
sess = dryscrape.Session(base_url = 'http://www.udacity.com/')

# there are some failing HTTP requests, so we need to enter
# a more error-resistant mode (like real browsers do)
sess.set_error_tolerant(True)

# we don't need images
sess.set_attribute('auto_load_images', False)

# visit homepage and log in
print "Logging in..."
sess.visit('/')

email_field = sess.at_xpath('//input[@name="email"]')
print email_field
password_field = sess.at_xpath('//input[@name="password"]')
print password_field

email_field.set(USERNAME)
password_field.set(PASSWORD)
email_field.form().submit()

and that is the output

Logging in...
<Node #/html/body/div[@id='not-footer']/div[@id='top_bin']/div[@id='top_content']/div/div[@id='user-topbar-button-overlay']/form[@id='signin-form']/div[1]/input[1]>
<Node #/html/body/div[@id='not-footer']/div[@id='top_bin']/div[@id='top_content']/div/div[@id='user-topbar-button-overlay']/form[@id='signin-form']/div[1]/input[2]>
<Node #/html/body/div[@id='not-footer']/div[@id='top_bin']/div[@id='top_content']/div/div[@id='user-topbar-button-overlay']/form[@id='signin-form']>
Traceback (most recent call last):
  File "prova.py", line 30, in <module>
    email_field.form().submit()
  File "/home/simon/projects/udacity_downloader/dryscrape/driver/webkit_server/__init__.py", line 97, in submit
    self.client.wait()
  File "/home/simon/projects/udacity_downloader/dryscrape/driver/webkit_server/__init__.py", line 224, in wait
    self.conn.issue_command("Wait")
  File "/home/simon/projects/udacity_downloader/dryscrape/driver/webkit_server/__init__.py", line 429, in issue_command
    return self._read_response()
  File "/home/simon/projects/udacity_downloader/dryscrape/driver/webkit_server/__init__.py", line 438, in _read_response
    raise InvalidResponseError, self._read_message()
dryscrape.driver.webkit_server.InvalidResponseError: Error while loading URL http://www.udacity.com/: Error downloading http://www.udacity.com/ - server replied: Method Not Allowed (error code 202)

any suggestion to resolve this problem?

@voxsim
Copy link
Author

voxsim commented Apr 24, 2012

I found the problem: the submit of the form isn't the right way.. they hide how to submit the data.. I try to simulate the click of button "GO" but this don't do anything

@voxsim voxsim closed this as completed Apr 24, 2012
@voxsim voxsim reopened this Apr 24, 2012
@niklasb
Copy link
Owner

niklasb commented Apr 25, 2012

You're not the first to request more error tolerance to make these server-side errors non-fatal. I will try and find a solution as soon as I find the time.

@voxsim
Copy link
Author

voxsim commented Apr 25, 2012

It's not your fault! It 's udacity fault and jQuery.ajax() .. I solved with javascript injection in the page (https://github.com/voxsim/udacity_downloader/blob/master/udacity.py)

@niklasb
Copy link
Owner

niklasb commented Apr 25, 2012

@voxsim: It's nice that you could fix it on your side in this case, but dryscrape is specifically designed to be able to scrape real-world web pages, and those have bugs (which you usually can't fix on the server side).

@voxsim
Copy link
Author

voxsim commented Apr 25, 2012

@niklasb: Maybe you're right. Now i really don't understand how to debug dryscrape and webkit_server, i sniffed the packet traffic with wireshark. I intend to use dryscrape in various my projects, maybe I can help to fix something.

@niklasb
Copy link
Owner

niklasb commented Apr 25, 2012

@voxsim: I usually just use cout/cerr for C++ debugging, especially because in the case of Qt, a lot of multi-threading is going on. What we need in particular is a way to make failures on intermediate requests (CSS resources, Javascripts etc.) non-fatal, but still finish loading the page. Without looking into it myself, I can't tell you what the actual caveats might be here. I remember that the SetErrorTolerant command was doing something similar, but it was quite a hack (and doesn't seem to work as expected in many cases).

@voxsim
Copy link
Author

voxsim commented Apr 25, 2012

@niklasb: ok i understand, now i have fork of dryscrape and webkit-server, if i found one way to fix some problems i can pull request and try to merge my patch, ok? (i'm new of github, but i'm old of git XD)
I saw many developers working on capybara webkit-server, I'll see if they have already solved some problems.

@niklasb
Copy link
Owner

niklasb commented Apr 25, 2012

@voxsim: Yes, that's the way it works best :) It's best to create a topic branch for work like this. Pull requests are basically just notifications of changes on a branch to the original author. I actually have to check the current status of the "real" webkit_server myself. Have contributed quite a bit to it already, but they are constantly adding new features.

Thanks for your interest and participation by the way :) It's highly appreciated!

@voxsim
Copy link
Author

voxsim commented Apr 25, 2012

@niklasb: Good :D I sent you an email just to talk about webkit-server and dryscrape when i'll have news and don't continue to talk about it here XD

@niklasb niklasb closed this as completed Apr 25, 2012
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants