Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simulating browsing to a page and then logging in #591

Closed
jeremyquinton opened this issue Apr 3, 2017 · 7 comments
Closed

Simulating browsing to a page and then logging in #591

jeremyquinton opened this issue Apr 3, 2017 · 7 comments

Comments

@jeremyquinton
Copy link

jeremyquinton commented Apr 3, 2017

I currently use scrappy to scrape javascript sites and it works extremely well.

However I have a website where I need to

  1. Request the login page and let the page render and javascript execute.
  2. Login to the website by simulating the click on the login form as certain Javascript methods are called.

Is this possible to do with Scrappy and if so can someone point me in the direction of an example or correct documentation. I have looked in the documentation and tried a couple things but no luck.

I cant seem to set the label on this issue to Question.

@kmike
Copy link
Member

kmike commented Apr 3, 2017

Hey @jeremyquinton,

Are you using Scrapy or Splash or scrapy-splash?

@jeremyquinton
Copy link
Author

@kmike Thanks for the reply Im just using Splash and the have golang code to drive it.

@kmike
Copy link
Member

kmike commented Apr 3, 2017

@jeremyquinton I think there are two options: 1) login with each request, and 2) login once, store cookies, then use them for all requests.

Either way, to login you need to go to a login page and submit the login form; one way to do this is to to use Splash Lua scripts (see http://splash.readthedocs.io/en/stable/scripting-tutorial.html) - use splash:go to visit a page and then element:fill and element:submit to submit a form - see examples here.

After a login you'll likely have to wait for some element to appear (see http://stackoverflow.com/questions/41075257/adding-a-wait-for-element-while-performing-a-splashrequest-in-python-scrapy), or just wait for some time using splash:wait.

If you're logging in at each request you can just go to the next page using splash:go after logging in.

A better way is to login once and perserve cookies; to do so use splash:get_cookies to get cookies after the login, return them to your Go client, and then send back with each request and use splash:init_cookies to initialize them.

@jeremyquinton
Copy link
Author

@kmike thanks for the detailed reply much appreciated. Im going to try the above and see if I can get it going. I can perhaps do a PR to the docs as an extra example if it I get it working will keep you posted.

@jeremyquinton
Copy link
Author

jeremyquinton commented Apr 5, 2017

@kmike I got the login part working correctly Im just not sure I understand the second part of your answer.

Once I have returned the cookies using splash:get_cookies I can inject them into my go client.
After logging in my next request is a GET request. If I make the GET request with the go client with cookies set to the /render.html endpoint does splash forward on the cookies as part of the request? Will this work or do I have to continue with the execute endpoint. Lastly is there a way to see the request that splash is sending to the actual website with some sort of debug mode.

The edit on Github link here https://splash.readthedocs.io/en/stable/ is not working. I have some contributions to the documentation I would like to suggest.

@wenxzhen
Copy link

wenxzhen commented May 3, 2017

@kmike does the render.html api of Splash support for the pages with JS logging? If so, how to make it?

Thanks

@gypapp
Copy link

gypapp commented Aug 8, 2017

@kmike is there any plan or chance for splash object having a wait method you described in http://stackoverflow.com/questions/41075257/adding-a-wait-for-element-while-performing-a-splashrequest-in-python-scrapy

Such a smart checking would make our scripts smaller and more readable. I admit one can make his own Lua module and import it from the script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants