New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simulating browsing to a page and then logging in #591
Comments
Hey @jeremyquinton, Are you using Scrapy or Splash or scrapy-splash? |
@kmike Thanks for the reply Im just using Splash and the have golang code to drive it. |
@jeremyquinton I think there are two options: 1) login with each request, and 2) login once, store cookies, then use them for all requests. Either way, to login you need to go to a login page and submit the login form; one way to do this is to to use Splash Lua scripts (see http://splash.readthedocs.io/en/stable/scripting-tutorial.html) - use After a login you'll likely have to wait for some element to appear (see http://stackoverflow.com/questions/41075257/adding-a-wait-for-element-while-performing-a-splashrequest-in-python-scrapy), or just wait for some time using splash:wait. If you're logging in at each request you can just go to the next page using splash:go after logging in. A better way is to login once and perserve cookies; to do so use splash:get_cookies to get cookies after the login, return them to your Go client, and then send back with each request and use splash:init_cookies to initialize them. |
@kmike thanks for the detailed reply much appreciated. Im going to try the above and see if I can get it going. I can perhaps do a PR to the docs as an extra example if it I get it working will keep you posted. |
@kmike I got the login part working correctly Im just not sure I understand the second part of your answer. Once I have returned the cookies using splash:get_cookies I can inject them into my go client. The edit on Github link here https://splash.readthedocs.io/en/stable/ is not working. I have some contributions to the documentation I would like to suggest. |
@kmike does the render.html api of Splash support for the pages with JS logging? If so, how to make it? Thanks |
@kmike is there any plan or chance for splash object having a wait method you described in http://stackoverflow.com/questions/41075257/adding-a-wait-for-element-while-performing-a-splashrequest-in-python-scrapy Such a smart checking would make our scripts smaller and more readable. I admit one can make his own Lua module and import it from the script. |
I currently use scrappy to scrape javascript sites and it works extremely well.
However I have a website where I need to
Is this possible to do with Scrappy and if so can someone point me in the direction of an example or correct documentation. I have looked in the documentation and tried a couple things but no luck.
I cant seem to set the label on this issue to Question.
The text was updated successfully, but these errors were encountered: