-
Notifications
You must be signed in to change notification settings - Fork 149
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
This is a support query. Echoing a number of posts (such as #119) , I write to seek help from the community regarding sign-in issues. I am trying to access a database behind a single sign-on system, which redirects all authentication queries to a centralized system.
Playwright is able handle the task easily, but Scrapy struggles to move past the first URL redirect. When I instruct it to follow the redirected link manually -- i.e. clicking on "If your browser does not continue automatically, click here." -- it leads to a stale request and fails to reach the log-in page.
What explains the differences in outcome? Am I missing something? Please see below for reproducible code.
Original Playwright
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.firefox.launch(headless=False, slow_mo=100)
context = browser.new_context()
page = context.new_page()
page.goto('http://ezproxy-prd.bodleian.ox.ac.uk:10224/', wait_until='networkidle')
# Interact with login form
page.get_by_placeholder("someone@example.com").fill('test@ox.ac.uk')
page.get_by_text("Next").click()
page.wait_for_load_state('domcontentloaded')
With scrapy-playwright
import scrapy
from scrapy_playwright.page import PageMethod
from scrapy.shell import inspect_response
class CcrdSpiderSpider(scrapy.Spider):
name = 'ccrd_spider'
def start_requests(self):
login_url = "http://ezproxy-prd.bodleian.ox.ac.uk:10224"
yield scrapy.Request(login_url, meta= dict(
playwright = True,
playwright_context = "new",
playwright_include_page = True,
playwright_page_methods =[
PageMethod('wait_for_load_state', state="domcontentloaded"),
],
),
callback= self.parse
)
def parse(self, response):
inspect_response(response, self)
# pass
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working