Skip to content

Single Sign-On (SSO) Authentication #143

@yilu1015

Description

@yilu1015

This is a support query. Echoing a number of posts (such as #119) , I write to seek help from the community regarding sign-in issues. I am trying to access a database behind a single sign-on system, which redirects all authentication queries to a centralized system.

Playwright is able handle the task easily, but Scrapy struggles to move past the first URL redirect. When I instruct it to follow the redirected link manually -- i.e. clicking on "If your browser does not continue automatically, click here." -- it leads to a stale request and fails to reach the log-in page.

What explains the differences in outcome? Am I missing something? Please see below for reproducible code.

Original Playwright

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.firefox.launch(headless=False, slow_mo=100)
    context = browser.new_context()
    page = context.new_page()
    page.goto('http://ezproxy-prd.bodleian.ox.ac.uk:10224/', wait_until='networkidle')

    # Interact with login form
    page.get_by_placeholder("someone@example.com").fill('test@ox.ac.uk')
    page.get_by_text("Next").click()
    page.wait_for_load_state('domcontentloaded')

With scrapy-playwright

import scrapy
from scrapy_playwright.page import PageMethod
from scrapy.shell import inspect_response

class CcrdSpiderSpider(scrapy.Spider):
    name = 'ccrd_spider'

    def start_requests(self):
        login_url = "http://ezproxy-prd.bodleian.ox.ac.uk:10224"
        yield scrapy.Request(login_url, meta= dict(
            playwright = True,
            playwright_context = "new",
            playwright_include_page = True,
            playwright_page_methods =[
                PageMethod('wait_for_load_state', state="domcontentloaded"),
            ],
            ),
            callback= self.parse
                          )
    def parse(self, response):
        inspect_response(response, self)
        # pass

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions