Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally disable content security policies #114

Closed
jamesking opened this issue Jul 4, 2023 · 3 comments
Closed

Optionally disable content security policies #114

jamesking opened this issue Jul 4, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@jamesking
Copy link

Problem

I have been following this TIL to run the Readability.js on a page with Shot Scraper.

https://til.simonwillison.net/shot-scraper/readability

This worked fine for pages with liberal content security policies, however when tried to scrape a page with a stronger CSP I ran across this error:

Refused to load the script 'https://cdn.skypack.dev/@mozilla/readability' because it violates the following Content Security Policy directive: …

When a page has a strong CSP like this it limits the ability for Shot Scraper to run Javascript on a page before processing it.

Suggestion

The Playwright Python tools have an optional bypass_csp argument that can be passed to the new_context method.

As a test I monkey-patched shot_scraper/cli.py with the following:

# cli.py, line 353
...
context_args["bypass_csp"] = True # <-- Line added
context = browser_obj.new_context(**context_args)
...

And now the Readability.js script executes without a problem. :)

It would be really useful to give Shot Scraper a CLI argument like --bypass-csp that would then optionally add this argument in Playwright and allow more flexibility to run javascript on pages like this.

Thank you for a great tool!

@sesh
Copy link
Contributor

sesh commented Jul 30, 2023

I just ran into this today while testing Simon's TIL about running axe-core with shot-scraper.

I've taken @jamesking's suggestion above and implemented it in a PR. The --bypass-csp option is added to all commands that allow you to execute Javascript. See #116.

@simonw simonw added the enhancement New feature or request label Nov 1, 2023
@simonw
Copy link
Owner

simonw commented Nov 1, 2023

This is a really smart feature request, and #116 looks like a good implementation.

@simonw simonw closed this as completed in 3d14b03 Nov 1, 2023
simonw added a commit that referenced this issue Nov 1, 2023
simonw added a commit that referenced this issue Nov 1, 2023
@simonw
Copy link
Owner

simonw commented Nov 1, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants