Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interactive screenshots revert to original page state before saving #125

Closed
mhalle opened this issue Nov 1, 2023 · 3 comments
Closed

Interactive screenshots revert to original page state before saving #125

mhalle opened this issue Nov 1, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@mhalle
Copy link
Contributor

mhalle commented Nov 1, 2023

The --interactive flag allows the user to interact with the page before saving. However, it seems that shot-scraper reloads the URL after enter is pressed, reverting the page back to its original state. This limitation or defect reduces the value of --interactive.

For example, load https://maps.google.com/ with --interactive. Drag the map around, then hit enter. The original state of the map is saved.

This is true for any controls as well. For example, try https://mui.com/material-ui/react-slider/ and drag some sliders around in interactive mode, then save. The sliders are in their original default state.

If the user changes the URL interactively to a new page, the old URL is reloaded and saved.

@simonw simonw added the bug Something isn't working label Nov 1, 2023
@simonw
Copy link
Owner

simonw commented Nov 1, 2023

Huh... yeah I just recreated that myself using shot-scraper 'https://maps.google.com/' --interactive - how weird, that's certainly not intended!

Here's what's supposed to happen:

if interactive or devtools:
use_existing_page = True
page = context.new_page()
page.goto(url)
context = page
click.echo(
"Hit <enter> to take the shot and close the browser window:", err=True
)
input()
try:
if output == "-":
shot = take_shot(
context,
shot,
return_bytes=True,
use_existing_page=use_existing_page,
log_requests=log_requests,
log_console=log_console,
silent=silent,
)
sys.stdout.buffer.write(shot)
else:
shot["output"] = str(output)
shot = take_shot(
context,
shot,
use_existing_page=use_existing_page,
log_requests=log_requests,
log_console=log_console,
skip=skip,
fail=fail,
silent=silent,
)

Note that in interactive mode it opens up the browser and then waits for the user to hit <enter> (that input() line)... but then still calls the take_shot() function. BUT it does call that with use_existing_page=True.

And here's where that takes effect:

if not use_existing_page:
page = context_or_page.new_page()
if log_requests:
def on_response(response):
try:
body = response.body()
size = len(body)
except Error:
size = None
log_requests.write(
json.dumps(
{
"method": response.request.method,
"url": response.url,
"status": response.status,
"size": size,
"timing": response.request.timing,
}
)
+ "\n"
)
page.on("response", on_response)
else:
page = context_or_page

Reading the code it looks to me like it should be doing the right thing - taking the screenshot based on the state of the page after the user has interacted with it, rather than creating a new page.

Needs more digging.

@simonw
Copy link
Owner

simonw commented Nov 1, 2023

Spotted it! Further down that take_shot() function:

page.set_viewport_size(viewport)
if shot.get("height"):
full_page = False
response = page.goto(url)
# Check if page was a 404 or 500 or other error
if str(response.status)[0] in ("4", "5"):
if skip:
click.echo(
"{} error for {}, skipping".format(response.status, url), err=True
)
return
elif fail:
raise click.ClickException("{} error for {}".format(response.status, url))

That response = page.goto(url) line is throwing away our previous state. That shouldn't happen if use_existing_page is set.

@simonw
Copy link
Owner

simonw commented Nov 1, 2023

Tested manually and this fixes the issue.

@simonw simonw closed this as completed Nov 1, 2023
simonw added a commit that referenced this issue Nov 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants