Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails to save screen shots when url has query string #25

Open
ReevesL opened this issue Feb 11, 2019 · 1 comment · May be fixed by #26
Open

Fails to save screen shots when url has query string #25

ReevesL opened this issue Feb 11, 2019 · 1 comment · May be fixed by #26

Comments

@ReevesL
Copy link

ReevesL commented Feb 11, 2019

The site I was trying to crawl has query strings as part of the navigation causing the script to fail when trying to save the screen shot on Windows (may or may not repro on other platforms). It appears slugify doesn't trim out all characters illegal for Windows file names.

Example error sequence (sort headers on a table add to the query string):
Loading: https://example.org/index/99984?sort=NAME&order=asc (node:19708) UnhandledPromiseRejectionWarning: Error: ENOENT: no such file or directory, open 'C:\Users\Reeves\source\repos\puppeteer\output\https___example.org\https___example.org\index_99984?sort=NAME&order=asc'

I corrected this in my script by adding 'santize-filename' and adding to the screenshots section of the code (on line 146 at this hot second).
const path = `./${OUT_DIR}/${slugify(sanitze(page.url))}.png`;

The slugify in this context may be redundant.

@ReevesL
Copy link
Author

ReevesL commented Feb 11, 2019

I overlooked that you are using a custom slugify function and not the module. I extended the custom slugify function to include all characters which shouldn't be in a file path (character list based reserved characters list from wikipedia file path article).

Here's my proposed fix:

// Replaces characters from the URL which are illegal in a file path for working dir and saving screenshots.
function slugify(str) {
      return str.replace(/[\/:?*%|"<>. ]/g, '_');
  }

Thanks,
Reeves

ReevesL added a commit to ReevesL/puppeteer-examples that referenced this issue Feb 11, 2019
Added more characters to slugify function to fix issue puppeteer#25, querystrings in URLs break file-related functions.
@ReevesL ReevesL linked a pull request Feb 11, 2019 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant