New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate rewritten srcset image urls; handle urls with commas #269
Separate rewritten srcset image urls; handle urls with commas #269
Conversation
…fused browser requests
Part 2: per offline discussion, address #238 as well, insofar as possible. Perfect splitting of URL lists in srcset is likely impossible due to ambiguities in the spec: URL paths may contain commas, but in certain circumstances, URLs in srcsets may be joined by simple commas, rather than commas and spaces. "/foo,bar/baz" is therefore ambiguous: a single url, or two relative urls (/foo,bar and /baz") joined by a comma? This commit represents our best effort so far to split via a regex, rather than implementing the token-based parsing recommended in the spec (see "parse the value of the element’s srcset attribute as follows" in http://w3c.github.io/html/semantics-embedded-content.html#element-attrdef-img-srcset) |
The tests pass locally. The failure in Travis appears unrelated to this PR. I suspect it's a package version issue; investigating. |
Confirmed: a fresh clone of the repo + a fresh virtualenv fails in the same way as the travis ci build for this PR. |
aae5920
to
077e397
Compare
Revised per offline code review. Thanks @ikreymer! |
Looks great, thanks @rebeccacremona for this thorough PR! |
Part 1: This PR offers a tiny tweak, the result of investigating why certain images at https://lil.law.harvard.edu, like the logo, aren't currently being captured/played back.
It turns out that websites using
srcset
to implement responsive images normally define a list of sources separated by both commas and whitespace. The whitespace is necessary since URLs can contain commas in the path section without being percent-encoded (see related issue #238). Without the whitespace, browsers will/may interpret the list of sources as one giant url.The LIL src html:
currently results in the rewritten html:
which results in the request (404 Not Found):
By joining with ", " instead of "," these images are captured and played back as expected.