You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 7, 2021. It is now read-only.
srcset URLs are not always correctly terminated:
not passing a (optional) descriptor to the source string (see srcset) results in an additional discovered URL containing the next character which can be a , , a %22 (") or %27 (').
As a result such image URLs will not be ignored as they don't match a file extension.
What should have happened?
The returned URLs should match the input in count and characters (see examples below).
Steps to reproduce the problem
I got a fresh clone and added these tests to test/discovery.js:
appended comma
it("should not append comma",function(){discover("<img srcset='https://example.com/image1.png, https://example.com/image2.png'>",{url: "https://example.com/"}).should.eql(["https://example.com/image1.png","https://example.com/image2.png"]);});
Result is ["https://example.com/image1.png,", "https://example.com/image2.png", "example.com/image1.png"]. Note however that the result was okay with relative URLs.
appended quote
it("should work with sourcesets",function(){varlinks=discover("<picture>"+"<source srcset='https://example.com/image1.svg, https://example.com/image2.svg' media='(min-width: 123px)'>"+"<source srcset='https://example.com/image3.svg' media='(min-width: 456px)'>"+"<img src='https://example.com/image4.svg'>"+"</picture>");links.should.be.an("array");links.should.eql(["https://example.com/image1.svg","https://example.com/image2.svg","https://example.com/image3.svg","https://example.com/image4.svg"]);});
Result is ["https://example.com/image4.svg", "https://example.com/image1.svg,", "https://example.com/image2.svg", "https://example.com/image3.svg", "https://example.com/image1.svg", "https://example.com/image2.svg%27"]. Using relative URLs in this example only returns image 4, 1 and 2%27. While this example of course is semantically wrong, using two images in the first source only serves to illustrate both described errors. Removing one source from the srcset still adds a %27-appended URL though.
A non-greedy matcher in crawler.js:341 seems to resolve that problem.
The text was updated successfully, but these errors were encountered:
Hi @PRGfx, it looks like this was due to the regexp on this line. When simplecrawler was being more actively maintained, our standard answer used to be that the default discovery mechanism would rather grab a few false positives than be too prescriptive about which resources were discovered. Since this was such a simple fix though, I went ahead and pushed a fix.
What happened?
srcset
URLs are not always correctly terminated:not passing a (optional) descriptor to the source string (see srcset) results in an additional discovered URL containing the next character which can be a
,
, a%22
("
) or%27
('
).As a result such image URLs will not be ignored as they don't match a file extension.
What should have happened?
The returned URLs should match the input in count and characters (see examples below).
Steps to reproduce the problem
I got a fresh clone and added these tests to
test/discovery.js
:appended comma
Result is
["https://example.com/image1.png,", "https://example.com/image2.png", "example.com/image1.png"]
. Note however that the result was okay with relative URLs.appended quote
Result is
["https://example.com/image4.svg", "https://example.com/image1.svg,", "https://example.com/image2.svg", "https://example.com/image3.svg", "https://example.com/image1.svg", "https://example.com/image2.svg%27"]
. Using relative URLs in this example only returns image 4, 1 and 2%27. While this example of course is semantically wrong, using two images in the first source only serves to illustrate both described errors. Removing one source from the srcset still adds a%27
-appended URL though.A non-greedy matcher in crawler.js:341 seems to resolve that problem.
The text was updated successfully, but these errors were encountered: