Links in HTML source followed by "), ">, ') or '> returns a wrong end index

I tried to extract links from https://fr.yahoo.com/?p=us for a test.

Some of the returned links contained too many characters at the end, for example : 
- https://s.yimg.com/dh/ap/default/130909/y_200_a.png"/>
- https://s.yimg.com/rq/darla/2-9-1/js/g-r-min.js'></script>
- https://s.yimg.com/os/uh-icons/0.1.16/uh/fonts/uh.eot?);src:url(https://s.yimg.com/os/uh-icons/0.1.16/uh/fonts/uh.eot?#iefix
- https://s.yimg.com/dh/ap/default/150720/pc_icons_btns_sprite_0720_330pm.png");
- https://s.yimg.com/dh/ap/default/130908/SFL_Purple_reg.png');

I'm trying to understand UrlScanner's code but I'm not sure to be able to fix it.

I'll send a pull request later if I manage to fix it.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Links in HTML source followed by "), ">, ') or '> returns a wrong end index #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Links in HTML source followed by "), ">, ') or '> returns a wrong end index #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions