I tried to extract links from https://fr.yahoo.com/?p=us for a test. Some of the returned links contained too many characters at the end, for example : - https://s.yimg.com/dh/ap/default/130909/y_200_a.png"/> - https://s.yimg.com/rq/darla/2-9-1/js/g-r-min.js'></script> - https://s.yimg.com/os/uh-icons/0.1.16/uh/fonts/uh.eot?);src:url(https://s.yimg.com/os/uh-icons/0.1.16/uh/fonts/uh.eot?#iefix - https://s.yimg.com/dh/ap/default/150720/pc_icons_btns_sprite_0720_330pm.png"); - https://s.yimg.com/dh/ap/default/130908/SFL_Purple_reg.png'); I'm trying to understand UrlScanner's code but I'm not sure to be able to fix it. I'll send a pull request later if I manage to fix it.