Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Urls with a trailing / are not found by --sany #102

Closed
AndreiUlmeyda opened this issue Dec 5, 2016 · 2 comments
Closed

Urls with a trailing / are not found by --sany #102

AndreiUlmeyda opened this issue Dec 5, 2016 · 2 comments

Comments

@AndreiUlmeyda
Copy link

Greetings,
buku --sany http://www.something.com/
fails while
buku --sany http://www.something.com
does not, even if the former is the url as displayed by buku.

@jarun
Copy link
Owner

jarun commented Dec 6, 2016

The reason is --sany and --sall are limited at word boundaries using regex. So the trailing space is never matched. You have to use --deep too. For example, the following work:

buku --sany http://www.something.com/ --deep
buku --sall http://www.something.com/ --deep

If you search "hello/" (including the double quotes) in google you'ld notice the same behaviour. It searches hello. If you search for "www.hellomagazine.com/", the results would include the following:

www.hellomagazine.com/
https://en.wikipedia.org/wiki/Hellomagazine.com

So google is ignoring the trailing slash altogether!

I'ld have to trim the trailing slashes to make it work just like google. I can also document this more clearly in the operational notes section and leave it be.

What do you say?

@jarun
Copy link
Owner

jarun commented Dec 6, 2016

Another note:

"hello//// world" returns the same results as hello world. So all trialing /s are removed from all tokens.

My fix will be along the same line.

@jarun jarun closed this as completed in 41deb14 Dec 6, 2016
jarun added a commit that referenced this issue Dec 6, 2016
The behaviour is adapted from google's behaviour.
Please see the notes in the bug log for more details.
@github-actions github-actions bot locked and limited conversation to collaborators Jun 18, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants