Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Been thinking about removing Tokenization #355

Closed
krisk opened this issue Mar 9, 2020 · 7 comments
Closed

[RFC] Been thinking about removing Tokenization #355

krisk opened this issue Mar 9, 2020 · 7 comments

Comments

@krisk
Copy link
Owner

krisk commented Mar 9, 2020

I've been thinking of removing the tokenization option, mainly because (1) it makes searching and the setup a little counter-intuitive, (2) tokenization could be done by preprocessing the list, and (3) I'm working on adding the ability to use filters, like in fzf, which could possibly make this redundant.

But before I even think about that, I'd love to learn more from the users:

  1. Does the option make sense to have, especially for fuzzy searching?
  2. Do people actually use the option? Looking at the analytics from fusejs.io, it's one of the least used options.

Thanks!

@petemolinero
Copy link

petemolinero commented Mar 9, 2020

What do you mean by "filters" exactly? Just looking through the fzf documentation and don't see that exact terminology used. Do you mean the enhanced search syntax, such as ^music .mp3$ sbtrkt !fire. If so, I think that would be great!

Honestly, I don't find myself using the tokenize setting much. Allowing every word to match individually returns too broad of results. What I would love to have would be a native way to separate tokens by commas. That way I (or the user) could keep the strictness of some tokens while still specifying multiple tokens: (e.g. John Smith, Instructor -> Combines results for "John Smith" and "Instructor", and doesn't match text only containing "John"). Currently the only way that I know of to do that kind of search is to do two separate searches and combine/sort the results. Which is okay...but when thinking about tokenization it seems like it would be a pretty common desire to have it work like this.

@krisk
Copy link
Owner Author

krisk commented Mar 9, 2020

Yes, I mean "enhanced search syntax". Couldn't think of a word for it 😄

Allowing every word to match individually returns too broad of results

Agreed.

What I would love to have would be a native way to separate tokens by commas.>

Interesting feature!

@ralf57
Copy link

ralf57 commented Mar 10, 2020

@krisk I am possibly one of the few people using tokenization but I find it very useful, especially in conjunction with tags.

@krisk
Copy link
Owner Author

krisk commented Mar 11, 2020

@ralf57 - makes sense. I do wonder whether for the tags use-case you could solve it by having the tags as an array in the list though:

const list = [{
 name: "something"
 tags: ["tag1", "tag2", "tag3"]
}]

let fuse = new Fuse(list, {
  key: ["tags"]
})

@ralf57
Copy link

ralf57 commented Mar 11, 2020

@krisk I have a very similar data structure, with tags as simple array, and tokenization really makes a difference with multiple words queries.
The only drawback is that it's not possible to control location, distance and threshold anymore.
But I somehow overlooked https://fusejs.io/#extended-search so I will check that out too.

@danielfdickinson
Copy link

I'm perfectly happy to see tokenization go by the wayside. I think it results in a lot of not so helpful hits when used in conjunction with fuzzy search. OTOH having a way to search things like arrays of tags with tokenization while at the same time searching 'content' fields (regular text) with fuzzy search would be 'nice to have'. Not a priority though.

@krisk
Copy link
Owner Author

krisk commented Mar 15, 2020

@cshoredaniel yes I think I’ll annihilate it. One good thing is that you could still search individual tokens via the use of extended searching. I think it’s much more powerful and produces better results. Its one disadvantage is that the search query may not be too intuitive to an everyday user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants