[RFC] Been thinking about removing Tokenization #355

krisk · 2020-03-09T17:24:58Z

I've been thinking of removing the tokenization option, mainly because (1) it makes searching and the setup a little counter-intuitive, (2) tokenization could be done by preprocessing the list, and (3) I'm working on adding the ability to use filters, like in fzf, which could possibly make this redundant.

But before I even think about that, I'd love to learn more from the users:

Does the option make sense to have, especially for fuzzy searching?
Do people actually use the option? Looking at the analytics from fusejs.io, it's one of the least used options.

Thanks!

petemolinero · 2020-03-09T17:47:01Z

What do you mean by "filters" exactly? Just looking through the fzf documentation and don't see that exact terminology used. Do you mean the enhanced search syntax, such as ^music .mp3$ sbtrkt !fire. If so, I think that would be great!

Honestly, I don't find myself using the tokenize setting much. Allowing every word to match individually returns too broad of results. What I would love to have would be a native way to separate tokens by commas. That way I (or the user) could keep the strictness of some tokens while still specifying multiple tokens: (e.g. John Smith, Instructor -> Combines results for "John Smith" and "Instructor", and doesn't match text only containing "John"). Currently the only way that I know of to do that kind of search is to do two separate searches and combine/sort the results. Which is okay...but when thinking about tokenization it seems like it would be a pretty common desire to have it work like this.

krisk · 2020-03-09T18:02:49Z

Yes, I mean "enhanced search syntax". Couldn't think of a word for it 😄

Allowing every word to match individually returns too broad of results

Agreed.

What I would love to have would be a native way to separate tokens by commas.>

Interesting feature!

ralf57 · 2020-03-10T13:33:57Z

@krisk I am possibly one of the few people using tokenization but I find it very useful, especially in conjunction with tags.

krisk · 2020-03-11T01:57:59Z

@ralf57 - makes sense. I do wonder whether for the tags use-case you could solve it by having the tags as an array in the list though:

const list = [{
 name: "something"
 tags: ["tag1", "tag2", "tag3"]
}]

let fuse = new Fuse(list, {
  key: ["tags"]
})

ralf57 · 2020-03-11T05:54:02Z

@krisk I have a very similar data structure, with tags as simple array, and tokenization really makes a difference with multiple words queries.
The only drawback is that it's not possible to control location, distance and threshold anymore.
But I somehow overlooked https://fusejs.io/#extended-search so I will check that out too.

danielfdickinson · 2020-03-14T19:34:09Z

I'm perfectly happy to see tokenization go by the wayside. I think it results in a lot of not so helpful hits when used in conjunction with fuzzy search. OTOH having a way to search things like arrays of tags with tokenization while at the same time searching 'content' fields (regular text) with fuzzy search would be 'nice to have'. Not a priority though.

krisk · 2020-03-15T04:48:39Z

@cshoredaniel yes I think I’ll annihilate it. One good thing is that you could still search individual tokens via the use of extended searching. I think it’s much more powerful and produces better results. Its one disadvantage is that the search query may not be too intuitive to an everyday user.

krisk added bug discussion and removed bug labels Mar 9, 2020

krisk mentioned this issue Mar 11, 2020

[RFC] Extended searching in v4.0.0-beta #356

Closed

krisk closed this as completed Mar 18, 2020

mfranzke mentioned this issue May 13, 2021

refactor: updated even further dependencies pattern-lab/patternlab-node#1320

Merged

78 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Been thinking about removing Tokenization #355

[RFC] Been thinking about removing Tokenization #355

krisk commented Mar 9, 2020

petemolinero commented Mar 9, 2020 •

edited

Loading

krisk commented Mar 9, 2020

ralf57 commented Mar 10, 2020 •

edited

Loading

krisk commented Mar 11, 2020

ralf57 commented Mar 11, 2020

danielfdickinson commented Mar 14, 2020

krisk commented Mar 15, 2020 •

edited

Loading

[RFC] Been thinking about removing Tokenization #355

[RFC] Been thinking about removing Tokenization #355

Comments

krisk commented Mar 9, 2020

petemolinero commented Mar 9, 2020 • edited Loading

krisk commented Mar 9, 2020

ralf57 commented Mar 10, 2020 • edited Loading

krisk commented Mar 11, 2020

ralf57 commented Mar 11, 2020

danielfdickinson commented Mar 14, 2020

krisk commented Mar 15, 2020 • edited Loading

petemolinero commented Mar 9, 2020 •

edited

Loading

ralf57 commented Mar 10, 2020 •

edited

Loading

krisk commented Mar 15, 2020 •

edited

Loading