Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Added support for fuzzy word search #1037

Open
wants to merge 2 commits into from

8 participants

@eliasdorneles

Implemented "fuzzy" word search as presented in issue #858

@RafiKueng RafiKueng referenced this pull request in RafiKueng/SpaghettiLens
Closed

add fuzzy word search for lens selection #34

README.md
@@ -1,5 +1,8 @@
# Chosen
+**Note:** This is an experimental fork implementing "fuzzy" word search (as introduced in [this issue](https://github.com/harvesthq/chosen/issues/858)).
+For the original Chosen plugin, check out https://github.com/harvesthq/chosen.
@stof Collaborator
stof added a note

This shoud be removed

done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Makefile
@@ -0,0 +1,4 @@
+# simple makefile, because CoffeeScript cake isn't working on my system :/
+all:
+ cake build
+ uglifyjs -o chosen/chosen.jquery.min.js chosen/chosen.jquery.js
@stof Collaborator
stof added a note

cake build is already minifying

yeah, it isn't working in my system for a reason.
I'll remove it, in order to get the changesapplied.

done removed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@stof
Collaborator

the Prototype version should be updated as well. Both implementations should be kept in sync feature-wise

@eliasdorneles

@stof hey man, I've just updated the code with your recommendations! :)

@p-j

I tried your jquery file and on many case I had <em> parts appearing randomly in the text. especially when I erased part of my search. You may want to strip <em>s at the beginning of the function and add them back when you do the highlighting.

@eliasdorneles

@p-j hey, man! I'm not being able to reproduce your problem. Could you provide an example?

@p-j

@eliasdorneles, I whas not able to repro with your script only. Sorry for the false alarm.
I'm actually unable to repro in the exact same condition where yesterday I was 100% of the time.. I don't get it...

@eliasdorneles

@stof hey, man! Is there something else that is missing for this to go upstream?

@Olical

Just thought I should mention that I've been using this branch for a while now and it works really well. I, personally, haven't noticed anything wrong with it. It would be nice to have this in the core repository although I would imagine it would have to be behind a flag for backwards compatibility.

@pfiller pfiller referenced this pull request
Closed

Advanced search #1205

@pavlovt

This version is working great!
I think in some cases chosen is not usable without fuzzy search.
Please add it to the master branch!

@tjschuck
Owner

I can't say when or if this would be merged, but can you please at least a.) rebase against master, and b.) squash this into one (or a few) logical commit(s) to remove a lot of the leftover dev-cycle cruft in the commits? Things like 5e809cb and d806b34 and d6f3a81 should be squashed out of the history.

Directions here if you're unfamiliar.

@eliasdorneles

@tjschuck thank you for the directions -- I'm not familiar with rebase yet, I'll give a try.

@eliasdorneles

@tjschuck well, I was able to squash it into one commit only -- that should make the revision a bit easier.
I'm not sure if I'm the right person to merge them into the current master, as there has been a lot of new commits in the recent months and I haven't had time to keep up.

@tjschuck
Owner

@eliasdorneles If you rebase against master, you should then be up to date with what's happened there. Looking at https://github.com/eliasdorneles/chosen/commits/master, it appears that you're still quite a bit behind.

@eliasdorneles

Well, I meant to say that it was not that simple to rebase because the upstream code had several conflicting changes that were made after my initial pull request, and I had't been able to keep up.
So, today I redid the work against the current code base and cleaned up the code a little bit, I believe now it is easier to review it. :)

@eliasdorneles

Can someone please take a second look at this patch?
Or otherwise, le me know if there is a way I could package it so people can use it as an extension of vanilla Chosen?
Thanks, guys!

coffee/lib/abstract-chosen.coffee
((13 lines not shown))
- return true
+ search_string_match: (search_string, words) ->
+ for word in words
+ if search_string.toLowerCase().indexOf(word) < 0
+ return false
+ return true
+
+ highlight_search_text: (text, words) ->
+ # sort the query words to highlight the longest first
+ words.sort (a, b) -> b.length - a.length
+ highlight_offset = 10 # 1 + '<em></em>'.length
+ for word in words
+ startpos = text.toLowerCase().indexOf word
+ while startpos >= 0
+ text = text.substr(0, startpos) + '<em>' + text.substr(startpos, word.length) + '</em>' + text.substr(startpos + word.length)
+ startpos = text.toLowerCase().indexOf(word, startpos + highlight_offset)
@stof Collaborator
stof added a note

shouldn't you use startpos + highlight_offset + word.length to start matching only at the end of the occurrence we just found ?

The problem with that is that it would require the word parts to be in the same order. You see, it supports queries like "state uni" giving as results "United States" and also "Florida State University" -- this is useful when the user is not much familiar with the option names.

@stof Collaborator
stof added a note

this startpos defined on this line is about searching the same word again later in the text, not about searching the next word.

oh, you're absolutely right -- sorry for my confusion! That makes sense, it should add word.length to get the next match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@stof
Collaborator

I think the highlighting logic is flawed if you are searching for foo foobar as highlighting foo will find places already highlighted for foobar.
And I think it gets even worse when searching for r</e foobar (and as the search string comes from the user, there is no guarantee about what you get).

coffee/lib/abstract-chosen.coffee
@@ -130,10 +130,7 @@ class AbstractChosen
results = 0
searchText = this.get_search_text()
- escapedSearchText = searchText.replace(/[-[\]{}()*+?.,\\^$|#\s]/g, "\\$&")
- regexAnchor = if @search_contains then "" else "^"
- regex = new RegExp(regexAnchor + escapedSearchText, 'i')
- zregex = new RegExp(escapedSearchText, 'i')
+ words = searchText.toLowerCase().split(' ')
@stof Collaborator
stof added a note

You need to filter empty strings out of the list of words (in case of several consecutive spaces) to avoid weird stuff in the highlighting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
coffee/lib/abstract-chosen.coffee
((8 lines not shown))
- #TODO: replace this substitution of /\[\]/ with a list of characters to skip.
- parts = search_string.replace(/\[|\]/g, "").split(" ")
- if parts.length
- for part in parts
- if regex.test part
- return true
+ search_string_match: (search_string, words) ->
+ for word in words
+ if search_string.toLowerCase().indexOf(word) < 0
+ return false
+ return true
+
+ highlight_search_text: (text, words) ->
+ # sort the query words to highlight the longest first
+ words.sort (a, b) -> b.length - a.length
+ highlight_offset = 10 # 1 + '<em></em>'.length
@stof Collaborator
stof added a note

From a semantic PoV, the markup used to highlight the word should not be <em>word</em> but <mark>word</mark>: http://www.whatwg.org/specs/web-apps/current-work/multipage/text-level-semantics.html#the-mark-element

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@stof stof referenced this pull request
Open

Use word boundary matching #1483

0 of 3 tasks complete
@eliasdorneles

@stof thanks for the review, man -- I'll try to fix these.
I'm not sure about the problem with the search for r</e foobar, because that string only gets replaced if it matches with the option label. Considering this, do you think it is a problem still?

@eliasdorneles

@stof hey, man! I've added a new commit fixing the stuff you pointed out.

Also, I've tested the behavior of the highlighting with the search queries you mentioned and they seemed to work fine.
I've added to my local tests an option like this:

<option value="foobar">r&lt;/mark&gt; foo foobar</option>

What I found was:

  1. Searching for </mark> does not causes an issue because get_search_text() returns the search text already escaped. :)
  2. Searching for foobar foo ends up highlighting foo inside foobar indeed, but the nested <mark> tags do not affect the behavior.

If you really think it's worthwhile, I can try coming up with some algorithm to avoid the nested highlighting but as it doesn't affect the behavior, I hesitate to add more code to this with potentially more bugs -- it is simpler as it is.

@koenpunt
Collaborator

I tried implementing fuzzy search using a single regex, which works but then its not possible to highlight the matches.
Anyway, I do think we're going to need unit tests for this functionality. Maybe you can use the tests in my branch as a start: https://github.com/koenpunt/chosen/commits/jasmine-tests

@stof
Collaborator

@koenpunt Before adding specs, it would be easier to merge the PR adding the runner first so that they can be used

@stringfellow

bump - what's up with this? Over a year old, lots of people want fuzzy search done right - will it appear in chosen master do you think, folks?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Aug 17, 2013
  1. @eliasdorneles
Commits on Sep 4, 2013
  1. @eliasdorneles
This page is out of date. Refresh to see the latest.
Showing with 26 additions and 18 deletions.
  1. +26 −18 coffee/lib/abstract-chosen.coffee
View
44 coffee/lib/abstract-chosen.coffee
@@ -130,10 +130,7 @@ class AbstractChosen
results = 0
searchText = this.get_search_text()
- escapedSearchText = searchText.replace(/[-[\]{}()*+?.,\\^$|#\s]/g, "\\$&")
- regexAnchor = if @search_contains then "" else "^"
- regex = new RegExp(regexAnchor + escapedSearchText, 'i')
- zregex = new RegExp(escapedSearchText, 'i')
+ words = this.split_search_text(searchText)
for option in @results_data
@@ -154,14 +151,12 @@ class AbstractChosen
unless option.group and not @group_search
option.search_text = if option.group then option.label else option.html
- option.search_match = this.search_string_match(option.search_text, regex)
+ option.search_match = this.search_string_match(option.search_text, words)
results += 1 if option.search_match and not option.group
if option.search_match
if searchText.length
- startpos = option.search_text.search zregex
- text = option.search_text.substr(0, startpos + searchText.length) + '</em>' + option.search_text.substr(startpos + searchText.length)
- option.search_text = text.substr(0, startpos) + '<em>' + text.substr(startpos)
+ option.search_text = this.highlight_search_text(option.search_text, words)
results_group.group_match = true if results_group?
@@ -177,16 +172,29 @@ class AbstractChosen
this.update_results_content this.results_option_build()
this.winnow_results_set_highlight()
- search_string_match: (search_string, regex) ->
- if regex.test search_string
- return true
- else if @enable_split_word_search and (search_string.indexOf(" ") >= 0 or search_string.indexOf("[") == 0)
- #TODO: replace this substitution of /\[\]/ with a list of characters to skip.
- parts = search_string.replace(/\[|\]/g, "").split(" ")
- if parts.length
- for part in parts
- if regex.test part
- return true
+ search_string_match: (search_string, words) ->
+ for word in words
+ if search_string.toLowerCase().indexOf(word) < 0
+ return false
+ return true
+
+ split_search_text: (text) ->
+ words = []
+ for word in text.toLowerCase().split(' ')
+ if word.trim() != ""
+ words.push(word)
+ return words
+
+ highlight_search_text: (text, words) ->
+ # sort the query words to highlight the longest first
+ words.sort (a, b) -> b.length - a.length
+ highlight_offset = 14 # 1 + '<mark></mark>'.length
+ for word in words
+ startpos = text.toLowerCase().indexOf word
+ while startpos >= 0
+ text = text.substr(0, startpos) + '<mark>' + text.substr(startpos, word.length) + '</mark>' + text.substr(startpos + word.length)
+ startpos = text.toLowerCase().indexOf(word, startpos + highlight_offset + word.length)
+ return text
choices_count: ->
return @selected_option_count if @selected_option_count?
Something went wrong with that request. Please try again.