Google's "Did you mean" feature is very useful. Would be awesome if ES could implement this.
Lucene has pulled in the SpellChecker contrib. Maybe ES could expose that?
Ex. if I specify suggestSimilar with some optional parameters in my search object I could get back an array with some suggestions.
you can implement this yourself by having a search term index, probably using ngram and then sorted by popularity.
Can you give an example?
something like this: http://sujitpal.blogspot.com/2007/12/spelling-checker-with-lucene.html
But I also see that Lucene has pulled in the SpellChecker contrib: http://lucene.apache.org/java/3_1_0/api/all/org/apache/lucene/search/spell/SpellChecker.html so I guess ES could expose that.
@keteracel Red the article you linked. Looks interesting, but is probably more than I can handle at the moment. I really think something as useful as this should be in ES by default. I've updated the issue with a better description.
The current spell checker requires building an auxilery index in order to support it (and moreover, requires reindexing the data periodically). In Lucene 4.0, since fuzzy queries are much faster, spell checking can be done on the main index. So, the logic is that it makes little sense to incorperate a feature that is quite heavy weigth currently, and not simply waiting to easily implement it with 4.0 is out.
Agreed, that's the best solution. Any idea when 4.0 will be out?
No, no due date yet. It seems like the pace is being picked up towards a release, but it will take a few months I think.
Ok, thanks ;) Looking forward to it.
We would very much like this feature too.
Are there any news on this? Tired of running around with ASpell :(
We would like to use spellchecker too. Thank you.
Apologies for the +1, but this is way up my wishlist too.
Yep, me too! +1
This would be an awesome feature, for an already awesome product! Thank you so much :)
ping @kimchy It's been almost a year! :) Any status on this? Tonnnnns of +1's up in here!
Guys, I think @kimchy gets it... we all want this. However, Lucene 4.0 hasn't been released yet, and last update from him mentioned that that release would make this feature much easier. Maybe we should be pressuring the Lucene team to hurry up? There's been talk of a 4.0 release forever.
@mhluongo, it is understood that it's a better "Lucene 4.0" feature, but there seems to be other options in relation to spell checking, etc. for example, #646. A lot of open source softwares don't wait over a year for a feature that the community wants.... a bridge could be made for searching, and when Lucene supports it directly, it can be BC to a temporary/secondary solution (ie. hunspell). i.e. Symfony2 PHP framework builds functionalities for PHP4.0 to get the minor optimization, but has a backup strategy for php versions of 3.x.
My two cents is that this is a huge feature in memory based searching... and would def. set elasticsearch apart from anything else out there right now.
Just my two cents IMO. :)
@jstout24 I know that waiting for Lucene 4 is just the path of least resistance, but there are a ton of other awesome features that we could use, as well, and that could be written/maintained in the time saved. At some point one of these +1's needs to start coding themselves if we want this feature, or be okay with waiting (I'm guilty of this too, obviously).
Just trying to be understanding of an embattled OSS developer :)
To people "+1"ing, take a look over here: https://issues.apache.org/jira/browse/LUCENE/fixforversion/12314025. That's the progress of Lucene 4.0.
Heya fellows, understood, this feature is highly important. The only thing that can be done currently (aside from other ways of solving it like using custom built index using ngrams and the like) is to possibly write a plugin (and probably new extensions points) to the current Lucene spell checking behavior. But, its not really good... (as I explained in my first comment here).
Sorry but i have to +1 this issue too ^^
But now that lucene 4.0 is out, is it possible in any way or do we need an implementation in es ?
Lucene 4.0 is not out, only the beta. Final release probably will not happen until October.
I think we all know now that many people are interested in this feature, can we stop with the +1 please ?
They serve little to no purpose and spam anyone who is watching this thread for real informations.
4.0 is Out! :)
I agree with schmurfy, enough with +1s. If you want to subscribe to this issue, you can change your notification settings below. Look for the dropdown that says "Not watching thread" and change it to "Watch".
Shay commented on spellchecking and Lucene 4.0 last week. In case you missed it, here is the thread:
"The plan is the first get Lucene 4.0 integrated with elasticsearch, and then expose all the new features. We will take it feature by feature, but to your points, there will be a spellcheck builtin using the new "direct" spellcheck feature, you will be able to configure codecs in the mapping, and write a plugin that introduces new codes, and so on..."
I'd particularly like to use it when it's deployed on StackOverflow
seriously can we stop with the +1 ? There is a watch thread button at the bottom of the page if you want to be notified of any changes here which won't send a notification to anyone watching this.
See the suggest API added in 0.90