Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

"Did you mean" spellchecking #911

Closed
sindresorhus opened this Issue · 52 comments

36 participants

Sindre Sorhus keteracel Shay Banon richardsyeo Daniel Fort Jordan Stout voltaire-in Conrad Chu Matt Luongo tfreitas Florent Bécart alexis779 David Stendardi Adam Warski Julius Schorzman Bryan C Green Augusto Becciu Nick Dunn Mario Uher Sebastian Seilund dunkee Diego Plentz Gerard Matthew Kevin McBride and others
Sindre Sorhus

Google's "Did you mean" feature is very useful. Would be awesome if ES could implement this.

Lucene has pulled in the SpellChecker contrib. Maybe ES could expose that?

Ex. if I specify suggestSimilar with some optional parameters in my search object I could get back an array with some suggestions.

keteracel

you can implement this yourself by having a search term index, probably using ngram and then sorted by popularity.

Sindre Sorhus

Can you give an example?

keteracel

But I also see that Lucene has pulled in the SpellChecker contrib: http://lucene.apache.org/java/3_1_0/api/all/org/apache/lucene/search/spell/SpellChecker.html so I guess ES could expose that.

Sindre Sorhus

@keteracel Red the article you linked. Looks interesting, but is probably more than I can handle at the moment. I really think something as useful as this should be in ES by default. I've updated the issue with a better description.

Shay Banon
Owner

The current spell checker requires building an auxilery index in order to support it (and moreover, requires reindexing the data periodically). In Lucene 4.0, since fuzzy queries are much faster, spell checking can be done on the main index. So, the logic is that it makes little sense to incorperate a feature that is quite heavy weigth currently, and not simply waiting to easily implement it with 4.0 is out.

Sindre Sorhus

Agreed, that's the best solution. Any idea when 4.0 will be out?

Shay Banon
Owner

No, no due date yet. It seems like the pace is being picked up towards a release, but it will take a few months I think.

Sindre Sorhus

Ok, thanks ;) Looking forward to it.

richardsyeo

We would very much like this feature too.

Daniel Fort

Hi.

Are there any news on this? Tired of running around with ASpell :(

Jordan Stout
j commented

+1

voltaire-in

We would like to use spellchecker too. Thank you.

Florent Bécart

+1

tfreitas

+1

Adam Warski

+1

Julius Schorzman

+1

Augusto Becciu

+1

Nick Dunn

Apologies for the +1, but this is way up my wishlist too.

Mario Uher

Yep, me too! +1

Sebastian Seilund

+1
This would be an awesome feature, for an already awesome product! Thank you so much :)

dunkee

+1

Diego Plentz

+1

Jordan Stout
j commented

ping @kimchy It's been almost a year! :) Any status on this? Tonnnnns of +1's up in here!

Matt Luongo

Guys, I think @kimchy gets it... we all want this. However, Lucene 4.0 hasn't been released yet, and last update from him mentioned that that release would make this feature much easier. Maybe we should be pressuring the Lucene team to hurry up? There's been talk of a 4.0 release forever.

Jordan Stout
j commented

@mhluongo, it is understood that it's a better "Lucene 4.0" feature, but there seems to be other options in relation to spell checking, etc. for example, #646. A lot of open source softwares don't wait over a year for a feature that the community wants.... a bridge could be made for searching, and when Lucene supports it directly, it can be BC to a temporary/secondary solution (ie. hunspell). i.e. Symfony2 PHP framework builds functionalities for PHP4.0 to get the minor optimization, but has a backup strategy for php versions of 3.x.

My two cents is that this is a huge feature in memory based searching... and would def. set elasticsearch apart from anything else out there right now.

Just my two cents IMO. :)

Matt Luongo

@jstout24 I know that waiting for Lucene 4 is just the path of least resistance, but there are a ton of other awesome features that we could use, as well, and that could be written/maintained in the time saved. At some point one of these +1's needs to start coding themselves if we want this feature, or be okay with waiting (I'm guilty of this too, obviously).

Just trying to be understanding of an embattled OSS developer :)

Brad Beattie

To people "+1"ing, take a look over here: https://issues.apache.org/jira/browse/LUCENE/fixforversion/12314025. That's the progress of Lucene 4.0.

Shay Banon
Owner

Heya fellows, understood, this feature is highly important. The only thing that can be done currently (aside from other ways of solving it like using custom built index using ngrams and the like) is to possibly write a plugin (and probably new extensions points) to the current Lucene spell checking behavior. But, its not really good... (as I explained in my first comment here).

DeeJayPee

Hello,

Sorry but i have to +1 this issue too ^^
But now that lucene 4.0 is out, is it possible in any way or do we need an implementation in es ?

Regards,

Ivan Brusic

Lucene 4.0 is not out, only the beta. Final release probably will not happen until October.

louman

+1

Michael Elfassy

+1

tfreitas

+1

Julien Ammous

I think we all know now that many people are interested in this feature, can we stop with the +1 please ?
They serve little to no purpose and spam anyone who is watching this thread for real informations.

kul

4.0 is Out! :)

Ivan Brusic

I agree with schmurfy, enough with +1s. If you want to subscribe to this issue, you can change your notification settings below. Look for the dropdown that says "Not watching thread" and change it to "Watch".

Shay commented on spellchecking and Lucene 4.0 last week. In case you missed it, here is the thread:
https://groups.google.com/d/topic/elasticsearch/p2mu0Tv3VPI/discussion

"The plan is the first get Lucene 4.0 integrated with elasticsearch, and then expose all the new features. We will take it feature by feature, but to your points, there will be a spellcheck builtin using the new "direct" spellcheck feature, you will be able to configure codecs in the mapping, and write a plugin that introduces new codes, and so on..."

Bruno Bowden

+1
I'd particularly like to use it when it's deployed on StackOverflow

Julien Ammous

seriously can we stop with the +1 ? There is a watch thread button at the bottom of the page if you want to be notified of any changes here which won't send a notification to anyone watching this.

Clinton Gormley

See the suggest API added in 0.90

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.