Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analysis: Synonym Token Filter #900

Closed
abtris opened this issue May 3, 2011 · 2 comments

Comments

@abtris
Copy link

commented May 3, 2011

Add a synonym token filter. The synonym token filter can be configured in the following manner:

{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "synonym" : {
                    "tokenizer" : "whitespace",
                    "filter" : ["synonym"]
                }
            },
            "filter" : {
                "synonym" : {
                    "type" : "synonym",
                    "synonym_path" : "analysis/synonym.txt"
                }
            }
        }
    }
}

The above configured a synonym filter, with a path of analysis/synonym.txt (based from the config location). The synonym analyzer is then configured with the filter. Additional settings are: ignore_case (defaults to false), and expand (defaults to true).

Here is a sample format of the file (uses the same solr file format):

# blank lines and lines starting with pound are comments.

#Explicit mappings match any token sequence on the LHS of "=>"
#and replace with all alternatives on the RHS.  These types of mappings
#ignore the expand parameter in the schema.
#Examples:
i-pod, i pod => ipod,
sea biscuit, sea biscit => seabiscuit

#Equivalent synonyms may be separated with commas and give
#no explicit mapping.  In this case the mapping behavior will
#be taken from the expand parameter in the schema.  This allows
#the same synonym file to be used in different synonym handling strategies.
#Examples:
ipod, i-pod, i pod
foozball , foosball
universe , cosmos

# If expand==true, "ipod, i-pod, i pod" is equivalent to the explicit mapping:
ipod, i-pod, i pod => ipod, i-pod, i pod
# If expand==false, "ipod, i-pod, i pod" is equivalent to the explicit mapping:
ipod, i-pod, i pod => ipod

#multiple synonym mapping entries are merged.
foo => foo bar
foo => baz
#is equivalent to
foo => foo bar, baz

Original Request

I'd like to see Synonym analyzer added into ElasticSearch. The Solr equivalent is the SynonymFilterFactory.

My example use case is searching for jobs:

When I search for "ruby developer", I'd like also documents matching "ruby programmer", "ruby engineer", "ruby coder" returned.

When I search for "ZF developer", I'd like also documents matching "Zend Framework programmer", "ZF programmer" returned.

The synonym list could be passed similarly to the StopAnalyzer.

Is something like this possible to do with a custom analyzer in ES curently?

@karmi

This comment has been minimized.

Copy link
Member

commented May 3, 2011

+1 I'd enjoy synonym analyzer as well...

@rmuir

This comment has been minimized.

Copy link
Contributor

commented May 6, 2011

this has been refactored into the lucene analysis module in 4.0... might be a good start, you just have to construct the synonym map somehow from however you want the input file to look:

http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/java/org/apache/lucene/analysis/synonym/

@kimchy kimchy closed this in 15d8f0b May 10, 2011

ofavre pushed a commit to yakaz/elasticsearch that referenced this issue Jul 18, 2011
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.