Skip to content

Commit

Permalink
Updated readme and version 1.0 released.
Browse files Browse the repository at this point in the history
  • Loading branch information
samekmichal committed Aug 1, 2013
1 parent 68fc9f9 commit 6a0b202
Show file tree
Hide file tree
Showing 2 changed files with 50 additions and 26 deletions.
74 changes: 49 additions & 25 deletions README.md
Expand Up @@ -15,6 +15,8 @@ This plugin provides analyzer `AnnotationAnalyzer` as well as filter
`InlineAnnotationFilter`.
`AnnotationAnalyzer` is composed of `WhitespaceTokenizer`, `LowerCaseFilter` and
`InlineAnnotationFilter` (with default settings).
More sophisticated analyzers (equivalent to StandardAnalyzer or SnowballAnalyzer)
can be configure via configuration file elasticsearch.yml or web API.


Example
Expand All @@ -24,48 +26,65 @@ Let's say we have this documents
"Mozart[artist] was born[lifeEvent] in Salzburg[city;Austria]"
```

If we parse this with StandardAnalyzer equivalent with annotation analysis added to it
we get these tokens - some are omitted due to used StopFilter.
```
"Beethoven[artist] died in Vienna[city]"
| [austria]
[artist] | | [lifeevent] | | [city]
mozart | | born | | salzburg
```

could result is this token streams (depends on tokenizers and other filters used)
If we use StandardAnalyzer the result would be
```
| <[austria]>
<[artist]> | | <[lifeEvent]> | | <[city]>
<mozart> | <was> | <born> | <in> | <salzburg>
mozart | artist | | born | lifeevent | | salzburg | city | austria
```


```
<[artist]> | | <[city]>
<beethoven> | <died> | <in> | <vienna>
```



Installation
------------
This plugin follows conventions for elasticsearch plugins, thus can be installed
in standard manner - see http://www.elasticsearch.org/guide/reference/modules/plugins/
in a standard manner - see http://www.elasticsearch.org/guide/reference/modules/plugins/


Using this plugin
-----------------
To use those custom analyzers/filters you need to modify `elasticsearch.yml`
configuration file - see http://www.elasticsearch.org/guide/reference/index-modules/analysis/

example configuration:
The following example configuration contains definitions for analyzers based on behaviour of
StandardAnalyzer and SnowballAnalyzer.

Please note that standard_annotation and snowball_annotation analyzers use standard tokenizer,
which removes all non-alphanumeric characters and thus makes it impossible to process inline
annotations marked with [,],; (which are used in default behaviour of InlineAnnotationFilter).

For this purpose we need to use mapping char filter, which remaps those special characters to
their equivalent, which will be accepted by standard tokenizer as part of the token.

```
index :
analysis :
analyzer :
annotation :
type : annotation
annotation_filter :
type : custom
tokenizer : whitespace
filter : [lowercase,annotation_filter]
index :
analysis :
char_filter :
annotation_remap :
type : mapping
mappings : ["[=>__annotation_start__", "]=>__annotation_end__",";=>__annotation_delimiter__"] #substituované řetězce
analyzer :
standard_annotation :
type : custom
tokenizer : standard
char_filter : annotation_remap
filter : [standard, lowercase, annotation_filter, stop] #sada filtrů používaná StandardAnalyzerem
snowball_annotation :
type : custom
tokenizer : standard
char_filter : annotation_remap
filter : [standard, lowercase, annotation_filter, stop, snowball] #sada filtrů používaná SnowballAnalyzerem
filter :
annotation_filter :
type : annotation_filter
start : __annotation_start__
end : __annotation_end__
delimiter : __annotation_delimiter__
```

To test the analyzer you can query the following
Expand All @@ -74,7 +93,7 @@ To test the analyzer you can query the following

Customization
-------------
Both AnnotationAnalyzer and InlineAnnotationFilter can be slightly customized.
The InlineAnnotationFilter can be slightly customized.

List of supported options
+ `start` - start delimiter for inline annotation
Expand All @@ -98,3 +117,8 @@ index :
token-type: synonym
delimiter : ;
```


Elasticsearch version
---------------------
This plugin was successfuly tested on elasticsearch version 0.90.2
2 changes: 1 addition & 1 deletion pom.xml
Expand Up @@ -6,7 +6,7 @@
<modelVersion>4.0.0</modelVersion>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-analysis-annotation</artifactId>
<version>1.0-BETA</version>
<version>1.0</version>
<packaging>jar</packaging>
<inceptionYear>2013</inceptionYear>
<licenses>
Expand Down

0 comments on commit 6a0b202

Please sign in to comment.