a elasticsearch plugin integrated with carrot2,which clustering your search results into topics,
Java
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
src
.gitignore
README.textile
pom.xml

README.textile

elasticsearch.carrot2

a elasticsearch plugin integrated with carrot2,which clustering your search results into topics,

License Apache2

Version
-—————
master | 0.90.0 → master
1.2.0 | 0.90.0
1.1.1 | 0.20.2

the demo page is here:
http://s.medcl.net/?query=Search+API++Search+Type

a detailed tutorial is here:
http://log.medcl.net/item/2013/06/tutorial-clustering-search-result-with-plugin-tools-carrot2/

1.download lexical files (https://github.com/downloads/medcl/elasticsearch-carrot2/config.zip) ,put them into the config folder.
2.bin/plugin install medcl/elasticsearch-carrot2/1.1.1

2.you download this plugin from RTF project(https://github.com/medcl/elasticsearch-rtf)
https://github.com/medcl/elasticsearch-rtf/tree/master/elasticsearch/plugins/tools.carrot2

have fun.

curl -XPOST http://localhost:9200/elasticsearch_resources/_carrot2?carrot2.language=ENGLISH&carrot2.title_fields=title&carrot2.summary_fields=snippet&carrot2.url_field=url&carrot2.attach_detail=true&carrot2.cluster_count_base=10&carrot2.cluster_phrase_label_boost=2.0
-d'
{
    "query": {
        "bool": {
            "should": [
                {
                    "match_all": {}
                }
            ]
        }
    },
    "from": 0,
    "size": 500
}
'

Response sample:
https://gist.github.com/2184894

carrot2.language=ENGLISH                [check appendix to view supported language]
carrot2.title_fields                    [which filed in doc's source will be used as title for clustering]
carrot2.summary_fields                  [which filed in doc's source will be used as summary for clustering]
carrot2.url_field                       [which filed in doc's source will be used as url for clustering]
carrot2.attach_hits=false               [set false to decrease the size of response,will remove the original search hits]
carrot2.attach_detail                   [set false to just return the id,title/summary/url will not included in response]
carrot2.max_cluster_size=100            [the max num of clusters will be returned]
carrot2.max_doc_per_cluster=10          [the max num of the docs within a cluster will be returned]
carrot2.cluster_count_base=30           [http://download.carrot2.org/head/manual/index.html#section.attribute.LingoClusteringAlgorithm.desiredClusterCountBase]
carrot2.cluster_phrase_label_boost=1.5  [http://download.carrot2.org/head/manual/index.html#section.attribute.LingoClusteringAlgorithm.phraseLabelBoost]

supported algorithm:
LingoClusteringAlgorithm

TODO:
STCClusteringAlgorithm
BisectingKMeansClusteringAlgorithm
ByFieldClusteringAlgorithm
ByUrlClusteringAlgorithm

language:
ARABIC, BULGARIAN, CZECH, CHINESE_SIMPLIFIED, DANISH, DUTCH, ENGLISH, ESTONIAN, FINNISH, FRENCH, GERMAN, GREEK, HUNGARIAN, ITALIAN, IRISH, KOREAN, LATVIAN, LITHUANIAN, MALTESE, NORWEGIAN, POLISH, PORTUGUESE, ROMANIAN, RUSSIAN, SLOVAK, SLOVENE, SPANISH, SWEDISH, THAI, TURKISH;