Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
a elasticsearch plugin integrated with carrot2,which clustering your search results into topics,
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
src
.gitignore update carrot2 to support 0.19.11
README.textile add tutorial link
pom.xml

README.textile

elasticsearch.carrot2

a elasticsearch plugin integrated with carrot2,which clustering your search results into topics,

License Apache2

Version
-—————
master | 0.90.0 → master
1.2.0 | 0.90.0
1.1.1 | 0.20.2

the demo page is here:
http://s.medcl.net/?query=Search+API++Search+Type

a detailed tutorial is here:
http://log.medcl.net/item/2013/06/tutorial-clustering-search-result-with-plugin-tools-carrot2/

1.download lexical files (https://github.com/downloads/medcl/elasticsearch-carrot2/config.zip) ,put them into the config folder.
2.bin/plugin install medcl/elasticsearch-carrot2/1.1.1

2.you download this plugin from RTF project(https://github.com/medcl/elasticsearch-rtf)
https://github.com/medcl/elasticsearch-rtf/tree/master/elasticsearch/plugins/tools.carrot2

have fun.


curl -XPOST http://localhost:9200/elasticsearch_resources/_carrot2?carrot2.language=ENGLISH&carrot2.title_fields=title&carrot2.summary_fields=snippet&carrot2.url_field=url&carrot2.attach_detail=true&carrot2.cluster_count_base=10&carrot2.cluster_phrase_label_boost=2.0
-d'
{
    "query": {
        "bool": {
            "should": [
                {
                    "match_all": {}
                }
            ]
        }
    },
    "from": 0,
    "size": 500
}
'

Response sample:
https://gist.github.com/2184894

carrot2.language=ENGLISH                [check appendix to view supported language]
carrot2.title_fields                    [which filed in doc's source will be used as title for clustering]
carrot2.summary_fields                  [which filed in doc's source will be used as summary for clustering]
carrot2.url_field                       [which filed in doc's source will be used as url for clustering]
carrot2.attach_hits=false               [set false to decrease the size of response,will remove the original search hits]
carrot2.attach_detail                   [set false to just return the id,title/summary/url will not included in response]
carrot2.max_cluster_size=100            [the max num of clusters will be returned]
carrot2.max_doc_per_cluster=10          [the max num of the docs within a cluster will be returned]
carrot2.cluster_count_base=30           [http://download.carrot2.org/head/manual/index.html#section.attribute.LingoClusteringAlgorithm.desiredClusterCountBase]
carrot2.cluster_phrase_label_boost=1.5  [http://download.carrot2.org/head/manual/index.html#section.attribute.LingoClusteringAlgorithm.phraseLabelBoost]

supported algorithm:
LingoClusteringAlgorithm

TODO:
STCClusteringAlgorithm
BisectingKMeansClusteringAlgorithm
ByFieldClusteringAlgorithm
ByUrlClusteringAlgorithm

language:
ARABIC, BULGARIAN, CZECH, CHINESE_SIMPLIFIED, DANISH, DUTCH, ENGLISH, ESTONIAN, FINNISH, FRENCH, GERMAN, GREEK, HUNGARIAN, ITALIAN, IRISH, KOREAN, LATVIAN, LITHUANIAN, MALTESE, NORWEGIAN, POLISH, PORTUGUESE, ROMANIAN, RUSSIAN, SLOVAK, SLOVENE, SPANISH, SWEDISH, THAI, TURKISH;

Something went wrong with that request. Please try again.