Skip to content

leoh/ElasticSearch.pm

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NAME

ElasticSearch - An API for communicating with ElasticSearch

VERSION

Version 0.62, tested against ElasticSearch server version 0.20.2.

DESCRIPTION

ElasticSearch is an Open Source (Apache 2 license), distributed, RESTful Search Engine based on Lucene, and built for the cloud, with a JSON API.

Check out its features: http://www.elasticsearch.org/

This module is a thin API which makes it easy to communicate with an ElasticSearch cluster.

It maintains a list of all servers/nodes in the ElasticSearch cluster, and spreads the load across these nodes in round-robin fashion. If the current active node disappears, then it attempts to connect to another node in the list.

Forking a process triggers a server list refresh, and a new connection to a randomly chosen node in the list.

SYNOPSIS

use ElasticSearch;
my $es = ElasticSearch->new(
    servers      => 'search.foo.com:9200',  # default '127.0.0.1:9200'
    transport    => 'http'                  # default 'http'
                    | 'httplite'
                    | 'httptiny'
                    | 'curl'
                    | 'aehttp'
                    | 'aecurl'
                    | 'thrift',
    max_requests => 10_000,                 # default 10_000
    trace_calls  => 'log_file',
    no_refresh   => 0 | 1,
);

$es->index(
    index => 'twitter',
    type  => 'tweet',
    id    => 1,
    data  => {
        user        => 'kimchy',
        post_date   => '2009-11-15T14:12:12',
        message     => 'trying out Elastic Search'
    }
);

$data = $es->get(
    index => 'twitter',
    type  => 'tweet',
    id    => 1
);

# native elasticsearch query language
$results = $es->search(
    index => 'twitter',
    type  => 'tweet',
    query => {
        text => { user => 'kimchy' }
    }
);

# ElasticSearch::SearchBuilder Perlish query language
$results = $es->search(
    index  => 'twitter',
    type   => 'tweet',
    queryb => {
        message   => 'Perl API',
        user      => 'kimchy',
        post_date => {
            '>'   => '2010-01-01',
            '<='  => '2011-01-01',
        }
    }
);


$dodgy_qs = "foo AND AND bar";
$results = $es->search(
    index => 'twitter',
    type  => 'tweet',
    query => {
        query_string => {
            query => $es->query_parser->filter($dodgy_qs)
        },
    }
);

See the examples/ directory for a simple working example.

GETTING ElasticSearch

You can download the latest released version of ElasticSearch from http://www.elasticsearch.org/download/.

See here for setup instructions: http://www.elasticsearch.org/tutorials/2010/07/01/setting-up-elasticsearch.html

CALLING CONVENTIONS

I've tried to follow the same terminology as used in the ElasticSearch docs when naming methods, so it should be easy to tie the two together.

Some methods require a specific index and a specific type, while others allow a list of indices or types, or allow you to specify all indices or types. I distinguish between them as follows:

$es->method( index => multi, type => single, ...)

single values must be a scalar, and are required parameters

type  => 'tweet'

multi values can be:

index   => 'twitter'          # specific index
index   => ['twitter','user'] # list of indices
index   => undef              # (or not specified) = all indices

multi_req values work like multi values, but at least one value is required, so:

index   => 'twitter'          # specific index
index   => ['twitter','user'] # list of indices
index   => '_all'             # all indices

index   => []                 # error
index   => undef              # error

Also, see "use_index()/use_type()".

as_json

If you pass as_json => 1 to any request to the ElasticSearch server, it will return the raw UTF8-decoded JSON response, rather than a Perl datastructure.

RETURN VALUES AND EXCEPTIONS

Methods that query the ElasticSearch cluster return the raw data structure that the cluster returns. This may change in the future, but as these data structures are still in flux, I thought it safer not to try to interpret.

Anything that is known to be an error throws an exception, eg trying to delete a non-existent index.

INTEGRATION WITH ElasticSearch::SearchBuilder

ElasticSearch::SearchBuilder provides a concise Perlish SQL::Abstract-style query language, which gets translated into the native Query DSL that ElasticSearch uses.

For instance:

{
    content => 'search keywords',
    -filter => {
        tags        => ['perl','ruby'],
        date        => {
            '>'     => '2010-01-01',
            '<='    => '2011-01-01'
        },
    }
}

Would be translated to:

{ query => {
    filtered => {
        query  => { text => { content => "search keywords" } },
        filter => {
            and => [
                { terms => { tags => ["perl", "ruby"] } },
                { numeric_range => {
                    date => {
                        gt => "2010-01-01",
                        lte => "2011-01-01"
                }}},
            ],
        }
}}}

All you have to do to start using ElasticSearch::SearchBuilder is to change your query or filter parameter to queryb or filterb (where the extra b stands for builder):

$es->search(
    queryb => { content => 'keywords' }
)

If you want to see what your SearchBuilder-style query is being converted into, you can either use "trace_calls()" or access it directly with:

$native_query  = $es->builder->query( $query )
$native_filter = $es->builder->filter( $filter )

See the ElasticSearch::SearchBuilder docs for more information about the syntax.

METHODS

Creating a new ElasticSearch instance

new()

$es = ElasticSearch->new(
        transport    =>  'http',
        servers      =>  '127.0.0.1:9200'                   # single server
                          | ['es1.foo.com:9200',
                             'es2.foo.com:9200'],           # multiple servers
        trace_calls  => 1 | '/path/to/log/file' | $fh
        timeout      => 30,
        max_requests => 10_000,                             # refresh server list
                                                            # after max_requests

        no_refresh   => 0 | 1                               # don't retrieve the live
                                                            # server list. Instead, use
                                                            # just the servers specified
 );

servers can be either a single server or an ARRAY ref with a list of servers. If not specified, then it defaults to localhost and the port for the specified transport (eg 9200 for http* or 9500 for thrift).

These servers are used in a round-robin fashion. If any server fails to connect, then the other servers in the list are tried, and if any succeeds, then a list of all servers/nodes currently known to the ElasticSearch cluster are retrieved and stored.

Every max_requests (default 10,000) this list of known nodes is refreshed automatically. To disable this automatic refresh, you can set max_requests to 0.

To force a lookup of live nodes, you can do:

$es->refresh_servers();

no_refresh()

Regardless of the max_requests setting, a list of live nodes will still be retrieved on the first request. This may not be desirable behaviour if, for instance, you are connecting to remote servers which use internal IP addresses, or which don't allow remote nodes() requests.

If you want to disable this behaviour completely, set no_refresh to 1, in which case the transport module will round robin through the servers list only. Failed nodes will be removed from the list (but added back in every max_requests or when all nodes have failed).

Transport Backends

There are various transport backends that ElasticSearch can use: http (the default, based on LWP), httplite (based on HTTP::Lite), httptiny (based on HTTP::Tiny), curl (based on WWW::Curl), aehttp (based on AnyEvent::HTTP), aecurl (based on AnyEvent::Curl::Multi) and thrift (which uses the Thrift protocol).

Although the thrift interface has the right buzzwords (binary, compact, sockets), the generated Perl code is very slow. Until that is improved, I recommend one of the http backends instead.

The httplite backend is about 30% faster than the default http backend, and will probably become the default after more testing in production.

The httptiny backend is 1% faster again than httplite.

See also: ElasticSearch::Transport, "timeout()", "trace_calls()", http://www.elasticsearch.org/guide/reference/modules/http.html and http://www.elasticsearch.org/guide/reference/modules/thrift.html

Document-indexing methods

index()

$result = $es->index(
    index       => single,
    type        => single,
    id          => $document_id,        # optional, otherwise auto-generated
    data        => {
        key => value,
        ...
    },

    # optional
    consistency  => 'quorum' | 'one' | 'all',
    create       => 0 | 1,
    parent       => $parent,
    percolate    => $percolate,
    refresh      => 0 | 1,
    replication  => 'sync' | 'async',
    routing      => $routing,
    timeout      => eg '1m' or '10s'
    version      => int,
    version_type => 'internal' | 'external',
);

eg:

$result = $es->index(
    index   => 'twitter',
    type    => 'tweet',
    id      => 1,
    data    => {
        user        => 'kimchy',
        post_date   => '2009-11-15T14:12:12',
        message     => 'trying out Elastic Search'
    },
);

Used to add a document to a specific index as a specific type with a specific id. If the index/type/id combination already exists, then that document is updated, otherwise it is created.

Note:

  • If the id is not specified, then ElasticSearch autogenerates a unique ID and a new document is always created.

  • If version is passed, and the current version in ElasticSearch is different, then a Conflict error will be thrown.

  • data can also be a raw JSON encoded string (but ensure that it is correctly encoded, otherwise you see errors when trying to retrieve it from ElasticSearch).

    $es->index(
        index   => 'foo',
        type    =>  'bar',
        id      =>  1,
        data    =>  '{"foo":"bar"}'
    );
  • timeout for all CRUD methods and "search()" is a query timeout, specifying the amount of time ElasticSearch will spend (roughly) processing a query. Units can be concatenated with the integer value, e.g., 500ms or 1s.

    See also: http://www.elasticsearch.org/guide/reference/api/search/request-body.html

    Note: this is distinct from the transport timeout, see "timeout()".

See also: http://www.elasticsearch.org/guide/reference/api/index_.html, "bulk()" and "put_mapping()"

set()

set() is a synonym for "index()"

create()

$result = $es->create(
    index       => single,
    type        => single,
    id          => $document_id,        # optional, otherwise auto-generated
    data        => {
        key => value,
        ...
    },

    # optional
    consistency  => 'quorum' | 'one' | 'all',
    parent       => $parent,
    percolate    => $percolate,
    refresh      => 0 | 1,
    replication  => 'sync' | 'async',
    routing      => $routing,
    timeout      => eg '1m' or '10s',
    version      => int,
    version_type => 'internal' | 'external',
);

eg:

$result = $es->create(
    index   => 'twitter',
    type    => 'tweet',
    id      => 1,
    data    => {
        user        => 'kimchy',
        post_date   => '2009-11-15T14:12:12',
        message     => 'trying out Elastic Search'
    },
);

Used to add a NEW document to a specific index as a specific type with a specific id. If the index/type/id combination already exists, then a Conflict error is thrown.

If the id is not specified, then ElasticSearch autogenerates a unique ID.

If you pass a version parameter to create, then it must be 0 unless you also set version_type to external.

See also: "index()"

update()

$result = $es->update(
    index             => single,
    type              => single,
    id                => single,

    # required
    script            => $script,
  | doc               => $doc

    # optional
    params            => { params },
    upsert            => { new_doc },
    consistency       => 'quorum' | 'one' | 'all',
    fields            => ['_source'],
    ignore_missing    => 0 | 1,
    parent            => $parent,
    percolate         => $percolate,
    retry_on_conflict => 2,
    routing           => $routing,
    timeout           => '10s',
    replication       => 'sync' | 'async'
)

The update() method accepts a script to update, or a doc to be merged with, an existing doc, without having to retrieve and reindex the doc yourself, eg:

$es->update(
    index   => 'test',
    type    => 'foo',
    id      => 123,
    script  => 'ctx._source.tags+=[tag]',
    params  => { tag => 'red' }
);

You can also pass a new doc which will be inserted if the doc does not already exist, via the upsert paramater.

See http://www.elasticsearch.org/guide/reference/api/update.html for more.

get()

$result = $es->get(
    index   => single,
    type    => single or blank,
    id      => single,

    # optional
    fields          => 'field' or ['field1',...]
    preference      => '_local' | '_primary' | '_primary_first' | $string,
    refresh         => 0 | 1,
    routing         => $routing,
    parent          => $parent,
    ignore_missing  => 0 | 1,

);

Returns the document stored at index/type/id or throws an exception if the document doesn't exist.

Example:

$es->get( index => 'twitter', type => 'tweet', id => 1)

Returns:

{
  _id     => 1,
  _index  => "twitter",
  _source => {
               message => "trying out Elastic Search",
               post_date=> "2009-11-15T14:12:12",
               user => "kimchy",
             },
  _type   => "tweet",
}

By default the _source field is returned. Use fields to specify a list of (stored) fields to return instead, or [] to return no fields.

Pass a true value for refresh to force an index refresh before performing the get.

If the requested index, type or id is not found, then a Missing exception is thrown, unless ignore_missing is true.

See also: "bulk()", http://www.elasticsearch.org/guide/reference/api/get.html

exists()

$bool = $es->exists(
    index           => single,
    type            => single,
    id              => single,

    preference      => '_local' | '_primary' | '_primary_first' | $string,
    refresh         => 0 | 1,
    routing         => $routing,
    parent          => $parent,
);

Returns true or false depending on whether the doc exists.

mget()

$docs = $es->mget(
    index          => single,
    type           => single or blank,
    ids            => \@ids,
    fields         => ['field_1','field_2'],
    filter_missing => 0 | 1
);

$docs = $es->mget(
    index          => single or blank,
    type           => single or blank,
    docs           => \@doc_info,
    fields         => ['field_1','field_2'],
    filter_missing => 0 | 1
);

mget or "multi-get" returns multiple documents at once. There are two ways to call mget():

If all docs come from the same index (and potentially the same type):

$docs = $es->mget(
    index => 'myindex',
    type  => 'mytype',   # optional
    ids   => [1,2,3],
)

Alternatively you can specify each doc separately:

$docs = $es->mget(
    docs => [
        { _index => 'index_1', _type => 'type_1', _id => 1 },
        { _index => 'index_2', _type => 'type_2', _id => 2 },
    ]
)

Or:

$docs = $es->mget(
    index  => 'myindex',                    # default index
    type   => 'mytype',                     # default type
    fields => ['field_1','field_2'],        # default fields
    docs => [
        { _id => 1 },                       # uses defaults
        { _index => 'index_2',
          _type  => 'type_2',
          _id    => 2,
          fields => ['field_2','field_3'],
        },
    ]
);

If $docs or $ids is an empty array ref, then mget() will just return an empty array ref.

Returns an array ref containing all of the documents requested. If a document is not found, then its entry will include {exists => 0}. If you would rather filter these missing docs, pass filter_missing => 1.

See http://www.elasticsearch.org/guide/reference/api/multi-get.html

delete()

$result = $es->delete(
    index           => single,
    type            => single,
    id              => single,

    # optional
    consistency     => 'quorum' | 'one' | 'all'
    ignore_missing  => 0 | 1
    refresh         => 0 | 1
    parent          => $parent,
    routing         => $routing,
    replication     => 'sync' | 'async'
    version         => int
);

Deletes the document stored at index/type/id or throws an Missing exception if the document doesn't exist and ignore_missing is not true.

If you specify a version and the current version of the document is different (or if the document is not found), a Conflict error will be thrown.

If refresh is true, an index refresh will be forced after the delete has completed.

Example:

$es->delete( index => 'twitter', type => 'tweet', id => 1);

See also: "bulk()", http://www.elasticsearch.org/guide/reference/api/delete.html

bulk()

$result = $es->bulk( [ actions ] )

$result = $es->bulk(
    actions     => [ actions ]                  # required

    index       => 'foo',                       # optional
    type        => 'bar',                       # optional
    consistency => 'quorum' |  'one' | 'all'    # optional
    refresh     => 0 | 1,                       # optional
    replication => 'sync' | 'async',            # optional

    on_conflict => sub {...} | 'IGNORE'         # optional
    on_error    => sub {...} | 'IGNORE'         # optional
);

Perform multiple index, create and delete actions in a single request. This is about 10x as fast as performing each action in a separate request.

Each action is a HASH ref with a key indicating the action type (index, create or delete), whose value is another HASH ref containing the associated metadata.

The index and type parameters can be specified for each individual action, or inherited from the top level index and type parameters, as shown above.

NOTE: bulk() also accepts the _index, _type, _id, _source, _parent, _routing and _version parameters so that you can pass search results directly to bulk().

index and create actions

{ index  => {
    index           => 'foo',
    type            => 'bar',
    id              => 123,
    data            => { text => 'foo bar'},

    # optional
    routing         => $routing,
    parent          => $parent,
    percolate       => $percolate,
    timestamp       => $timestamp,
    ttl             => $ttl,
    version         => $version,
    version_type    => 'internal' | 'external'
}}

{ create  => { ... same options as for 'index' }}

The index and type parameters, if not specified, are inherited from the top level bulk request.

data can also be a raw JSON encoded string (but ensure that it is correctly encoded, otherwise you see errors when trying to retrieve it from ElasticSearch).

actions => [{
    index => {
        index   => 'foo',
        type    =>  'bar',
        id      =>  1,
        data    =>  '{"foo":"bar"}'
    }
}]

delete action

{ delete  => {
    index           => 'foo',
    type            => 'bar',
    id              => 123,

    # optional
    routing         => $routing,
    parent          => $parent,
    version         => $version,
    version_type    => 'internal' | 'external'
}}

The index and type parameters, if not specified, are inherited from the top level bulk request.

Error handlers

The on_conflict and on_error parameters accept either a coderef or the string 'IGNORE'. Normally, any errors are returned under the errors key (see "Return values").

The logic works as follows:

  • If the error is a versioning conflict error, or if you try to create a doc whose ID already exists, and there is an on_conflict handler, then call the handler and move on to the next document

  • If the error is still unhandled, and we have an on_error handler, then call it and move on to the next document.

  • If no handler exists, then add the error to the @errors array which is returned by "bulk()"

Setting on_conflict or on_error to 'IGNORE' is the equivalent of passing an empty no-op handler.

The handler callbacks are called as:

$handler->( $action, $document, $error, $req_no );

For instance:

$action
"index"
$document
{ id => 1, data => { count => "foo" }}
$error
"MapperParsingException[Failed to parse [count]]; ... etc ... "
$req_no
0

The $req_no is the array index of the current $action from the original array of @actions.

Return values

The "bulk()" method returns a HASH ref containing:

{
    actions => [ the list of actions you passed in ],
    results => [ the result of each of the actions ],
    errors  => [ a list of any errors              ]
}

The results ARRAY ref contains the same values that would be returned for individiual index/create/delete statements, eg:

results => [
     { create => { _id => 123, _index => "foo", _type => "bar", _version => 1 } },
     { index  => { _id => 123, _index => "foo", _type => "bar", _version => 2 } },
     { delete => { _id => 123, _index => "foo", _type => "bar", _version => 3 } },
]

The errors key is only present if an error has occured and has not been handled by an on_conflict or on_error handler, so you can do:

$results = $es->bulk(\@actions);
if ($results->{errors}) {
    # handle errors
}

Each error element contains the error message plus the action that triggered the error. Each result element will also contain the error message., eg:

$result = {
    actions => [

        ## NOTE - num is numeric
        {   index => { index => 'bar', type  => 'bar', id => 123,
                       data  => { num => 123 } } },

        ## NOTE - num is a string
        {   index => { index => 'bar', type  => 'bar', id => 123,
                       data  => { num => 'foo bar' } } },
    ],
    errors => [
        {
            action => {
                index => { index => 'bar', type  => 'bar', id => 123,
                           data  => { num => 'text foo' } }
            },
            error => "MapperParsingException[Failed to parse [num]]; ...",
        },
    ],
    results => [
        { index => { _id => 123, _index => "bar", _type => "bar", _version => 1 }},
        {   index => {
                error => "MapperParsingException[Failed to parse [num]];...",
                id    => 123, index => "bar", type  => "bar",
            },
        },
    ],

};

See http://www.elasticsearch.org/guide/reference/api/bulk.html for more details.

bulk_index(), bulk_create(), bulk_delete()

These are convenience methods which allow you to pass just the metadata, without the index, create or index action for each record.

These methods accept the same parameters as the "bulk()" method, except that the actions parameter is replaced by docs, eg:

$result = $es->bulk_index( [ docs ] );

$result = $es->bulk_index(
    docs        => [ docs ],                    # required

    index       => 'foo',                       # optional
    type        => 'bar',                       # optional
    consistency => 'quorum' |  'one' | 'all'    # optional
    refresh     => 0 | 1,                       # optional
    replication => 'sync' | 'async',            # optional

    on_conflict => sub {...} | 'IGNORE'         # optional
    on_error    => sub {...} | 'IGNORE'         # optional
);

For instance:

$es->bulk_index(
    index   => 'foo',
    type    => 'bar',
    refresh => 1,
    docs    => [
        { id => 123,                data => { text=>'foo'} },
        { id => 124, type => 'baz', data => { text=>'bar'} },
    ]
);

reindex()

$es->reindex(
    source      => $scrolled_search,

    # optional
    bulk_size   => 1000,
    dest_index  => $index,
    quiet       => 0 | 1,
    transform   => sub {....},

    on_conflict => sub {...} | 'IGNORE'
    on_error    => sub {...} | 'IGNORE'
)

reindex() is a utility method which can be used for reindexing data from one index to another (eg if the mapping has changed), or copying data from one cluster to another.

Params

  • source is a required parameter, and should be an instance of ElasticSearch::ScrolledSearch.

  • dest_index is the name of the destination index, ie where the docs are indexed to. If you are indexing your data from one cluster to another, and you want to use the same index name in your destination cluster, then you can leave this blank.

  • bulk_size - the number of docs that will be indexed at a time. Defaults to 1,000

  • Set quiet to 1 if you don't want any progress information to be printed to STDOUT

  • transform should be a sub-ref which will be called for each doc, allowing you to transform some element of the doc, or to skip the doc by returning undef.

  • See "Error handlers" for an explanation on_conflict and on_error.

Examples:

To copy the ElasticSearch website index locally, you could do:

my $local = ElasticSearch->new(
    servers => 'localhost:9200'
);
my $remote = ElasticSearch->new(
    servers    => 'search.elasticsearch.org:80',
    no_refresh => 1
);

my $source = $remote->scrolled_search(
    search_type => 'scan',
    scroll      => '5m'
);
$local->reindex(source=>$source);

To copy one local index to another, make the title upper case, exclude docs of type boring, and to preserve the version numbers from the original index:

my $source = $es->scrolled_search(
    index       => 'old_index',
    search_type => 'scan',
    scroll      => '5m',
    version     => 1
);

$es->reindex(
    source      => $source,
    dest_index  => 'new_index',
    transform   => sub {
        my $doc = shift;
        return if $doc->{_type} eq 'boring';
        $doc->{_source}{title} = uc( $doc->{_source}{title} );
        return $doc;
    }
);

NOTE: If some of your docs have parent/child relationships, and you want to preserve this relationship, then you should add this to your scrolled search parameters: fields => ['_source','_parent'].

For example:

my $source = $es->scrolled_search(
    index       => 'old_index',
    search_type => 'scan',
    fields      => ['_source','_parent'],
    version     => 1
);

$es->reindex(
    source      => $source,
    dest_index  => 'new_index',
);

See also "scrolled_search()", ElasticSearch::ScrolledSearch, and "search()".

analyze()

$result = $es->analyze(
  text          =>  $text_to_analyze,           # required
  index         =>  single,                     # optional

  # either
  field         =>  'type.fieldname',           # requires index

  analyzer      =>  $analyzer,

  tokenizer     => $tokenizer,
  filters       => \@filters,

  # other options
  format        =>  'detailed' | 'text',
  prefer_local  =>  1 | 0
);

The analyze() method allows you to see how ElasticSearch is analyzing the text that you pass in, eg:

$result = $es->analyze( text => 'The Man' )

$result = $es->analyze(
    text        => 'The Man',
    analyzer    => 'simple'
);

$result = $es->analyze(
    text        => 'The Man',
    tokenizer   => 'keyword',
    filters     => ['lowercase'],
);

$result = $es->analyze(
    text        => 'The Man',
    index       => 'my_index',
    analyzer    => 'my_custom_analyzer'
);

$result = $es->analyze(
    text        => 'The Man',
    index       => 'my_index',
    field       => 'my_type.my_field',
);

See http://www.elasticsearch.org/guide/reference/api/admin-indices-analyze.html for more.

Query methods

$result = $es->search(
    index           => multi,
    type            => multi,

    # optional
    query           => { native query },
    queryb          => { searchbuilder query },

    filter          => { native filter },
    filterb         => { searchbuilder filter },

    explain         => 1 | 0,
    facets          => { facets },
    fields          => [$field_1,$field_n],
    partial_fields  => { my_field => { include => 'foo.bar.*' }},
    from            => $start_from,
    highlight       => { highlight }.
    ignore_indices  => 'none' | 'missing',
    indices_boost   => { index_1 => 1.5,... },
    min_score       => $score,
    preference      => '_local' | '_primary' | '_primary_first' | $string,
    routing         => [$routing, ...]
    script_fields   => { script_fields }
    search_type     => 'dfs_query_then_fetch'
                       | 'dfs_query_and_fetch'
                       | 'query_then_fetch'
                       | 'query_and_fetch'
                       | 'count'
                       | 'scan'
    size            => $no_of_results
    sort            => ['_score',$field_1]
    scroll          => '5m' | '30s',
    stats           => ['group_1','group_2'],
    track_scores    => 0 | 1,
    timeout         => '10s'
    version         => 0 | 1
);

Searches for all documents matching the query, with a request-body search. Documents can be matched against multiple indices and multiple types, eg:

$result = $es->search(
    index   => undef,                           # all
    type    => ['user','tweet'],
    query   => { term => {user => 'kimchy' }}
);

You can provide either the query parameter, which uses the native ElasticSearch Query DSL, or the queryb parameter, which uses the more concise ElasticSearch::SearchBuilder query syntax.

Similarly, use filterb instead of filter. SearchBuilder can also be used in facets, for instance, instead of:

$es->search(
    facets  => {
        wow_facet => {
            query        => { text => { content => 'wow'  }},
            facet_filter => { term => {status => 'active' }},
        }
    }
)

You can use:

$es->search(
    facets  => {
        wow_facet => {
            queryb        => { content => 'wow'   },  # note the extra 'b'
            facet_filterb => { status => 'active' },  # note the extra 'b'
        }
    }
)

See "INTEGRATION WITH ElasticSearch::SearchBuilder" for more.

For all of the options that can be included in the native query parameter, see http://www.elasticsearch.org/guide/reference/api/search, http://www.elasticsearch.org/guide/reference/api/search/request-body.html and http://www.elasticsearch.org/guide/reference/query-dsl

searchqs()

$result = $es->searchqs(
    index                    => multi,
    type                     => multi,

    # optional
    q                        => $query_string,
    analyze_wildcard         => 0 | 1,
    analyzer                 => $analyzer,
    default_operator         => 'OR | AND ',
    df                       => $default_field,
    explain                  => 1 | 0,
    fields                   => [$field_1,$field_n],
    from                     => $start_from,
    ignore_indices           => 'none' | 'missing',
    lenient                  => 0 | 1,
    lowercase_expanded_terms => 0 | 1,
    preference               => '_local' | '_primary' | '_primary_first' | $string,
    quote_analyzer           => $analyzer,
    quote_field_suffix       => '.unstemmed',
    routing                  => [$routing, ...]
    search_type              => $search_type
    size                     => $no_of_results
    sort                     => ['_score:asc','last_modified:desc'],
    scroll                   => '5m' | '30s',
    stats                    => ['group_1','group_2'],
    timeout                  => '10s'
    version                  => 0 | 1

Searches for all documents matching the q query_string, with a URI request. Documents can be matched against multiple indices and multiple types, eg:

$result = $es->searchqs(
    index   => undef,                           # all
    type    => ['user','tweet'],
    q       => 'john smith'
);

For all of the options that can be included in the query parameter, see http://www.elasticsearch.org/guide/reference/api/search and http://www.elasticsearch.org/guide/reference/api/search/uri-request.html.

scroll()

$result = $es->scroll(
    scroll_id => $scroll_id,
    scroll    => '5m' | '30s',
);

If a search has been executed with a scroll parameter, then the returned scroll_id can be used like a cursor to scroll through the rest of the results.

If a further scroll request will be issued, then the scroll parameter should be passed as well. For instance;

my $result = $es->search(
                query=>{match_all=>{}},
                scroll => '5m'
             );

while (1) {
    my $hits = $result->{hits}{hits};
    last unless @$hits;                 # if no hits, we're finished

    do_something_with($hits);

    $result = $es->scroll(
        scroll_id   => $result->{_scroll_id},
        scroll      => '5m'
    );
}

See http://www.elasticsearch.org/guide/reference/api/search/scroll.html

scrolled_search() returns a convenience iterator for scrolled searches. It accepts the standard search parameters that would be passed to "search()" and requires a scroll parameter, eg:

$scroller = $es->scrolled_search(
                query  => {match_all=>{}},
                scroll => '5m'               # keep the scroll request
                                             # live for 5 minutes
            );

See ElasticSearch::ScrolledSearch, "search()", "searchqs()" and "scroll()".

count()

$result = $es->count(
    index           => multi,
    type            => multi,

    # optional
    routing         => [$routing,...]
    ignore_indices  => 'none' | 'missing',

    # one of:
    query           => { native query },
    queryb          => { search builder query },
);

Counts the number of documents matching the query. Documents can be matched against multiple indices and multiple types, eg

$result = $es->count(
    index   => undef,               # all
    type    => ['user','tweet'],
    queryb  => { user  => 'kimchy' }
);

Note: count() supports ElasticSearch::SearchBuilder-style queries via the queryb parameter. See "INTEGRATION WITH ElasticSearch::SearchBuilder" for more details.

query defaults to {match_all=>{}} unless specified.

DEPRECATION: count() previously took query types at the top level, eg $es->count( term=> { ... }). This form still works, but is deprecated. Instead use the queryb or query parameter as you would in "search()".

See also "search()", http://www.elasticsearch.org/guide/reference/api/count.html and http://www.elasticsearch.org/guide/reference/query-dsl

msearch()

$results = $es->msearch(
    index       => multi,
    type        => multi,
    queries     => \@queries | \%queries,
    search_type => $search_type,
);

With "msearch()" you can run multiple searches in parallel. queries can contain either an array of queries, or a hash of named queries. $results will return either an array or hash of results, depending on what you pass in.

The top-level index, type and search_type parameters define default values which will be used for each query, although these can be overridden in the query parameters:

$results = $es->msearch(
    index   => 'my_index',
    type    => 'my_type',
    queries => {
        first   => {
            query => { match_all: {}}   # my_index/my_type
        },
        second  => {
            index => 'other_index',
            query => { match_all: {}}   # other_index/my_type
        },
    }
)

In the above example, $results would look like:

{
    first  => { hits => ... },
    second => { hits => ... }
}

A query can contain the following options:

{
      index          => 'index_name' | ['index_1',...],
      type           => 'type_name'  | ['type_1',...],

      query          => { native query },
      queryb         => { search_builder query },
      filter         => { native filter },
      filterb        => { search_builder filter },

      facets         => { facets },
      from           => 0,
      size           => 10,
      sort           => { sort },
      highlight      => { highlight },
      fields         => [ 'field1', ... ],

      explain        => 0 | 1,
      indices_boost  => { index_1 => 5, ... },
      ignore_indices => 'none' | 'missing',
      min_score      => 2,
      partial_fields => { partial fields },
      preference     => '_local' | '_primary' | '_primary_first' | $string,
      routing        => 'routing' | ['route_1',...],
      script_fields  => { script fields },
      search_type    => $search_type,
      stats          => 'group_1' | ['group_1','group_2'],
      timeout        => '30s',
      track_scores   => 0 | 1,
      version        => 0 | 1,
}

See http://www.elasticsearch.org/guide/reference/api/multi-search.html.

delete_by_query()

$result = $es->delete_by_query(
    index           => multi,
    type            => multi,

    # optional
    consistency     => 'quorum' | 'one' | 'all'
    replication     => 'sync' | 'async'
    routing         => [$routing,...]

    # one of:
    query           => { native query },
    queryb          => { search builder query },

);

Deletes any documents matching the query. Documents can be matched against multiple indices and multiple types, eg

$result = $es->delete_by_query(
    index   => undef,               # all
    type    => ['user','tweet'],
    queryb  => {user => 'kimchy' },
);

Note: delete_by_query() supports ElasticSearch::SearchBuilder-style queries via the queryb parameter. See "INTEGRATION WITH ElasticSearch::SearchBuilder" for more details.

DEPRECATION: delete_by_query() previously took query types at the top level, eg $es->delete_by_query( term=> { ... }). This form still works, but is deprecated. Instead use the queryb or query parameter as you would in "search()".

See also "search()", http://www.elasticsearch.org/guide/reference/api/delete-by-query.html and http://www.elasticsearch.org/guide/reference/query-dsl

mlt()

# mlt == more_like_this

$results = $es->mlt(
    index               => single,              # required
    type                => single,              # required
    id                  => $id,                 # required

    # optional more-like-this params
    boost_terms          =>  float
    mlt_fields           =>  'scalar' or ['scalar_1', 'scalar_n']
    max_doc_freq         =>  integer
    max_query_terms      =>  integer
    max_word_len         =>  integer
    min_doc_freq         =>  integer
    min_term_freq        =>  integer
    min_word_len         =>  integer
    pct_terms_to_match   =>  float
    stop_words           =>  'scalar' or ['scalar_1', 'scalar_n']

    # optional search params
    explain              =>  {explain}
    facets               =>  {facets}
    fields               =>  {fields}
    filter               =>  { native filter },
    filterb              =>  { search builder filter },
    from                 =>  {from}
    indices_boost        =>  { index_1 => 1.5,... }
    min_score            =>  $score
    preference           =>  '_local' | '_primary' | '_primary_first' | $string
    routing              =>  [$routing,...]
    script_fields        =>  { script_fields }
    search_scroll        =>  '5m' | '10s',
    search_indices       =>  ['index1','index2],
    search_from          =>  integer,
    search_size          =>  integer,
    search_type          =>  $search_type
    search_types         =>  ['type1','type],
    size                 =>  {size}
    sort                 =>  {sort}
    scroll               =>  '5m' | '30s'
    timeout              =>  '10s'
)

More-like-this (mlt) finds related/similar documents. It is possible to run a search query with a more_like_this clause (where you pass in the text you're trying to match), or to use this method, which uses the text of the document referred to by index/type/id.

This gets transformed into a search query, so all of the search parameters are also available.

Note: mlt() supports ElasticSearch::SearchBuilder-style filters via the filterb parameter. See "INTEGRATION WITH ElasticSearch::SearchBuilder" for more details.

See http://www.elasticsearch.org/guide/reference/api/more-like-this.html and http://www.elasticsearch.org/guide/reference/query-dsl/mlt-query.html

explain()

$result = $ex->explain(
    index                      =>  single,
    type                       =>  single,
    id                         =>  single,


    query                      => { native query}
  | queryb                     => { search builder query }
  | q                          => $query_string,

    analyze_wildcard           => 1 | 0,
    analyzer                   => $string,
    default_operator           => 'OR' | 'AND',
    df                         => $default_field
    fields                     => ['_source'],
    lenient                    => 1 | 0,
    lowercase_expanded_terms   => 1 | 0,
    preference                 => _local | _primary | _primary_first | $string,
    routing                    => $routing
);

The explain() method is very useful for debugging queries. It will run the query on the specified document and report whether the document matches the query or not, and why.

See http://www.elasticsearch.org/guide/reference/api/search/explain.html

validate_query()

$bool = $es->validate_query(
    index          => multi,
    type           => multi,

    query          => { native query }
  | queryb         => { search builder query }
  | q              => $query_string

    explain        => 0 | 1,
    ignore_indices => 'none' | 'missing',
);

Returns a hashref with { valid => 1} if the passed in query (native ES query) queryb (SearchBuilder style query) or q (Lucene query string) is valid. Otherwise valid is false. Set explain to 1 to include the explanation of why the query is invalid.

See http://www.elasticsearch.org/guide/reference/api/validate.html

Index Admin methods

index_status()

$result = $es->index_status(
    index           => multi,
    recovery        => 0 | 1,
    snapshot        => 0 | 1,
    ignore_indices  => 'none' | 'missing',
);

Returns the status of $result = $es->index_status(); #all $result = $es->index_status( index => ['twitter','buzz'] ); $result = $es->index_status( index => 'twitter' );

Throws a Missing exception if the specified indices do not exist.

See http://www.elasticsearch.org/guide/reference/api/admin-indices-status.html

index_stats()

$result = $es->index_stats(
    index           => multi,
    types           => multi,

    docs            => 1|0,
    store           => 1|0,
    indexing        => 1|0,
    get             => 1|0,

    all             => 0|1,  # returns all stats
    clear           => 0|1,  # clears default docs,store,indexing,get,search

    flush           => 0|1,
    merge           => 0|1
    refresh         => 0|1,

    level           => 'shards',
    ignore_indices  => 'none' | 'missing',
);

Throws a Missing exception if the specified indices do not exist.

See http://www.elasticsearch.org/guide/reference/api/admin-indices-stats.html

index_segments()

$result = $es->index_segments(
    index           => multi,
    ignore_indices  => 'none' | 'missing',
);

Returns low-level Lucene segments information for the specified indices.

Throws a Missing exception if the specified indices do not exist.

See http://www.elasticsearch.org/guide/reference/api/admin-indices-segments.html

create_index()

$result = $es->create_index(
    index       => single,

    # optional
    settings    => {...},
    mappings    => {...},
    warmers     => {...},
);

Creates a new index, optionally passing index settings and mappings, eg:

$result = $es->create_index(
    index   => 'twitter',
    settings => {
        number_of_shards      => 3,
        number_of_replicas    => 2,
        analysis => {
            analyzer => {
                default => {
                    tokenizer   => 'standard',
                    char_filter => ['html_strip'],
                    filter      => [qw(standard lowercase stop asciifolding)],
                }
            }
        }
    },
    mappings => {
        tweet   => {
            properties  => {
                user    => { type => 'string' },
                content => { type => 'string' },
                date    => { type => 'date'   }
            }
        }
    },
    warmers => {
        warmer_1 => {
            types  => ['tweet'],
            source => {
                queryb => { date    => { gt => '2012-01-01' }},
                facets => {
                    content => {
                        terms => {
                            field=>'content'
                        }
                    }
                }
            }
        }
    }
);

Throws an exception if the index already exists.

See http://www.elasticsearch.org/guide/reference/api/admin-indices-create-index.html

delete_index()

$result = $es->delete_index(
    index           => multi_req,
    ignore_missing  => 0 | 1        # optional
);

Deletes one or more existing indices, or throws a Missing exception if a specified index doesn't exist and ignore_missing is not true:

$result = $es->delete_index( index => 'twitter' );

See http://www.elasticsearch.org/guide/reference/api/admin-indices-delete-index.html

index_exists()

$result = $e->index_exists(
    index => multi
);

Returns {ok => 1} if all specified indices exist, or an empty list if it doesn't.

See http://www.elasticsearch.org/guide/reference/api/admin-indices-indices-exists.html

index_settings()

$result = $es->index_settings(
    index           => multi,
);

Returns the current settings for all, one or many indices.

$result = $es->index_settings( index=> ['index_1','index_2'] );

See http://www.elasticsearch.org/guide/reference/api/admin-indices-get-settings.html

update_index_settings()

$result = $es->update_index_settings(
    index           => multi,
    settings        => { ... settings ...},
);

Update the settings for all, one or many indices. Currently only the number_of_replicas is exposed:

$result = $es->update_index_settings(
    settings    => {  number_of_replicas => 1 }
);

Throws a Missing exception if the specified indices do not exist.

See http://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings.html

aliases()

$result = $es->aliases( actions => [actions] | {actions} )

Adds or removes an alias for an index, eg:

$result = $es->aliases( actions => [
            { remove => { index => 'foo', alias => 'bar' }},
            { add    => { index => 'foo', alias => 'baz'  }}
          ]);

actions can be a single HASH ref, or an ARRAY ref containing multiple HASH refs.

Note: aliases() supports ElasticSearch::SearchBuilder-style filters via the filterb parameter. See "INTEGRATION WITH ElasticSearch::SearchBuilder" for more details.

$result = $es->aliases( actions => [
    { add    => {
        index           => 'foo',
        alias           => 'baz',
        index_routing   => '1',
        search_routing  => '1,2',
        filterb => { foo => 'bar' }
    }}
]);

See http://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html

get_aliases()

$result = $es->get_aliases( index => multi )

Returns a hashref listing all indices and their corresponding aliases, eg:

{
   "foo" : {
      "aliases" : {
         "foo_1" : {
            "search_routing" : "1,2",
            "index_routing" : "1"
            "filter" : {
               "term" : {
                  "foo" : "bar"
               }
            }
         },
         "foo_2" : {}
      }
   }
}

If you pass in the optional index argument, which can be an index name or an alias name, then it will only return the indices related to that argument.

See http://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html

open_index()

$result = $es->open_index( index => single);

Opens a closed index.

The open and close index APIs allow you to close an index, and later on open it.

A closed index has almost no overhead on the cluster (except for maintaining its metadata), and is blocked for read/write operations. A closed index can be opened which will then go through the normal recovery process.

See http://www.elasticsearch.org/guide/reference/api/admin-indices-open-close.html for more

close_index()

$result = $es->close_index( index => single);

Closes an open index. See http://www.elasticsearch.org/guide/reference/api/admin-indices-open-close.html for more

create_index_template()

$result = $es->create_index_template(
    name     => single,
    template => $template,  # required
    mappings => {...},      # optional
    settings => {...},      # optional
    warmers  => {...},      # optional
    order    => $order,     # optional
);

Index templates allow you to define templates that will automatically be applied to newly created indices. You can specify both settings and mappings, and a simple pattern template that controls whether the template will be applied to a new index.

For example:

$result = $es->create_index_template(
    name        => 'my_template',
    template    => 'small_*',
    settings    =>  { number_of_shards => 1 }
);

See http://www.elasticsearch.org/guide/reference/api/admin-indices-templates.html for more.

index_template()

$result = $es->index_template(
    name    => single
);

Retrieves the named index template.

See http://www.elasticsearch.org/guide/reference/api/admin-indices-templates.html#GETting_a_Template

delete_index_template()

$result = $es->delete_index_template(
    name            => single,
    ignore_missing  => 0 | 1    # optional
);

Deletes the named index template.

See http://www.elasticsearch.org/guide/reference/api/admin-indices-templates.html#Deleting_a_Template

flush_index()

$result = $es->flush_index(
    index           => multi,
    full            => 0 | 1,
    refresh         => 0 | 1,
    ignore_indices  => 'none' | 'missing',
);

Flushes one or more indices, which frees memory from the index by flushing data to the index storage and clearing the internal transaction log. By default, ElasticSearch uses memory heuristics in order to automatically trigger flush operations as required in order to clear memory.

Example:

$result = $es->flush_index( index => 'twitter' );

Throws a Missing exception if the specified indices do not exist.

See http://www.elasticsearch.org/guide/reference/api/admin-indices-flush.html

refresh_index()

$result = $es->refresh_index(
    index           => multi,
    ignore_indices  => 'none' | 'missing',
);

Explicitly refreshes one or more indices, making all operations performed since the last refresh available for search. The (near) real-time capabilities depends on the index engine used. For example, the robin one requires refresh to be called, but by default a refresh is scheduled periodically.

Example:

$result = $es->refresh_index( index => 'twitter' );

Throws a Missing exception if the specified indices do not exist.

See http://www.elasticsearch.org/guide/reference/api/admin-indices-refresh.html

optimize_index()

$result = $es->optimize_index(
    index               => multi,
    only_deletes        => 0 | 1,  # only_expunge_deletes
    flush               => 0 | 1,  # flush after optmization
    refresh             => 0 | 1,  # refresh after optmization
    wait_for_merge      => 1 | 0,  # wait for merge to finish
    max_num_segments    => int,    # number of segments to optimize to
    ignore_indices      => 'none' | 'missing',
)

Throws a Missing exception if the specified indices do not exist.

See http://www.elasticsearch.org/guide/reference/api/admin-indices-optimize.html

gateway_snapshot()

$result = $es->gateway_snapshot(
    index           => multi,
    ignore_indices  => 'none' | 'missing',
);

Explicitly performs a snapshot through the gateway of one or more indices (backs them up ). By default, each index gateway periodically snapshot changes, though it can be disabled and be controlled completely through this API.

Example:

$result = $es->gateway_snapshot( index => 'twitter' );

Throws a Missing exception if the specified indices do not exist.

See http://www.elasticsearch.org/guide/reference/api/admin-indices-gateway-snapshot.html and http://www.elasticsearch.org/guide/reference/modules/gateway

snapshot_index()

snapshot_index() is a synonym for "gateway_snapshot()"

clear_cache()

$result = $es->clear_cache(
    index           => multi,
    bloom           => 0 | 1,
    field_data      => 0 | 1,
    filter          => 0 | 1,
    id              => 0 | 1,
    fields          => 'field1' | ['field1','fieldn',...],
    ignore_indices  => 'none' | 'missing',
);

Clears the caches for the specified indices. By default, clears all caches, but if any of id, field, field_data or bloom are true, then it clears just the specified caches.

Throws a Missing exception if the specified indices do not exist.

See http://www.elasticsearch.org/guide/reference/api/admin-indices-clearcache.html

Mapping methods

put_mapping()

$result = $es->put_mapping(
    index               => multi,
    type                => single,
    mapping             => { ... }      # required
    ignore_conflicts    => 0 | 1
);

A mapping is the data definition of a type. If no mapping has been specified, then ElasticSearch tries to infer the types of each field in document, by looking at its contents, eg

'foo'       => string
123         => integer
1.23        => float

However, these heuristics can be confused, so it safer (and much more powerful) to specify an official mapping instead, eg:

$result = $es->put_mapping(
    index   => ['twitter','buzz'],
    type    => 'tweet',
    mapping => {
        _source => { compress => 1 },
        properties  =>  {
            user        =>  {type  =>  "string", index      =>  "not_analyzed"},
            message     =>  {type  =>  "string", null_value =>  "na"},
            post_date   =>  {type  =>  "date"},
            priority    =>  {type  =>  "integer"},
            rank        =>  {type  =>  "float"}
        }
    }
);

See also: http://www.elasticsearch.org/guide/reference/api/admin-indices-put-mapping.html and http://www.elasticsearch.org/guide/reference/mapping

DEPRECATION: put_mapping() previously took the mapping parameters at the top level, eg $es->put_mapping( properties=> { ... }). This form still works, but is deprecated. Instead use the mapping parameter.

delete_mapping()

$result = $es->delete_mapping(
    index           => multi_req,
    type            => single,
    ignore_missing  => 0 | 1,
);

Deletes a mapping/type in one or more indices. See also http://www.elasticsearch.org/guide/reference/api/admin-indices-delete-mapping.html

Throws a Missing exception if the indices or type don't exist and ignore_missing is false.

mapping()

$mapping = $es->mapping(
    index       => single,
    type        => multi
);

Returns the mappings for all types in an index, or the mapping for the specified type(s), eg:

$mapping = $es->mapping(
    index       => 'twitter',
    type        => 'tweet'
);

$mappings = $es->mapping(
    index       => 'twitter',
    type        => ['tweet','user']
);
# { twitter => { tweet => {mapping}, user => {mapping}} }

Note: the index name which as used in the results is the actual index name. If you pass an alias name as the index name, then this key will be the index (or indices) that the alias points to.

See also: http://www.elasticsearch.org/guide/reference/api/admin-indices-get-mapping.html

type_exists()

$result = $e->type_exists(
    index          => multi,             # optional
    type           => multi,             # required
    ignore_indices => 'none' | 'missing',
);

Returns {ok => 1} if all specified types exist in all specified indices, or an empty list if they doesn't.

See http://www.elasticsearch.org/guide/reference/api/admin-indices-types-exists.html

Warmer methods

Index warming allow you to run typical search requests to "warm up" new segments before they become available for search. Warmup searches typically include requests that require heavy loading of data, such as faceting or sorting on specific fields.

create_warmer()

$es->create_warmer(
    warmer        => $warmer,
    index         => multi,
    type          => multi,

    # optional

    query         => { raw query }
  | queryb        => { search builder query },

    filter        => { raw filter }
  | filterb       => { search builder filter},

    facets        => { facets },
    script_fields => { script fields },
    sort          => { sort },
);

Create an index warmer called $warmer: a search which is run whenever a matching index/type segment is about to be brought online.

See elastic/elasticsearch#1913 for more.

warmer()

$result = $es->warmer(
    index          => multi,       # optional
    warmer         => $warmer,     # optional

    ignore_missing => 0 | 1
);

Returns any matching registered warmers. The $warmer can be blank, the name of a particular warmer, or use wilcards, eg "warmer_*". Throws an error if no matching warmer is found, and ignore_missing is false.

See elastic/elasticsearch#1913 for more.

delete_warmer()

$result = $es->delete_warmer(
    index          => multi,       # required
    warmer         => $warmer,     # required

    ignore_missing => 0 | 1
);

Deletes any matching registered warmers. The index parameter is required and can be set to _all to match all indices. The $warmer can be the name of a particular warmer, or use wilcards, eg "warmer_*" or "*" for any warmer. Throws an error if no matching warmer is found, and ignore_missing is false.

See elastic/elasticsearch#1913 for more.

River admin methods

See http://www.elasticsearch.org/guide/reference/river/ and http://www.elasticsearch.org/guide/reference/river/twitter.html.

create_river()

$result = $es->create_river(
    river   => $river_name,     # required
    type    => $type,           # required
    $type   => {...},           # depends on river type
    index   => {...},           # depends on river type
);

Creates a new river with name $name, eg:

$result = $es->create_river(
    river   => 'my_twitter_river',
    type    => 'twitter',
    twitter => {
        user        => 'user',
        password    => 'password',
    },
    index   => {
        index       => 'my_twitter_index',
        type        => 'status',
        bulk_size   => 100
    }
)

get_river()

$result = $es->get_river(
    river           => $river_name,
    ignore_missing  => 0 | 1        # optional
);

Returns the river details eg

$result = $es->get_river ( river => 'my_twitter_river' )

Throws a Missing exception if the river doesn't exist and ignore_missing is false.

delete_river()

$result = $es->delete_river( river => $river_name );

Deletes the corresponding river, eg:

$result = $es->delete_river ( river => 'my_twitter_river' )

See http://www.elasticsearch.org/guide/reference/river/.

river_status()

$result = $es->river_status(
    river           => $river_name,
    ignore_missing  => 0 | 1        # optional
);

Returns the status doc for the named river.

Throws a Missing exception if the river doesn't exist and ignore_missing is false.

Percolate methods

See also: http://www.elasticsearch.org/guide/reference/api/percolate.html and http://www.elasticsearch.org/blog/2011/02/08/percolator.html

create_percolator()

$es->create_percolator(
    index           =>  single
    percolator      =>  $percolator

    # one of queryb or query is required
    query           =>  { native query }
    queryb          =>  { search builder query }

    # optional
    data            =>  {data}
)

Create a percolator, eg:

$es->create_percolator(
    index           => 'myindex',
    percolator      => 'mypercolator',
    queryb          => { field => 'foo'  },
    data            => { color => 'blue' }
)

Note: create_percolator() supports ElasticSearch::SearchBuilder-style queries via the queryb parameter. See "INTEGRATION WITH ElasticSearch::SearchBuilder" for more details.

get_percolator()

$es->get_percolator(
    index           =>  single
    percolator      =>  $percolator,
    ignore_missing  =>  0 | 1,
)

Retrieves a percolator, eg:

$es->get_percolator(
    index           => 'myindex',
    percolator      => 'mypercolator',
)

Throws a Missing exception if the specified index or percolator does not exist, and ignore_missing is false.

delete_percolator()

$es->delete_percolator(
    index           =>  single
    percolator      =>  $percolator,
    ignore_missing  =>  0 | 1,
)

Deletes a percolator, eg:

$es->delete_percolator(
    index           => 'myindex',
    percolator      => 'mypercolator',
)

Throws a Missing exception if the specified index or percolator does not exist, and ignore_missing is false.

percolate()

$result = $es->percolate(
    index           => single,
    type            => single,
    doc             => { doc to percolate },

    # optional
    query           => { query to filter percolators },
    prefer_local    => 1 | 0,
)

Check for any percolators which match a document, optionally filtering which percolators could match by passing a query param, for instance:

$result = $es->percolate(
    index           => 'myindex',
    type            => 'mytype',
    doc             => { text => 'foo' },
    query           => { term => { color => 'blue' }}
);

Returns:

{
    ok      => 1,
    matches => ['mypercolator']
}

Cluster admin methods

cluster_state()

$result = $es->cluster_state(
     # optional
     filter_blocks          => 0 | 1,
     filter_nodes           => 0 | 1,
     filter_metadata        => 0 | 1,
     filter_routing_table   => 0 | 1,
     filter_indices         => [ 'index_1', ... 'index_n' ],
);

Returns cluster state information.

See http://www.elasticsearch.org/guide/reference/api/admin-cluster-state.html

cluster_health()

$result = $es->cluster_health(
    index                         => multi,
    level                         => 'cluster' | 'indices' | 'shards',
    timeout                       => $seconds
    wait_for_status               => 'red' | 'yellow' | 'green',
    | wait_for_relocating_shards  => $number_of_shards,
    | wait_for_nodes              => eg '>=2',
);

Returns the status of the cluster, or index|indices or shards, where the returned status means:

red: Data not allocated
yellow: Primary shard allocated
green: All shards allocated

It can block to wait for a particular status (or better), or can block to wait until the specified number of shards have been relocated (where 0 means all) or the specified number of nodes have been allocated.

If waiting, then a timeout can be specified.

For example:

$result = $es->cluster_health( wait_for_status => 'green', timeout => '10s')

See: http://www.elasticsearch.org/guide/reference/api/admin-cluster-health.html

cluster_settings()

$result = $es->cluster_settings()

Returns any cluster wide settings that have been set with "update_cluster_settings".

See http://www.elasticsearch.org/guide/reference/api/admin-cluster-update-settings.html

update_cluster_settings()

$result = $es->update_cluster_settings(
    persistent  => {...},
    transient   => {...},
)

For example:

$result = $es->update_cluster_settings(
    persistent  => {
        "discovery.zen.minimum_master_nodes" => 2
    },
)

persistent settings will survive a full cluster restart. transient settings won't.

See http://www.elasticsearch.org/guide/reference/api/admin-cluster-update-settings.html

nodes()

$result = $es->nodes(
    nodes       => multi,
    settings    => 0 | 1,
    http        => 0 | 1,
    jvm         => 0 | 1,
    network     => 0 | 1,
    os          => 0 | 1,
    process     => 0 | 1,
    thread_pool => 0 | 1,
    transport   => 0 | 1
);

Returns information about one or more nodes or servers in the cluster.

See: http://www.elasticsearch.org/guide/reference/api/admin-cluster-nodes-info.html

nodes_stats()

$result = $es->nodes_stats(
    node    => multi,

    indices     => 1 | 0,
    clear       => 0 | 1,
    all         => 0 | 1,
    fs          => 0 | 1,
    http        => 0 | 1,
    jvm         => 0 | 1,
    network     => 0 | 1,
    os          => 0 | 1,
    process     => 0 | 1,
    thread_pool => 0 | 1,
    transport   => 0 | 1,

);

Returns various statistics about one or more nodes in the cluster.

See: http://www.elasticsearch.org/guide/reference/api/admin-cluster-nodes-stats.html

cluster_reroute()

$result = $es->cluster_reroute(
    commands => [
        { move => {
              index     => 'test',
              shard     => 0,
              from_node => 'node1',
              to_node   => 'node2',
        }},
        { allocate => {
              index         => 'test',
              shard         => 1,
              node          => 'node3',
              allow_primary => 0 | 1
        }},
        { cancel => {
              index         => 'test',
              shard         => 2,
              node          => 'node4',
              allow_primary => 0 | 1
        }},
    ],
    dry_run  => 0 | 1
);

The "cluster_reroute" command allows you to explicitly affect shard allocation within a cluster. For example, a shard can be moved from one node to another, an allocation can be cancelled, or an unassigned shard can be explicitly allocated on a specific node.

NOTE: after executing the commands, the cluster will automatically rebalance itself if it is out of balance. Use the dry_run parameter to see what the final outcome will be after automatic rebalancing, before executing the real "cluster_reroute" call.

Without any \@commands, the current cluster routing will be returned.

See http://www.elasticsearch.org/guide/reference/api/admin-cluster-reroute.html

shutdown()

$result = $es->shutdown(
    node        => multi,
    delay       => '5s' | '10m'        # optional
);

Shuts down one or more nodes (or the whole cluster if no nodes specified), optionally with a delay.

node can also have the values _local, _master or _all.

See: http://www.elasticsearch.org/guide/reference/api/admin-cluster-nodes-shutdown.html

restart()

$result = $es->restart(
    node        => multi,
    delay       => '5s' | '10m'        # optional
);

Restarts one or more nodes (or the whole cluster if no nodes specified), optionally with a delay.

node can also have the values _local, _master or _all.

See: "KNOWN ISSUES"

current_server_version()

$version = $es->current_server_version()

Returns a HASH containing the version number string and whether or not the current server is a snapshot_build.

Other methods

use_index()/use_type()

use_index() and use_type() can be used to set default values for any index or type parameter. The default value can be overridden by passing a parameter (including undef) to any request.

$es->use_index('one');
$es->use_type(['foo','bar']);

$es->index(                         # index: one, types: foo,bar
    data=>{ text => 'my text' }
);

$es->index(                         # index: two, type: foo,bar
    index=>'two',
    data=>{ text => 'my text' }
)

$es->search( type => undef );       # index: one, type: all

trace_calls()

$es->trace_calls(1);            # log to STDERR
$es->trace_calls($filename);    # log to $filename.$PID
$es->trace_calls(\*STDOUT);     # log to STDOUT
$es->trace_calls($fh);          # log to given filehandle
$es->trace_calls(0 | undef);    # disable logging

trace_calls() is used for debugging. All requests to the cluster are logged either to STDERR, or the specified filehandle, or the specified filename, with the current $PID appended, in a form that can be rerun with curl.

The cluster response will also be logged, and commented out.

Example: $es->cluster_health is logged as:

# [Tue Oct 19 15:32:31 2010] Protocol: http, Server: 127.0.0.1:9200
curl -XGET 'http://127.0.0.1:9200/_cluster/health'

# [Tue Oct 19 15:32:31 2010] Response:
# {
#    "relocating_shards" : 0,
#    "active_shards" : 0,
#    "status" : "green",
#    "cluster_name" : "elasticsearch",
#    "active_primary_shards" : 0,
#    "timed_out" : false,
#    "initializing_shards" : 0,
#    "number_of_nodes" : 1,
#    "unassigned_shards" : 0
# }

query_parser()

$qp = $es->query_parser(%opts);

Returns an ElasticSearch::QueryParser object for tidying up query strings so that they won't cause an error when passed to ElasticSearch.

See ElasticSearch::QueryParser for more information.

transport()

$transport = $es->transport

Returns the Transport object, eg ElasticSearch::Transport::HTTP.

timeout()

$timeout = $es->timeout($timeout)

Convenience method which does the same as:

$es->transport->timeout($timeout)

refresh_servers()

$es->refresh_servers()

Convenience method which does the same as:

$es->transport->refresh_servers()

This tries to retrieve a list of all known live servers in the ElasticSearch cluster by connecting to each of the last known live servers (and the initial list of servers passed to new()) until it succeeds.

This list of live servers is then used in a round-robin fashion.

refresh_servers() is called on the first request and every max_requests. This automatic refresh can be disabled by setting max_requests to 0:

$es->transport->max_requests(0)

Or:

$es = ElasticSearch->new(
        servers         => '127.0.0.1:9200',
        max_requests    => 0,
);

builder_class() | builder()

The builder_class is set to ElasticSearch::SearchBuilder by default. This can be changed, eg:

$es = ElasticSearch->new(
        servers         => '127.0.0.1:9200',
        builder_class   => 'My::Builder'
);

builder() will require the module set in builder_class(), create an instance, and store that instance for future use. The builder_class should implement the filter() and query() methods.

camel_case()

$bool = $es->camel_case($bool)

Gets/sets the camel_case flag. If true, then all JSON keys returned by ElasticSearch are in camelCase, instead of with_underscores. This flag does not apply to the source document being indexed or fetched.

Defaults to false.

error_trace()

$bool = $es->error_trace($bool)

If the ElasticSearch server is returning an error, setting error_trace to true will return some internal information about where the error originates. Mostly useful for debugging.

GLOBAL VARIABLES

$Elasticsearch::DEBUG = 0 | 1;

If $Elasticsearch::DEBUG is set to true, then ElasticSearch exceptions will include a stack trace.

AUTHOR

Clinton Gormley, <drtech at cpan.org>

KNOWN ISSUES

"get()"

The _source key that is returned from a "get()" contains the original JSON string that was used to index the document initially. ElasticSearch parses JSON more leniently than JSON::XS, so if invalid JSON is used to index the document (eg unquoted keys) then $es->get(....) will fail with a JSON exception.

Any documents indexed via this module will be not susceptible to this problem.

"restart()"

restart() is currently disabled in ElasticSearch as it doesn't work correctly. Instead you can "shutdown()" one or all nodes and then start them up from the command line.

BUGS

This is a stable AP - but it will evolve as the API of ElasticSearch itself changes.

If you have any suggestions for improvements, or find any bugs, please report them to http://github.com/clintongormley/ElasticSearch.pm/issues. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

perldoc ElasticSearch

You can also look for information at:

TEST SUITE

The full test suite requires a live ElasticSearch cluster to run. CPAN testers doesn't support this. You can see full test results here: http://travis-ci.org/#!/clintongormley/ElasticSearch/builds.

To run the full test suite locally, run it as:

perl Makefile.PL
ES_HOME=/path/to/elasticsearch make test

ACKNOWLEDGEMENTS

Thanks to Shay Banon, the ElasticSearch author, for producing an amazingly easy to use search engine.

LICENSE AND COPYRIGHT

Copyright 2010 - 2011 Clinton Gormley.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.

Packages

No packages published

Languages

  • Perl 100.0%