elastic · seang-es · Apr 1, 2014 · Apr 1, 2014 · Apr 4, 2014 · Apr 5, 2014
diff --git a/README.textile b/README.textile
@@ -86,7 +86,7 @@ We can also use the JSON query language Elasticsearch provides instead of a quer
 curl -XGET 'http://localhost:9200/twitter/tweet/_search?pretty=true' -d '
 { 
     "query" : { 
-        "text" : { "user": "kimchy" }
+        "match" : { "user": "kimchy" }
     } 
 }'
 </pre>
@@ -206,6 +206,10 @@ The distribution will be created under @target/releases@.
 See the "TESTING":TESTING.asciidoc file for more information about
 running the Elasticsearch test suite.
 
+h3. Upgrading to Elasticsearch 1.x?
+
+In order to ensure a smooth upgrade process from earlier versions of Elasticsearch (< 1.0.0), it is recommended to perform a full cluster restart. Please see the "Upgrading" section of the "setup reference":http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup.html.
+
 h1. License
 
 <pre>

diff --git a/TESTING.asciidoc b/TESTING.asciidoc
@@ -186,22 +186,18 @@ mvn test -Dtests.class=org.elasticsearch.test.rest.ElasticsearchRestTests
 `ElasticsearchRestTests` is the executable test class that runs all the
 yaml suites available within the `rest-api-spec` folder.
 
-The following are the options supported by the REST tests runner:
+The REST tests support all the options provided by the randomized runner, plus the following:
 
-* `tests.rest[true|false|host:port]`: determines whether the REST tests need
-to be run and if so whether to rely on an external cluster (providing host
-and port) or fire a test cluster (default). It's possible to provide a
-comma separated list of addresses to send requests in a round-robin fashion.
+* `tests.rest[true|false]`: determines whether the REST tests need to be run (default) or not.
 * `tests.rest.suite`: comma separated paths of the test suites to be run
 (by default loaded from /rest-api-spec/test). It is possible to run only a subset
 of the tests providing a sub-folder or even a single yaml file (the default
 /rest-api-spec/test prefix is optional when files are loaded from classpath)
 e.g. -Dtests.rest.suite=index,get,create/10_with_id
-* `tests.rest.section`: regex that allows to filter the test sections that
-are going to be run. If provided, only the section names that match (case
-insensitive) against it will be executed
 * `tests.rest.spec`: REST spec path (default /rest-api-spec/api)
-* `tests.iters`: runs multiple iterations
-* `tests.seed`: seed to base the random behaviours on
-* `tests.appendseed[true|false]`: enables adding the seed to each test
-section's description (default false)
+
+Note that the REST tests, like all the integration tests, can be run against an external
+cluster by specifying the `tests.cluster` property, which if present needs to contain a
+comma separated list of nodes to connect to (e.g. localhost:9300). A transport client will
+be created based on that and used for all the before|after test operations, and to extract
+the http addresses of the nodes so that REST requests can be sent to them.
diff --git a/dev-tools/build_release.py b/dev-tools/build_release.py
@@ -388,7 +388,7 @@ def smoke_test_release(release, files, expected_hash, plugins):
           if version['build_hash'].strip() !=  expected_hash:
             raise RuntimeError('HEAD hash does not match expected [%s] but got [%s]' % (expected_hash, version['build_hash']))
           print('  Running REST Spec tests against package [%s]' % release_file)
-          run_mvn('test -Dtests.rest=%s -Dtests.class=*.*RestTests' % ("127.0.0.1:9200"))
+          run_mvn('test -Dtests.cluster=%s -Dtests.class=*.*RestTests' % ("127.0.0.1:9300"))
           print('  Verify if plugins are listed in _nodes')
           conn.request('GET', '/_nodes?plugin=true&pretty=true')
           res = conn.getresponse()

diff --git a/docs/community/clients.asciidoc b/docs/community/clients.asciidoc
@@ -39,15 +39,15 @@ See the {client}/ruby-api/current/index.html[official Elasticsearch Ruby client]
 * http://github.com/karmi/tire[Tire]:
   Ruby API & DSL, with ActiveRecord/ActiveModel integration.
 
-* http://github.com/grantr/rubberband[rubberband]:
-  Ruby client.
-
 * https://github.com/PoseBiz/stretcher[stretcher]:
   Ruby client.
 
 * https://github.com/wireframe/elastic_searchable/[elastic_searchable]:
   Ruby client + Rails integration.
 
+* https://github.com/ddnexus/flex[Flex]:
+  Ruby Client.
+
 
 [[community-php]]
 === PHP
@@ -62,6 +62,8 @@ See the {client}/php-api/current/index.html[official Elasticsearch PHP client].
 * http://github.com/polyfractal/Sherlock[Sherlock]:
   PHP client, one-to-one mapping with query DSL, fluid interface.
 
+* https://github.com/nervetattoo/elasticsearch[elasticsearch]
+  PHP 5.3 client
 
 [[community-java]]
 === Java
@@ -184,3 +186,7 @@ See the {client}/javascript-api/current/index.html[official Elasticsearch JavaSc
 * https://github.com/jasonfill/ColdFusion-ElasticSearch-Client[ColdFusion-Elasticsearch-Client]
   Cold Fusion client for Elasticsearch
 
+[[community-nodejs]]
+=== NodeJS
+* https://github.com/phillro/node-elasticsearch-client[Node-Elasticsearch-Client]
+  A node.js client for elasticsearch
diff --git a/docs/community/misc.asciidoc b/docs/community/misc.asciidoc
@@ -1,15 +1,12 @@
 [[misc]]
 == Misc
 
-* https://github.com/electrical/puppet-elasticsearch[Puppet]:
+* https://github.com/elasticsearch/puppet-elasticsearch[Puppet]:
   Elasticsearch puppet module.
 
 * http://github.com/elasticsearch/cookbook-elasticsearch[Chef]:
   Chef cookbook for Elasticsearch
 
-* https://github.com/tavisto/elasticsearch-rpms[elasticsearch-rpms]:
-  RPMs for elasticsearch.
-
 * http://www.github.com/neogenix/daikon[daikon]:
   Daikon Elasticsearch CLI
 

diff --git a/docs/reference/cluster/nodes-info.asciidoc b/docs/reference/cluster/nodes-info.asciidoc
@@ -40,7 +40,6 @@ plugins per node:
 * `site`: `true` if the plugin is a site plugin
 * `jvm`: `true` if the plugin is a plugin running in the JVM
 * `url`: URL if the plugin is a site plugin
-* `isolation`: whether the plugin is loaded in isolation (`true`) or not (`false`)
 
 The result will look similar to:
 

diff --git a/docs/reference/cluster/update-settings.asciidoc b/docs/reference/cluster/update-settings.asciidoc
@@ -65,22 +65,32 @@ There is a specific list of settings that can be updated, those include:
 
 [float]
 ===== Balanced Shards
+All these values are relative to one another.  The first three are used to
+compose a three separate weighting functions into one.  The cluster is balanced
+when no allowed action can bring the weights of each node closer together by
+more then the fourth setting.  Actions might not be allowed, for instance,
+due to forced awareness or allocation filtering.
 
 `cluster.routing.allocation.balance.shard`::
-     Defines the weight factor for shards allocated on a node 
-     (float). Defaults to `0.45f`.
+     Defines the weight factor for shards allocated on a node
+     (float). Defaults to `0.45f`.  Raising this raises the tendency to
+     equalize the number of shards across all nodes in the cluster.
 
 `cluster.routing.allocation.balance.index`::
-     Defines a factor to the number of shards per index allocated 
-      on a specific node (float). Defaults to `0.5f`.
+     Defines a factor to the number of shards per index allocated
+      on a specific node (float). Defaults to `0.5f`.  Raising this raises the
+      tendency to equalize the number of shards per index across all nodes in
+      the cluster.
 
 `cluster.routing.allocation.balance.primary`::
-      defines a weight factor for the number of primaries of a specific index 
-      allocated on a node (float). `0.05f`.
+     Defines a weight factor for the number of primaries of a specific index
+      allocated on a node (float). `0.05f`.  Raising this raises the tendency
+      to equalize the number of primary shards across all nodes in the cluster.
 
 `cluster.routing.allocation.balance.threshold`::
-      minimal optimization value of operations that should be performed (non 
-      negative float). Defaults to `1.0f`.
+     Minimal optimization value of operations that should be performed (non
+      negative float). Defaults to `1.0f`.  Raising this will cause the cluster
+      to be less aggressive about optimizing the shard balance.
 
 [float]
 ===== Concurrent Rebalance

diff --git a/docs/reference/index-modules/fielddata.asciidoc b/docs/reference/index-modules/fielddata.asciidoc
@@ -124,6 +124,41 @@ field data format.
 `doc_values`::
     Computes and stores field data data-structures on disk at indexing time.
 
+[float]
+==== Global ordinals
+
+coming[1.2.0]
+
+Global ordinals is a data-structure on top of field data, that maintains an
+incremental numbering for all the terms in field data in a lexicographic order.
+Each term has a unique number and the number of term 'A' is lower than the number
+of term 'B'. Global ordinals are only supported on string fields.
+
+Field data on string also has ordinals, which is a unique numbering for all terms
+in a particular segment and field. Global ordinals just build on top of this,
+by providing a mapping between the segment ordinals and the global ordinals.
+The latter being unique across the entire shard.
+
+Global ordinals can be beneficial in search features that use segment ordinals already
+such as the terms aggregator to improve the execution time. Often these search features
+need to merge the segment ordinal results to a cross segment terms result. With
+global ordinals this mapping happens during field data load time instead of during each
+query execution. With global ordinals search features only need to resolve the actual
+term when building the (shard) response, but during the execution there is no need
+at all to use the actual terms and the unique numbering global ordinals provided is
+sufficient and improves the execution time.
+
+Global ordinals for a specified field are tied to all the segments of a shard (Lucene index),
+which is different than for field data for a specific field which is tied to a single segment.
+For this reason global ordinals need to be rebuilt in its entirety once new segments
+become visible. This one time cost would happen anyway without global ordinals, but
+then it would happen for each search execution instead!
+
+The loading time of global ordinals depends on the number of terms in a field, but in general
+it is low, since it source field data has already been loaded. The memory overhead of global
+ordinals is a small because it is very efficiently compressed. Eager loading of global ordinals
+can move the loading time from the first search request, to the refresh itself.
+
 [float]
 === Fielddata loading
 
@@ -147,6 +182,23 @@ It is possible to force field data to be loaded and cached eagerly through the
 }
 --------------------------------------------------
 
+Global ordinals can also be eagerly loaded:
+
+[source,js]
+--------------------------------------------------
+{
+    category: {
+        type:      "string",
+        fielddata: {
+            loading: "eager_global_ordinals"
+        }
+    }
+}
+--------------------------------------------------
+
+With the above setting both field data and global ordinals for a specific field
+are eagerly loaded.
+
 [float]
 ==== Disabling field data loading
 

diff --git a/docs/reference/index-modules/similarity.asciidoc b/docs/reference/index-modules/similarity.asciidoc
@@ -121,6 +121,31 @@ based model] . This similarity has the following options:
 
 Type name: `IB`
 
+[float]
+[[lm_dirichlet]]
+==== LM Dirichlet similarity.
+
+http://lucene.apache.org/core/4_7_1/core/org/apache/lucene/search/similarities/LMDirichletSimilarity.html[LM
+Dirichlet similarity] . This similarity has the following options:
+
+[horizontal]
+`mu`::  Default to `2000`.
+
+Type name: `LMDirichlet`
+
+[float]
+[[lm_jelinek_mercer]]
+==== LM Jelinek Mercer similarity.
+
+http://lucene.apache.org/core/4_7_1/core/org/apache/lucene/search/similarities/LMJelinekMercerSimilarity.html[LM
+Jelinek Mercer similarity] . This similarity has the following options:
+
+[horizontal]
+`lambda`::  The optimal value depends on both the collection and the query. The optimal value is around `0.1`
+for title queries and `0.7` for long queries. Default to `0.1`.
+
+Type name: `LMJelinekMercer`
+
 [float]
 [[default-base]]
 ==== Default and Base Similarities

diff --git a/docs/reference/mapping/types/core-types.asciidoc b/docs/reference/mapping/types/core-types.asciidoc
@@ -446,6 +446,7 @@ Defaults to the property/field name.
 |`store` |Set to `true` to store actual field in the index, `false` to not
 store it. Defaults to `false` (note, the JSON document itself is stored,
 and it can be retrieved from it).
+|`doc_values` |Set to `true` to store field values in a column-stride fashion.
 |=======================================================================
 
 [float]

diff --git a/docs/reference/modules/advanced-scripting.asciidoc b/docs/reference/modules/advanced-scripting.asciidoc
@@ -177,7 +177,7 @@ return score;
 === Term vectors:
 
 The `_index` variable can only be used to gather statistics for single terms. If you want to use information on all terms in a field, you must store the term vectors (set `term_vector` in the mapping as described in the <<mapping-core-types,mapping documentation>>). To access them, call
-`_index.getTermVectors()` to get a
+`_index.termVectors()` to get a
 https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/index/Fields.html[Fields]
 instance. This object can then be used as described in https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/index/Fields.html[lucene doc] to iterate over fields and then for each field iterate over each term in the field.
 The method will return null if the term vectors were not stored.

diff --git a/docs/reference/modules/plugins.asciidoc b/docs/reference/modules/plugins.asciidoc
@@ -142,20 +142,6 @@ bin/plugin --install mobz/elasticsearch-head --timeout 1m
 bin/plugin --install mobz/elasticsearch-head --timeout 0
 -----------------------------------
 
-added[1.1.0]
-[float]
-==== Plugins isolation
-
-Since Elasticsearch 1.1, by default, each plugin is loaded in _isolation_ (in its dedicated `ClassLoader`) to avoid class clashes between the various plugins and their associated libraries. The default can be changed through the `plugins.isolation` property in `elasticsearch.yml`, by setting it to `false`:
-
-[source,js]
---------------------------------------------------
-plugins.isolation: false
---------------------------------------------------
-
-Do note that each plugin can specify its _mandatory_ isolation through the `isolation` property in its `es-plugin.properties` configuration. In this (rare) case, the plugin setting is used, overwriting whatever default used by Elasticsearch.
-
-
 [float]
 [[known-plugins]]
 === Known Plugins

diff --git a/docs/reference/search/aggregations/bucket/global-aggregation.asciidoc b/docs/reference/search/aggregations/bucket/global-aggregation.asciidoc
@@ -28,7 +28,7 @@ Example:
 <1> The `global` aggregation has an empty body
 <2> The sub-aggregations that are registered for this `global` aggregation
 
-The above aggregation demonstrates how one would compute aggregations (`avg_price` in this example) on all the documents in the search context, regardless of the query (in our example, it will compute the the average price over all products in our catalog, not just on the "shirts").
+The above aggregation demonstrates how one would compute aggregations (`avg_price` in this example) on all the documents in the search context, regardless of the query (in our example, it will compute the average price over all products in our catalog, not just on the "shirts").
 
 The response for the above aggreation:
 
@@ -48,4 +48,4 @@ The response for the above aggreation:
 }
 --------------------------------------------------
 
-<1> The number of documents that were aggregated (in our case, all documents within the search context)
+<1> The number of documents that were aggregated (in our case, all documents within the search context)
diff --git a/docs/reference/search/aggregations/bucket/terms-aggregation.asciidoc b/docs/reference/search/aggregations/bucket/terms-aggregation.asciidoc
@@ -310,12 +310,15 @@ http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#UNIX_LINES
 
 ==== Execution hint
 
-There are two mechanisms by which terms aggregations can be executed: either by using field values directly in order to aggregate
-data per-bucket (`map`), or by using ordinals of the field values instead of the values themselves (`ordinals`). Although the
-latter execution mode can be expected to be slightly faster, it is only available for use when the underlying data source exposes
-those terms ordinals. Moreover, it may actually be slower if most field values are unique. Elasticsearch tries to have sensible
-defaults when it comes to the execution mode that should be used, but in case you know that one execution mode may perform better
-than the other one, you have the ability to "hint" it to Elasticsearch:
+coming[1.2.0] The `global_ordinals` execution mode
+
+There are three mechanisms by which terms aggregations can be executed: either by using field values directly in order to aggregate
+data per-bucket (`map`), by using ordinals of the field values instead of the values themselves (`ordinals`) or by using global
+ordinals of the field (`global_ordinals`). The latter is faster, especially for fields with many unique
+values. However it can be slower if only a few documents match, when for example a terms aggregator is nested in another
+aggregator, this applies for both `ordinals` and `global_ordinals` execution modes. Elasticsearch tries to have sensible
+defaults when it comes to the execution mode that should be used, but  in case you know that one execution mode may
+perform better than the other one, you have the ability to "hint" it to Elasticsearch:
 
 [source,js]
 --------------------------------------------------
@@ -331,6 +334,6 @@ than the other one, you have the ability to "hint" it to Elasticsearch:
 }
 --------------------------------------------------
 
-<1> the possible values are `map` and `ordinals`
+<1> the possible values are `map`, `ordinals` and `global_ordinals`
 
 Please note that Elasticsearch will ignore this execution hint if it is not applicable.
diff --git a/docs/reference/search/request/search-type.asciidoc b/docs/reference/search/request/search-type.asciidoc
@@ -109,7 +109,7 @@ curl -XGET 'localhost:9200/_search?search_type=scan&scroll=10m&size=50' -d '
 '
 --------------------------------------------------
 
-The `scroll` parameter control the keep alive time of the scrolling
+The `scroll` parameter controls the keep alive time of the scrolling
 request and initiates the scrolling process. The timeout applies per
 round trip (i.e. between the previous scan scroll request, to the next).
 

diff --git a/docs/reference/setup.asciidoc b/docs/reference/setup.asciidoc
@@ -61,3 +61,5 @@ include::setup/as-a-service-win.asciidoc[]
 include::setup/dir-layout.asciidoc[]
 
 include::setup/repositories.asciidoc[]
+
+include::setup/upgrade.asciidoc[]
Original file line number	Diff line number	Diff line change
Expand Up		@@ -61,3 +61,5 @@ include::setup/as-a-service-win.asciidoc[]
		include::setup/dir-layout.asciidoc[]

		include::setup/repositories.asciidoc[]

		include::setup/upgrade.asciidoc[]