diff --git a/docs/src/reference/asciidoc/core/arch.adoc b/docs/src/reference/asciidoc/core/arch.adoc index 4ca6acdb8..0d4ec58d4 100644 --- a/docs/src/reference/asciidoc/core/arch.adoc +++ b/docs/src/reference/asciidoc/core/arch.adoc @@ -9,7 +9,7 @@ At the core, {eh} integrates two _distributed_ systems: *Hadoop*, a distributed [float] === {mr} and Shards -A critical component for scalability is parallelism or splitting a task into multiple, smaller ones that execute at the same time, on different nodes in the cluster. The concept is present in both Hadoop through its `splits` (the number of parts in which a source or input can be divided) and {es} through {ref}/glossary.html#glossary-shard[`shards`] (the number of parts in which a index is divided into). +A critical component for scalability is parallelism or splitting a task into multiple, smaller ones that execute at the same time, on different nodes in the cluster. The concept is present in both Hadoop through its `splits` (the number of parts in which a source or input can be divided) and {es} through {glossary}/terms.html#glossary-shard[`shards`] (the number of parts in which a index is divided into). In short, roughly speaking more input splits means more tasks that can read at the same time, different parts of the source. More shards means more 'buckets' from which to read an index content (at the same time). diff --git a/docs/src/reference/asciidoc/core/mr.adoc b/docs/src/reference/asciidoc/core/mr.adoc index aa23121cc..80b19ff3b 100644 --- a/docs/src/reference/asciidoc/core/mr.adoc +++ b/docs/src/reference/asciidoc/core/mr.adoc @@ -43,7 +43,7 @@ Simply use the configuration object when constructing the Hadoop job and you are [float] === Writing data to {es} -With {eh}, {mr} jobs can write data to {es} making it searchable through {ref}/glossary.html#glossary-index[indexes]. {eh} supports both (so-called) http://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/mapred/package-use.html['old'] and http://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/mapreduce/package-use.html['new'] Hadoop APIs. +With {eh}, {mr} jobs can write data to {es} making it searchable through {glossary}/terms.html#glossary-index[indexes]. {eh} supports both (so-called) http://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/mapred/package-use.html['old'] and http://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/mapreduce/package-use.html['new'] Hadoop APIs. `EsOutputFormat` expects a `Map` representing a _document_ value that is converted internally into a JSON document and indexed in {es}. Hadoop `OutputFormat` requires implementations to expect a key and a value however, since for {es} only the document (that is the value) is necessary, `EsOutputFormat`