Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/src/reference/asciidoc/core/arch.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ At the core, {eh} integrates two _distributed_ systems: *Hadoop*, a distributed
[float]
=== {mr} and Shards

A critical component for scalability is parallelism or splitting a task into multiple, smaller ones that execute at the same time, on different nodes in the cluster. The concept is present in both Hadoop through its `splits` (the number of parts in which a source or input can be divided) and {es} through {ref}/glossary.html#glossary-shard[`shards`] (the number of parts in which a index is divided into).
A critical component for scalability is parallelism or splitting a task into multiple, smaller ones that execute at the same time, on different nodes in the cluster. The concept is present in both Hadoop through its `splits` (the number of parts in which a source or input can be divided) and {es} through {glossary}/terms.html#glossary-shard[`shards`] (the number of parts in which a index is divided into).

In short, roughly speaking more input splits means more tasks that can read at the same time, different parts of the source. More shards means more 'buckets' from which to read an index content (at the same time).

Expand Down
2 changes: 1 addition & 1 deletion docs/src/reference/asciidoc/core/mr.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ Simply use the configuration object when constructing the Hadoop job and you are
[float]
=== Writing data to {es}

With {eh}, {mr} jobs can write data to {es} making it searchable through {ref}/glossary.html#glossary-index[indexes]. {eh} supports both (so-called) http://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/mapred/package-use.html['old'] and http://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/mapreduce/package-use.html['new'] Hadoop APIs.
With {eh}, {mr} jobs can write data to {es} making it searchable through {glossary}/terms.html#glossary-index[indexes]. {eh} supports both (so-called) http://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/mapred/package-use.html['old'] and http://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/mapreduce/package-use.html['new'] Hadoop APIs.

`EsOutputFormat` expects a `Map<Writable, Writable>` representing a _document_ value that is converted internally into a JSON document and indexed in {es}.
Hadoop `OutputFormat` requires implementations to expect a key and a value however, since for {es} only the document (that is the value) is necessary, `EsOutputFormat`
Expand Down