Merge pull request #702 from pachyderm/derekchiang-patch-3

Update README.md
pachyderm · Aug 5, 2016 · 9a683a7 · 9a683a7
2 parents 2bd83e6 + d7651bb
commit 9a683a7
Showing 1 changed file with 37 additions and 5 deletions.
diff --git a/README.md b/README.md
@@ -2,25 +2,57 @@
 [![GitHub release](https://img.shields.io/github/release/pachyderm/pachyderm.svg?style=flat-square)](https://github.com/pachyderm/pachyderm/releases)
 [![GitHub license](https://img.shields.io/github/license/pachyderm/pachyderm.svg?style=flat-square)](https://github.com/pachyderm/pachyderm/blob/master/LICENSE)
 
-* [News](#news)
 * [Getting Started](http://pachyderm.readthedocs.io/)
+* [What is Pachyderm?](#what-is-pachyderm)
+* [What's new about Pachyderm? (How is it different from Hadoop?)](#whats-new-about-pachyderm-how-is-it-different-from-hadoop)
 * [Contributing](#contributing)
+* [Join Us](#join-us)
 * [Usage Metrics](#usage-metrics)
 
-### News
-
-WE'RE HIRING! Love Docker, Go and distributed systems? Learn more about [our team](http://www.pachyderm.io/jobs.html) and email us at jobs@pachyderm.io.
-
 ### Getting Started
 
 Refer to our [developer docs](http://pachyderm.readthedocs.io) to get started.
 
+### What is Pachyderm?
+
+Pachyderm is a software platform the supports the storage and processing of large data sets.
+Pachyderm is inspired by the Hadoop ecosystem but _shares no code_ with it.
+Instead, we leverage the container ecosystem to provide the broad functionality
+of Hadoop with the ease of use of Docker.
+
+### What's new about Pachyderm? (How is it different from Hadoop?)
+
+There are two bold new ideas in Pachyderm:
+
+- Containers as the core processing primitive
+- Version Control for data
+
+These ideas lead directly to a system that's much more powerful, flexible and easy to use. 
+
+To process data, you simply create a containerized program which reads and writes to the **local filesystem**. You can use _any_ tools you want because it's all just going in a container! Pachyderm will take your container and inject data into it by way of a FUSE volume. We'll then automatically replicate your container, showing each copy a different chunk of data. With this technique, Pachyderm can scale any code you write to process up to petabytes of data (Example: [distributed grep](https://github.com/pachyderm/pachyderm/tree/master/examples/fruit_stand)).
+
+Pachyderm also version controls all data using a commit-based distributed
+filesystem (PFS), similar to what git does with code. Version control for data
+has far reaching consequences in a distributed filesystem. You get the full
+history of your data, it's much easier to collaborate with teammates, and if
+anything goes wrong you can revert _the entire cluster_ with one click!
+
+Version control is also very synergistic with our containerized processing
+engine. Pachyderm understands how your data changes and thus, as new data
+is ingested, can run your workload on the _diff_ of the data rather than the
+whole thing. This means that there's no difference between a batched job and
+a streaming job, the same code will work for both!
+
 ### Contributing
 
 To get started, sign the [Contributor License Agreement](https://pachyderm.wufoo.com/forms/pachyderm-contributor-license-agreement).
 
 Send us PRs, we would love to see what you do!
 
+### Join Us
+
+WE'RE HIRING! Love Docker, Go and distributed systems? Learn more about [our team](http://www.pachyderm.io/jobs.html) and email us at jobs@pachyderm.io.
+
 ### Usage Metrics
 
 Pachyderm automatically reports anonymized usage metrics. These metrics help us