Permalink
Browse files

init

  • Loading branch information...
0 parents commit 270511a7f8272105fc517a3ac1862a0002de01bb @mhausenblas committed Nov 2, 2012
@@ -0,0 +1 @@
+.DS_Store
@@ -0,0 +1,43 @@
+# Apache Big Data projects
+
+Based on and motivated by the following resources:
+
+* Apache [project list](http://projects.apache.org/indexes/category.html)
+* Edd Dumbill's [What is Apache Hadoop?](http://strata.oreilly.com/2012/02/what-is-apache-hadoop.html)
+* Edd Dumbill's [The SMAQ stack for big data](http://strata.oreilly.com/2010/09/the-smaq-stack-for-big-data.htm)
+* My [Interactive analysis of large-scale datasets](http://webofdata.wordpress.com/2012/09/02/large-scale-interactive-analysis/) post
+
+## Top-level
+
+* Accumulo, http://accumulo.apache.org/ - a sorted, distributed key/value store
+* Cassandra, http://cassandra.apache.org/ - column-oriented database
+* Cayenne, http://cayenne.apache.org/ - object-relational mapping (ORM) and remoting services
+* CouchDB, http://couchdb.apache.org/ - NoSQL document-oriented datastore
+* Gora, http://gora.apache.org/ - provides an in-memory data model and persistence for big data
+* Hadoop, http://hadoop.apache.org/ - a distributed computing platform:
+ * HDFS - distributed redundant file system for Hadoop
+ * MapReduce - parallel computation on server clusters
+* HBase, http://hbase.apache.org/ - column-oriented database on top of Hadoop
+* Hive, http://hive.apache.org/ - data warehouse with SQL-like access
+* Flume, http://flume.apache.org/ - collection and import of log and event data
+* Lucene, http://lucene.apache.org/ - indexing
+* Mahout, http://mahout.apache.org/ - library of machine learning and data mining algorithms on top of Hadoop
+* Pig, http://pig.apache.org/ - high-level programming language for Hadoop computations
+* Oozie, http://oozie.apache.org/ - orchestration and workflow management for Hadoop
+* Solr, http://lucene.apache.org/solr/ - Lucene-based enterprise search platform
+* Sqoop, http://sqoop.apache.org/ - imports data from relational databases into Hadoop
+* Whirr, http://whirr.apache.org/ - cloud-agnostic deployment of clusters
+* Zookeeper, http://zookeeper.apache.org/ - configuration management and coordination
+
+## Incubator
+
+* Ambari, http://incubator.apache.org/ambari/ - deployment, configuration and monitoring of Hadoop clusters
+* Blur, http://incubator.apache.org/blur/ - search platform for searching massive amounts of data in a cloud computing environment
+* Chukwa, http://incubator.apache.org/chukwa/ - log collection and analysis framework for Apache Hadoop clusters
+* Crunch, http://incubator.apache.org/crunch/ - a Java library for writing, testing, and running pipelines of MapReduce jobs
+* Drill, http://incubator.apache.org/drill/ - interactive analysis of large-scale data
+* HCatalog, http://incubator.apache.org/hcatalog/ - schema and data type sharing over Pig, Hive and MapReduce
+* Kafka, http://incubator.apache.org/kafka/ - distributed publish-subscribe messaging system
+* Mesos, http://incubator.apache.org/mesos/ - a cluster manager that provides resource sharing and isolation across cluster applications
+* S4, http://incubator.apache.org/s4/ - distributed platform for processing continuous unbounded streams of data
+* Tashi, http://incubator.apache.org/tashi/ - infrastructure for service providers to build applications harnessing cluster computing resources to efficiently access repositories of rich data
Oops, something went wrong.

0 comments on commit 270511a

Please sign in to comment.