Skip to content

Commit

Permalink
HIVE-2598. Update README.txt file to use description from wiki
Browse files Browse the repository at this point in the history
(Carl Steinbach via jvs)



git-svn-id: https://svn.apache.org/repos/asf/hive/trunk@1203885 13f79535-47bb-0310-9956-ffa450edef68
  • Loading branch information
John Sichi committed Nov 18, 2011
1 parent 4910f33 commit 8a19652
Showing 1 changed file with 24 additions and 11 deletions.
35 changes: 24 additions & 11 deletions README.txt
@@ -1,14 +1,27 @@
Apache Hive @VERSION@
=================

Apache Hive is a data warehouse system for Hadoop that facilitates
easy data summarization, ad-hoc querying and analysis of large
datasets stored in Hadoop compatible file systems. Hive provides a
mechanism to put structure on this data and query the data using a
SQL-like language called HiveQL. At the same time this language also
allows traditional map/reduce programmers to plug in their custom
mappers and reducers when it is inconvenient or inefficient to express
this logic in HiveQL.
Apache Hive (TM) @VERSION@
======================

The Apache Hive (TM) data warehouse software facilitates querying and
managing large datasets residing in distributed storage. Built on top
of Apache Hadoop (TM), it provides:

* Tools to enable easy data extract/transform/load (ETL)

* A mechanism to impose structure on a variety of data formats

* Access to files stored either directly in Apache HDFS (TM) or in other
data storage systems such as Apache HBase (TM)

* Query execution via MapReduce

Hive defines a simple SQL-like query language, called QL, that enables
users familiar with SQL to query the data. At the same time, this
language also allows programmers who are familiar with the MapReduce
framework to be able to plug in their custom mappers and reducers to
perform more sophisticated analysis that may not be supported by the
built-in capabilities of the language. QL can also be extended with
custom scalar functions (UDF's), aggregations (UDAF's), and table
functions (UDTF's).

Please note that Hadoop is a batch processing system and Hadoop jobs
tend to have high latency and incur substantial overheads in job
Expand Down

0 comments on commit 8a19652

Please sign in to comment.