Skip to content
Mirror of Apache Hive
Java C++ PHP Python Ruby C Other
Branch: 0.7.0-pentaho
Clone or download
Pull request Compare This branch is 38 commits ahead, 12712 commits behind apache:master.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
ant HIVE-1867 Add mechanism for disabling tests with intermittent failure… Mar 15, 2011
bin HIVE-1817. Remove Hive dependency on unreleased commons-cli 2.0 Snapshot Feb 16, 2011
checkstyle HIVE-1198. When checkstyle is activated for Hive in Eclipse environme… May 29, 2010
cli HIVE-1817. Remove Hive dependency on unreleased commons-cli 2.0 Snapshot Feb 16, 2011
common HIVE-2059. Add datanucleus.identifierFactory property HiveConf to avoid Mar 17, 2011
conf HIVE-1998. Update README.txt and add missing ASF headers Feb 20, 2011
contrib HIVE-1867 Add mechanism for disabling tests with intermittent failure… Mar 15, 2011
data Fixing test cases Jun 14, 2012
docs HIVE-1415: add CLI command for executing a SQL script Dec 9, 2010
eclipse-templates HIVE-1817. Remove Hive dependency on unreleased commons-cli 2.0 Snapshot Feb 16, 2011
hbase-handler HIVE-1867 Add mechanism for disabling tests with intermittent failure… Mar 15, 2011
hwi HIVE-1998. Update README.txt and add missing ASF headers Feb 20, 2011
ivy [ENGOPS-503] shared file update from distribute.groovy Aug 26, 2014
jdbc [ENGOPS-503] shared file update from distribute.groovy Aug 26, 2014
lib HIVE-1817. Remove Hive dependency on unreleased commons-cli 2.0 Snapshot Feb 16, 2011
metastore HIVE-2011. upgrade-0.6.0.mysql.sql script attempts to increase size of Mar 19, 2011
odbc HIVE-1526. Hive should depend on release version of Thrift (Carl Stei… Dec 9, 2010
ql Fixed build issue for hive-exec where it pulls in it's own json code' Jun 14, 2012
serde HIVE-1517 Ability to select across a database (Siying Dong and Carl S… Feb 25, 2011
service HIVE-2007 Executing queries using Hive Server is not logging to the l… Mar 6, 2011
shims HIVE-2064. Make call to SecurityUtil.getServerPrincipal unambiguous (… Mar 19, 2011
testlibs HIVE-1601. Hadoop 0.17 ant test broken by HIVE-1523 Aug 27, 2010
testutils HADOOP-4230. Fix for serde2 interface, limit operator, select * opera… Oct 21, 2008
.checkstyle HIVE-1198. When checkstyle is activated for Hive in Eclipse environme… May 29, 2010
.gitignore It's now possible to build and publish just the JDBC jar against any … Jun 14, 2012
LICENSE Oops, forgot to add files for HIVE-1729 Oct 19, 2010
NOTICE HIVE-1998. Update README.txt and add missing ASF headers Feb 20, 2011
README.txt HIVE-1998. Update README.txt and add missing ASF headers Feb 20, 2011
RELEASE_NOTES.txt HIVE-BUILD. Update RELEASE_NOTES.txt Mar 20, 2011
build-common.xml HIVE-1867 Add mechanism for disabling tests with intermittent failure… Mar 15, 2011
build.properties added scripts for building and publishing with CI Jun 14, 2012
build.xml HIVE-1998. Update README.txt and add missing ASF headers Feb 20, 2011
hive-jdbc-pom.template added scripts for building and publishing with CI Jun 14, 2012
ivy.xml HIVE-1135. Use Anakia for version controlled documentation Jul 1, 2010
publish-artifacts.sh Hadoop dependency should be to vanilla Apache Hadoop Jun 14, 2012

README.txt

Apache Hive @VERSION@
=================

Hive is a data warehouse system for Hadoop that facilitates
easy data summarization, ad-hoc querying and analysis of large
datasets stored in Hadoop compatible file systems. Hive provides a
mechanism to put structure on this data and query the data using a
SQL-like language called HiveQL. At the same time this language also
allows traditional map/reduce programmers to plug in their custom
mappers and reducers when it is inconvenient or inefficient to express
this logic in HiveQL.

Please note that Hadoop is a batch processing system and Hadoop jobs
tend to have high latency and incur substantial overheads in job
submission and scheduling. Consequently the average latency for Hive
queries is generally very high (minutes) even when data sets involved
are very small (say a few hundred megabytes). As a result it cannot be
compared with systems such as Oracle where analyses are conducted on a
significantly smaller amount of data but the analyses proceed much
more iteratively with the response times between iterations being less
than a few minutes. Hive aims to provide acceptable (but not optimal)
latency for interactive data browsing, queries over small data sets or
test queries.

Hive is not designed for online transaction processing and does not
support real-time queries or row level insert/updates. It is best used
for batch jobs over large sets of immutable data (like web logs). What
Hive values most are scalability (scale out with more machines added
dynamically to the Hadoop cluster), extensibility (with MapReduce
framework and UDF/UDAF/UDTF), fault-tolerance, and loose-coupling with
its input formats.


General Info
============

For the latest information about Hive, please visit out website at:

  http://hive.apache.org/


Getting Started
===============

- Installation Instructions and a quick tutorial:
  http://wiki.apache.org/hadoop/Hive/GettingStarted

- A longer tutorial that covers more features of HiveQL:
  http://wiki.apache.org/hadoop/Hive/Tutorial

- The HiveQL Language Manual:
  http://wiki.apache.org/hadoop/Hive/LanguageManual


Requirements
============

- Java 1.6

- Hadoop 0.20.x (x >= 1)


Upgrading from older versions of Hive
=====================================

- Hive @VERSION@ includes changes to the MetaStore schema. If
  you are upgrading from an earlier version of Hive it is
  imperative that you upgrade the MetaStore schema by
  running the appropriate schema upgrade scripts located in
  the scripts/metastore/upgrade directory.

  We have provided upgrade scripts for Derby and MySQL databases. If
  you are using a different database for your MetaStore you will need
  to provide your own upgrade script.

- Hive @VERSION@ includes new configuration properties. If you
  are upgrading from an earlier version of Hive it is imperative
  that you replace all of the old copies of the hive-default.xml
  configuration file with the new version located in the conf/
  directory.


Useful mailing lists
====================

1. user@hive.apache.org - To discuss and ask usage questions. Send an
   empty email to user-subscribe@hive.apache.org in order to subscribe
   to this mailing list.

2. dev@hive.apache.org - For discussions about code, design and features.
   Send an empty email to dev-subscribe@hive.apache.org in order to subscribe
   to this mailing list.

3. commits@hive.apache.org - In order to monitor commits to the source
   repository. Send an empty email to commits-subscribe@hive.apache.org
   in order to subscribe to this mailing list.
You can’t perform that action at this time.