Skip to content

jorgemarsal/orc

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

463 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ORC is a self-describing type-aware columnar file format designed for Hadoop workloads. It is optimized for large streaming reads, but with integrated support for finding required rows quickly. Storing data in a columnar format lets the reader read, decompress, and process only the values that are required for the current query. Because ORC files are type-aware, the writer chooses the most appropriate encoding for the type and builds an internal index as the file is written. Predicate pushdown uses those indexes to determine which stripes in a file need to be read for a particular query and the row indexes can narrow the search to a particular set of 10,000 rows. ORC supports the complete set of types in Hive, including the complex types: structs, lists, maps, and unions.

ORC File C++ Library

This library allows C++ programs to read and write the Optimized Row Columnar (ORC) file format.

Build Status Build status

Building

-To compile:
% export TZ=America/Los_Angeles
% mkdir build
% cd build
% cmake ..
% make
% make test-out

ORC File Format

To build the project, use Maven (3.0.x) from http://maven.apache.org/.

You'll also need to install the protobuf compiler (2.4.x) from https://code.google.com/p/protobuf/.

You'll also need to install the jdo2 jar in your maven repository:

  • download jdo2-api-2.3-ec.jar to your working directory
  • mvn install:install-file -DgroupId=javax.jdo -DartifactId=jdo2-api -Dversion=2.3-ec -Dpackaging=jar -Dfile=jdo2-api-2.3-ec.jar

Building the jar and running the unit tests:

% mvn package

About

ORC File working repo

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • C++ 68.9%
  • Java 8.7%
  • Shell 7.5%
  • C 5.5%
  • Python 5.3%
  • Protocol Buffer 2.0%
  • Other 2.1%