Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

NexR RHive 2.0

RHive is an R extension facilitating distributed computing via HIVE query. RHive allows easy usage of HQL(Hive SQL) in R, and allows easy usage of R objects and R functions in Hive.

Before installing RHive, you have to have installed Hadoop and Hive

Install Hadoop

  1. Single Node
  2. Cluster Node
  3. set HADOOP_HOME at local machine on which R runs

Install Hive

  1. install local machine and remote machine on which NameNode runs or Hive-Server runs.
  2. Installation Guide
  3. set HIVE_HOME at local machine on which R runs.
  4. launch Hive Server with following command on remote machine. it should be as a background process.
    • $HIVE_HOME/bin/hive --service hiveserver

Install R and Packages

  1. install R
    • need to install R on all tasktracker nodes
  2. install rJava
    • only install rJava on local machine.
  3. install Rserve
    • need to install Rserve on all tasktracker nodes
    • make configuration in path (/etc/Rserv.conf) on all tasktracker nodes. edit this file to add 'remote enable' to allow remote connection.
    • launch all Rserve on all tasktracker nodes.
      • e.q> R CMD Rserve
  4. setting tasktracker nodes
    • add R_HOME path at $HADOOP_HOME/conf/hadoop-env.sh
      • e.q> export R_HOME=/usr/lib/R
  5. install RUnit

Install RHive

  1. Requirements
    • ant (in order to build java files)
  2. Installing RHive
    1. Download source code: git clone https://github.com/nexr/RHive.git
    2. Change your working directory: cd RHive
    3. Set the environment variables HIVE_HOME and HADOOP_HOME: export HIVE_HOME=/path/to/your/hive/directory export HADOOP_HOME=/path/to/your/hadoop/directory
    4. Build java files using ant: ant build
    5. Build RHive: R CMD build RHive
    6. Install RHive: R CMD INSTALL RHive_.tar.gz

Loading RHive and connecting to Hive

  1. Set the environment variables HIVE_HOME and HADOOP_HOME:
    • Set the environment variables: export HIVE_HOME=/path/to/your/hive/directory export HADOOP_HOME=/path/to/your/hadoop/directory export HADOOP_CONF_DIR=/path/to/your/hadoop/conf/directory
    • Or, add environment variables into Renviron HIVE_HOME=/path/to/your/hive/directory HADOOP_HOME=/path/to/your/hadoop/directory HADOOP_CONF_DIR=/path/to/your/hadoop/conf/directory
  2. launch R
library(RHive)
rhive.connect(host, port, hiveServer2)

Tutorials

Requirements

  • Java 1.6
  • R 2.13.0
  • Rserve 0.6-0
  • rJava 0.9-0
  • Hadoop 0.20.x (x >= 1)
  • Hive 0.8.x (x >= 0)

About

RHive is an R extension facilitating distributed computing via Apache Hive.

Resources

Packages

No packages published
You can’t perform that action at this time.