Deployment

System Requirements

JRE 1.8+
Apache Hadoop v2.X
Apache Hive v0.14+
Apache Drill v1.8+. You can visit http://drill.apache.org/docs/ to learn how to deploy and use it.
Apache Zookeeper. We use v3.4.8, consider using an adaptable version.
Apache Kafka v0.8.0+. Optional. Used for realtime ingestion.

Note - Please make sure you have all those requirements installed corrrectly before deploying IndexR.

Deploy cpp libs

Copy the correct lib file in indexr-<version>/lib to /usr/local/lib/ on all cluster nodes, including those nodes where you may run Hive or indexr-tool scripts.

e.g. On Linux platform, you should use the libbhcompress.so file.

Edit ${HADOOP_HOME}/etc/hadoop/mapred-site.xml, add /usr/local/lib to LD_LIBRARY_PATH in mapred.child.env parameter. e.g.

<property>
        <name>mapred.child.env</name>
        <value>LD_LIBRARY_PATH=/usr/local/lib</value>
</property>

Edit ${HADOOP_HOME}/etc/hadoop/hadoop-env.sh, add /usr/local/lib to LD_LIBRARY_PATH. e.g.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib

Setup Hive IndexR Plugin

Copy IndexR Hive aux jars indexr-<version>/indexr-hive/aux/* to Hive's HIVE_AUX_JARS_PATH. HIVE_AUX_JARS_PATH can be set in ${HIVE_HOME}/conf/hive-env.sh. e.g.

cp -r indexr-<version>/indexr-hive/aux/* /usr/local/hive/aux/

[Optional] Sometimes you will need to upload those hive aux jars to HDFS in the same path. e.g.

hdfs dfs -put /usr/local/hive/aux/* /usr/local/hive/aux/

Restart HiveServer2 if you have.
Now you should be able to create an IndexR hive table via Hive console. e.g.

hive (default)> CREATE EXTERNAL TABLE IF NOT EXISTS test (
  `date` int,
  `d1` string,
  `m1` int,
  `m2` bigint,
  `m3` float,
  `m4` double
) 
PARTITIONED BY (`dt` string)
ROW FORMAT SERDE 'io.indexr.hive.IndexRSerde' 
STORED AS INPUTFORMAT 'io.indexr.hive.IndexRInputFormat' 
OUTPUTFORMAT 'io.indexr.hive.IndexROutputFormat' 
LOCATION '/indexr/segment/test' 
;
hive (default)> insert into table test partition (dt=20160701) values(20160701,'mac',100,192444,1.55,-331.43555);
hive (default)> select * from test limit 10;

Deploy indexr-tool

indexr-tool is a tool box to manage IndexR. It only need to deploy on one node, usaully your manage node.

Copy indexr-<version>/indexr-tool to a path, like /usr/local/indexr-tool
Copy ${HADOOP_CONF}/core-site.xml and ${HADOOP_CONF}/hdfs-site.xml to conf folder.
Modify configurations in conf folder. Especially the indexr.fs.connection setting in indexr.config.properties, make sure it is set to the same value as fs.defaultFS in core-site.xml.
```
 env.sh
 indexr.config.properties
 log4j.xml
```

Setup Drill Plugin

Copy all files in indexr-<version>/indexr-drill/* to Drill installation home dir ${DRILL_HOME}/, for example, /usr/local/drill. Do it on all Drill nodes in cluster.
Copy drill-indexr-storage-<version>.jar to ${DRILL_HOME}/jars/.
Modify configuration ${DRILL_HOME}/conf/indexr.config.properties. It should be keep in sync with indexr-tool and all Drillbit nodes.
Copy ${HADOOP_CONF}/core-site.xml and ${HADOOP_CONF}/hdfs-site.xml to ${DRILL_HOME}/conf folder if not yet exists.
Modify ${DRILL_HOME}/conf/drill-env.sh, add /usr/local/lib to LD_LIBRARY_PATH. e.g.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib

Synchronize ${DRILL_HOME}/conf to all Drillbit nodes and restart them.
Go to Drill Web Console, create an new Storage call indexr, and input the follow text, click Create. You only need to do this once in one of your Drill Web Console in the cluster.

{
  "type": "indexr",
  "enabled": true
}

Node that IndexR plugin can only create one storage in a Drill cluster, so you should always use the name indexr

Now you can create an IndexR table and enjoy.

Create IndexR table by indexr-tool

cd ${INDEXR-TOOL_HOME}
bin/tools.sh -cmd settb -t test -c test_schema.json

test_schem.json:

{
    "schema":{
        "columns":
        [
            {"name": "date", "dataType": "int"},
            {"name": "d1", "dataType": "string"},
            {"name": "m1", "dataType": "int"},
            {"name": "m2", "dataType": "bigint"},
            {"name": "m3", "dataType": "float"},
            {"name": "m4", "dataType": "double"}
        ]
    }
}

Do some query via Drill console

cd ${DRILL_HOME}
bin/drill-conf
0: jdbc:drill:> select * from indexr.test limit 10;

Deploy IndexR Spark

Node: We only support Spark 2.1.0+.

Simple copy all jars in indexr-<version>/indexr-spark/jars/* to ${SPARK_HOME}/jars/, you are good to go.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deployment

Table Of Contents

System Requirements

Deploy cpp libs

Setup Hive IndexR Plugin

Deploy indexr-tool

Setup Drill Plugin

Deploy IndexR Spark

Home

Architecture

Compilation

Deployment

User Guide

IndexR Spark

Performance Tuning

Clone this wiki locally