Skip to content

Deployment

flow edited this page Jun 26, 2017 · 21 revisions

Table Of Contents

System Requirements

Note - Please make sure you have all those requirements installed corrrectly before deploying IndexR.

Deploy cpp libs

  • Copy the correct lib file in indexr-<version>/lib to /usr/local/lib/ on all cluster nodes, including those nodes where you may run Hive or indexr-tool scripts.

e.g. On Linux platform, you should use the libbhcompress.so file.

  • Edit ${HADOOP_HOME}/etc/hadoop/mapred-site.xml, add /usr/local/lib to LD_LIBRARY_PATH in mapred.child.env parameter. e.g.
<property>
        <name>mapred.child.env</name>
        <value>LD_LIBRARY_PATH=/usr/local/lib</value>
</property>
  • Edit ${HADOOP_HOME}/etc/hadoop/hadoop-env.sh, add /usr/local/lib to LD_LIBRARY_PATH. e.g.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib

Setup Hive IndexR Plugin

  • Copy IndexR Hive aux jars indexr-<version>/indexr-hive/aux/* to Hive's HIVE_AUX_JARS_PATH. HIVE_AUX_JARS_PATH can be set in ${HIVE_HOME}/conf/hive-env.sh. e.g.
cp -r indexr-<version>/indexr-hive/aux/* /usr/local/hive/aux/
  • [Optional] Sometimes you will need to upload those hive aux jars to HDFS in the same path. e.g.
hdfs dfs -put /usr/local/hive/aux/* /usr/local/hive/aux/
  • Restart HiveServer2 if you have.
  • Now you should be able to create an IndexR hive table via Hive console. e.g.
hive (default)> CREATE EXTERNAL TABLE IF NOT EXISTS test (
  `date` int,
  `d1` string,
  `m1` int,
  `m2` bigint,
  `m3` float,
  `m4` double
) 
PARTITIONED BY (`dt` string)
ROW FORMAT SERDE 'io.indexr.hive.IndexRSerde' 
STORED AS INPUTFORMAT 'io.indexr.hive.IndexRInputFormat' 
OUTPUTFORMAT 'io.indexr.hive.IndexROutputFormat' 
LOCATION '/indexr/segment/test' 
;
hive (default)> insert into table test partition (dt=20160701) values(20160701,'mac',100,192444,1.55,-331.43555);
hive (default)> select * from test limit 10;

Deploy indexr-tool

indexr-tool is a tool box to manage IndexR. It only need to deploy on one node, usaully your manage node.

  • Copy indexr-<version>/indexr-tool to a path, like /usr/local/indexr-tool

  • Copy ${HADOOP_CONF}/core-site.xml and ${HADOOP_CONF}/hdfs-site.xml to conf folder.

  • Modify configurations in conf folder. Especially the indexr.fs.connection setting in indexr.config.properties, make sure it is set to the same value as fs.defaultFS in core-site.xml.

     env.sh
     indexr.config.properties
     log4j.xml
    

Setup Drill Plugin

  • Copy all files in indexr-<version>/indexr-drill/* to Drill installation home dir ${DRILL_HOME}/, for example, /usr/local/drill. Do it on all Drill nodes in cluster.
  • Copy drill-indexr-storage-<version>.jar to ${DRILL_HOME}/jars/.
  • Modify configuration ${DRILL_HOME}/conf/indexr.config.properties. It should be keep in sync with indexr-tool and all Drillbit nodes.
  • Copy ${HADOOP_CONF}/core-site.xml and ${HADOOP_CONF}/hdfs-site.xml to ${DRILL_HOME}/conf folder if not yet exists.
  • Modify ${DRILL_HOME}/conf/drill-env.sh, add /usr/local/lib to LD_LIBRARY_PATH. e.g.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
  • Synchronize ${DRILL_HOME}/conf to all Drillbit nodes and restart them.

  • Go to Drill Web Console, create an new Storage call indexr, and input the follow text, click Create. You only need to do this once in one of your Drill Web Console in the cluster.

{
  "type": "indexr",
  "enabled": true
}

Node that IndexR plugin can only create one storage in a Drill cluster, so you should always use the name indexr

  • Now you can create an IndexR table and enjoy.

Create IndexR table by indexr-tool

cd ${INDEXR-TOOL_HOME}
bin/tools.sh -cmd settb -t test -c test_schema.json

test_schem.json:

{
    "schema":{
        "columns":
        [
            {"name": "date", "dataType": "int"},
            {"name": "d1", "dataType": "string"},
            {"name": "m1", "dataType": "int"},
            {"name": "m2", "dataType": "bigint"},
            {"name": "m3", "dataType": "float"},
            {"name": "m4", "dataType": "double"}
        ]
    }
}

Do some query via Drill console

cd ${DRILL_HOME}
bin/drill-conf
0: jdbc:drill:> select * from indexr.test limit 10;

Deploy IndexR Spark

Node: We only support Spark 2.1.0+.

Simple copy all jars in indexr-<version>/indexr-spark/jars/* to ${SPARK_HOME}/jars/, you are good to go.