This topic describes installing and configuring Splice Machine on a Cloudera-managed cluster. Follow these steps:
- Verify Prerequisites
- Install the Splice Machine Parcel
- Stop Hadoop Services
- Make Cluster Modifications for Splice Machine
- Configure Hadoop Services
- Make any needed Optional Configuration Modifications
- Deploy the Client Configuration
- Restart the Cluster
- Verify your Splice Machine Installation
Before starting your Splice Machine installation, please make sure that your cluster contains the prerequisite software components:
- A cluster running Cloudera Data Hub (CDH) with Cloudera Manager (CM)
- HBase installed
- HDFS installed
- YARN installed
- ZooKeeper installed
- Spark2 installed (2.2 Release 2 recommended)
NOTE: The specific versions of these components that you need depend on your operating environment, and are called out in detail in the Requirements topic of our Getting Started Guide.
Follow these steps to install CDH, Hadoop, Hadoop services, and Splice Machine on your cluster:
-
Copy your parcel URL to the clipboard for use in the next step.
Which Splice Machine parcel URL you need depends upon which Splice Machine version you're installing and which version of CDH you are using. Here are the URLs for Splice Machine Release 2.7 and 2.5:
NOTE: To be sure that you have the latest URL, please check the Splice Machine Community site or contact your Splice Machine representative.
-
Add the parcel repository
a. Make sure the
Use Parcels (Recommended)
option and theMatched release
option are both selected.b. Click the
Continue
button to land on the More Options screen.c. Cick the
+
button for theRemote Parcel Repository URLs
field. Paste your Splice Machine repository URL into this field. -
Use Cloudera Manager to install the parcel.
-
Verify that the parcel has been distributed and activated.
The Splice Machine parcel is identified as
SPLICEMACHINE
in the Cloudera Manager user interface. Make sure that this parcel has been downloaded, distributed, and activated on your cluster. -
Restart and redeploy any client changes when Cloudera Manager prompts you.
As a first step, we stop cluster services to allow our installer to make changes that require the cluster to be temporarily inactive.
From the Cloudera Manager home screen, click the drop-down arrow next to the cluster on
-
Select your cluster in Cloudera Manager
Click the drop-down arrow next to the name of the cluster on which you are installing Splice Machine.
-
Stop the cluster
Click the
Stop
button.
Now it's time to make a few modifications in the Hadoop services configurations:
- Configure and Restart the Management Service
- Configure ZooKeeper
- Configure HDFS
- Configure YARN
- Configure HBASE
-
Select the
Configuration
tab in CM: -
Change the value of the Alerts: Listen Port to
10110
. -
Save changes and restart the Management Service.
To edit the ZooKeeper configuration, click ZooKeeper
in the Cloudera Manager (CM) home
screen, then click the Configuration
tab
and follow these steps:
-
Select the
Service-Wide
category.Make the following changes:
Maximum Client Connections = 0 Maximum Session Timeout = 120000
Click the
Save Changes
button.
To edit the HDFS configuration, click HDFS
in the Cloudera Manager home screen, then
click the Configuration
tab and make
these changes:
-
Verify that the HDFS data directories for your cluster are set up to use your data disks.
-
Change the values of these settings:
Setting New Value Handler Count
20
Maximum Number of Transfer Threads
8192
NameNode Handler Count
64
NameNode Service Handler Count
60
Replication Factor
2 or 3 *
Java Heap Size of DataNode in Bytes
2 GB
-
Click the
Save Changes
button.
To edit the YARN configuration, click YARN
in the Cloudera Manager home screen, then
click the Configuration
tab and make
these changes:
-
Verify that the following directories are set up to use your data disks:
NodeManager Local Directories
NameNode Data Directories
HDFS Checkpoint Directories
-
Change the values of these settings
Setting New Value Heartbeat Interval
100 ms
MR Application Classpath
$HADOOP_MAPRED_HOME/*
$HADOOP_MAPRED_HOME/lib/* $MR2_CLASSPATH
YARN Application Classpath
$HADOOP_CLIENT_CONF_DIR $HADOOP_CONF_DIR $HADOOP_COMMON_HOME/* $HADOOP_COMMON_HOME/lib/* $HADOOP_HDFS_HOME/* $HADOOP_HDFS_HOME/lib/* $HADOOP_YARN_HOME/* $HADOOP_YARN_HOME/lib/* $HADOOP_MAPRED_HOME/* $HADOOP_MAPRED_HOME/lib/* $MR2_CLASSPATH
Localized Dir Deletion Delay
86400
JobHistory Server Max Log Size
1 GB
NodeManager Max Log Size
1 GB
ResourceManager Max Log Size
1 GB
Container Memory
30 GB (based on node specs)
Container Memory Maximum
30 GB (based on node specs)
Container Virtual CPU Cores
19
(based on node specs)Container Virtual CPU Cores Maximum
19
(Based on node specs)-
Add property values
You need to add the same two property values to each of four YARN advanced configuration settings.
Add these properties:
XML Property Name XML Property Value yarn.nodemanager.aux-services.spark_shuffle.class
org.apache.spark.network.yarn.YarnShuffleService
yarn.nodemanager.aux-services
mapreduce_shuffle,spark_shuffle
To each of these YARN settings:
Yarn Service Advanced Configuration Snippet (Safety Valve) for yarn-site.xml
Yarn Client Advanced Configuration Snippet (Safety Valve) for yarn-site.xml
NodeManager Advanced Configuration Snippet (Safety Valve) for yarn-site.xml
ResourceManager Advanced Configuration Snippet (Safety Valve) for yarn-site.xml
-
Click the
Save Changes
button.
To edit the HBASE configuration, click
HBASE
in the Cloudera Manager home screen, then click theConfiguration
tab and make these changes:-
Change the values of these settings
Setting New Value HBase Client Scanner Caching
100 ms
Graceful Shutdown Timeout
600 seconds
HBase Service Advanced Configuration Snippet (Safety Valve) for hbase-site.xml
The property list for the Safety Valve snippet is shown below, in Step 2 SplitLog Manager Timeout
5 minutes
Maximum HBase Client Retries
40
RPC Timeout
20 minutes (or 1200000 milliseconds)
HBase Client Pause
90
ZooKeeper Session Timeout
120000
HBase Master Web UI Port
16010
HBase Master Port
16000
Java Configuration Options for HBase Master
The HBase Master Java configuration options list is shown below, in Step 3 HBase Coprocessor Master Classes
com.splicemachine.hbase.SpliceMasterObserver
Java Heap Size of HBase Master in Bytes
5 GB
HStore Compaction Threshold
5
HBase RegionServer Web UI port
16030
HStore Blocking Store Files
20
Java Configuration Options for HBase RegionServer
The HBase RegionServerJava configuration options list is shown below, in Step 4 HBase Memstore Block Multiplier
4
Maximum Number of HStoreFiles Compaction
7
HBase RegionServer Lease Period
20 minutes (or 1200000 milliseconds)
HFile Block Cache Size
0.25
Java Heap Size of HBase RegionServer in Bytes
24 GB
HBase RegionServer Handler Count
200
HBase RegionServer Meta-Handler Count
200
HBase Coprocessor Region Classes
com.splicemachine.hbase.MemstoreAwareObserver
com.splicemachine.derby.hbase.SpliceIndexObserver com.splicemachine.derby.hbase.SpliceIndexEndpoint com.splicemachine.hbase.RegionSizeEndpoint com.splicemachine.si.data.hbase.coprocessor.TxnLifecycleEndpoint com.splicemachine.si.data.hbase.coprocessor.SIObserver com.splicemachine.hbase.BackupEndpointObserver
Maximum number of Write-Ahead Log (WAL) files
48
RegionServer Small Compactions Thread Count
4
HBase RegionServer Port
16020
Per-RegionServer Number of WAL Pipelines
16
-
Set the value of
HBase Service Advanced Configuration Snippet (Safety Valve) for hbase-site.xml
:<property><name>dfs.client.read.shortcircuit.buffer.size</name><value>131072</value></property> <property><name>hbase.balancer.period</name><value>60000</value></property> <property><name>hbase.client.ipc.pool.size</name><value>10</value></property> <property><name>hbase.client.max.perregion.tasks</name><value>100</value></property> <property><name>hbase.coprocessor.regionserver.classes</name><value>com.splicemachine.hbase.RegionServerLifecycleObserver</value></property> <property><name>hbase.hstore.defaultengine.compactionpolicy.class</name><value>com.splicemachine.compactions.SpliceDefaultCompactionPolicy</value></property> <property><name>hbase.hstore.defaultengine.compactor.class</name><value>com.splicemachine.compactions.SpliceDefaultCompactor</value></property> <property><name>hbase.htable.threads.max</name><value>96</value></property> <property><name>hbase.ipc.warn.response.size</name><value>-1</value></property> <property><name>hbase.ipc.warn.response.time</name><value>-1</value></property> <property><name>hbase.master.loadbalance.bytable</name><value>true</value></property> <property><name>hbase.master.balancer.stochastic.regionCountCost</name><value>1500</value></property> <property><name>hbase.mvcc.impl</name><value>org.apache.hadoop.hbase.regionserver.SIMultiVersionConsistencyControl</value></property> <property><name>hbase.regions.slop</name><value>0</value></property> <property><name>hbase.regionserver.global.memstore.size.lower.limit</name><value>0.9</value></property> <property><name>hbase.regionserver.global.memstore.size</name><value>0.25</value></property> <property><name>hbase.regionserver.maxlogs</name><value>48</value></property> <property><name>hbase.regionserver.wal.enablecompression</name><value>true</value></property> <property><name>hbase.rowlock.wait.duration</name><value>0</value></property> <property><name>hbase.status.multicast.port</name><value>16100</value></property> <property><name>hbase.wal.disruptor.batch</name><value>true</value></property> <property><name>hbase.wal.provider</name><value>multiwal</value></property> <property><name>hbase.wal.regiongrouping.numgroups</name><value>16</value></property> <property><name>hbase.zookeeper.property.tickTime</name><value>6000</value></property> <property><name>hfile.block.bloom.cacheonwrite</name><value>true</value></property> <property><name>io.storefile.bloom.error.rate</name><value>0.005</value></property> <property><name>splice.client.numConnections</name><value>1</value></property> <property><name>splice.client.write.maxDependentWrites</name><value>60000</value></property> <property><name>splice.client.write.maxIndependentWrites</name><value>60000</value></property> <property><name>splice.compression</name><value>snappy</value></property> <property><name>splice.marshal.kryoPoolSize</name><value>1100</value></property> <property><name>splice.olap_server.clientWaitTime</name><value>900000</value></property> <property><name>splice.ring.bufferSize</name><value>131072</value></property> <property><name>splice.splitBlockSize</name><value>67108864</value></property> <property><name>splice.timestamp_server.clientWaitTime</name><value>120000</value></property> <property><name>splice.txn.activeTxns.cacheSize</name><value>10240</value></property> <property><name>splice.txn.completedTxns.concurrency</name><value>128</value></property> <property><name>splice.txn.concurrencyLevel</name><value>4096</value></property> <property><name>hbase.hstore.compaction.min.size</name><value>136314880</value></property> <property><name>hbase.hstore.compaction.min</name><value>3</value></property> <property><name>hbase.regionserver.thread.compaction.large</name><value>4</value></property> <property><name>splice.authentication.native.algorithm</name><value>SHA-512</value></property> <property><name>splice.authentication</name><value>NATIVE</value></property> <property><name>splice.olap_server.memory</name><value>8196</value></property> <property><name>splice.olap.log4j.configuration</name><value>file:/opt/cloudera/parcels/SPLICEMACHINE/conf/olap-log4j.properties</value></property>
-
Set the value of
HBase Client Advanced Configuration Snippet (Safety Valve) for hbase-site.xml
:<property><name>hbase.client.ipc.pool.size</name><value>10</value></property> <property><name>hbase.zookeeper.property.tickTime</name><value>6000</value></property> <property><name>hfile.block.cache.size</name><value>.1</value></property> <property><name>splice.compression</name><value>snappy</value></property> <property><name>splice.txn.activeCacheSize</name><value>10240</value></property>
-
Set the value of Java Configuration Options for HBase Master
-XX:MaxPermSize=512M -XX:+HeapDumpOnOutOfMemoryError -XX:MaxDirectMemorySize=2g -XX:+AlwaysPreTouch -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=10101 -Dsplice.spark.enabled=true -Dsplice.spark.app.name=SpliceMachine -Dsplice.spark.master=yarn-client -Dsplice.spark.logConf=true -Dsplice.spark.yarn.maxAppAttempts=1 -Dsplice.spark.driver.maxResultSize=1g -Dsplice.spark.driver.cores=2 -Dsplice.spark.yarn.am.memory=1g -Dsplice.spark.dynamicAllocation.enabled=true -Dsplice.spark.dynamicAllocation.executorIdleTimeout=120 -Dsplice.spark.dynamicAllocation.cachedExecutorIdleTimeout=120 -Dsplice.spark.dynamicAllocation.minExecutors=0 -Dsplice.spark.dynamicAllocation.maxExecutors=12 -Dsplice.spark.io.compression.lz4.blockSize=32k -Dsplice.spark.kryo.referenceTracking=false -Dsplice.spark.kryo.registrator=com.splicemachine.derby.impl.SpliceSparkKryoRegistrator -Dsplice.spark.kryoserializer.buffer.max=512m -Dsplice.spark.kryoserializer.buffer=4m -Dsplice.spark.locality.wait=100 -Dsplice.spark.memory.fraction=0.5 -Dsplice.spark.scheduler.mode=FAIR -Dsplice.spark.serializer=org.apache.spark.serializer.KryoSerializer -Dsplice.spark.shuffle.compress=false -Dsplice.spark.shuffle.file.buffer=128k -Dsplice.spark.shuffle.service.enabled=true -Dsplice.spark.yarn.am.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/hadoop/lib/native -Dsplice.spark.yarn.am.waitTime=10s -Dsplice.spark.yarn.executor.memoryOverhead=2048 -Dsplice.spark.driver.extraJavaOptions=-Dlog4j.configuration=file:/etc/spark/conf/log4j.properties -Dsplice.spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/hadoop/lib/native -Dsplice.spark.driver.extraClassPath=/opt/cloudera/parcels/CDH/lib/hbase/conf:/opt/cloudera/parcels/CDH/jars/htrace-core-3.1.0-incubating.jar -Dsplice.spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/hbase/conf:/opt/cloudera/parcels/CDH/lib/hadoop/lib/native -Dsplice.spark.executor.extraClassPath=/opt/cloudera/parcels/CDH/lib/hbase/conf:/opt/cloudera/parcels/SPLICEMACHINE/lib/*:/opt/cloudera/parcels/SPARK2/lib/spark2/jars/*:/opt/cloudera/parcels/CDH/lib/hbase/lib/* -Dsplice.spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH/lib/hadoop/lib/native -Dsplice.spark.ui.retainedJobs=100 -Dsplice.spark.ui.retainedStages=100 -Dsplice.spark.worker.ui.retainedExecutors=100 -Dsplice.spark.worker.ui.retainedDrivers=100 -Dsplice.spark.streaming.ui.retainedBatches=100 -Dsplice.spark.executor.cores=4 -Dsplice.spark.executor.memory=8g -Dspark.compaction.reserved.slots=4 -Dsplice.spark.local.dir=/tmp -Dsplice.spark.yarn.jars=/opt/cloudera/parcels/SPARK2/lib/spark2/jars/*
-
Set the value of Java Configuration Options for Region Servers:
-XX:+HeapDumpOnOutOfMemoryError -XX:MaxDirectMemorySize=2g -XX:MaxPermSize=512M -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:MaxNewSize=4g -XX:InitiatingHeapOccupancyPercent=60 -XX:ParallelGCThreads=24 -XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=5000 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=10102
-
Click the
Save Changes
button.
There are a few configuration modifications you might want to make:
- Modify the Authentication Mechanism if you want to authenticate users with something other than the default native authentication mechanism.
- Modify the Log Location if you want your Splice Machine log entries stored somewhere other than in the logs for your region servers.
Splice Machine installs with Native authentication configured; native authentication uses the
sys.sysusers
table in thesplice
schema for configuring user names and passwords.You can disable authentication or change the authentication mechanism that Splice Machine uses to LDAP by following the simple instructions in Configuring Splice Machine Authentication
You can use Cloudera's Kerberos Wizard to enable Kerberos mode on a CDH5.8.x cluster. If you're enabling Kerberos, you need to add this option to your HBase Master Java Configuration Options:
-Dsplice.spark.hadoop.fs.hdfs.impl.disable.cache=true
Splice Machine logs all SQL statements by default, storing the log entries in your region server's logs, as described in our Using Logging topic. You can modify where Splice Machine stroes logs by adding the following snippet to your RegionServer Logging Advanced Configuration Snippet (Safety Valve) section of your HBase Configuration:
log4j.appender.spliceDerby=org.apache.log4j.FileAppender log4j.appender.spliceDerby.File=${hbase.log.dir}/splice-derby.log log4j.appender.spliceDerby.layout=org.apache.log4j.EnhancedPatternLayout log4j.appender.spliceDerby.layout.ConversionPattern=%d{EEE MMM d HH:mm:ss,SSS} Thread[%t] %m%n log4j.appender.spliceStatement=org.apache.log4j.FileAppender log4j.appender.spliceStatement.File=${hbase.log.dir}/splice-statement.log log4j.appender.spliceStatement.layout=org.apache.log4j.EnhancedPatternLayout log4j.appender.spliceStatement.layout.ConversionPattern=%d{EEE MMM d HH:mm:ss,SSS} Thread[%t] %m%n log4j.logger.splice-derby=INFO, spliceDerby log4j.additivity.splice-derby=false # Uncomment to log statements to a different file: #log4j.logger.splice-derby.statement=INFO, spliceStatement # Uncomment to not replicate statements to the spliceDerby file: #log4j.additivity.splice-derby.statement=false
Splice Machine uses log4j to config OLAP server's log. There is a default configuration at
conf/
directory in Splice Machine's parcel. It default to write logs to/var/log/hadoop-yarn
. If you want to change the log behavior of OLAP server, configsplice.olap.log4j.configuration
inhbase-site.xml
. It specifies the log4j.properties file you want to use. This file needs to be available on HBase master server.Now that you've updated your configuration information, you need to deploy it throughout your cluster. You should see a small notification in the upper right corner of your screen that looks like this:
To deploy your configuration:
- Click the notification.
- Click the
Deploy Client Configuration
button. - When the deployment completes, click the
Finish
button.
As a first step, we stop the services that we're about to configure from the Cloudera Manager home screen:
-
Restart ZooKeeper
Select
Start
from theActions
menu in the upper right corner of the ZooKeeperConfiguration
tab to restart ZooKeeper. -
Restart HDFS
Click the
HDFS Actions
drop-down arrow associated with (to the right of) HDFS in the cluster summary section of the Cloudera Manager home screen, and then clickStart
to restart HDFS.Use your terminal window to create these directories (if they are not already available in HDFS):
sudo -iu hdfs hadoop fs -mkdir -p hdfs:///user/hbase hdfs:///user/splice/history sudo -iu hdfs hadoop fs -chown -R hbase:hbase hdfs:///user/hbase hdfs:///user/splice sudo -iu hdfs hadoop fs -chmod 1777 hdfs:///user/splice hdfs:///user/splice/history
-
Restart YARN
Click the
YARN Actions
drop-down arrow associated with (to the right of) YARN in the cluster summary section of the Cloudera Manager home screen, and then clickStart
to restart YARN.- Restart HBase
Click the
HBASE Actions
drop-down arrow associated with (to the right of) HBASE in the cluster summary section of the Cloudera Manager home screen, and then clickStart
to restart HBase.Now start using the Splice Machine command line interpreter, which is referred to as the splice prompt or simply
splice>
by launching thesqlshell.sh
script on any node in your cluster that is running an HBase region server.The command line interpreter defaults to connecting on port
1527
onlocalhost
, with usernamesplice
, and passwordadmin
. You can override these defaults when starting the interpreter, as described in the Command Line (splice>) Reference topic in our Developer's Guide.Now try entering a few sample commands you can run to verify that everything is working with your Splice Machine installation.
Operation Command to perform operation Display tables splice> show tables;
Create a table splice> create table test (i int);
Add data to the table splice> insert into test values 1,2,3,4,5;
Query data in the table splice> select * from test;
Drop the table splice> drop table test;
List available commands splice> help;
Exit the command line interpreter splice> exit;
Make sure you end each command with a semicolon ( ;
), followed by the Enter key or Return keySee the Command Line (splice>) Reference section of our Developer's Guide for information about our commands and command syntax.
-
-