Once you have followed the steps in the getting_started/installation.adoc section to install the operator and its dependencies, you will now deploy an HBase cluster and its dependencies. Afterwards you can verify that it works by creating tables and data in HBase using the REST API and Apache Phoenix (an SQL layer used to interact with HBase).
To deploy a ZooKeeper cluster create one file called zk.yaml
:
link:example$getting_started/zk.yaml[role=include]
We also need to define a ZNode that will be used by the HDFS and HBase clusters to reference ZooKeeper. Create another file called znode.yaml
and define a separate ZNode for each service:
link:example$getting_started/znode.yaml[role=include]
Apply both of these files:
link:example$getting_started/getting_started.sh[role=include]
The state of the ZooKeeper cluster can be tracked with kubectl
:
link:example$getting_started/getting_started.sh[role=include]
An HDFS cluster has three components: the namenode
, the datanode
and the journalnode
. Create a file named hdfs.yaml
defining 2 namenodes
and one datanode
and journalnode
each:
link:example$getting_started/hdfs.yaml[role=include]
Where:
-
metadata.name
contains the name of the HDFS cluster -
the label of the Docker image provided by Stackable must be set in
spec.version
Note
|
Please note that the version you need to specify for spec.version is not only the version of Hadoop which you want to roll out, but has to be amended with a Stackable version as shown. This Stackable version is the version of the underlying container image which is used to execute the processes. For a list of available versions please check our
image registry.
It should generally be safe to simply use the latest image version that is available.
|
Create the actual HDFS cluster by applying the file:
link:example$getting_started/getting_started.sh[role=include]
Track the progress with kubectl
as this step may take a few minutes:
link:example$getting_started/getting_started.sh[role=include]
To test the cluster you will use the REST API to check its version and status, and to create and inspect a new table. You will also use Phoenix to create, populate and query a second new table, before listing all non-system tables in HBase. These actions wil be carried out from one of the HBase components, the REST server.
First, check the cluster version with this callout:
link:example$getting_started/getting_started.sh[role=include]
This will return the version that was specified in the HBase cluster definition:
{"Version":"2.4.17"}
The cluster status can be checked and formatted like this:
link:example$getting_started/getting_started.sh[role=include]
which will display cluster metadata that looks like this (only the first region is included for the sake of readability):
{
"DeadNodes" : [],
"LiveNodes" : [
{
"Region" : [
{
"currentCompactedKVs" : 0,
"memStoreSizeMB" : 0,
"name" : "U1lTVEVNLkNBVEFMT0csLDE2NjExNjA0NDM2NjcuYmYwMzA1YmM4ZjFmOGIwZWMwYjhmMGNjMWI5N2RmMmUu",
"readRequestsCount" : 104,
"rootIndexSizeKB" : 1,
"storefileIndexSizeKB" : 1,
"storefileSizeMB" : 1,
"storefiles" : 1,
"stores" : 1,
"totalCompactingKVs" : 0,
"totalStaticBloomSizeKB" : 0,
"totalStaticIndexSizeKB" : 1,
"writeRequestsCount" : 360
},
...
],
"heapSizeMB" : 351,
"maxHeapSizeMB" : 11978,
"name" : "simple-hbase-regionserver-default-0.simple-hbase-regionserver-default.default.svc.cluster.local:16020",
"requests" : 395,
"startCode" : 1661156787704
}
],
"averageLoad" : 43,
"regions" : 43,
"requests" : 1716
}
You can now create a table like this:
link:example$getting_started/getting_started.sh[role=include]
This will create a table users
with a single column family cf
. Its creation can be verified by listing it:
link:example$getting_started/getting_started.sh[role=include]
{
"table" : [
{
"name" : "users"
}
]
}
An alternative way to interact with HBase is to use the Phoenix library that is pre-installed on the Stackable HBase image (in the /stackable/phoenix directory). Use the python utility psql.py
(found in /stackable/phoenix/bin) to create, populate and query a table called WEB_STAT
:
link:example$getting_started/getting_started.sh[role=include]
The final command will display some grouped data like this:
HO TOTAL_ACTIVE_VISITORS
-- ----------------------------------------
EU 150
NA 1
Time: 0.017 sec(s)
Check the tables again with:
link:example$getting_started/getting_started.sh[role=include]
This time the list includes not just users
(created above with the REST API) and WEB_STAT
, but several other tables too:
{
"table" : [
{
"name" : "SYSTEM.CATALOG"
},
{
"name" : "SYSTEM.CHILD_LINK"
},
{
"name" : "SYSTEM.FUNCTION"
},
{
"name" : "SYSTEM.LOG"
},
{
"name" : "SYSTEM.MUTEX"
},
{
"name" : "SYSTEM.SEQUENCE"
},
{
"name" : "SYSTEM.STATS"
},
{
"name" : "SYSTEM.TASK"
},
{
"name" : "WEB_STAT"
},
{
"name" : "users"
}
]
}
This is because Phoenix requires these SYSTEM.
tables for its own internal mapping mechanism, and they are created the first time that Phoenix is used on the cluster.
Look at the usage-guide/index.adoc to find out more about configuring your HBase cluster.