Reason for numRegion < 3 condition in HBaseRelation #76

sachinjain024 opened this Issue Jan 10, 2017 · 3 comments


None yet

2 participants


I can see the following code in HBaseRelation.scala

if (catalog.numReg > 3) {
      val tName = TableName.valueOf(
      val cfs = catalog.getColumnFamilies

      val connection = HBaseConnectionCache.getConnection(hbaseConf)
      // Initialize hBase table if necessary
      val admin = connection.getAdmin

      // The names of tables which are created by the Examples has prefix "shcExample"
      if (admin.isTableAvailable(tName) && tName.toString.startsWith("shcExample")){

      if (!admin.isTableAvailable(tName)) {
        val tableDesc = new HTableDescriptor(tName)
        cfs.foreach { x =>
         val cf = new HColumnDescriptor(x.getBytes())
          logDebug(s"add family $x to ${}")
        val startKey = Bytes.toBytes("aaaaaaa")
        val endKey = Bytes.toBytes("zzzzzzz")
        val splitKeys = Bytes.split(startKey, endKey, catalog.numReg - 3)

I am curious to know the reason for this if condition and I also checked this in HBase shell. By default Hbase creates 3 extra regions. Why so ?

weiqingy commented Jan 11, 2017 edited

My memory is that the issue around "3" is related to fit in some requirement of creating region server on HBase side. Will come back here after figuring out it. @tedyu probably have a better memory of this.


@weiqingy I figured out that part later. Here are the calculations:

Number Of Regions = Number of Splits + 1

Since we have defined two boundaries 'aaaaaa' and 'zzzzzz' that means there will be two regions for keys < 'aaaaaa' and keys > 'zzzzzz'. So the number of splits should be

NumRegions - 1 - 2
NumRegions - 3

Closing this ticket.


Thanks, @sachinjain024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment