Skip to content
This repository has been archived by the owner on Feb 12, 2022. It is now read-only.

Gather and maintain stats for HBase tables in a designated HBase table #64

Open
jtaylor-sfdc opened this issue Feb 22, 2013 · 4 comments
Assignees

Comments

@jtaylor-sfdc
Copy link
Contributor

Our current stats gathering is way too simplistic - it's only keeping a cache per client connection to a cluster for the min and max key for a table. Instead, we should:

  1. have a system table that stores the stats
  2. create a coprocessor that updates the stats during compaction (i.e. using the preCompactSelection, postCompactSelection, preCompact, postCompact methods)
  3. keep a kind of histogram - the key boundary of every N bytes within a region. Perhaps we can do a delta update on minor compaction and a complete update on major compaction.
  4. keep the min key/max key of a table in the stats table too
@tonyhuang
Copy link
Contributor

Hi Jesse, when you finish an rc for this ticket, could you inform me?

Thanks
Tony

@testn
Copy link

testn commented Mar 13, 2013

Do you think we can optimize the query better if we have the cardinality information in the table? If so, hyperloglog might be a good choice.

@jtaylor-sfdc
Copy link
Contributor Author

Wow, that HyperLogLog is pretty interesting - thanks for the pointer. For stats, we're calculating it at major compression where a full pass is made through the data anyway, so I don't think it'll help there. But for COUNT DISTINCT and SELECT DISTINCT, it could definitely be useful.

@testn
Copy link

testn commented Mar 13, 2013

It will only give out the cardinality but not the unique value itself. I'm thinking whether we can implement the combination of HyperLogLog and BloomFilter at the column value itself to determine the strategy to aggregate the data. If so, that would be awesome.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants