Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Add support for avoiding distributed and non-distributed data overlapping #5
It is advised to avoid having data with original (old data which was written without help of HBaseWD) and distributed keys and mixed in one HBase table. The basic distributors implementations are not guarantee that data will not be overlapped (the focus was on making smallest affect on data size vs separating data from the other "junk" in the table.
There are cases when switching to distributed approach is performed after some data is already written into HBase table and the data have to coexist. There's no out-of the box solution for avoiding data overlapping. The easiest solution to avoid the data overlapping in such case is to add some extra prefix to the keys when writing/reading the data which you know is never met in "old" data.
Usually when using HBaseWD distributor configuration is externalized (extracted in config files, etc.) so it will help a lot if this is doable on the distributor configuration level rather than in custom java code