Add support for avoiding distributed and non-distributed data overlapping #5

abaranau opened this Issue Aug 15, 2011 · 0 comments


None yet
1 participant

abaranau commented Aug 15, 2011

It is advised to avoid having data with original (old data which was written without help of HBaseWD) and distributed keys and mixed in one HBase table. The basic distributors implementations are not guarantee that data will not be overlapped (the focus was on making smallest affect on data size vs separating data from the other "junk" in the table.

There are cases when switching to distributed approach is performed after some data is already written into HBase table and the data have to coexist. There's no out-of the box solution for avoiding data overlapping. The easiest solution to avoid the data overlapping in such case is to add some extra prefix to the keys when writing/reading the data which you know is never met in "old" data.

Usually when using HBaseWD distributor configuration is externalized (extracted in config files, etc.) so it will help a lot if this is doable on the distributor configuration level rather than in custom java code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment