Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reverse timestamp issue with cell timestamps #2193

Open
thileesf opened this issue Aug 2, 2019 · 3 comments
Open

Reverse timestamp issue with cell timestamps #2193

thileesf opened this issue Aug 2, 2019 · 3 comments
Labels
api: bigtable type: docs

Comments

@thileesf
Copy link

thileesf commented Aug 2, 2019

This is a request to document some limitations we came across with Bigtable, and a question about the use of bigtable-beam-import to copy data from HBase to Bigtable and vice-versa.

Documentation requests

In our HBase table we do Put operations with a reversed timestamp, i.e.,

Put p = new Put(row, column, qualifier, Long.MAX_VALUE - timeNowMillis, value);
table.put(p);

We do this to enforce a particular ordering, and this has worked fine with HBase.

When we used the bigtable-hbase-1.x client to do the same Put in Bigtable, the subsequent Get results all contained Long.MAX_VALUE timestamp. We traced it down to the com.google.cloud.bigtable.hbase.util.TimestampConverter which internally converts the time, and doesn't handle things well when the Put timestamp exceeds TimestampConverter.HBASE_EFFECTIVE_MAX_TIMESTAMP. We are exploring ways to fix this.

I think it will be useful if you document this issue in CBT Docs, especially considering the HBase suggestions around this which others may be following.

Questions

Will you make the TimestampConverter.HBASE_EFFECTIVE_MAX_TIMESTAMP public? Because one fix we are exploring is to use HBASE_EFFECTIVE_MAX_TIMESTAMP - timeNowMillis instead of Long.MAX_VALUE - timeNowMillis in our Puts; we are vary of computing it ourselves in case TimestampConverter.FACTOR changes.

Also, how does this work while using bigtable-beam-import:

a) Say, I have an HBase table with cells with timestamp in milliseconds. And I export this to sequencefiles using HBase's Export. When I import this sequencefile into Bigtable using bigtable-beam-import, will it do the hbase2bigtable() translation on the sequencefile cell timestamps?

b) If I export the data in Bigtable using bigtable-beam-import export, will it do the reverse translation using bigtable2hbase()? i.e., can I expect millisecond timestamps in the exported sequencefile or will it be microsecond timestamps?

c) How do these work if the HBase table had cells created with reverse timestamps, i.e., Long.MAX_VALUE - timeNowMillis?

@yoshi-automation yoshi-automation added the triage me label Aug 2, 2019
@rahulKQL rahulKQL added type: docs type: question and removed triage me labels Aug 5, 2019
@igorbernstein2
Copy link
Collaborator

igorbernstein2 commented Aug 7, 2019

Hi,

You are absolutely right that we need to document this behavior. The issue stems from the fact that bigtable stores timestamps as microseconds while hbase uses milliseconds. The hbase adapter tries to adjust the discrepancy by multiplying the hbase value by 1000. However the range of values that we can store is narrowed by this conversion. The hbase adapter did not consider reverse timestamps and assumed that if the timestamp higher than the highest storable value then we can just use the highest possible value. When reading it would convert it to just Long.MAX_VALUE.

The sequence file importer/exporter will use the hbase client under the hood. So any value higher than HBASE_EFFECTIVE_MAX_TIMESTAMP will be read back as Long.MAX_VALUE.

Other than documenting the behavior, there is no easy solution to this. As it stands Long.MAX_VALUE - timeNowMillis pattern doesn't work with bigtable.

I'd like to understand your use case better this is the first time this has come up as an issue. Can you describe what the usecase for reverse timestamps in cell values?

@JustinBeckwith JustinBeckwith removed the type: question label Oct 16, 2019
@google-cloud-label-sync google-cloud-label-sync bot added the api: bigtable label Jan 31, 2020
@scordata
Copy link

scordata commented Feb 5, 2022

We're getting hit by this as well. Are there any workarounds at the moment?

The use case is we need to store a "First Seen" field, and a "Last Seen" field.
By writing (Long.MAX_VALUE - timestamp.now) to "First Seen" and timestamp.now() to "Last Seen" in tandem with garbage collection policies, we can have bigtable retain these values for us without computing them.

Any info would be greatly appreciated.

Thanks.

@igorbernstein2
Copy link
Collaborator

igorbernstein2 commented Jun 14, 2022

This came up again and the silent truncation is causing a lot of confusion. Especially when it's combined with HBase behavior of treating Long.MAX_VALUE as the current timestamp.

In the next minor release we will start logging warnings when this happens and add an opt-in configuration to throw errors instead of truncating. Eventually we would like to make error throwing a default behavior.

The only workaround is to use a smaller timestamp to subtract from, so instead of:
long rev_ts = Long.MAX_VALUE - ts

Use:

final long HBASE_EFFECTIVE_MAX_TIMESTAMP = Long.MAX_VALUE / 1000 - 1;
long rev_ts =  HBASE_EFFECTIVE_MAX_TIMESTAMP - ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigtable type: docs
Projects
None yet
Development

No branches or pull requests

6 participants