Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move set_timestamp out of Query/PreparedStatement #262

Open
piodul opened this issue Jul 4, 2021 · 7 comments
Open

Move set_timestamp out of Query/PreparedStatement #262

piodul opened this issue Jul 4, 2021 · 7 comments
Assignees
Labels
API-breaking This might introduce incompatible API changes cpp-rust-driver-p2 Functionality required by cpp-rust-driver
Milestone

Comments

@piodul
Copy link
Collaborator

piodul commented Jul 4, 2021

The driver lacks the ability to use client-side timestamps.

While, of course, it is possible to set a custom timestamp for a statement via the CQL USING TIMESTAMP clause, the Datastax Java driver also allows to:

  • Programmatically set the timestamp for the query - Statement.setQueryTimestamp(),
  • Use a client-side timestamp generator which automatically generates a new timestamp just before sending the statement.

Source: https://docs.datastax.com/en/developer/java-driver/4.10/manual/core/query_timestamps/

@havaker havaker mentioned this issue Sep 28, 2021
6 tasks
@krzysztofgal
Copy link

I have issue with set_timestamp. When constructing batch statements and each statement must have timestamp to force correct order of queries.

When i do query without USING TIMESTAMP
UPDATE lesser_jank SET version = ?, config = ? WHERE id = ?"
And setup timestamp like:

self.timestamp = AtomicI64::new(chrono::Local::now().timestamp_nanos() / 1000);
...

pub fn add_batch_stmt(&mut self, mut prepared_statement: PreparedStatement) {
        let timestamp = self.timestamp.fetch_add(1, Ordering::SeqCst);
        prepared_statement.set_timestamp(Some(timestamp));
        self.batch.append_statement(prepared_statement);
}

Then I get invalid end state which resembles timestamp collision - according to scylla documentation
"If it happens, e.g. two INSERTS have the same timestamp, conflicting cell values are compared and the cells with the lexicographically bigger value prevail."

When i do query with USING TIMESTAMP:
UPDATE lesser_jank USING TIMESTAMP ? SET version = ?, config = ? WHERE id = ?"
Using same timestamps as above then i get correct end state.

So I am doing something completely wrong, or driver have some bug and set_timestamp() does not set timestamp.

@psarna
Copy link
Contributor

psarna commented May 30, 2022

@krzysztofgal I think our API is a little inconsistent here. What will most likely work for you (please try) is calling set_timestamp() directly on self.batch, not on individual statements. Based on code inspection, I think that our code for batches does not check whether the statements inside the batch have a timestamp set (which is a bug), but instead we only check if somebody called set_timestamp() on the batch itself. USING TIMESTAMP, on the other hand, will work regardless because it's treated with higher priority than timestamps set with set_timestamp().

@krzysztofgal
Copy link

Thanks for quick reply. I just tried batch.set_timestamp and it produces same invalid state. That is event sourcing system so correct order of queries inside batch is crucial (there can be multiple same queries on same PK in one BATCH).
So I will have to do USING TIMESTAMP on each query, which is more room for mistakes 😅 or change my api to be able to detect duplicate queries and add to BATCH just last of it what actually makes sense anyway 😃.

@psarna
Copy link
Contributor

psarna commented May 30, 2022

Oh wait, if you need multiple timestamp values inside a batch statement, then I think that CQL does not really allow expressing it without using USING TIMESTAMP in each query. I read through https://github.com/apache/cassandra/blob/c7e7984008062aa2cf2de5a1fc080674bb2139ff/doc/native_protocol_v4.spec#L414-L474 and the frame only accepts a single timestamp - as if all queries inside the batch use a single timestamp.
When you call execute() on a single prepared statement, then it also allows setting the timestamp via query_parameters:
https://github.com/apache/cassandra/blob/c7e7984008062aa2cf2de5a1fc080674bb2139ff/doc/native_protocol_v4.spec#L403-L409
https://github.com/apache/cassandra/blob/c7e7984008062aa2cf2de5a1fc080674bb2139ff/doc/native_protocol_v4.spec#L337-L343
, but these parameters are part of the EXECUTE (or QUERY) frame, and the BATCH frame does not have any room for per-query timeouts.

So unless I misunderstand the docs, USING TIMESTAMP ? is the correct way of expressing what you want to do, according to the CQL standard. /cc @piodul, I'd appreciate a second opinion.

If I'm right, what we should do in the driver is emitting a warning if somebody set a timestamp via set_timestamp() on a query that exists only inside of a batch - because it will not work as expected and it's a pitfall.

@piodul
Copy link
Collaborator Author

piodul commented May 30, 2022

@psarna Your explanation looks correct. I don't see anything about per-query timestamps in a batch in the protocol spec either - so, USING TIMESTAMP ? seems to be the only option.

As a side note, I think we should consider separating the timestamp parameter from the Query/PreparedStatement. This sounds like a parameter you would choose every time you issue a query - like e.g. paging state for which we already have special versions of Session::execute/query. On the other hand, query/prepared statement is a long-lived object so it doesn't make sense to add such parameters to it. Such separation would also reduce confusion like in @krzysztofgal's case as the queries/stmts you add to a batch wouldn't have timestamps.

Perhaps we should tidy up our interface a little - we should consider doing it along with #354.

@psarna psarna closed this as completed Aug 16, 2022
@psarna psarna reopened this Aug 16, 2022
@piodul piodul added this to the 0.9.0 milestone Mar 24, 2023
@Lorak-mmk Lorak-mmk self-assigned this Nov 15, 2023
@avelanarius
Copy link

avelanarius commented Feb 20, 2024

Renaming the issue so that it better describes the current state of this issue, which is:

I think we should consider separating the timestamp parameter from the Query/PreparedStatement. This sounds like a parameter you would choose every time you issue a query - like e.g. paging state for which we already have special versions of Session::execute/query. On the other hand, query/prepared statement is a long-lived object so it doesn't make sense to add such parameters to it. Such separation would also reduce confusion like in @krzysztofgal's case as the queries/stmts you add to a batch wouldn't have timestamps.

@avelanarius avelanarius changed the title Client-side timestamps Move set_timestamp out of Query/PreparedStatement Feb 20, 2024
@avelanarius avelanarius added the API-breaking This might introduce incompatible API changes label Feb 20, 2024
@wprzytula wprzytula self-assigned this Apr 9, 2024
@wprzytula
Copy link
Collaborator

The option to specify timestamp for a single execution only could be included in the Execute trait design (#713).
As that option should be available for any kind of statement (Unprepared, Prepared, Bound, Batch), it could be part of the trait, as in:

trait Executable {
    async fn execute(...) { ...}
    
    async fn execute_with_timestamp(...) { ... }
} 

@avelanarius avelanarius modified the milestones: 1.0.0, 0.15.0 Apr 30, 2024
@wprzytula wprzytula added the cpp-rust-driver-p2 Functionality required by cpp-rust-driver label Jul 9, 2024
@Lorak-mmk Lorak-mmk removed their assignment Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API-breaking This might introduce incompatible API changes cpp-rust-driver-p2 Functionality required by cpp-rust-driver
Projects
None yet
Development

No branches or pull requests

6 participants