Use `prepare(String)` for Prepared Statement preparation to prevent bind values from being cached #1213

sbied · 2022-01-24T13:04:10Z

When updating to spring.data.cassandra 3.3.1 from an older version (that one "bundled" with spring-boot 2.3.8) I realized a poor write performance due to the fact that each write eats about 7k memory which then later results in painful full-gc's. After a full-gc, memory is reclaimed and the same procedure begins...

So, this being a typical cache-going-postal pattern, I searched for found the cache in Cassandra java driver class CQLPrepareAsyncProcessor responsible for this behavior. This cache holds PrepareRequests as key and CompletableFuture as values.

Since the write paths of spring.data.cassandra (*CassandraRepository, *CassandraTemplate) call session.prepare with fully filled-out SimpleStatements (not as one might expect with punched-out holes for the actual values to be filled in later during the binding process), a correct prepared statement is delivered, but each individual SimpleStatement (via DefaultPrepareRequest) is stored in the cache, yielding the "memory leak". The cache works with weak references, so a full-gc can reclaim memory.

For a small fix, it should be possible override the createPreparedStatement-Methods in the PreparedStatementHandler classes (themselves inner classes of *CassandraTemplate). Perhaps a SimpleStatementWrapper can do the job.

As a WORKAROUND I found out, that setting usePreparedStatement=false in *CassandraTemplate circumvents the problem (but leads to a performance loss of about 20% in my use case).

I hope, you can help here!

Best regards,

Sven

mp911de · 2022-01-24T13:37:39Z

call session.prepare with fully filled-out SimpleStatements

Do you have an actual example or reproducer? Spring Data Cassandra 3.3.x uses positional bind markers to prepare and run statements. We had once a bug #1172 that created issues for statements using IN keywords.

sbied · 2022-01-24T14:42:06Z

I think, use can produce the issue by writing any entity with

(Reactive)CassandraTemplate::insert, ::update

or using a *Repository on the entity with ::save oder ::insert.

The positional bind markers you mentioned, are still contained in the prepare request (a DefaultPrepareRequest will be created with the DefaultSimpleStatement inside it, which contains the markers). So, the preparing will see different PrepareRequests (see hashCode-methods), where you probably intended having the same PrepareRequest.

I think, by wrapping the SimpleStatement and leaving out the markers for the PrepareRequest, the problem could be fixed.
{of course not the most beautiful solution}.

sbied · 2022-01-24T14:52:30Z

PrepareRequest is inside the Cassandra-Driver, so the wrapping (punching-out holes) must occur before calling session.prepareAsync.

mp911de · 2022-01-24T14:57:11Z

With the investigation from your side, would you mind creating a pull request where we can explore the possibilities?

sbied · 2022-01-24T15:08:56Z

My environment is locked in in a vm with no access to the outside. But perhaps we can have a call, both sitting in Germany?

samueldlightfoot · 2022-02-09T10:40:36Z

One way to get the behaviour we expect is to ensure equals/hashCode for PrepareRequest only includes properties that will actually be used as part of the PreparedStatement. This would include dropping positional values and keeping consistency level, for example.

DefaultPrepareRequest does contain a String accepting constructor, of which we could pass in the parameterized (with bind markers) query String, however certain execution details of the query would be dropped that are usually contained on the SimpleStatement (consistency level etc).

The DataStax docs do say if you want to customize the current behaviour then you can implement your own PrepareRequest, so maybe this is the way forward However, I would be very keen to hear their thoughts on the current behaviour.

I think it should be also noted that due to the cache misses for each request, there will be an unnecessary amount of prepare requests being sent to the Cassandra nodes, which is waste of resources.

mp911de · 2022-02-09T14:34:59Z

Paging @adutra

sbied · 2022-02-14T09:15:07Z

As another solution, I think it might be worth considering to create the PreparedStatement once and store it with the associated CassandraPersistentEntity or a similar place. This might cause problems with consistency level and other options, of course.

Rednas92-spec · 2022-02-21T11:09:28Z

Why not remove the values before preparing?

    public PreparedStatement createPreparedStatement(CqlSession session) throws DriverException { 
        return session.prepare(simpleStatement.setPositionalValues(Collections.emptyList()));
    }

at org.springframework.data.cassandra.core.CassandraTemplate.PreparedStatementHandler#createPreparedStatement

In the javadocs of com.datastax.oss.driver.api.core.cql.SimpleStatement#setPositionalValues

The driver's built-in implementation is immutable, and returns a new instance from this
method. However custom implementations may choose to be mutable and return the same instance.

I don't think this is the most elegant solution as it says that custom implementations may choose to be mutable.

Let me know if I can contribute.

mp911de · 2022-02-21T12:19:39Z

I'm not sure that we want to host a fix/workaround for a driver issue.

Rednas92-spec · 2022-02-21T12:41:39Z

I don't think it's a driver issue. It is working as documented:

At https://docs.datastax.com/en/developer/java-driver/4.13/manual/core/statements/prepared/

Note that caching is based on:

the query string exactly as you provided it: the driver does not perform any kind of trimming or sanitizing.

all other execution parameters: for example, preparing two statements with identical query strings but different consistency levels will yield two distinct prepared statements (that each produce bound statements with their respective consistency level).

I do think it's strange behavior from the driver.

adutra · 2022-02-21T13:34:31Z

One way to get the behaviour we expect is to ensure equals/hashCode for PrepareRequest only includes properties that will actually be used as part of the PreparedStatement. This would include dropping positional values and keeping consistency level, for example.

I agree, the cache shouldn't take into account any actual bound values present in the statement being prepared.

It's difficult to tell whether this should be considered a flaw in the driver or in SDC because implicitly, it is expected that what you are preparing should be a template statement. There is no sense in preparing an actual statement that you would like to see executed as is. IOW, it is expected that when calling session.prepare, you are going to pass a statement containing bind markers, but no bound values. Makes sense?

That said, this constraint is not clearly stated anywhere in our docs, so I don't mind opening a ticket on our side to explore the idea of fixing/improving that. There you go: https://datastax-oss.atlassian.net/browse/JAVA-2996.

But frankly, this looks to me as a misusage, so something to fix on your side. And I do like very much @Rednas92-spec suggestion, as this is exactly what I said above: use a statement template, not the actual statement:

    public PreparedStatement createPreparedStatement(CqlSession session) throws DriverException { 
        return session.prepare(simpleStatement
                .setPositionalValues(Collections.emptyList())
                .setNamedValues(Collections.emptyMap()));
    }

Paging @absurdfarce for awareness.

mp911de · 2022-02-21T14:17:11Z

Thanks @adutra for providing more context. Does it actually make a difference (during prepare) when specifying consistency levels or routing keys? (CqlSession.prepare(String) vs. CqlSession.prepate(SimpleStatement))

I'm wondering whether it would make sense to just prepare plain CQL (session.prepare(statement.getQuery())) as we apply statement settings to BoundStatement anyways in Spring Data.

adutra · 2022-02-21T15:31:45Z

The consistency level would become the default consistency level for all bound statements.

For routing keys and tokens, it doesn't make sense to provide those, as these are precisely what Cassandra computes for you when you prepare a statement. I will add a note to JAVA-2966.

If you apply all desired settings to the bound statements themselves, then yes, you could – and probably should :-) – prepare just the CQL query.

mp911de · 2022-02-21T15:52:54Z

Thanks a lot. Then I'd switch our prepare calls to just the CQL query as that approach is much simpler than resetting the provided parameter maps/lists.

…d values from being cached. We now also apply query options from the initial statement to the bound statement by copying these if query options are set. Closes #1213

mp911de · 2022-03-04T14:07:36Z

We've decided to address the issue in two steps: First, we introduced with #1237 a customization hook for allowing customization of the preparation. Existing applications can override the creation of PreparedStatementHandler to call prepare(String) themselves without us introducing such a change in a maintenance release.

Please ensure that you apply any statement options (page size, paging state, consistency level) as these are not retained into the PreparedStatement when preparing plain CQL.

Version 3.4 and 4.0 of Spring Data Cassandra are going to use prepare(String) preparation method to address the caching issue. Similar as with #1237, you can revert in a subclass to use prepare(Statement) if that is more convenient.

spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Jan 24, 2022

mp911de added the status: waiting-for-feedback We need additional information before we can continue label Jan 24, 2022

spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Jan 24, 2022

Rednas92-spec mentioned this issue Feb 18, 2022

Prepared Statements do not get cached #1229

Closed

mp911de added type: enhancement A general enhancement and removed status: waiting-for-triage An issue we've not yet triaged status: feedback-provided Feedback has been provided labels Feb 21, 2022

mp911de changed the title ~~Writes flood Prepared-Statement-Cache~~ Use prepare(String) for Prepared Statement preparation to prevent bind values from being cached Feb 21, 2022

mp911de added this to the 3.3.3 (2021.1.3) milestone Feb 21, 2022

mp911de modified the milestones: 3.3.3 (2021.1.3), 3.4 M4 (2021.2.0) Mar 4, 2022

mp911de mentioned this issue Mar 4, 2022

Introduce customization hook for PreparedStatementHandler #1237

Closed

mp911de closed this as completed in 1cd392e Mar 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `prepare(String)` for Prepared Statement preparation to prevent bind values from being cached #1213

Use `prepare(String)` for Prepared Statement preparation to prevent bind values from being cached #1213

sbied commented Jan 24, 2022

mp911de commented Jan 24, 2022

sbied commented Jan 24, 2022

sbied commented Jan 24, 2022

mp911de commented Jan 24, 2022

sbied commented Jan 24, 2022

samueldlightfoot commented Feb 9, 2022 •

edited

Loading

mp911de commented Feb 9, 2022

sbied commented Feb 14, 2022 •

edited

Loading

Rednas92-spec commented Feb 21, 2022

mp911de commented Feb 21, 2022

Rednas92-spec commented Feb 21, 2022 •

edited

Loading

adutra commented Feb 21, 2022

mp911de commented Feb 21, 2022

adutra commented Feb 21, 2022

mp911de commented Feb 21, 2022

mp911de commented Mar 4, 2022

Use prepare(String) for Prepared Statement preparation to prevent bind values from being cached #1213

Use prepare(String) for Prepared Statement preparation to prevent bind values from being cached #1213

Comments

sbied commented Jan 24, 2022

mp911de commented Jan 24, 2022

sbied commented Jan 24, 2022

sbied commented Jan 24, 2022

mp911de commented Jan 24, 2022

sbied commented Jan 24, 2022

samueldlightfoot commented Feb 9, 2022 • edited Loading

mp911de commented Feb 9, 2022

sbied commented Feb 14, 2022 • edited Loading

Rednas92-spec commented Feb 21, 2022

mp911de commented Feb 21, 2022

Rednas92-spec commented Feb 21, 2022 • edited Loading

adutra commented Feb 21, 2022

mp911de commented Feb 21, 2022

adutra commented Feb 21, 2022

mp911de commented Feb 21, 2022

mp911de commented Mar 4, 2022

Use `prepare(String)` for Prepared Statement preparation to prevent bind values from being cached #1213

Use `prepare(String)` for Prepared Statement preparation to prevent bind values from being cached #1213

samueldlightfoot commented Feb 9, 2022 •

edited

Loading

sbied commented Feb 14, 2022 •

edited

Loading

Rednas92-spec commented Feb 21, 2022 •

edited

Loading