Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use prepare(String) for Prepared Statement preparation to prevent bind values from being cached #1213

Closed
sbied opened this issue Jan 24, 2022 · 16 comments
Labels
type: enhancement A general enhancement

Comments

@sbied
Copy link

sbied commented Jan 24, 2022

When updating to spring.data.cassandra 3.3.1 from an older version (that one "bundled" with spring-boot 2.3.8) I realized a poor write performance due to the fact that each write eats about 7k memory which then later results in painful full-gc's. After a full-gc, memory is reclaimed and the same procedure begins...

So, this being a typical cache-going-postal pattern, I searched for found the cache in Cassandra java driver class CQLPrepareAsyncProcessor responsible for this behavior. This cache holds PrepareRequests as key and CompletableFuture as values.

Since the write paths of spring.data.cassandra (*CassandraRepository, *CassandraTemplate) call session.prepare with fully filled-out SimpleStatements (not as one might expect with punched-out holes for the actual values to be filled in later during the binding process), a correct prepared statement is delivered, but each individual SimpleStatement (via DefaultPrepareRequest) is stored in the cache, yielding the "memory leak". The cache works with weak references, so a full-gc can reclaim memory.

For a small fix, it should be possible override the createPreparedStatement-Methods in the PreparedStatementHandler classes (themselves inner classes of *CassandraTemplate). Perhaps a SimpleStatementWrapper can do the job.

As a WORKAROUND I found out, that setting usePreparedStatement=false in *CassandraTemplate circumvents the problem (but leads to a performance loss of about 20% in my use case).

I hope, you can help here!

Best regards,

Sven

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Jan 24, 2022
@mp911de
Copy link
Member

mp911de commented Jan 24, 2022

call session.prepare with fully filled-out SimpleStatements

Do you have an actual example or reproducer? Spring Data Cassandra 3.3.x uses positional bind markers to prepare and run statements. We had once a bug #1172 that created issues for statements using IN keywords.

@mp911de mp911de added the status: waiting-for-feedback We need additional information before we can continue label Jan 24, 2022
@sbied
Copy link
Author

sbied commented Jan 24, 2022

I think, use can produce the issue by writing any entity with

(Reactive)CassandraTemplate::insert, ::update

or using a *Repository on the entity with ::save oder ::insert.

The positional bind markers you mentioned, are still contained in the prepare request (a DefaultPrepareRequest will be created with the DefaultSimpleStatement inside it, which contains the markers). So, the preparing will see different PrepareRequests (see hashCode-methods), where you probably intended having the same PrepareRequest.

I think, by wrapping the SimpleStatement and leaving out the markers for the PrepareRequest, the problem could be fixed.
{of course not the most beautiful solution}.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Jan 24, 2022
@sbied
Copy link
Author

sbied commented Jan 24, 2022

PrepareRequest is inside the Cassandra-Driver, so the wrapping (punching-out holes) must occur before calling session.prepareAsync.

@mp911de
Copy link
Member

mp911de commented Jan 24, 2022

With the investigation from your side, would you mind creating a pull request where we can explore the possibilities?

@sbied
Copy link
Author

sbied commented Jan 24, 2022

My environment is locked in in a vm with no access to the outside. But perhaps we can have a call, both sitting in Germany?

@samueldlightfoot
Copy link
Contributor

samueldlightfoot commented Feb 9, 2022

One way to get the behaviour we expect is to ensure equals/hashCode for PrepareRequest only includes properties that will actually be used as part of the PreparedStatement. This would include dropping positional values and keeping consistency level, for example.

DefaultPrepareRequest does contain a String accepting constructor, of which we could pass in the parameterized (with bind markers) query String, however certain execution details of the query would be dropped that are usually contained on the SimpleStatement (consistency level etc).

The DataStax docs do say if you want to customize the current behaviour then you can implement your own PrepareRequest, so maybe this is the way forward However, I would be very keen to hear their thoughts on the current behaviour.

I think it should be also noted that due to the cache misses for each request, there will be an unnecessary amount of prepare requests being sent to the Cassandra nodes, which is waste of resources.

@mp911de
Copy link
Member

mp911de commented Feb 9, 2022

Paging @adutra

@sbied
Copy link
Author

sbied commented Feb 14, 2022

As another solution, I think it might be worth considering to create the PreparedStatement once and store it with the associated CassandraPersistentEntity or a similar place. This might cause problems with consistency level and other options, of course.

@Rednas92-spec
Copy link

Why not remove the values before preparing?

    public PreparedStatement createPreparedStatement(CqlSession session) throws DriverException { 
        return session.prepare(simpleStatement.setPositionalValues(Collections.emptyList()));
    }

at org.springframework.data.cassandra.core.CassandraTemplate.PreparedStatementHandler#createPreparedStatement

In the javadocs of com.datastax.oss.driver.api.core.cql.SimpleStatement#setPositionalValues

The driver's built-in implementation is immutable, and returns a new instance from this
method. However custom implementations may choose to be mutable and return the same instance.

I don't think this is the most elegant solution as it says that custom implementations may choose to be mutable.

Let me know if I can contribute.

@mp911de
Copy link
Member

mp911de commented Feb 21, 2022

I'm not sure that we want to host a fix/workaround for a driver issue.

@Rednas92-spec
Copy link

Rednas92-spec commented Feb 21, 2022

I don't think it's a driver issue. It is working as documented:

At https://docs.datastax.com/en/developer/java-driver/4.13/manual/core/statements/prepared/

Note that caching is based on:

  • the query string exactly as you provided it: the driver does not perform any kind of trimming or sanitizing.
  • all other execution parameters: for example, preparing two statements with identical query strings but different consistency levels will yield two distinct prepared statements (that each produce bound statements with their respective consistency level).

I do think it's strange behavior from the driver.

@adutra
Copy link

adutra commented Feb 21, 2022

One way to get the behaviour we expect is to ensure equals/hashCode for PrepareRequest only includes properties that will actually be used as part of the PreparedStatement. This would include dropping positional values and keeping consistency level, for example.

I agree, the cache shouldn't take into account any actual bound values present in the statement being prepared.

It's difficult to tell whether this should be considered a flaw in the driver or in SDC because implicitly, it is expected that what you are preparing should be a template statement. There is no sense in preparing an actual statement that you would like to see executed as is. IOW, it is expected that when calling session.prepare, you are going to pass a statement containing bind markers, but no bound values. Makes sense?

That said, this constraint is not clearly stated anywhere in our docs, so I don't mind opening a ticket on our side to explore the idea of fixing/improving that. There you go: https://datastax-oss.atlassian.net/browse/JAVA-2996.

But frankly, this looks to me as a misusage, so something to fix on your side. And I do like very much @Rednas92-spec suggestion, as this is exactly what I said above: use a statement template, not the actual statement:

    public PreparedStatement createPreparedStatement(CqlSession session) throws DriverException { 
        return session.prepare(simpleStatement
                .setPositionalValues(Collections.emptyList())
                .setNamedValues(Collections.emptyMap()));
    }

Paging @absurdfarce for awareness.

@mp911de
Copy link
Member

mp911de commented Feb 21, 2022

Thanks @adutra for providing more context. Does it actually make a difference (during prepare) when specifying consistency levels or routing keys? (CqlSession.prepare(String) vs. CqlSession.prepate(SimpleStatement))

I'm wondering whether it would make sense to just prepare plain CQL (session.prepare(statement.getQuery())) as we apply statement settings to BoundStatement anyways in Spring Data.

@adutra
Copy link

adutra commented Feb 21, 2022

The consistency level would become the default consistency level for all bound statements.

For routing keys and tokens, it doesn't make sense to provide those, as these are precisely what Cassandra computes for you when you prepare a statement. I will add a note to JAVA-2966.

If you apply all desired settings to the bound statements themselves, then yes, you could – and probably should :-) – prepare just the CQL query.

@mp911de
Copy link
Member

mp911de commented Feb 21, 2022

Thanks a lot. Then I'd switch our prepare calls to just the CQL query as that approach is much simpler than resetting the provided parameter maps/lists.

@mp911de mp911de added type: enhancement A general enhancement and removed status: waiting-for-triage An issue we've not yet triaged status: feedback-provided Feedback has been provided labels Feb 21, 2022
@mp911de mp911de changed the title Writes flood Prepared-Statement-Cache Use prepare(String) for Prepared Statement preparation to prevent bind values from being cached Feb 21, 2022
@mp911de mp911de added this to the 3.3.3 (2021.1.3) milestone Feb 21, 2022
mp911de added a commit that referenced this issue Mar 4, 2022
…d values from being cached.

We now also apply query options from the initial statement to the bound statement by copying these if query options are set.

Closes #1213
@mp911de mp911de closed this as completed in 1cd392e Mar 4, 2022
@mp911de
Copy link
Member

mp911de commented Mar 4, 2022

We've decided to address the issue in two steps: First, we introduced with #1237 a customization hook for allowing customization of the preparation. Existing applications can override the creation of PreparedStatementHandler to call prepare(String) themselves without us introducing such a change in a maintenance release.

Please ensure that you apply any statement options (page size, paging state, consistency level) as these are not retained into the PreparedStatement when preparing plain CQL.

Version 3.4 and 4.0 of Spring Data Cassandra are going to use prepare(String) preparation method to address the caching issue. Similar as with #1237, you can revert in a subclass to use prepare(Statement) if that is more convenient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

6 participants