Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache common PersistentProperty attributes #1082

Conversation

FrankSpitulski
Copy link
Contributor

@FrankSpitulski FrankSpitulski commented Jan 15, 2021

Moving these to the constructor means the annotation evaluation only needs to happen once per type rather than for every mapped row.
This should result in a ~4% improvement under some use cases as shown.
image

  • You have read the Spring Data contribution guidelines.
  • You use the code formatters provided here and have them applied to your changes. Don’t submit any formatting related changes.
  • You submit test cases (unit or integration tests) that back your changes.
  • You added yourself as author in the headers of the classes you touched. Amend the date range in the Apache license header if needed. For new types, add the license header (copy from another file and set the current year only).

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Jan 15, 2021
…onstructor means there only needs to be one evaluation
@FrankSpitulski FrankSpitulski force-pushed the fix/performance/property-annotation-eval branch from bf3aebf to d5813c7 Compare January 15, 2021 05:52
@mp911de
Copy link
Member

mp911de commented Jan 15, 2021

Thanks a lot. Whenever we experience performance issues, we'd rather apply some caching in the sense of adding that aspect on top of the class, not inside of it. That being said, let's introduce a caching variant of CassandraPersistentProperty, similar to CachingMongoPersistentProperty.

@mp911de mp911de added status: waiting-for-feedback We need additional information before we can continue type: enhancement A general enhancement and removed status: waiting-for-triage An issue we've not yet triaged labels Jan 15, 2021
@FrankSpitulski
Copy link
Contributor Author

Sure, I’ll implement that. By the way, I’ve noticed that there is a whole bunch of type checking done for every row that seems like it could be done once on the first query of that type. The datastax driver also skips most of these type checks in a similar manner. Does that sound reasonable? Do you have any plans along those lines? The reason I ask is because I’m observing 3-5x worse performance compared to the datastax mapper.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Jan 15, 2021
@mp911de
Copy link
Member

mp911de commented Jan 15, 2021

By the way, I’ve noticed that there is a whole bunch of type checking done for every row that seems like it could be done once on the first query of that type

Do you have any pointers you're referring to? We are always happy to remove performance bottlenecks.

@FrankSpitulski
Copy link
Contributor Author

Sure, I'd be happy to go further into that.

From my limited knowledge of the library it seems there is a lot of time spent in the MappingCassandraConverter validating the types particularly under the getReadValue method. For instance in the following screenshot the hasProperty function (invoked on line 941) is only useful on the first invocation of that type of entity. After that, all of the execution time is wasted validating that the property is still there.

image

Above that under the convertReadValue function, I'm not sure exactly what to do there but it seems like that typing information could also be cached.

As for getting the value itself, I haven't dug too deep since this crosses the boundary into the datastax portion but this lib seems to be reading the values in a way that isn't as quick as it was before.
Comparing this first image to the second which uses raw datastax reading you can see a whole lot less typing. Note that the first image is using persistence constructors. Additionally the second image uses the datastax cassandra 3 driver.
image
image
Comparisons of percentages and runtimes between the two images aren't valid since they were recorded for different lengths of time, but it does illustrate the difference in typing validations between the two methods.

@mp911de
Copy link
Member

mp911de commented Jan 15, 2021

ClassTypeInformation makes heavy usage of caching internally. I'm actually surprised that obtaining a collection (getSet()) is reported expensive.

It's worth trying to skip external type information and call getObject(…) to see how performance changes.

High self-times in getPotentiallyConvertedSimpleRead(…) indicate that the Cassandra value must be converted into the target type because we have early returns that check the assignability of the value. That being said, feel free to submit another PR with your converter findings/improvements.

@mp911de mp911de removed the status: feedback-provided Feedback has been provided label Jan 15, 2021
@mp911de mp911de self-assigned this Jan 15, 2021
@FrankSpitulski
Copy link
Contributor Author

FrankSpitulski commented Jan 15, 2021

I had a look at CachingMongoPersistentProperty but I think there may be some multithreading issues if those fields are not guarded or at least volatile when used in a potential CachingCassandraPersistentProperty. Since the property is stored once, two queries at the very start before the fields have been initialised could end up colliding and causing double lookups at best or data corruption at worst. Is there some other way these fields are guarded that I missed?

There's also the issue of pushing the primitive booleans to wrapper Booleans and the need for an additional object wrapper over Ordering since it is nullable which would cause increased complexity and performance hits.

@FrankSpitulski
Copy link
Contributor Author

I tried out your suggestion of removing the getCollection special case in the RowReader and letting it fall to getObject. It seems to perform better. I was at ~8% cpu after vs ~10% before the change.

@mp911de mp911de changed the title eager annotation evaluation Cache common PersistentEntity and PersistentProperty attributes Jan 29, 2021
@mp911de
Copy link
Member

mp911de commented Jan 29, 2021

Is there any update here? We'd like to continue with that change. Unless you can/want to provide an updated variant of the code, we'd move forward and apply the caching changes ourselves.

@FrankSpitulski
Copy link
Contributor Author

Sorry, I got pulled into other projects. Go ahead with your changes please. I’ll be sure to re-investigate after they’ve gone in to look for more hot points.

@mp911de mp911de changed the title Cache common PersistentEntity and PersistentProperty attributes Cache common PersistentProperty attributes Jan 29, 2021
@mp911de mp911de closed this in 72129e7 Jan 29, 2021
mp911de added a commit that referenced this pull request Jan 29, 2021
mp911de added a commit that referenced this pull request Jan 29, 2021
We now use CachingCassandraPersistentProperty to pre-compute primary key and embedded flags to reduce computation load when using these properties.

Resolves #1082
mp911de added a commit that referenced this pull request Jan 29, 2021
@mp911de
Copy link
Member

mp911de commented Jan 29, 2021

We've fixed this one without merging the actual pull request.

@mp911de mp911de added this to the 3.1.4 (2020.0.4) milestone Jan 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement A general enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants