Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of chained queries into contained resources #3312

Merged
merged 8 commits into from
Jan 20, 2022

Conversation

JasonRoberts-smile
Copy link
Collaborator

No description provided.

@codecov
Copy link

codecov bot commented Jan 19, 2022

Codecov Report

Merging #3312 (b0a5b7b) into master (7374099) will increase coverage by 0.00%.
The diff coverage is 93.11%.

Impacted file tree graph

@@            Coverage Diff             @@
##             master    #3312    +/-   ##
==========================================
  Coverage     82.82%   82.83%            
- Complexity    20377    20399    +22     
==========================================
  Files          1366     1367     +1     
  Lines         73172    73337   +165     
  Branches      11010    11033    +23     
==========================================
+ Hits          60608    60748   +140     
- Misses         8343     8367    +24     
- Partials       4221     4222     +1     
Impacted Files Coverage Δ
...uilder/predicate/ResourceLinkPredicateBuilder.java 85.30% <ø> (-7.67%) ⬇️
...ava/ca/uhn/fhir/jpa/search/builder/QueryStack.java 90.03% <93.11%> (+0.84%) ⬆️
...n/fhir/jpa/config/HapiFhirHibernateJpaDialect.java 75.00% <0.00%> (-18.75%) ⬇️
...or/TransactionConcurrencySemaphoreInterceptor.java 79.41% <0.00%> (-4.42%) ⬇️
...java/ca/uhn/fhir/jpa/dao/TransactionProcessor.java 91.97% <0.00%> (-0.54%) ⬇️
...r-base/src/main/java/ca/uhn/fhir/util/UrlUtil.java 73.93% <0.00%> (-0.43%) ⬇️
...n/fhir/rest/api/IVersionSpecificBundleFactory.java 0.00% <0.00%> (ø)
...va/ca/uhn/fhir/jpa/model/entity/ResourceTable.java 89.73% <0.00%> (ø)
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7374099...b0a5b7b. Read the comment docs.

Copy link
Collaborator

@katiesmilecdr katiesmilecdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me, just curious why this way improves the performance?

@JasonRoberts-smile JasonRoberts-smile enabled auto-merge (squash) January 20, 2022 20:24
Copy link
Contributor

@michaelabuckley michaelabuckley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with some questions. Happy to walk through this with you in the morning.

return createPredicateComposite(theSourceJoinColumn, theResourceName, theSpnamePrefix, theParamDef, theNextAnd, theRequestPartitionId, mySqlBuilder);
}

private Condition createPredicateComposite(@Nullable DbColumn theSourceJoinColumn, String theResourceName, String theSpnamePrefix, RuntimeSearchParam theParamDef, List<? extends IQueryParameterType> theNextAnd, RequestPartitionId theRequestPartitionId, SearchQueryBuilder theSqlBuilder) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh. It would be nice to move these out to a new class, but I understand why you didn't. Maybe some Javadoc explaining they are only used for chained recursion?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, I made this change to make them easier to use for subselects, and they could also be used to construct subselects outside of the chained recursion algorithm. But I agree that pulling these createPredicateX methods out should be one step in decomposing this monster class.


String paramName = getParamNameWithPrefix(theSpnamePrefix, theSearchParam.getName());

if (theList.get(0).getMissing() != null) {
QuantityBasePredicateBuilder join = createOrReusePredicateBuilder(PredicateBuilderTypeEnum.QUANTITY, theSourceJoinColumn, theSearchParam.getName(), () -> mySqlBuilder.addQuantityPredicateBuilder(theSourceJoinColumn)).getResult();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How confident are you that you got every type (quantity, date, etc) of clause? The tests look like they are more focused on the chaining than on exercising the different types as leaf nodes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other tests that assert the correctness of the leaf clauses. They didn't change as part of this MR, but they did catch a few bugs for me.

import java.util.function.Supplier;
import java.util.stream.Collectors;

import static org.apache.commons.lang3.ObjectUtils.defaultIfNull;
import static org.apache.commons.lang3.StringUtils.isBlank;
import static org.apache.commons.lang3.StringUtils.isNotBlank;
import static org.apache.commons.lang3.StringUtils.split;

public class QueryStack {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wowza. This is a big class. I'm impressed by your wrangling.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should certainly decompose and refactor this monster before attempting any further enhancements.

}

// restore the state of this collection to turn caching back on before we exit
myReusePredicateBuilderTypes.addAll(cachedReusePredicateBuilderTypes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be cleared first like above? Or is the re-use barrier only one direction?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

myReusePredicateBuilderTypes is only populated on class initialization. Its contents don't change during processing. Other than the lines I added in this method, it's only accessed for reads elsewhere in the class. When I cleared it at the top of the method, it should still be empty at the bottom, where I restore its previous contents. That being said, in hindsight, I might have preferred to disable the caching with an explicit boolean switch rather than messing with the contents of the collection at all.

@@ -368,10 +383,15 @@ public Condition createPredicateCoords(@Nullable DbColumn theSourceJoinColumn,
public Condition createPredicateDate(@Nullable DbColumn theSourceJoinColumn, String theResourceName,
String theSpnamePrefix, RuntimeSearchParam theSearchParam, List<? extends IQueryParameterType> theList,
SearchFilterParser.CompareOperation theOperation, RequestPartitionId theRequestPartitionId) {
return createPredicateDate(theSourceJoinColumn, theResourceName, theSpnamePrefix, theSearchParam, theList, theOperation, theRequestPartitionId, mySqlBuilder);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am uncomfortable having both of these in the same class. The risk of calling the wrong one seems very high.

I don't see a double for createPredicateCoords(), or createPredicateHas(). Should I?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not every type of predicate is supported for use in a chain. I only converted the ones I needed to use. One way or another, when we pull these methods out to their own class during decomposition, the doubles will probably go away anyhow.

String theResourceName, String theParamName, List<String> theQualifiers, RuntimeSearchParam theSearchParam,
List<? extends IQueryParameterType> theList, SearchFilterParser.CompareOperation theOperation,
RequestDetails theRequest, RequestPartitionId theRequestPartitionId) {
private class ChainElement {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these all need to be in QueryStack?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. This class badly needs to be decomposed, and promoting all these inner classes to top level will be an important part of that.

updateMapOfReferenceLinks(referenceLinks, Lists.newArrayList(),
leafNodes
.stream()
.map(t -> t.withPathPrefix(nextChain.get(0).getResourceType(), nextChain.get(0).getSearchParam().getName() + "." + nextChain.get(1).getSearchParam().getName()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is the only literal "." here. Is there a reason?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, literal "." also occurs on lines 1042 and 1048. They're correct, but the code would probably be more readable if I pulled the name merge out into a helper method like I did for the path merge.

.map(t -> t.withPathPrefix(nextChain.get(2).getResourceType(), nextChain.get(2).getSearchParam().getName()))
.collect(Collectors.toSet()));
// discrete -> discrete -> contained -> discrete
updateMapOfReferenceLinks(referenceLinks, Lists.newArrayList(nextChain.get(0).getSearchParam().getPath(), mergePaths(nextChain.get(1).getSearchParam().getPath(), nextChain.get(2).getSearchParam().getPath())), leafNodes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of helpers on ChainElement could make this more readable:

  • static String mergePaths(ChainElement ...)
    Also: every call to LeafNodeDefinition.withPathPrefix(String, String) seems to be with a disassembled ChainElement. Maybe simplify to .withPathPrefix(ChainElement)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will have to take a List of ChainElements, but I like this idea. Also, since we always take the elements in sequence, I could probably also replace all the Lists.newArrayList(...) with nextChain.subList().

return createPredicateUri(theSourceJoinColumn, theResourceName, theSpnamePrefix, theSearchParam, theList, theOperation, theRequestDetails, theRequestPartitionId, mySqlBuilder);
}

public Condition createPredicateUri(@Nullable DbColumn theSourceJoinColumn, String theResourceName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see chained versions of createIndexPredicate(), or createPredicateResourceId() either. Should I?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resource IDs aren't supported in chains, so it isn't needed. createIndexPredicate() is a method that I extracted from createPredicateReferenceForContainedResource() to improve readability. It isn't used anywhere else, so it only needs one signature.

@JasonRoberts-smile JasonRoberts-smile merged commit 48caff3 into master Jan 20, 2022
@JasonRoberts-smile JasonRoberts-smile deleted the 20220110-jr-chained-search-performance branch January 20, 2022 21:22
tadgh pushed a commit that referenced this pull request Jan 21, 2022
* base restructuring of query

* fix unit tests

* suppress unnecessary resource type parameter

* pass the resource type used to fetch the search param as part of the chain, so later we do not need to guess what it was

* add query structure tests

* changelog

* fix test failures

* got one of the branches wrong in the 3-reference case
tadgh added a commit that referenced this pull request Apr 5, 2022
* Add versionenum, bump maven version

* failing tests

* more logs

* Add more test cases, not passing

* fixed the double chain with contained resource second case

* clean up a bit

* add tests for qualifiers

* force the resource table to be the root of the query if we might be traversing a contained reference

* tidy up

* tests for longer chains

* changelog

* code review feedback

* backport changelog

* Add version.yaml

* [2935] Escape "%" in like expression

* Add backport to changelog

* Bumping version number. Update VersionEnum to support 5.5.3

* Jr 20211013 chained references (#3079)

* Create index entries for outbound references of contained resources

* build query for chained reference

* fix case where the contained reference is an explicit id rather than a continued chain

* fix contained index to use path names not search param names

* make qualified search work

* cleanup and changelog

* code review

* fix broken tests

* add imports from bad merge

* Add forgotten import from merge

* Jr 20211018 chained references 2 (#3099)

* Create index entries for outbound references of contained resources

* build query for chained reference

* fix case where the contained reference is an explicit id rather than a continued chain

* fix contained index to use path names not search param names

* make qualified search work

* cleanup and changelog

* recurse while creating indexes on contained resources

* double link both contained

* longer contained subchains

* adding some failing test cases to illustrate the limitations of qualified searches

* clean up merge cruft

* changelog

* Jr 20211021 chained references 3 (#3107)

* Create index entries for outbound references of contained resources

* build query for chained reference

* fix case where the contained reference is an explicit id rather than a continued chain

* fix contained index to use path names not search param names

* make qualified search work

* cleanup and changelog

* recurse while creating indexes on contained resources

* double link both contained

* longer contained subchains

* adding some failing test cases to illustrate the limitations of qualified searches

* clean up merge cruft

* changelog

* create recursive resource links

* add test coverage for a more complicated case

* changelog

* remove unnecessary check for _contained flag

* fix broken tests

* Add backport information

* Bump version, add version enum

* begin with failing test

* fix bug

* change log

* change log

* Add backport information

* Improve performance of chained queries into contained resources (#3312)

* base restructuring of query

* fix unit tests

* suppress unnecessary resource type parameter

* pass the resource type used to fetch the search param as part of the chain, so later we do not need to guess what it was

* add query structure tests

* changelog

* fix test failures

* got one of the branches wrong in the 3-reference case

* Add backport info

* Bump version

* Fix errors due to merge

* Bump license year

* Add folder for release version

* Version bump

* Jr 20220210 handle common search params in contained searches (#3377)

* fix handling of common search parameters

* add support for reference search parameters with multiple paths

* Add backport info

* add new folder for version

* Add missing message

* New version enum, and folder

* Version bump

* Bump spring lib version

* Tidy versions

* Remove test

Co-authored-by: Ken Stevens <khstevens@gmail.com>
Co-authored-by: Jason Roberts <jason.roberts@smilecdr.com>
Co-authored-by: katie_smilecdr <katie@smilecdr.com>
Co-authored-by: JasonRoberts-smile <85363818+JasonRoberts-smile@users.noreply.github.com>
jkiddo pushed a commit to trifork/hapi-fhir that referenced this pull request Apr 19, 2022
* Add versionenum, bump maven version

* failing tests

* more logs

* Add more test cases, not passing

* fixed the double chain with contained resource second case

* clean up a bit

* add tests for qualifiers

* force the resource table to be the root of the query if we might be traversing a contained reference

* tidy up

* tests for longer chains

* changelog

* code review feedback

* backport changelog

* Add version.yaml

* [2935] Escape "%" in like expression

* Add backport to changelog

* Bumping version number. Update VersionEnum to support 5.5.3

* Jr 20211013 chained references (hapifhir#3079)

* Create index entries for outbound references of contained resources

* build query for chained reference

* fix case where the contained reference is an explicit id rather than a continued chain

* fix contained index to use path names not search param names

* make qualified search work

* cleanup and changelog

* code review

* fix broken tests

* add imports from bad merge

* Add forgotten import from merge

* Jr 20211018 chained references 2 (hapifhir#3099)

* Create index entries for outbound references of contained resources

* build query for chained reference

* fix case where the contained reference is an explicit id rather than a continued chain

* fix contained index to use path names not search param names

* make qualified search work

* cleanup and changelog

* recurse while creating indexes on contained resources

* double link both contained

* longer contained subchains

* adding some failing test cases to illustrate the limitations of qualified searches

* clean up merge cruft

* changelog

* Jr 20211021 chained references 3 (hapifhir#3107)

* Create index entries for outbound references of contained resources

* build query for chained reference

* fix case where the contained reference is an explicit id rather than a continued chain

* fix contained index to use path names not search param names

* make qualified search work

* cleanup and changelog

* recurse while creating indexes on contained resources

* double link both contained

* longer contained subchains

* adding some failing test cases to illustrate the limitations of qualified searches

* clean up merge cruft

* changelog

* create recursive resource links

* add test coverage for a more complicated case

* changelog

* remove unnecessary check for _contained flag

* fix broken tests

* Add backport information

* Bump version, add version enum

* begin with failing test

* fix bug

* change log

* change log

* Add backport information

* Improve performance of chained queries into contained resources (hapifhir#3312)

* base restructuring of query

* fix unit tests

* suppress unnecessary resource type parameter

* pass the resource type used to fetch the search param as part of the chain, so later we do not need to guess what it was

* add query structure tests

* changelog

* fix test failures

* got one of the branches wrong in the 3-reference case

* Add backport info

* Bump version

* Fix errors due to merge

* Bump license year

* Add folder for release version

* Version bump

* Jr 20220210 handle common search params in contained searches (hapifhir#3377)

* fix handling of common search parameters

* add support for reference search parameters with multiple paths

* Add backport info

* add new folder for version

* Add missing message

* New version enum, and folder

* Version bump

* Bump spring lib version

* Tidy versions

* Remove test

Co-authored-by: Ken Stevens <khstevens@gmail.com>
Co-authored-by: Jason Roberts <jason.roberts@smilecdr.com>
Co-authored-by: katie_smilecdr <katie@smilecdr.com>
Co-authored-by: JasonRoberts-smile <85363818+JasonRoberts-smile@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants