Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform resource existence checks only once in Mongo connector [HZ-2673] #24953

Merged

Conversation

TomaszGaweda
Copy link
Contributor

@TomaszGaweda TomaszGaweda commented Jul 4, 2023

Instead of performing check on every processor, do it once.

Additionally, make it possible to disable the checks in SQL connector.
Possible value of new flag "checkExistence":

  • only-initial: db/collection existence will be performed once, during mapping creation.
  • once-per-job: as above + during every query
  • on-each-connect - similar to above, but in case of reconnection (e.g. restore after failure) it will perform the checks again.
  • never - well... never.

Fixes performance issue found during benchmarking, where checks were 1/3 of execution time.

Breaking changes:

  • throwOnNonExisting flags were changed to checkResourceExistence with enum values instead of true/false
  • SQL parameter forceMongoReadParallelismOne was changed to forceReadTotalParallelismOne

Checklist:

  • Labels (Team:, Type:, Source:, Module:) and Milestone set
  • Label Add to Release Notes or Not Release Notes content set
  • Request reviewers if possible
  • Send backports/forwardports if fix needs to be applied to past/future releases

@TomaszGaweda TomaszGaweda changed the title Perform resource existence checks only once in Mongo connector Perform resource existence checks only once in Mongo connector [HZ-2673] Jul 5, 2023
@hz-devops-test
Copy link

The job Hazelcast-pr-EE-compiler of your PR failed. (Hazelcast internal details: build log, artifacts).
Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log file
--------------------------
---------SUMMARY----------
--------------------------
[ERROR] COMPILATION ERROR : 
--------------------------
[ERROR] /home/jenkins/jenkins_slave/workspace/Hazelcast-pr-EE-compiler/hazelcast-enterprise/hazelcast-enterprise/src/test/java/com/hazelcast/jet/sql/impl/index/HDIndexRangeFilterIteratorTest.java:[19,52] error: cannot find symbol
--------------------------
---------ERRORS-----------
--------------------------
[ERROR] COMPILATION ERROR : 
--------------------------
[ERROR] /home/jenkins/jenkins_slave/workspace/Hazelcast-pr-EE-compiler/hazelcast-enterprise/hazelcast-enterprise/src/test/java/com/hazelcast/jet/sql/impl/index/HDIndexRangeFilterIteratorTest.java:[19,52] error: cannot find symbol
--------------------------
[ERROR] /home/jenkins/jenkins_slave/workspace/Hazelcast-pr-EE-compiler/hazelcast-enterprise/hazelcast-enterprise/src/test/java/com/hazelcast/jet/sql/impl/index/HDIndexInFilterIterationTest.java:[19,50] error: cannot find symbol
--------------------------
[ERROR] /home/jenkins/jenkins_slave/workspace/Hazelcast-pr-EE-compiler/hazelcast-enterprise/hazelcast-enterprise/src/test/java/com/hazelcast/jet/sql/impl/index/HDIndexEqualsFilterIterationTest.java:[19,54] error: cannot find symbol
--------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.11.0:testCompile (default-testCompile) on project hazelcast-enterprise: Compilation failure: Compilation failure: 
--------------------------
[ERROR] /home/jenkins/jenkins_slave/workspace/Hazelcast-pr-EE-compiler/hazelcast-enterprise/hazelcast-enterprise/src/test/java/com/hazelcast/jet/sql/impl/index/HDIndexRangeFilterIteratorTest.java:[19,52] error: cannot find symbol
--------------------------
[ERROR]   symbol: class IndexRangeFilterIteratorTest
--------------------------
[ERROR] /home/jenkins/jenkins_slave/workspace/Hazelcast-pr-EE-compiler/hazelcast-enterprise/hazelcast-enterprise/src/test/java/com/hazelcast/jet/sql/impl/index/HDIndexInFilterIterationTest.java:[19,50] error: cannot find symbol
--------------------------
[ERROR]   symbol: class IndexCompositeFilterIterationTest
--------------------------
[ERROR] /home/jenkins/jenkins_slave/workspace/Hazelcast-pr-EE-compiler/hazelcast-enterprise/hazelcast-enterprise/src/test/java/com/hazelcast/jet/sql/impl/index/HDIndexEqualsFilterIterationTest.java:[19,54] error: cannot find symbol
--------------------------
[ERROR]   symbol: class IndexEqualsFilterIterationTest
--------------------------
[ERROR] -> [Help 1]
--------------------------
[ERROR] 
--------------------------
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
--------------------------
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
--------------------------
[ERROR] 
--------------------------
[ERROR] For more information about the errors and possible solutions, please read the following articles:
--------------------------
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
--------------------------
[ERROR] 
--------------------------
[ERROR] After correcting the problems, you can resume the build with the command
--------------------------
[ERROR]   mvn  -rf :hazelcast-enterprise
--------------------------

@hazelcast hazelcast deleted a comment from hz-devops-test Sep 19, 2023
@hazelcast hazelcast deleted a comment from hz-devops-test Sep 19, 2023
@@ -200,59 +200,29 @@ public static MongoSourceBuilder.Stream<Document> stream(
return stream("MongoStreamSource(" + dataConnectionRef.getName() + ")", dataConnectionRef);
}

private abstract static class Base<T> {
@SuppressWarnings("unchecked")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: usually done with

SELF self() { return (SELF)this; }

@Nonnull SupplierEx<? extends MongoClient> clientSupplier
) {
checkSerializable(clientSupplier, "clientSupplier");
this.name = name;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: some of that code could be moved to superclass constructor

@@ -412,10 +410,20 @@ public BatchSource<T> build() {
checkNotNull(params.getMapItemFn(), "mapFn must be set");

final ReadMongoParams<T> localParams = params;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the point of having localParams variable? ReadMongoParams is mutable and ReadMongoParams.setXXX methods do not return new instance.
I can see that it is used in lambda in withProcessorSupplier but all sources created from the same builder will share the same instance of params. If the builder is modified after build() you can have problems (local variable fixes compilation problem but not the sharing problem)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

treating builder as a prototype might not be a best idea but someone might try do that

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usage of localParams throughout this methods makes me think that maybe you wanted to clone them

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have similar situation in MongoSinkBuilder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Referencing fields of the builder inside lambda will catch the reference to the builder, meaning that whole builder would need to become serializable. Referencing local variable instead solves the issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but it does not solve problem with builder as prototype. We do not control lifecycle of the builder and the time when PMS will be serialized.

Copy link
Collaborator

@k-jamroz k-jamroz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, we might however do something with a trap of using MongoSource/SinkBuilder as a prototype.

@TomaszGaweda TomaszGaweda merged commit 4fb0658 into hazelcast:master Sep 19, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants