Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SW-2372] Expose option used for waiting before the clouding starts in internal backend #2238

Merged
merged 8 commits into from
Jul 23, 2020

Conversation

jakubhava
Copy link
Contributor

@jakubhava jakubhava commented Jul 21, 2020

@mn-mikke
Copy link
Collaborator

Any reason for executing the SpreadRDD after expiration of the timeout? Couldn't the SpreadRDD be executed in the loop during the period specified by the timeout value?

@jakubhava
Copy link
Contributor Author

Yup, the reason is that the spread rdd is influenced by options such as num retries, subseq tries etc. During the loops, we might end earlier if these counters hit zero

@jakubhava
Copy link
Contributor Author

@mn-mikke What do you think about this change please?

Copy link
Collaborator

@mn-mikke mn-mikke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, It's better to execute SpreadRDD during the timeout and finish as soon as the number of executors is equal to spark.ext.h2o.cluster.size. The new timeout option is set, it could override settings of spark.ext.h2o.spreadrdd.retries.

This is just my opinion, making a decision is up to you.

@jakubhava
Copy link
Contributor Author

@mn-mikke thanks for your input! I'll give it another thought

@jakubhava
Copy link
Contributor Author

@mn-mikke I gave it a thought and try to incorporate the timeout into the main build loop. Can you please have a look on the PR one more time?

@@ -264,6 +264,16 @@ Internal backend configuration properties
| | | | settings and other HDFS-related |
| | | | configurations. |
+----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+
| ``spark.ext.h2o.internal.clouding.timeout`` | ``0`` | ``setInternalBackendCloudingTimeout(Int)`` | Specifies how long the clouding should |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifies how long the clouding should last. -> Specifies how long the discovering of Spark executors should last. Clouding up of H2O cluster will happen afterwards, won't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, good catch!

@@ -115,4 +119,7 @@ object InternalBackendConf {

/** Path to whole Hadoop configuration serialized into XML readable by org.hadoop.Configuration class */
val PROP_HDFS_CONF: (String, None.type) = ("spark.ext.h2o.hdfs_conf", None)

/** Timeout for the clouding, unit is milliseconds */
val PROP_CLOUDING_TIMEOUT: (String, Int) = ("spark.ext.h2o.internal.clouding.timeout", 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about spark.ext.h2o.spreadrdd.retries.timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jakubhava jakubhava merged commit 2f60ffe into master Jul 23, 2020
@jakubhava jakubhava deleted the jh/SW-2372 branch July 23, 2020 18:05
jakubhava added a commit that referenced this pull request Jul 23, 2020
jakubhava added a commit that referenced this pull request Jul 23, 2020
…n internal backend (#2238)

(cherry picked from commit 2f60ffe)
(cherry picked from commit 5c8935d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants