-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SW-2372] Expose option used for waiting before the clouding starts in internal backend #2238
Conversation
Any reason for executing the |
Yup, the reason is that the spread rdd is influenced by options such as num retries, subseq tries etc. During the loops, we might end earlier if these counters hit zero |
@mn-mikke What do you think about this change please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, It's better to execute SpreadRDD
during the timeout and finish as soon as the number of executors is equal to spark.ext.h2o.cluster.size
. The new timeout option is set, it could override settings of spark.ext.h2o.spreadrdd.retries
.
This is just my opinion, making a decision is up to you.
@mn-mikke thanks for your input! I'll give it another thought |
@mn-mikke I gave it a thought and try to incorporate the timeout into the main build loop. Can you please have a look on the PR one more time? |
@@ -264,6 +264,16 @@ Internal backend configuration properties | |||
| | | | settings and other HDFS-related | | |||
| | | | configurations. | | |||
+----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+ | |||
| ``spark.ext.h2o.internal.clouding.timeout`` | ``0`` | ``setInternalBackendCloudingTimeout(Int)`` | Specifies how long the clouding should | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifies how long the clouding should last.
-> Specifies how long the discovering of Spark executors should last.
Clouding up of H2O cluster will happen afterwards, won't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, good catch!
@@ -115,4 +119,7 @@ object InternalBackendConf { | |||
|
|||
/** Path to whole Hadoop configuration serialized into XML readable by org.hadoop.Configuration class */ | |||
val PROP_HDFS_CONF: (String, None.type) = ("spark.ext.h2o.hdfs_conf", None) | |||
|
|||
/** Timeout for the clouding, unit is milliseconds */ | |||
val PROP_CLOUDING_TIMEOUT: (String, Int) = ("spark.ext.h2o.internal.clouding.timeout", 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about spark.ext.h2o.spreadrdd.retries.timeout
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
CC: @FengAtH2O