[SW-2372] Expose option used for waiting before the clouding starts in internal backend #2238

jakubhava · 2020-07-21T13:21:18Z

doc/src/site/sphinx/configuration/configuration_properties.rst

mn-mikke · 2020-07-21T14:12:27Z

Any reason for executing the SpreadRDD after expiration of the timeout? Couldn't the SpreadRDD be executed in the loop during the period specified by the timeout value?

jakubhava · 2020-07-21T17:58:44Z

Yup, the reason is that the spread rdd is influenced by options such as num retries, subseq tries etc. During the loops, we might end earlier if these counters hit zero

jakubhava · 2020-07-22T19:39:45Z

@mn-mikke What do you think about this change please?

mn-mikke

IMHO, It's better to execute SpreadRDD during the timeout and finish as soon as the number of executors is equal to spark.ext.h2o.cluster.size. The new timeout option is set, it could override settings of spark.ext.h2o.spreadrdd.retries.

This is just my opinion, making a decision is up to you.

jakubhava · 2020-07-23T12:33:09Z

@mn-mikke thanks for your input! I'll give it another thought

…n internal backend

jakubhava · 2020-07-23T15:47:44Z

@mn-mikke I gave it a thought and try to incorporate the timeout into the main build loop. Can you please have a look on the PR one more time?

mn-mikke · 2020-07-23T17:43:05Z

doc/src/site/sphinx/configuration/configuration_properties.rst

@@ -264,6 +264,16 @@ Internal backend configuration properties
 |                                                    |                |                                                 | settings and other HDFS-related        |
 |                                                    |                |                                                 | configurations.                        |
 +----------------------------------------------------+----------------+-------------------------------------------------+----------------------------------------+
+| ``spark.ext.h2o.internal.clouding.timeout``        | ``0``          | ``setInternalBackendCloudingTimeout(Int)``      | Specifies how long the clouding should |


Specifies how long the clouding should last. -> Specifies how long the discovering of Spark executors should last. Clouding up of H2O cluster will happen afterwards, won't it?

Yup, good catch!

mn-mikke · 2020-07-23T17:44:46Z

core/src/main/scala/ai/h2o/sparkling/backend/internal/InternalBackendConf.scala

@@ -115,4 +119,7 @@ object InternalBackendConf {

  /** Path to whole Hadoop configuration serialized into XML readable by org.hadoop.Configuration class */
  val PROP_HDFS_CONF: (String, None.type) = ("spark.ext.h2o.hdfs_conf", None)
+
+  /** Timeout for the clouding, unit is milliseconds */
+  val PROP_CLOUDING_TIMEOUT: (String, Int) = ("spark.ext.h2o.internal.clouding.timeout", 0)


WDYT about spark.ext.h2o.spreadrdd.retries.timeout?

…n internal backend (#2238) (cherry picked from commit 2f60ffe)

…n internal backend (#2238) (cherry picked from commit 2f60ffe) (cherry picked from commit 5c8935d)

jakubhava added the next fix release label Jul 21, 2020

jakubhava requested a review from mn-mikke July 21, 2020 13:22

mn-mikke reviewed Jul 21, 2020

View reviewed changes

doc/src/site/sphinx/configuration/configuration_properties.rst Outdated Show resolved Hide resolved

mn-mikke approved these changes Jul 23, 2020

View reviewed changes

jakubhava added 4 commits July 23, 2020 17:02

[SW-2372] Expose option used for waiting before the clouding starts i…

9bee8e0

…n internal backend

style

ebc9218

Fix

5ab1333

Timeout

1379a38

jakubhava force-pushed the jh/SW-2372 branch from 4526868 to 1379a38 Compare July 23, 2020 15:45

jakubhava added 2 commits July 23, 2020 17:45

fix

194249e

revert

ab2cc68

mn-mikke reviewed Jul 23, 2020

View reviewed changes

jakubhava added 2 commits July 23, 2020 20:01

rename

daae657

Fix comment

0639b04

jakubhava merged commit 2f60ffe into master Jul 23, 2020

jakubhava deleted the jh/SW-2372 branch July 23, 2020 18:05

jakubhava added a commit that referenced this pull request Jul 23, 2020

[SW-2372] Expose option used for waiting before the clouding starts i…

5c8935d

…n internal backend (#2238) (cherry picked from commit 2f60ffe)

jakubhava added a commit that referenced this pull request Jul 23, 2020

[SW-2372] Expose option used for waiting before the clouding starts i…

6afaec2

…n internal backend (#2238) (cherry picked from commit 2f60ffe) (cherry picked from commit 5c8935d)

DinukaH2O mentioned this pull request May 23, 2023

Expose option used for waiting before the clouding starts in internal backend #3038

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SW-2372] Expose option used for waiting before the clouding starts in internal backend #2238

[SW-2372] Expose option used for waiting before the clouding starts in internal backend #2238

jakubhava commented Jul 21, 2020 •

edited

mn-mikke commented Jul 21, 2020

jakubhava commented Jul 21, 2020

jakubhava commented Jul 22, 2020

mn-mikke left a comment

jakubhava commented Jul 23, 2020

jakubhava commented Jul 23, 2020

mn-mikke Jul 23, 2020

jakubhava Jul 23, 2020

mn-mikke Jul 23, 2020

jakubhava Jul 23, 2020

[SW-2372] Expose option used for waiting before the clouding starts in internal backend #2238

[SW-2372] Expose option used for waiting before the clouding starts in internal backend #2238

Conversation

jakubhava commented Jul 21, 2020 • edited

mn-mikke commented Jul 21, 2020

jakubhava commented Jul 21, 2020

jakubhava commented Jul 22, 2020

mn-mikke left a comment

Choose a reason for hiding this comment

jakubhava commented Jul 23, 2020

jakubhava commented Jul 23, 2020

mn-mikke Jul 23, 2020

Choose a reason for hiding this comment

jakubhava Jul 23, 2020

Choose a reason for hiding this comment

mn-mikke Jul 23, 2020

Choose a reason for hiding this comment

jakubhava Jul 23, 2020

Choose a reason for hiding this comment

jakubhava commented Jul 21, 2020 •

edited