You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Finish work that started as part of the PoC PUBDEV-6862
Execute XGBoost on an external cluster.
TODO SECURITY
** cluster with https
** cluster with auth
TODO STEAM - decision mechanism of if to use remote execution and starting of the remote cluster (via Steam)
** it may not be possible to establish connection H2O->Steam
** think about being usable on k8s
Data exchange
-the H2O Frame is converted to DMatrix on each node and the DMatrix is written to the file system, the execution cluster then loads one DMatrix part in each node-
** -TODO- -think about transfering the DMatrix data directly over TCP/HTTP, maybe only for smaller matrices, but maybe, we could do 1:1 transfer for regular node to XGB node-
** -TODO- -when loading the DMatrix on executor cluster we first load it all into memory and then dump into a local file and then load it into DMatrix (native memory), maybe unnecessary to have it it memory twice-
-used single Schema classes for HTTP req/resp, actual data is passed as Base64 encoded binary data in JSON-
-the binary data in JSON is either a java Serialized object or raw xgboost booster bytes-
-TODO- -the above is fine except when transmitting a large booster, this may lead to HTTP timetouts and we might want to use a Streaming req/resp and move away from using SchemaV3 based API-
-TODO- -think about connection pooling to have faster HTTP turnaround (-[-comment-|https://github.com/[PUBDEV-6862] XGBoost off cluster POC #4344/files#r401372608] -from Pavel)-
The text was updated successfully, but these errors were encountered:
Finish work that started as part of the PoC PUBDEV-6862
Execute XGBoost on an external cluster.
** cluster with https
** cluster with auth
** it may not be possible to establish connection H2O->Steam
** think about being usable on k8s
Data exchange
** -TODO- -think about transfering the DMatrix data directly over TCP/HTTP, maybe only for smaller matrices, but maybe, we could do 1:1 transfer for regular node to XGB node-
** -TODO- -when loading the DMatrix on executor cluster we first load it all into memory and then dump into a local file and then load it into DMatrix (native memory), maybe unnecessary to have it it memory twice-
The text was updated successfully, but these errors were encountered: