Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ERROR] waiting for heartbeat. master will be shutdowned! #5

Closed
marshalWS opened this issue Dec 27, 2017 · 1 comment
Closed

[ERROR] waiting for heartbeat. master will be shutdowned! #5

marshalWS opened this issue Dec 27, 2017 · 1 comment

Comments

@marshalWS
Copy link

About 190W lines train data. 40w test data. What does this error mean? Can resolve it.?

2017.12.27 13:54:10 com.fenbi.mp4j.comm.CommMaster - slave num:1, port:65534
2017.12.27 13:54:10 org.apache.hadoop.ipc.CallQueueManager - Using callQueue class java.util.concurrent.LinkedBlockingQueue
2017.12.27 13:54:10 org.apache.hadoop.ipc.Server - Starting Socket Reader #1 for port 65534
2017.12.27 13:54:11 org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2017.12.27 13:54:11 com.fenbi.mp4j.comm.CommMaster - rpc server started!, rpcport=65534
2017.12.27 13:54:11 com.fenbi.ytklearn.worker.TrainWorker - configFile:config/model/flt_gbdt.conf
2017.12.27 13:54:11 com.fenbi.ytklearn.worker.TrainWorker - configPath:config/model/flt_gbdt.conf
2017.12.27 13:54:11 com.fenbi.ytklearn.worker.TrainWorker - pyTransformScript:
2017.12.27 13:54:11 com.fenbi.ytklearn.worker.TrainWorker - loginName:user
2017.12.27 13:54:11 com.fenbi.ytklearn.worker.TrainWorker - hostName:BOAXGLNJW0FEFII, hostPort:65534
2017.12.27 13:54:11 com.fenbi.ytklearn.worker.TrainWorker - threadNum:6
2017.12.27 13:54:11 com.fenbi.ytklearn.worker.TrainWorker - modelName:gbdt
2017.12.27 13:54:11 org.apache.hadoop.ipc.Server - IPC Server listener on 65534: starting
2017.12.27 13:54:11 org.apache.hadoop.ipc.Server - IPC Server Responder: starting
2017.12.27 13:54:11 com.fenbi.mp4j.comm.ProcessCommSlave - master host:BOAXGLNJW0FEFII, master port:65534
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - connecting:BOAXGLNJW0FEFII###62585, connected count:1
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - host names before sort:[BOAXGLNJW0FEFII###62585]
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - host names after sort:[BOAXGLNJW0FEFII###62585]
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - current slave's rank:0, address:BOAXGLNJW0FEFII###62585
2017.12.27 13:54:11 com.fenbi.mp4j.comm.ProcessCommSlave - this slave recv data port:62585
2017.12.27 13:54:11 com.fenbi.mp4j.comm.ProcessCommSlave - slave num:1
2017.12.27 13:54:11 com.fenbi.mp4j.comm.ProcessCommSlave - slave rank:0
2017.12.27 13:54:11 com.fenbi.mp4j.comm.ProcessCommSlave - Pid is:7748
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - Pid is:7748
2017.12.27 13:54:11 com.fenbi.mp4j.comm.ProcessCommSlave - slaves addresses:
2017.12.27 13:54:11 com.fenbi.mp4j.comm.ProcessCommSlave - BOAXGLNJW0FEFII:62585
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - this slave init finished!
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - ################ parameters ################
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - data.delim.feature_name_val_delim=ConfigString(":")
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - model.dict_path=ConfigString("config/model/feat_dict")
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - optimization.min_split_loss=ConfigInt(0)
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - optimization.max_leaf_cnt=ConfigInt(16)
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - model.need_dict=ConfigBoolean(false)
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - model.continue_train=ConfigBoolean(false)
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - data.max_feature_dim=ConfigInt(40)
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - data.delim.x_delim=ConfigString("###")
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - model.feature_importance_path=ConfigString("config/model/feature_importance")
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - data.test.max_error_tol=ConfigInt(0)
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - optimization.min_split_samples=ConfigInt(-1)
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - optimization.watch_test=ConfigBoolean(false)
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - data.delim.y_delim=ConfigString(",")
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - optimization.min_child_hessian_sum=ConfigInt(1)
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - optimization.watch_train=ConfigBoolean(false)
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - data.delim.features_delim=ConfigString(" ")
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - data.y_sampling=SimpleConfigList([])
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - optimization.regularization.l1=ConfigInt(0)
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - optimization.regularization.l2=ConfigInt(1)
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - optimization.feature_sample_rate=ConfigDouble(0.8)
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - optimization.max_depth=ConfigInt(7)
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - optimization.tree_grow_policy=ConfigString("loss")
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - optimization.sample_dependent_base_prediction=ConfigBoolean(false)
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - optimization.regularization.learning_rate=ConfigDouble(0.1)
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - optimization.silent=ConfigInt(1)
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - model.dump_freq=ConfigInt(-1)
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - fs_scheme=ConfigString("local")
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - optimization.tree_maker=ConfigString("data")
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - feature.approximate=SimpleConfigList([{"cols":"default","type":"no_sample"}])
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - feature.missing_value=ConfigString("value")
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - optimization.histogram_pool_capacity=ConfigInt(-1)
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - data.train.data_path=ConfigString("data/flt/train.ytklearn")
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - data.train.max_error_tol=ConfigInt(0)
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - model.data_path=ConfigString("config/model/gbdt.model")
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - optimization.loss_function=ConfigString("sigmoid")
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - optimization.instance_sample_rate=ConfigDouble(0.8)
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - feature.filter_threshold=ConfigInt(0)
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - data.unassigned_mode=ConfigString("lines_avg")
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - optimization.uniform_base_prediction=ConfigDouble(0.5)
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - data.assigned=ConfigBoolean(false)
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - optimization.just_evaluate=ConfigBoolean(false)
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - data.test.data_path=ConfigString("data/flt/test.ytklearn")
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - feature.split_type=ConfigString("mean")
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - optimization.eval_metric=SimpleConfigList(["confusion_matrix","auc"])
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - optimization.round_num=ConfigInt(300)
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - verbose=ConfigBoolean(false)
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - optimization.max_abs_leaf_val=ConfigInt(-1)
2017.12.27 13:54:11 com.fenbi.ytklearn.worker.TrainWorker - file system uri:local, URI:local, URI tostring:local
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - commonParams:GBDTCommonParams(verbose=false, dataParams=DataParams(train=DataParams.Train(data_path=data/flt/train.ytklearn, max_error_tol=0), test=DataParams.Test(data_path=data/flt/test.ytklearn, max_error_tol=0), delim=DataParams.Delim(x_delim=###, y_delim=,, features_delim= , feature_name_val_delim=:), y_sampling=[], assigned=false, unassigned_mode=lines_avg), max_feature_dim=40, modelParams=GBDTModelParams(data_path=config/model/gbdt.model, need_dict=false, dict_path=config/model/feat_dict, dump_freq=2147483647, continue_train=false, feature_importance_path=config/model/feature_importance), featureParams=GBDTFeatureParams(split_Type=MEAN, enable_missing_value=true, featureMissingParams=value, needFeaAppro=true, feaApproConfList=[Config(SimpleConfigObject({"cols":"default","type":"no_sample"}))], featureApproximateParamList=null, verbose=false, filter_threshold=0), optimizationParams=GBDTOptimizationParams(learn_type=gradient_boosting, tree_maker_type=DATA_PARALLEL, round_num=300, max_depth=7, min_child_hessian_sum=1.0, max_leaf_cnt=16, min_split_loss=0.0, min_split_samples=-1, objective=sigmoid, sigmoid_zmax=0.0, max_abs_leaf_val=-1.0, lad_refine_appr=false, tree_grow_policy=LOSSCHG_WISE, histogram_pool_capacity=-1.0, regularization=GBDTOptimizationParams.Regularization(l1=0.0, l2=1.0, learningRate=0.1), uniform_base_prediction=0.5, sample_dependent_base_prediction=false, subsample=0.8, feature_sample_rate=0.8, class_num=1, just_evaluate=false, eval_metrics=[confusion_matrix, auc], watch_train=false, watch_test=false, verbose=false))
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - have no dict, we will collect feature dict...
2017.12.27 13:54:11 com.fenbi.mp4j.rpc.Server - #########read train data############
2017.12.27 13:55:05 com.fenbi.mp4j.rpc.Server - [rank=0] [threadId=1] has readed lines:10000
2017.12.27 13:55:05 com.fenbi.mp4j.rpc.Server - [rank=0] [threadId=2] has readed lines:10000
2017.12.27 13:55:05 com.fenbi.mp4j.rpc.Server - [rank=0] [threadId=3] has readed lines:10000
2017.12.27 13:55:05 com.fenbi.mp4j.rpc.Server - [rank=0] [threadId=5] has readed lines:10000
2017.12.27 13:55:05 com.fenbi.mp4j.rpc.Server - [rank=0] [threadId=0] has readed lines:10000
2017.12.27 13:55:05 com.fenbi.mp4j.rpc.Server - [rank=0] [threadId=4] has readed lines:10000
2017.12.27 13:55:47 com.fenbi.mp4j.rpc.Server - [rank=0] [threadId=4] has readed lines:20000
2017.12.27 13:55:47 com.fenbi.mp4j.rpc.Server - [rank=0] [threadId=1] has readed lines:20000
2017.12.27 13:55:47 com.fenbi.mp4j.rpc.Server - [rank=0] [threadId=0] has readed lines:20000
2017.12.27 13:55:47 com.fenbi.mp4j.rpc.Server - [rank=0] [threadId=2] has readed lines:20000
2017.12.27 13:55:47 com.fenbi.mp4j.rpc.Server - [rank=0] [threadId=3] has readed lines:20000
2017.12.27 13:55:47 com.fenbi.mp4j.rpc.Server - [rank=0] [threadId=5] has readed lines:20000
2017.12.27 13:56:25 com.fenbi.mp4j.rpc.Server - [rank=0] [threadId=1] has readed lines:30000
2017.12.27 13:56:25 com.fenbi.mp4j.rpc.Server - [rank=0] [threadId=4] has readed lines:30000
2017.12.27 13:56:25 com.fenbi.mp4j.rpc.Server - [rank=0] [threadId=3] has readed lines:30000
2017.12.27 13:56:25 com.fenbi.mp4j.rpc.Server - [rank=0] [threadId=0] has readed lines:30000
2017.12.27 13:56:30 com.fenbi.mp4j.rpc.Server - [rank=0] [threadId=2] has readed lines:30000
2017.12.27 13:56:30 com.fenbi.mp4j.rpc.Server - [rank=0] [threadId=5] has readed lines:30000
2017.12.27 13:57:07 com.fenbi.mp4j.rpc.Server - [rank=0] [threadId=4] has readed lines:40000
2017.12.27 13:57:07 com.fenbi.mp4j.rpc.Server - [rank=0] [threadId=3] has readed lines:40000
2017.12.27 13:57:10 com.fenbi.mp4j.rpc.Server - [rank=0] [threadId=0] has readed lines:40000
2017.12.27 13:57:10 com.fenbi.mp4j.rpc.Server - [rank=0] [threadId=2] has readed lines:40000
2017.12.27 13:57:10 com.fenbi.mp4j.rpc.Server - [rank=0] [threadId=1] has readed lines:40000
2017.12.27 13:57:10 com.fenbi.mp4j.rpc.Server - [rank=0] [threadId=5] has readed lines:40000
2017.12.27 13:57:58 com.fenbi.mp4j.rpc.Server - [rank=0] [threadId=4] has readed lines:50000
2017.12.27 13:57:58 com.fenbi.mp4j.rpc.Server - [rank=0] [threadId=2] has readed lines:50000
2017.12.27 13:57:58 com.fenbi.mp4j.rpc.Server - [rank=0] [threadId=3] has readed lines:50000
2017.12.27 13:58:01 com.fenbi.mp4j.rpc.Server - [rank=0] [threadId=1] has readed lines:50000
2017.12.27 13:58:01 com.fenbi.mp4j.rpc.Server - [rank=0] [threadId=5] has readed lines:50000
2017.12.27 13:58:01 com.fenbi.mp4j.rpc.Server - [rank=0] [threadId=0] has readed lines:50000
2017.12.27 14:19:28 com.fenbi.mp4j.rpc.Server - [ERROR] waiting for heartbeat timeout > 600000, master will be shutdowned!

@marshalWS
Copy link
Author

closed !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant