New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dart - very poor accuracy #126
Comments
What is working in XGB is probably not just drop_rate but also skip_drop. http://xgboost.readthedocs.io/en/latest/tutorials/dart.html Can you please support this as well? |
https://github.com/dmlc/xgboost/blob/master/src/gbm/gbtree.cc#L547-L711 can you check the code of implementation in XGBoost, and figure out why? |
@guolinke I read these codes before. It's not hard to be added, but I have no idea where he came upon those parameters. |
|
@wxchan, I am not familiar with XGBoost.
|
@guolinke I can add that option. I will do some investigations on this. |
@wxchan ad2] actually dart for xgboost was done with my issue here ad3] For all methods I did some random search of parameters and method should be comparable in the sence of RMSE. Speed is best for deepnet - but it is different algorithm (also depends on settings and hardware). |
@gugatr0n1c
|
It's not so difficult to add |
@wxchan ad2] I can do this tomorrow from work |
@marugari where do you get the idea about another question: when |
@wxchan Regarding
|
@marugari sorry, what I meant is, is there some paper regarding skip_rate and normalize_type, because they are not shown in dart original paper? |
@wxchan sorry. They are my extensions, not published. |
@marugari I see, thanks. |
If you want to use skip_rate, you can arrange a callback with changing drop_rate. Just my opinion on this: I think the parameters I add 'here' should be tested, I am not saying they are wrong, they can be wonderful ideas, but I want to be convinced before I add them 'here'.
|
@wxchan If I set in xgboost learning_rate to 1, accuracy is very bad as well, even worse than here... |
@gugatr0n1c in original paper, they only use shrinkage_rate=1/(1+num_drop_trees); as marugari said, he set shrinkage_rate_ = learning_rate_ / (drop_index_.size() + learning_rate_); I think this is the main difference rather than skip_rate or sample_type. But according to code in xgboost:
If num_drop is a reasonable integer, num_drop+lr should not have big differences when lr=1 or lr=0.1. It has a lot of math involved in deciding these weights, I actually don't fully get it. Another difference: in original paper, their strategy is at least dropping one tree each round, which means num_drop >= 1, this is what I implemented too. xgboost don't have this 'at least one' part(in the mean time, they have skip_drop, with making num_drop=0 directly). It means, with a lr = 0.1 for example, weights of new trees can be very large (1/(0+0.1)=10). I think it's not correct. |
As I wrote I did not used 'sample_type' parameter. I only used 'drop_rate' and 'skip_rate' (together with learning_rate << 1) in xgboost. And it was working nicely. I can not say now the influence of 'skip_rate'. But when I used random search for hyperparameters tuning, it always wanted to set 'skip_rate' to something non zero. But to be honest, if I can use below to simulate 'skip_rate', then it is not necessary to add this parameter, just maybe mention it as an example about callback? Actually it is even better solution, because I can then randomize drop_rate together with skip_rate (this is not even in xgboost) during learning process by some logic in generating drop_rate_list. -->simulate 0.33 skip_rate So I believe there are nice things to do: 1] allow change drop_rate in callback, should work now right?, 2] change shrinkage_rate as marugari wrote. |
@wxchan Although |
@marugari oh sorry, I didn't see the Then I think it's reasonable to change shrinkage_rate to see if it works. It is actually a combination of shrinkage_rate in gbdt and normalization weight in dart. |
@gugatr0n1c can you test again with latest code? |
@guolinke I tried, but got error: Segmentation fault, I tried twice recompile from github boostint_type = 'gbdt' is OK |
@wxchan can you take a look about this? |
@guolinke it seems caused by c_api because cmd line version is fine. I take a look at segfault log: it happened in Dart::Init, seems gbdt_config_ is still null after GBDT::ResetTrainingData. |
@gugatr0n1c @wxchan fixed, can you try it again? |
@guolinke training seems to be working, give you know about accuracy |
@guolinke seems to be not working properly. It almost not converge (even worse than before). Seems to me that first tree is build with very large learning_rate, than additional trees has huge problem to converge. Tried with dart_rate: 0.3, 0.05, 0.001 and with learning_rate = 0.004 and 0.1 - almost same result on all settings. |
@gugatr0n1c OK, I see. |
@gugatr0n1c can you try the xgboost with latest code? @wxchan just fix bug in xgboost. |
@guolinke but weight of the first tree is 1/(num_drop+lr) = 1/lr, it canceled out the lr. |
@wxchan , I see. |
I think:
will be better. In this case, tree normalization is same as before, just new tree shrinks by lr. |
New trees and sum of dropped trees has similar leaf values. |
@marugari if trees are dropped, add new_tree with I think BTW, In original paper, delta weight is: Do you think let delta wight approx to |
This is a little inaccurate. I'm Sorry... New trees and sum of dropped trees scaled by previous weights has similar leaf values. If weights of dropped trees are small. new trees have also small leaf values. |
@marugari I think i understand your idea now. It seems when My idea is let total weight to be increased at every iteration. I think this may save half of iterations while achieve same accuracy. |
@guolinke You are right. |
I just push a commit at dart branch: 518bafd It adds @gugatr0n1c can you try this with your data? |
love to test this, but have some xmas travelling now, so it will take some time.. |
@gugatr0n1c take you time and merry xmas! @wxchan Do you have any comments? |
It's better now. I think you can merge it to master and update docs for it. |
@wxchan , I am not sure should we keep xgboost_dart_mode? And should give it a better name? |
@guolinke merry xmas to you as well I done some tests: But here both xDart and Dart have training very slow: 15mins (gdbt) vs. several hours for xDart, some iterations are very fast ~ 100ms, but some are about 15-20s (related to skip_drop?), DART in xgboost is not so slower to 'gbdt', is there space to speed it up? |
@gugatr0n1c Really? |
@gugatr0n1c what values of your |
hi, erveryone, I have rencently trianed dart and gbdt model using same training data with some noise. For predictions, in general, the dart model is little better than gbdt model. I set all the same parameter learning_rate=0.0112. I didn't try xgboost so I can't tell something. |
@guolinke my setting is this: First like 100 iterations are fast, but I can see still that some iterations are very fast some slower @marugari not detaily tested in xgboost, but never have this feeling that later in the training is difference so big.. @anddelu this is very big offtopic here... send me your email to my twitter PM... @gugatr0n1c |
@gugatr0n1c |
@wxchan if xgboost_dart_mode is better, I think we can just keep it and remove our version. |
@guolinke It may depend on different task. Our version is based on original paper. I think it's better keep it for now. |
@gugatr0n1c I add an parameter |
@guolinke thx, I will run some tests after New Year |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
When I use dart as a booster I always get very poor performance in term of l2 result for regression task.
Even If I use small drop_rate = 0.01 or big like 0.3.
When I use dart in xgboost on same dataset, with similar setting (same learning rate, similiar num_trees) dart alwasy give me boost for accuracy (small but always).
But here accuracy is poor badly, like there is a bug, not just dart is not suitable for my task.
Can anyone confirm that dart is working for regression task in term of better accuracy?
My setting is as follows (part of the Python code for ramdom search of params):
lr = np.random.choice([0.01, 0.005, 0.0025])
list_count = np.random.choice([250, 500, 750, 1000])
min_in_leaf = np.random.choice([25, 50, 100])
subF = np.random.choice([0.15, 0.22, 0.3, 0.5, 0.66, 0.75])
subR = np.random.choice([0.66, 0.75, 0.83, 0.9])
max_depth = np.random.choice([9, 11, 15, 25, 45, 100, -1])
dart_rate = np.random.choice([0, 0, 0, 0.01, 0.03, 0.1])
max_bin = np.random.choice([63, 127, 255, 511])
lambda_l1 = np.random.choice([0, 1., 10., 100.])
lambda_l2 = np.random.choice([0, 1., 10., 100.])
iterace = 10000
if only_testing:
min_in_leaf = 25
iterace = 10
boost_type = 'gbdt'
if dart_rate > 0:
boost_type = 'dart'
params = {
'task' : 'train',
'boosting_type' : boost_type,
'objective' : 'regression',
'metric' : 'l2',
'max_depth' : int(max_depth),
'num_leaves' : int(list_count),
'min_data_in_leaf' : int(min_in_leaf),
'learning_rate' : lr,
'feature_fraction' : subF,
'bagging_fraction' : subR,
'bagging_freq': 1,
'verbose' : 0,
'nthread' : nthread,
'drop_rate': dart_rate,
'max_bin': max_bin,
'lambda_l1' : lambda_l1,
'lambda_l2' : lambda_l2
}
model = lg.train(
params,
(matrix_learn, target_learn),
num_boost_round = iterace,
valid_datas = (matrix_test, target_test),
early_stopping_rounds = 50
)
The text was updated successfully, but these errors were encountered: