-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Hi folks,
The problem is that when I set fobj to my customized objective function, the prediction error I receive after training differs from what is reported by feval at the last iteration. The errors are identical, as expected, if I remove fobj.
def my_logistic_obj(y_hat, dtrain):
y = dtrain.get_label()
p = y_hat
grad = 4 * p * y + p - 5 * y
hess = (4 * y + 1) * (p * (1.0 - p))
return grad, hess
def my_err_rate(y_hat, dtrain):
y = dtrain.get_label()
y_hat = np.clip(y_hat, 10e-7, 1-10e-7)
loss_fn = y*np.log(y_hat)
loss_fp = (1.0 - y)*np.log(1.0 - y_hat)
return 'myloss', np.sum(-(5*loss_fn+loss_fp))/len(y), False
def calc_loss(y, yp): #same as my_err_rate
yp = np.clip(yp, 10e-7, 1.0-10e-7)
loss_fn = y*np.log(yp)
loss_fp = (1.0-y)*np.log(1.0-yp)
return np.sum(-(5*loss_fn+loss_fp))/y.shape[0]
params = {
'task': 'train',
'objective': 'regression',
'boosting': 'gbdt',
'metric': 'auc',
'train_metric': '+',
'num_leaves': 260,
'learning_rate': 0.0245,
'feature_fraction': 0.9,
'bagging_fraction': 0.8,
'bagging_freq': 5,
'max_depth': 15,
'max_bin': 512
}
categoricals = None
lgb_train = lgb.Dataset(X_trn, y_trn, feature_name=features, categorical_feature=categoricals, free_raw_data=True)
lgb_eval = lgb.Dataset(X_val, y_val, feature_name=features, categorical_feature=categoricals, reference=lgb_train, free_raw_data=True)
evals_result = {}
gbm = lgb.train(params,
lgb_train,
num_boost_round = 5,
valid_sets=[lgb_eval, lgb_train],
# early_stopping_rounds = 2,
# fobj = my_logistic_obj, # <~~ First without fobj
feval = my_err_rate,
evals_result=evals_result,
verbose_eval=1
)This gives me,
...
[4] training's auc: 0.725577 training's myloss: 0.921026 valid_0's auc: 0.664285 valid_0's myloss: 0.826622
[5] training's auc: 0.726053 training's myloss: 0.916696 valid_0's auc: 0.665518 valid_0's myloss: 0.824463
And if I evaluate both metrics on predictions, it correctly gives the same results:
y_pred = gbm.predict(X_val)
calc_loss(y_val, y_pred) # ~> 0.824463 -- this is to show calc_loss() is identical to my_err_rate()
fpr, tpr, thresholds = metrics.roc_curve(y_val, y_pred)
metrics.auc(fpr, tpr) # ~> 0.6655177490096028But the problem is that if I enable my customized objective function, the AUC will be the same by my own loss is different!
Enabling fobj I'd have,
...
[4] training's auc: 0.724176 training's myloss: 0.638512 valid_0's auc: 0.663375 valid_0's myloss: 0.620981
[5] training's auc: 0.727059 training's myloss: 0.635095 valid_0's auc: 0.666306 valid_0's myloss: 0.61966
And I'll have
y_pred = gbm.predict(X_val)
calc_loss(y_val, y_pred) # ~> 0.644286
fpr, tpr, thresholds = metrics.roc_curve(y_val, y_pred)
metrics.auc(fpr, tpr) # ~> 0.6663059118977099Now while the AUC is identical to what verbose-mode training indicates for last iteration, as you see, my loss shows a significantly worse error to what is reported.
Environment info
Operating System: Linux (Ubuntu Server 16.04)
CPU: x86_64 E5-2697 v2 (48 cores)
Python version: 2.7.12