Skip to content

Custom objective and evaluation functions #1230

@nimamox

Description

@nimamox

Hi folks,
The problem is that when I set fobj to my customized objective function, the prediction error I receive after training differs from what is reported by feval at the last iteration. The errors are identical, as expected, if I remove fobj.

def my_logistic_obj(y_hat, dtrain):
    y = dtrain.get_label()
    p = y_hat 
    grad = 4 * p * y + p - 5 * y
    hess = (4 * y + 1) * (p * (1.0 - p))
    return grad, hess

def my_err_rate(y_hat, dtrain):
    y = dtrain.get_label()
    y_hat = np.clip(y_hat, 10e-7, 1-10e-7)
    loss_fn = y*np.log(y_hat)
    loss_fp = (1.0 - y)*np.log(1.0 - y_hat)
    return 'myloss', np.sum(-(5*loss_fn+loss_fp))/len(y), False

def calc_loss(y, yp): #same as my_err_rate
    yp = np.clip(yp, 10e-7, 1.0-10e-7)
    loss_fn = y*np.log(yp)
    loss_fp = (1.0-y)*np.log(1.0-yp)
    return np.sum(-(5*loss_fn+loss_fp))/y.shape[0]

params = {
    'task': 'train',
    'objective': 'regression',
    'boosting': 'gbdt',
    'metric': 'auc',
    'train_metric': '+',
    'num_leaves': 260,
    'learning_rate': 0.0245,
    'feature_fraction': 0.9,
    'bagging_fraction': 0.8,
    'bagging_freq': 5,
    'max_depth': 15,
    'max_bin': 512
}

categoricals = None
lgb_train = lgb.Dataset(X_trn, y_trn, feature_name=features, categorical_feature=categoricals, free_raw_data=True) 
lgb_eval = lgb.Dataset(X_val, y_val, feature_name=features, categorical_feature=categoricals, reference=lgb_train, free_raw_data=True)

evals_result = {}
gbm = lgb.train(params,
                lgb_train,
                num_boost_round = 5,
                valid_sets=[lgb_eval, lgb_train],
#                 early_stopping_rounds = 2,
#                 fobj = my_logistic_obj, # <~~ First without fobj
                feval = my_err_rate,
                evals_result=evals_result,
                verbose_eval=1
               )

This gives me,

...
[4]	training's auc: 0.725577	training's myloss: 0.921026	valid_0's auc: 0.664285	valid_0's myloss: 0.826622
[5]	training's auc: 0.726053	training's myloss: 0.916696	valid_0's auc: 0.665518	valid_0's myloss: 0.824463

And if I evaluate both metrics on predictions, it correctly gives the same results:

y_pred = gbm.predict(X_val)
calc_loss(y_val, y_pred) # ~> 0.824463   --  this is to show calc_loss() is identical to my_err_rate()
fpr, tpr, thresholds = metrics.roc_curve(y_val, y_pred)
metrics.auc(fpr, tpr) # ~> 0.6655177490096028

But the problem is that if I enable my customized objective function, the AUC will be the same by my own loss is different!
Enabling fobj I'd have,

...
[4]	training's auc: 0.724176	training's myloss: 0.638512	valid_0's auc: 0.663375	valid_0's myloss: 0.620981
[5]	training's auc: 0.727059	training's myloss: 0.635095	valid_0's auc: 0.666306	valid_0's myloss: 0.61966

And I'll have

y_pred = gbm.predict(X_val)
calc_loss(y_val, y_pred) # ~> 0.644286 
fpr, tpr, thresholds = metrics.roc_curve(y_val, y_pred)
metrics.auc(fpr, tpr) # ~> 0.6663059118977099

Now while the AUC is identical to what verbose-mode training indicates for last iteration, as you see, my loss shows a significantly worse error to what is reported.

Environment info

Operating System: Linux (Ubuntu Server 16.04)
CPU: x86_64 E5-2697 v2 (48 cores)
Python version: 2.7.12

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions