How observations with sample_weight of zero influence the fit of LGBMRegressor #5553

JoaquinAmatRodrigo · 2022-10-22T09:13:17Z

Description

Hello,

I am trying to exclude some training observations by giving them a weight of zero through the "sample_weights" argument.

As I understand it, observations with a weight of 0 do not influence the training, so changes in their values should not affect the resulting model. However, this is not what I see if I train two models, each one with different values in the samples with weight of zero.

Does anyone know how exactly the algorithm implements sample weight inside the code?

Thanks a lot!

Reproducible example

import numpy as np
from lightgbm import LGBMRegressor

rng = np.random.default_rng(12345)
X_train = rng.normal(size=(100, 3))
y_train = rng.normal(loc=10, size=(100, 1)).ravel()
X_test  = rng.normal(size=(5, 3))
weights = np.repeat([0, 1], repeats=[10, 90]) # First 10 samples have zero weight

regressor = LGBMRegressor(random_state=123)
regressor.fit(X=X_train, y=y_train, sample_weight=weights)
regressor.predict(X=X_test)

array([10.19829764, 10.72541724, 9.65296611, 9.29206054, 9.77377765])

X_train_2 = X_train.copy()
X_train_2[:10, :] = 5000 # First 10 samples are modified
regressor = LGBMRegressor(random_state=123)
regressor.fit(X=X_train_2, y=y_train, sample_weight=weights)
regressor.predict(X=X_test)

array([10.3309465 , 10.5325829 , 9.54558229, 9.47538141, 9.83769447])

Environment info

import lightgbm
lightgbm.__version__

'3.3.2'

Command(s) you used to install LightGBM

pip install lightgbm

The text was updated successfully, but these errors were encountered:

jmoralez · 2022-10-22T15:28:57Z

Hi @JoaquinAmatRodrigo, thanks for using LightGBM. The weights are used when computing the gradients and hessians. By setting a zero weight you're basically ignoring the errors for those samples. However, the feature values for those samples are still considered when building the feature histograms, that's why you end up with different models.

If you instead modified the target values you should get the same results, e.g.:

import numpy as np
from lightgbm import LGBMRegressor

rng = np.random.default_rng(12345)
X_train = rng.normal(size=(100, 3))
y_train = rng.normal(loc=10, size=(100, 1)).ravel()
X_test  = rng.normal(size=(5, 3))
weights = np.repeat([0, 1], repeats=[10, 90]) # First 10 samples have zero weight

params = {'random_state': 123, 'verbose': -1}
regressor = LGBMRegressor(**params)
regressor.fit(X=X_train, y=y_train, sample_weight=weights)
print(regressor.predict(X=X_test))  # [10.19829764 10.72541724  9.65296611  9.29206054  9.77377765]

y_train2 = y_train.copy()
y_train2[:10] = 1_000 * y_train[:10]
regressor2 = LGBMRegressor(**params)
regressor2.fit(X=X_train, y=y_train2, sample_weight=weights)
print(regressor2.predict(X=X_test))  # [10.19829764 10.72541724  9.65296611  9.29206054  9.77377765]

Please let us know if you have further doubts.

JoaquinAmatRodrigo · 2022-10-22T15:46:21Z

Hi @jmoralez
Now I understand it. Thanks a lot for your explanation!

github-actions · 2023-08-19T03:21:48Z

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

jmoralez added the question label Oct 22, 2022

JoaquinAmatRodrigo closed this as completed Oct 22, 2022

github-actions bot locked as resolved and limited conversation to collaborators Aug 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How observations with sample_weight of zero influence the fit of LGBMRegressor #5553

How observations with sample_weight of zero influence the fit of LGBMRegressor #5553

JoaquinAmatRodrigo commented Oct 22, 2022

jmoralez commented Oct 22, 2022

JoaquinAmatRodrigo commented Oct 22, 2022

github-actions bot commented Aug 19, 2023

How observations with sample_weight of zero influence the fit of LGBMRegressor #5553

How observations with sample_weight of zero influence the fit of LGBMRegressor #5553

Comments

JoaquinAmatRodrigo commented Oct 22, 2022

Description

Reproducible example

Environment info

jmoralez commented Oct 22, 2022

JoaquinAmatRodrigo commented Oct 22, 2022

github-actions bot commented Aug 19, 2023