Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] Fix error when passing categorical features to lightgbm() (fixes #6000) #6003

Merged
merged 2 commits into from Aug 4, 2023

Conversation

david-cortes
Copy link
Contributor

fixes #6000

This PR fixes an error when supplying dataset parameters to lightgbm(), such as categorical_feature. Before this PR, the dataset was constructed with free_raw_data=TRUE, which impeded it from using parameters that require the raw data after dataset creation.

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for identifying this issue and taking the time to submit a PR! But I think we should pursue a different fix... setting free_raw_data=FALSE this way means unnecessarily storing a copy of the passed-in data throughout training, which might cause out-of-memory issues for users.

Did you explore simply moving this call

# Write categorical features
if (!is.null(categorical_feature)) {
data$set_categorical_feature(categorical_feature)
}

up further in lgb.train(), so it's run prior to the Dataset being constructed here?

data$construct()

I haven't tested that yet, but I think it should solve the issue without requiring changes to lightgbm() or holding an extra copy of the training data in memory. Could you please try that?

@jameslamb
Copy link
Collaborator

Also don't worry about the CUDA CI job failures... we have a repo-wide issue with those jobs right now: #6001

Sorry for the inconvenience.

@david-cortes
Copy link
Contributor Author

david-cortes commented Jul 24, 2023

Thanks for identifying this issue and taking the time to submit a PR! But I think we should pursue a different fix... setting free_raw_data=FALSE this way means unnecessarily storing a copy of the passed-in data throughout training, which might cause out-of-memory issues for users.

Did you explore simply moving this call

# Write categorical features
if (!is.null(categorical_feature)) {
data$set_categorical_feature(categorical_feature)
}

up further in lgb.train(), so it's run prior to the Dataset being constructed here?

data$construct()

I haven't tested that yet, but I think it should solve the issue without requiring changes to lightgbm() or holding an extra copy of the training data in memory. Could you please try that?

Yes, that also seems to fix the issue. Updated.

@jameslamb jameslamb changed the title [R-package] Fix error when passing categorical features to lightgbm() [R-package] Fix error when passing categorical features to lightgbm() (fixes #6000) Aug 4, 2023
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed that this test produces the error from #6000 on latest master, and that it's resolved here.

Thanks for the help as always!

@jameslamb jameslamb merged commit 170a930 into microsoft:master Aug 4, 2023
41 checks passed
Copy link

github-actions bot commented Nov 8, 2023

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 8, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[R-package] Cannot pass named categorical features to lightgbm()
2 participants