Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange parameter validation for interaction constraints in HistGradientBoosting* #24899

Closed
betatim opened this issue Nov 11, 2022 · 1 comment · Fixed by #24915
Closed

Strange parameter validation for interaction constraints in HistGradientBoosting* #24899

betatim opened this issue Nov 11, 2022 · 1 comment · Fixed by #24915

Comments

@betatim
Copy link
Member

betatim commented Nov 11, 2022

HistGradientBoostingClassifier and HistGradientBoostingRegressor both have interaction_cst parameter that is specified as Iterable in the parameter constraints. I discovered two related weirdnesses with this while working on #24849.

  1. no everything that "is a" Iterable can actually be passed as a value. For example h = HistGradientBoostingClassifier(interaction_cst=((1, i) for i in range(2))) leads to check_interaction_cst returning [{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}].
  2. when you add a StrOptions({"foo", "bar"}) as an additional constraint for that parameter you do not receive an error when calling HistGradientBoostingClassifier(interaction_cst="fizzbuzz").

I think (1) happens because the code in _check_interaction_cst consumes the generator with its calls to all and the like, before then trying to use it again.

I think (2) happens because a string "is a" Iterable. So "fizzbuzz" does not satisfy the StrOptions, but passes because it is an Iterable.

Before proposing a solution I'd like to know a bit more about what the intention is behind the argument and its constraint.


I don't think we actually want users to pass an Iterable, because things like dicts and strings are Iterables as well. I don't think there is a good "type" for "something that is kinda like a list or tuple or so, but not a string or dict that contains something that is kinda like a list or tuple or so, but not a string or dict". If we want to also allow generators (maybe because of worries about memory consumption), then we need to have an even smarter parameter constraint and need to re-write the code so that it does not unintentionally consume the generator.

@lorentzenchr
Copy link
Member

For the time being, we could restrict it to str, and list/tuple of lists/tuples/sets, i.e. no generators.
As to memory consumption: Internally, a list of sets is created anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants