You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
HistGradientBoostingClassifier and HistGradientBoostingRegressor both have interaction_cst parameter that is specified as Iterable in the parameter constraints. I discovered two related weirdnesses with this while working on #24849.
no everything that "is a" Iterable can actually be passed as a value. For example h = HistGradientBoostingClassifier(interaction_cst=((1, i) for i in range(2))) leads to check_interaction_cst returning [{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}].
when you add a StrOptions({"foo", "bar"}) as an additional constraint for that parameter you do not receive an error when calling HistGradientBoostingClassifier(interaction_cst="fizzbuzz").
I think (1) happens because the code in _check_interaction_cst consumes the generator with its calls to all and the like, before then trying to use it again.
I think (2) happens because a string "is a" Iterable. So "fizzbuzz" does not satisfy the StrOptions, but passes because it is an Iterable.
Before proposing a solution I'd like to know a bit more about what the intention is behind the argument and its constraint.
I don't think we actually want users to pass an Iterable, because things like dicts and strings are Iterables as well. I don't think there is a good "type" for "something that is kinda like a list or tuple or so, but not a string or dict that contains something that is kinda like a list or tuple or so, but not a string or dict". If we want to also allow generators (maybe because of worries about memory consumption), then we need to have an even smarter parameter constraint and need to re-write the code so that it does not unintentionally consume the generator.
The text was updated successfully, but these errors were encountered:
For the time being, we could restrict it to str, and list/tuple of lists/tuples/sets, i.e. no generators.
As to memory consumption: Internally, a list of sets is created anyway.
HistGradientBoostingClassifier
andHistGradientBoostingRegressor
both haveinteraction_cst
parameter that is specified asIterable
in the parameter constraints. I discovered two related weirdnesses with this while working on #24849.h = HistGradientBoostingClassifier(interaction_cst=((1, i) for i in range(2)))
leads tocheck_interaction_cst
returning[{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}]
.StrOptions({"foo", "bar"})
as an additional constraint for that parameter you do not receive an error when callingHistGradientBoostingClassifier(interaction_cst="fizzbuzz")
.I think (1) happens because the code in
_check_interaction_cst
consumes the generator with its calls toall
and the like, before then trying to use it again.I think (2) happens because a string "is a" Iterable. So "fizzbuzz" does not satisfy the
StrOptions
, but passes because it is anIterable
.Before proposing a solution I'd like to know a bit more about what the intention is behind the argument and its constraint.
I don't think we actually want users to pass an
Iterable
, because things likedict
s and strings areIterable
s as well. I don't think there is a good "type" for "something that is kinda like a list or tuple or so, but not a string or dict that contains something that is kinda like a list or tuple or so, but not a string or dict". If we want to also allow generators (maybe because of worries about memory consumption), then we need to have an even smarter parameter constraint and need to re-write the code so that it does not unintentionally consume the generator.The text was updated successfully, but these errors were encountered: