-
-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature creation: create new features by combining variables with decision trees #107
Comments
In my opinion, the right grouping will be domain-specific. When users have an idea of how the features are related, they can combine the features to improve the performance of their models or to better understand the interaction between features in the model (for example). Either way, if we simply provide an interface independent of the number of features, the users can choose the grouping using pipelines. This will keep the technique within the scope of feature engine. |
Will pick up this on the week of Sept 7. At the moment I am without my computer. Thanks! |
I've just read in the article, that they add the features discretised as additional features to the dataset instead of replacing the existing ones. So we can expand the decision tree discretiser to return the additional features instead of re-writing the original ones. |
Hola @solegalli, are you still looking for help w/ this issue? I'm happy to tackle this one. A couple of questions:
|
Hi @Morgan-Sell This class needs a bit of thinking and brainstorming of the design. But before developing this one, I would like to have this other PR #189 resolved. Maybe you would like to crack on with that one instead? |
Sounds good, @solegalli. I thought the task was already "assigned" to someone. I'm happy to work on it. |
Hola @solegalli, I see this one is now a priority. It seems a bit much to autogenerate all unique feature permutations and apply them all to a decision tree. As @SuryaThiru, I think the combinations are subject matter specific; therefore, we should allow the user to specify the features. This will also avoid unnecessary computational costs. On the other hand, we could create an init param called I'm happy to work on this one. |
@solegalli, should we work on this task now? |
Hi @Morgan-Sell You volunteered for many issues now. Not sure which one you like the most? |
I'll work on this one! |
fyi #454 |
New variables are created by combination of user indicated variables with decision trees. Example: if user passes 3 variables to transformer, a new feature will be created fitting a decision tree with this tree variables and the target.
To think about:
Should we make the transformer so that it combines variables in groups of 2s and 3s, etc? Say the user passes 5 variables, should we create features combining all possible groups of 2s, all possible groups of 3s, all possible groups of 4s and all 5?
Need to think a bit. I know that we do combine a few variables with trees to create new ones, particularly for use in linear models. But this brute force of combining everything with everything for the sake of combining, I have not seen in organisations where models will be used to score customers. So maybe not ideal. Also, increases computational cost, which is not in the spirit of feature-engine.
We want simple, understandable features, for solid, understandable models. Thoughts?
The text was updated successfully, but these errors were encountered: