Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature creation: create new features by combining variables with decision trees #107

Closed
solegalli opened this issue Aug 13, 2020 · 11 comments · Fixed by #766
Closed

feature creation: create new features by combining variables with decision trees #107

solegalli opened this issue Aug 13, 2020 · 11 comments · Fixed by #766
Labels
new transformer New feature or request priority need to be looked at next

Comments

@solegalli
Copy link
Collaborator

New variables are created by combination of user indicated variables with decision trees. Example: if user passes 3 variables to transformer, a new feature will be created fitting a decision tree with this tree variables and the target.

To think about:
Should we make the transformer so that it combines variables in groups of 2s and 3s, etc? Say the user passes 5 variables, should we create features combining all possible groups of 2s, all possible groups of 3s, all possible groups of 4s and all 5?

Need to think a bit. I know that we do combine a few variables with trees to create new ones, particularly for use in linear models. But this brute force of combining everything with everything for the sake of combining, I have not seen in organisations where models will be used to score customers. So maybe not ideal. Also, increases computational cost, which is not in the spirit of feature-engine.

We want simple, understandable features, for solid, understandable models. Thoughts?

@solegalli solegalli added new transformer New feature or request help wanted labels Aug 13, 2020
@SuryaThiru
Copy link
Contributor

In my opinion, the right grouping will be domain-specific. When users have an idea of how the features are related, they can combine the features to improve the performance of their models or to better understand the interaction between features in the model (for example). Either way, if we simply provide an interface independent of the number of features, the users can choose the grouping using pipelines. This will keep the technique within the scope of feature engine.

@solegalli
Copy link
Collaborator Author

Will pick up this on the week of Sept 7. At the moment I am without my computer. Thanks!

@solegalli solegalli changed the title create new features by combining variables with decision trees feature creation: create new features by combining variables with decision trees Oct 2, 2020
@solegalli
Copy link
Collaborator Author

I've just read in the article, that they add the features discretised as additional features to the dataset instead of replacing the existing ones. So we can expand the decision tree discretiser to return the additional features instead of re-writing the original ones.
Probably we can combine this with the above in 1 class.

@solegalli solegalli added the priority need to be looked at next label Jul 18, 2021
@Morgan-Sell
Copy link
Collaborator

Hola @solegalli, are you still looking for help w/ this issue? I'm happy to tackle this one.

A couple of questions:

  • Is the plan to create a new class which will be the parent class to both the DeciscisionTreeDiscretiser and a new child class that uses DescisionTreeRegressor to create new continuous features?

  • You brought up an excellent point of the cost associated w/ generating all possible permutations and that all permutations may not be relevant. What's the plan to produce a limited number of desired new features? Do we develop params that allow the user to place constraints on the permutations used to create a new feature?

@solegalli
Copy link
Collaborator Author

Hi @Morgan-Sell

This class needs a bit of thinking and brainstorming of the design.

But before developing this one, I would like to have this other PR #189 resolved.

Maybe you would like to crack on with that one instead?

@Morgan-Sell
Copy link
Collaborator

Morgan-Sell commented Dec 27, 2021

Sounds good, @solegalli. I thought the task was already "assigned" to someone. I'm happy to work on it.

@Morgan-Sell
Copy link
Collaborator

Hola @solegalli, I see this one is now a priority.

It seems a bit much to autogenerate all unique feature permutations and apply them all to a decision tree. As @SuryaThiru, I think the combinations are subject matter specific; therefore, we should allow the user to specify the features.

This will also avoid unnecessary computational costs.

On the other hand, we could create an init param called all_permutations. We set the default value equal to False. If the user selects True, well apply all the unique feature permutations to decision trees that return a unique feature for each permutation.

I'm happy to work on this one.

@Morgan-Sell
Copy link
Collaborator

@solegalli, should we work on this task now?

@solegalli
Copy link
Collaborator Author

Hi @Morgan-Sell

You volunteered for many issues now. Not sure which one you like the most?

@Morgan-Sell
Copy link
Collaborator

@solegalli,

I'll work on this one!

@Morgan-Sell
Copy link
Collaborator

Morgan-Sell commented May 22, 2022

fyi #454

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new transformer New feature or request priority need to be looked at next
Projects
None yet
3 participants