feature creation: create new features by combining variables with decision trees #107

solegalli · 2020-08-13T18:09:36Z

New variables are created by combination of user indicated variables with decision trees. Example: if user passes 3 variables to transformer, a new feature will be created fitting a decision tree with this tree variables and the target.

To think about:
Should we make the transformer so that it combines variables in groups of 2s and 3s, etc? Say the user passes 5 variables, should we create features combining all possible groups of 2s, all possible groups of 3s, all possible groups of 4s and all 5?

Need to think a bit. I know that we do combine a few variables with trees to create new ones, particularly for use in linear models. But this brute force of combining everything with everything for the sake of combining, I have not seen in organisations where models will be used to score customers. So maybe not ideal. Also, increases computational cost, which is not in the spirit of feature-engine.

We want simple, understandable features, for solid, understandable models. Thoughts?

SuryaThiru · 2020-08-14T06:43:32Z

In my opinion, the right grouping will be domain-specific. When users have an idea of how the features are related, they can combine the features to improve the performance of their models or to better understand the interaction between features in the model (for example). Either way, if we simply provide an interface independent of the number of features, the users can choose the grouping using pipelines. This will keep the technique within the scope of feature engine.

solegalli · 2020-08-29T13:27:49Z

Will pick up this on the week of Sept 7. At the moment I am without my computer. Thanks!

solegalli · 2020-11-27T16:37:42Z

I've just read in the article, that they add the features discretised as additional features to the dataset instead of replacing the existing ones. So we can expand the decision tree discretiser to return the additional features instead of re-writing the original ones.
Probably we can combine this with the above in 1 class.

Morgan-Sell · 2021-12-27T02:49:58Z

Hola @solegalli, are you still looking for help w/ this issue? I'm happy to tackle this one.

A couple of questions:

Is the plan to create a new class which will be the parent class to both the DeciscisionTreeDiscretiser and a new child class that uses DescisionTreeRegressor to create new continuous features?
You brought up an excellent point of the cost associated w/ generating all possible permutations and that all permutations may not be relevant. What's the plan to produce a limited number of desired new features? Do we develop params that allow the user to place constraints on the permutations used to create a new feature?

solegalli · 2021-12-27T12:13:55Z

Hi @Morgan-Sell

This class needs a bit of thinking and brainstorming of the design.

But before developing this one, I would like to have this other PR #189 resolved.

Maybe you would like to crack on with that one instead?

Morgan-Sell · 2021-12-27T23:37:54Z

Sounds good, @solegalli. I thought the task was already "assigned" to someone. I'm happy to work on it.

Morgan-Sell · 2022-04-09T17:59:28Z

Hola @solegalli, I see this one is now a priority.

It seems a bit much to autogenerate all unique feature permutations and apply them all to a decision tree. As @SuryaThiru, I think the combinations are subject matter specific; therefore, we should allow the user to specify the features.

This will also avoid unnecessary computational costs.

On the other hand, we could create an init param called all_permutations. We set the default value equal to False. If the user selects True, well apply all the unique feature permutations to decision trees that return a unique feature for each permutation.

I'm happy to work on this one.

Morgan-Sell · 2022-05-09T23:57:38Z

@solegalli, should we work on this task now?

solegalli · 2022-05-10T06:35:03Z

Hi @Morgan-Sell

You volunteered for many issues now. Not sure which one you like the most?

Morgan-Sell · 2022-05-11T23:43:50Z

@solegalli,

I'll work on this one!

Morgan-Sell · 2022-05-22T03:44:15Z

fyi #454

solegalli added new transformer New feature or request help wanted labels Aug 13, 2020

solegalli changed the title ~~create new features by combining variables with decision trees~~ feature creation: create new features by combining variables with decision trees Oct 2, 2020

solegalli removed the help wanted label Dec 16, 2020

solegalli added the priority need to be looked at next label Jul 18, 2021

Morgan-Sell mentioned this issue Apr 9, 2022

multivariate imputation #404

Open

Morgan-Sell mentioned this issue May 12, 2022

[Feature Creation] Decision tree creates a new feature by combining numerous variables #454

Closed

solegalli mentioned this issue May 19, 2024

add transformer that combines features with decision trees #766

Merged

3 tasks

solegalli closed this as completed in #766 May 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature creation: create new features by combining variables with decision trees #107

feature creation: create new features by combining variables with decision trees #107

solegalli commented Aug 13, 2020

SuryaThiru commented Aug 14, 2020

solegalli commented Aug 29, 2020

solegalli commented Nov 27, 2020

Morgan-Sell commented Dec 27, 2021

solegalli commented Dec 27, 2021

Morgan-Sell commented Dec 27, 2021 •

edited

Loading

Morgan-Sell commented Apr 9, 2022

Morgan-Sell commented May 9, 2022

solegalli commented May 10, 2022

Morgan-Sell commented May 11, 2022

Morgan-Sell commented May 22, 2022 •

edited by solegalli

Loading

feature creation: create new features by combining variables with decision trees #107

feature creation: create new features by combining variables with decision trees #107

Comments

solegalli commented Aug 13, 2020

SuryaThiru commented Aug 14, 2020

solegalli commented Aug 29, 2020

solegalli commented Nov 27, 2020

Morgan-Sell commented Dec 27, 2021

solegalli commented Dec 27, 2021

Morgan-Sell commented Dec 27, 2021 • edited Loading

Morgan-Sell commented Apr 9, 2022

Morgan-Sell commented May 9, 2022

solegalli commented May 10, 2022

Morgan-Sell commented May 11, 2022

Morgan-Sell commented May 22, 2022 • edited by solegalli Loading

Morgan-Sell commented Dec 27, 2021 •

edited

Loading

Morgan-Sell commented May 22, 2022 •

edited by solegalli

Loading