-
-
Notifications
You must be signed in to change notification settings - Fork 302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Creation] Decision tree creates a new feature by combining numerous variables #454
[Feature Creation] Decision tree creates a new feature by combining numerous variables #454
Conversation
halo @solegalli, A few questions:
|
An idea would be: These parameters in the init: So if I pass three variables in the list: [var1, var2, var3] and:
If i pass a list, say [1,2], then we return the output of 1 and 2 as above. Alternatively, the user can pass a tuple with tuples (var1, (var,var2), (var1,var2,var3)) indicating how to combine the variables. |
hola @solegalli, Espero que estes disfrutando vacay! When/why would a person apply the decision-tree transformer to one variable? |
hallo @solegalli, The transformer is generating new variables. I have created a few unit tests. I've written some of the docstrings. Before I progress, would you please review/counsel me? We both know I need it ;) A few questions:
Lastly, I included a couple of Gracias! |
…inations_ attribute and not the _create_variable_combinations() method
…ss. need to create more tests.
Hi @Morgan-Sell I've seen you made a lot of commits. Is this work in progress? Do you still need to update the tests? They are all failing :_( I am on holidays from Thursday till August. So if you don't hear from me... you know why ;) Cheers |
hi @solegalli. Ahh.... a month-long vacation! Hopefully, the US will adopt such traditions one day ;) I'm still working on this class. I do have one question. The following test is failing:
Do you know if sklearn's If so, then the DecisionTreeFeatures should raise an error. Other feature-engine classes skip certain sklearn checks, which makes sense given that all of sklearn's checks are unlikely to be appropriate for each class. How do we select to omit certain tests? |
Hi @solegalli, I'm embarrassed to say this, but I'm stumped by these errors. Hopefully, we can discuss the errors when you return. |
Closes #107
Notes from #107:
New variables are created by combination of user indicated variables with decision trees. Example: if user passes 3 variables to transformer, a new feature will be created fitting a decision tree with this tree variables and the target.
To think about:
Should we make the transformer so that it combines variables in groups of 2s and 3s, etc? Say the user passes 5 variables, should we create features combining all possible groups of 2s, all possible groups of 3s, all possible groups of 4s and all 5?
Need to think a bit. I know that we do combine a few variables with trees to create new ones, particularly for use in linear models. But this brute force of combining everything with everything for the sake of combining, I have not seen in organisations where models will be used to score customers. So maybe not ideal. Also, increases computational cost, which is not in the spirit of feature-engine.