New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PR for packaging, Python3 Support, and Issues #3 and #4 #5
Conversation
Hey there, |
Hi Daniel, My pleasure, glad it could help. I re-added Boruta_py and applied the Python3, #3, and #4 fixes to it as well. I kept the 'reference' implementation as class BorutaPy and made boruta_py2.py's class BorutaPyPlus (for lack of a better name). That way the user can have the module installed and instantiate either. I can update the doc to reflect this as well if you'd like and are satisfied with that name. |
Hey Mike, Thanks a lot this is really great! Like your name suggestion as well! Will
|
Hi guys, Automatic tree number selection is actually a part of the wrapped importance source, so in this regard it is already doable in R Boruta, arbitrary perc and MC correction could be back-ported quite easily making R&Python implementations fully compatible again. |
Hey Miron, Thanks a lot for joining in. If you're happy with it like that, having BorutaPlus be the one and only Boruta_py would be a great simplification as Mike already pointed out. I'll do this tomorrow and change the docs accordingly. Cheers! |
Er, why calling it Plus then? |
Sorry I wasn't clear, I meant, the functionality in BorutaPyPlus will be the only one in this repo and it will be called BorutaPy, just like Mike suggested 2 days ago.. Will do this tonight. |
Great. |
Thanks Miron, I agree. @danielhomola when you make that change, remember to update init.py as well. It's out of sync at the moment, from the last refactor. :) |
Hi Miron, I just noticed that in boruta.R, at line 165, you have this: my R is getting rusty as I use more and more Python, but doesn't this part Am I missing something? Sorry if this is stupid and I misunderstood something.. |
This is intentional; larger p makes all selected attributes more likely to be false positive, regardless in which iteration they were selected. (In other words, correction applies to the whole Boruta run, not a single iteration; this is also why Bonferroni method is the only correction which is not somewhat fishy to use.) BTW, it strikes me that you find it too strict; wherever I apply it, I feel it selects too much... |
Oh OK, if that's intentional, then I'll leave the Python version like that as well. I understand your logic, it's just in biological datasets we often test 10-20 thousand features and bonferroni is always overly harsh that's why FDR is widely used instead. |
So have I, but also the p-values from Boruta easily go extremely low (like log p required low) and there are tentatives to cover the grey area. Anyway, proper FDR has landed on my Boruta research problems list. |
First of all, I apologize for writing such a big PR all at once. I was struggling to get Boruta functioning on my problem and these were the things I needed to fix to get it working for me.
I hope you find this useful. Happy to make any changes you'd like. Thanks for implementing Boruta in python!