Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PRE REVIEW]: RENT: A Python Package for Repeated Elastic Net Feature Selection #3234

Closed
whedon opened this issue Apr 30, 2021 · 23 comments
Closed

Comments

@whedon
Copy link

whedon commented Apr 30, 2021

Submitting author: @annajenul (Anna Jenul)
Repository: https://github.com/NMBU-Data-Science/RENT
Version: 0.0.1
Editor: @mikldk
Reviewers: @maximtrp, @arunmano121
Managing EiC: Kevin M. Moerman

⚠️ JOSS reduced service mode ⚠️

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Author instructions

Thanks for submitting your paper to JOSS @annajenul. Currently, there isn't an JOSS editor assigned to your paper.

@annajenul if you have any suggestions for potential reviewers then please mention them here in this thread (without tagging them with an @). In addition, this list of people have already agreed to review for JOSS and may be suitable for this submission (please start at the bottom of the list).

Editor instructions

The JOSS submission bot @whedon is here to help you find and assign reviewers and start the main review. To find out what @whedon can do for you type:

@whedon commands
@whedon
Copy link
Author

whedon commented Apr 30, 2021

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks.

⚠️ JOSS reduced service mode ⚠️

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf

@whedon
Copy link
Author

whedon commented Apr 30, 2021

Failed to discover a Statement of need section in paper

@whedon
Copy link
Author

whedon commented Apr 30, 2021

Software report (experimental):

github.com/AlDanial/cloc v 1.88  T=0.53 s (41.7 files/s, 389620.1 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                           6            446            826           1321
Jupyter Notebook                 3              0         202161            325
Markdown                         2             27              0             73
reStructuredText                 6             59            113             56
TeX                              1              3              0             36
DOS Batch                        1              8              1             27
YAML                             1              5              9             11
make                             1              4              6             10
INI                              1              1              0              9
-------------------------------------------------------------------------------
SUM:                            22            553         203116           1868
-------------------------------------------------------------------------------


Statistical information for the repository '211e886260f7a8ac3a5bbec0' was
gathered on 2021/04/30.
The following historical commit information, by author, was found:

Author                     Commits    Insertions      Deletions    % of changes
Anna Jenul                      35          4473           1999           58.90
Ngoc Huynh                       3            62              4            0.60
Oliver Tomic                    18           429            150            5.27
annajenul                       12          1944           1927           35.23

Below are the number of rows from each author that have survived and are still
intact in the current revision:

Author                     Rows      Stability          Age       % in comments
Ngoc Huynh                   38           61.3          2.6               15.79
Oliver Tomic                188           43.8          1.8               30.85
annajenul                  2367          121.8          2.9                7.14

@whedon
Copy link
Author

whedon commented Apr 30, 2021

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@whedon
Copy link
Author

whedon commented Apr 30, 2021

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.21105/joss.00980 is OK

MISSING DOIs

- None

INVALID DOIs

- None

@Kevin-Mattheus-Moerman Kevin-Mattheus-Moerman added the waitlisted Submissions in the JOSS backlog due to reduced service mode. label Apr 30, 2021
@Kevin-Mattheus-Moerman
Copy link
Member

@annajenul Thanks for this submission. I have been looking for editors to handle this submission but unfortunately none of our editors in this domain are currently available to handle this work. Hence I've labelled this issue as waitlisted, which means we will resume the handling/reviewing of this work once one of our related editors becomes available. Thanks for your patience.

@annajenul
Copy link

Submitting author: @annajenul (Anna Jenul)
Repository: https://github.com/NMBU-Data-Science/RENT
Version: 0.0.1
Editor: Pending
Reviewer: Pending
Managing EiC: Kevin M. Moerman

⚠️ JOSS reduced service mode ⚠️

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Author instructions

Thanks for submitting your paper to JOSS @annajenul. Currently, there isn't an JOSS editor assigned to your paper.

@annajenul if you have any suggestions for potential reviewers then please mention them here in this thread (without tagging them with an @). In addition, this list of people have already agreed to review for JOSS and may be suitable for this submission (please start at the bottom of the list).

Editor instructions

The JOSS submission bot @whedon is here to help you find and assign reviewers and start the main review. To find out what @whedon can do for you type:

@whedon commands

From your list of potential reviewers, I suggest yxoos, arunmano121 and maximtrp as reviewers for our submission.

@kyleniemeyer
Copy link

@whedon invite @mikldk as editor

Hi @mikldk, can you edit this submission? The author has provided a few reviewer recommendations above.

@whedon
Copy link
Author

whedon commented May 21, 2021

@mikldk has been invited to edit this submission.

@mikldk
Copy link

mikldk commented May 25, 2021

@whedon assign me as editor

@whedon
Copy link
Author

whedon commented May 25, 2021

OK, the editor is @mikldk

@mikldk
Copy link

mikldk commented May 25, 2021

@annajenul Thanks for your submission. (I've actually been to NMBU in Ås a few times -- lovely place!)

I have a few questions. They are more related to the method than the software as such, but they may be relevant for the documentation and paper, so I pose them now before finding reviewers.

  • The paper and README mention 'unique subsets of the data' -- is this some sort of bagging?
  • The paper says 'trains K independent elastic net regularized models on distinct subsets of the train dataset'.
    • I am not sure how these subsets are selected, can you mention this briefly somewhere (maybe both in paper and README)?
    • I would doubt that the K subsets are (statistically) independent as you write. Can you support this statement somehow?
  • Have you considered also using feature bagging? If using both bagging and feature bagging, the method will be similar to that of random forests except using elastic nets instead of decision trees. I am not aware if something like this have been tried before nor whether there are any advantages/disadvantages. I was just curios that you do not seem to mention 'bagging' nor 'random forests' in the paper (nor the arXiv paper).

@mikldk mikldk removed the waitlisted Submissions in the JOSS backlog due to reduced service mode. label May 25, 2021
@annajenul
Copy link

@mikldk Thank you for your interesting inputs. Glad to hear that you liked your stay at NMBU. The campus is really nice now in the spring / summertime.

Regarding the selection of the K subsets from the training data: we use the scikit-learn train_test_split function K times, every time using a different value for random_state. This gives us K unique subsets of the training data, however without any sample duplicating (compared to what you might get from bootstrapping). We use the term 'unique' in the sense that none of the K subsets has exactly the same training samples.
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html#sklearn.model_selection.train_test_split

In other words, RENT does not use bagging, nor any type of bootstrap. Instead, each training subsample is drawn independently and without replacement from the full training dataset, which means that each sample can appear at most once in a single subset. Nevertheless, the same sample can appear in more than one subset. Therefore, each subset is an iid sample from the training dataset. We will adapt this in the README and the paper to make it more understandable.

Further, we consider to include a bootstrap option as alternative sampling strategy for future work, along with other adaptations, such as with additional classifiers instead of logistic regression. Another option is to apply RENT as a data preprocessing step to tree-based methods which often improved their performance in presence of noisy high cardinal features in the data. In this way we avoided introduction of bias in the tree-based models by removing those noisy high cardinal features (https://explained.ai/rf-importance/index.html).

Although interesting, we have not considered feature bagging, since one ouf our selection criteria (tau_1) is based on counting how often elastic net selects each feature from the full set of features. But this could be an interesting path to follow in future work.

@mikldk
Copy link

mikldk commented May 31, 2021

@arunmano121, @maximtrp: Would you be interested in reviewing this submission to The Journal of Open Source Software? Reviews are open and based on a checklist. The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. If you have any questions or concerns please let me know.

@maximtrp
Copy link

@mikldk Yeah, I'd be glad to! Thank you!

@mikldk
Copy link

mikldk commented May 31, 2021

@whedon assign @maximtrp as reviewer

@whedon
Copy link
Author

whedon commented May 31, 2021

OK, @maximtrp is now a reviewer

@arunmano121
Copy link

arunmano121 commented May 31, 2021 via email

@arunmano121
Copy link

arunmano121 commented May 31, 2021 via email

@mikldk
Copy link

mikldk commented Jun 1, 2021

@whedon add @arunmano121 as reviewer

@whedon whedon assigned arunmano121, maximtrp and mikldk and unassigned maximtrp Jun 1, 2021
@whedon
Copy link
Author

whedon commented Jun 1, 2021

OK, @arunmano121 is now a reviewer

@mikldk
Copy link

mikldk commented Jun 1, 2021

@whedon start review

@whedon
Copy link
Author

whedon commented Jun 1, 2021

OK, I've started the review over in #3323.

@whedon whedon closed this as completed Jun 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants