[PRE REVIEW]: RENT: A Python Package for Repeated Elastic Net Feature Selection #3234

whedon · 2021-04-30T21:18:32Z

Submitting author: @annajenul (Anna Jenul)
Repository: https://github.com/NMBU-Data-Science/RENT
Version: 0.0.1
Editor: @mikldk
Reviewers: @maximtrp, @arunmano121
Managing EiC: Kevin M. Moerman

⚠️ JOSS reduced service mode ⚠️

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Author instructions

Thanks for submitting your paper to JOSS @annajenul. Currently, there isn't an JOSS editor assigned to your paper.

@annajenul if you have any suggestions for potential reviewers then please mention them here in this thread (without tagging them with an @). In addition, this list of people have already agreed to review for JOSS and may be suitable for this submission (please start at the bottom of the list).

Editor instructions

The JOSS submission bot @whedon is here to help you find and assign reviewers and start the main review. To find out what @whedon can do for you type:

@whedon commands

The text was updated successfully, but these errors were encountered:

whedon · 2021-04-30T21:18:35Z

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks.

⚠️ JOSS reduced service mode ⚠️

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf

whedon · 2021-04-30T21:18:52Z

Failed to discover a Statement of need section in paper

whedon · 2021-04-30T21:19:02Z

Software report (experimental):

github.com/AlDanial/cloc v 1.88  T=0.53 s (41.7 files/s, 389620.1 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                           6            446            826           1321
Jupyter Notebook                 3              0         202161            325
Markdown                         2             27              0             73
reStructuredText                 6             59            113             56
TeX                              1              3              0             36
DOS Batch                        1              8              1             27
YAML                             1              5              9             11
make                             1              4              6             10
INI                              1              1              0              9
-------------------------------------------------------------------------------
SUM:                            22            553         203116           1868
-------------------------------------------------------------------------------


Statistical information for the repository '211e886260f7a8ac3a5bbec0' was
gathered on 2021/04/30.
The following historical commit information, by author, was found:

Author                     Commits    Insertions      Deletions    % of changes
Anna Jenul                      35          4473           1999           58.90
Ngoc Huynh                       3            62              4            0.60
Oliver Tomic                    18           429            150            5.27
annajenul                       12          1944           1927           35.23

Below are the number of rows from each author that have survived and are still
intact in the current revision:

Author                     Rows      Stability          Age       % in comments
Ngoc Huynh                   38           61.3          2.6               15.79
Oliver Tomic                188           43.8          1.8               30.85
annajenul                  2367          121.8          2.9                7.14

whedon · 2021-04-30T21:19:16Z

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

whedon · 2021-04-30T21:19:30Z

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.21105/joss.00980 is OK

MISSING DOIs

- None

INVALID DOIs

- None

Kevin-Mattheus-Moerman · 2021-04-30T21:50:17Z

@annajenul Thanks for this submission. I have been looking for editors to handle this submission but unfortunately none of our editors in this domain are currently available to handle this work. Hence I've labelled this issue as waitlisted, which means we will resume the handling/reviewing of this work once one of our related editors becomes available. Thanks for your patience.

annajenul · 2021-05-01T09:47:03Z

Submitting author: @annajenul (Anna Jenul)
Repository: https://github.com/NMBU-Data-Science/RENT
Version: 0.0.1
Editor: Pending
Reviewer: Pending
Managing EiC: Kevin M. Moerman

⚠️ JOSS reduced service mode ⚠️

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Author instructions

Thanks for submitting your paper to JOSS @annajenul. Currently, there isn't an JOSS editor assigned to your paper.

@annajenul if you have any suggestions for potential reviewers then please mention them here in this thread (without tagging them with an @). In addition, this list of people have already agreed to review for JOSS and may be suitable for this submission (please start at the bottom of the list).

Editor instructions

The JOSS submission bot @whedon is here to help you find and assign reviewers and start the main review. To find out what @whedon can do for you type:
@whedon commands

From your list of potential reviewers, I suggest yxoos, arunmano121 and maximtrp as reviewers for our submission.

kyleniemeyer · 2021-05-21T19:39:39Z

@whedon invite @mikldk as editor

Hi @mikldk, can you edit this submission? The author has provided a few reviewer recommendations above.

whedon · 2021-05-21T19:39:41Z

@mikldk has been invited to edit this submission.

mikldk · 2021-05-25T12:15:37Z

@whedon assign me as editor

whedon · 2021-05-25T12:15:41Z

OK, the editor is @mikldk

mikldk · 2021-05-25T12:29:34Z

@annajenul Thanks for your submission. (I've actually been to NMBU in Ås a few times -- lovely place!)

I have a few questions. They are more related to the method than the software as such, but they may be relevant for the documentation and paper, so I pose them now before finding reviewers.

The paper and README mention 'unique subsets of the data' -- is this some sort of bagging?
The paper says 'trains K independent elastic net regularized models on distinct subsets of the train dataset'.
- I am not sure how these subsets are selected, can you mention this briefly somewhere (maybe both in paper and README)?
- I would doubt that the K subsets are (statistically) independent as you write. Can you support this statement somehow?
Have you considered also using feature bagging? If using both bagging and feature bagging, the method will be similar to that of random forests except using elastic nets instead of decision trees. I am not aware if something like this have been tried before nor whether there are any advantages/disadvantages. I was just curios that you do not seem to mention 'bagging' nor 'random forests' in the paper (nor the arXiv paper).

annajenul · 2021-05-27T14:18:49Z

@mikldk Thank you for your interesting inputs. Glad to hear that you liked your stay at NMBU. The campus is really nice now in the spring / summertime.

Regarding the selection of the K subsets from the training data: we use the scikit-learn train_test_split function K times, every time using a different value for random_state. This gives us K unique subsets of the training data, however without any sample duplicating (compared to what you might get from bootstrapping). We use the term 'unique' in the sense that none of the K subsets has exactly the same training samples.
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html#sklearn.model_selection.train_test_split

In other words, RENT does not use bagging, nor any type of bootstrap. Instead, each training subsample is drawn independently and without replacement from the full training dataset, which means that each sample can appear at most once in a single subset. Nevertheless, the same sample can appear in more than one subset. Therefore, each subset is an iid sample from the training dataset. We will adapt this in the README and the paper to make it more understandable.

Further, we consider to include a bootstrap option as alternative sampling strategy for future work, along with other adaptations, such as with additional classifiers instead of logistic regression. Another option is to apply RENT as a data preprocessing step to tree-based methods which often improved their performance in presence of noisy high cardinal features in the data. In this way we avoided introduction of bias in the tree-based models by removing those noisy high cardinal features (https://explained.ai/rf-importance/index.html).

Although interesting, we have not considered feature bagging, since one ouf our selection criteria (tau_1) is based on counting how often elastic net selects each feature from the full set of features. But this could be an interesting path to follow in future work.

mikldk · 2021-05-31T10:48:59Z

@arunmano121, @maximtrp: Would you be interested in reviewing this submission to The Journal of Open Source Software? Reviews are open and based on a checklist. The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. If you have any questions or concerns please let me know.

maximtrp · 2021-05-31T11:01:45Z

@mikldk Yeah, I'd be glad to! Thank you!

mikldk · 2021-05-31T11:03:42Z

@whedon assign @maximtrp as reviewer

whedon · 2021-05-31T11:03:46Z

OK, @maximtrp is now a reviewer

arunmano121 · 2021-05-31T14:55:32Z

@mikldk - yes, I will be happy to review. On May 31, 2021, at 4:02 AM, Maksim Terpilowski ***@***.***> wrote: @mikldk<https://github.com/mikldk> Yeah, I'd be glad to! Thank you! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#3234 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ATAHAZSWERM7Y753C2ZOMNDTQNUCRANCNFSM435QAHBQ>.

arunmano121 · 2021-05-31T14:58:33Z

@mikldk<https://github.com/mikldk> - yes, I will be happy to review. Thanks. On May 31, 2021, at 3:49 AM, Mikkel Meyer Andersen ***@***.******@***.***>> wrote: @arunmano121<https://github.com/arunmano121>, @maximtrp<https://github.com/maximtrp>: Would you be interested in reviewing this submission to The Journal of Open Source Software<https://joss.theoj.org/>? Reviews are open and based on a checklist. The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. If you have any questions or concerns please let me know. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#3234 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ATAHAZUC2RGWFPKBAIM6B6TTQNSSVANCNFSM435QAHBQ>.

mikldk · 2021-06-01T08:06:10Z

@whedon add @arunmano121 as reviewer

whedon · 2021-06-01T08:06:15Z

OK, @arunmano121 is now a reviewer

mikldk · 2021-06-01T08:06:51Z

@whedon start review

whedon · 2021-06-01T08:06:56Z

OK, I've started the review over in #3323.

whedon added the pre-review label Apr 30, 2021

whedon added Python TeX labels Apr 30, 2021

Kevin-Mattheus-Moerman added the waitlisted Submissions in the JOSS backlog due to reduced service mode. label Apr 30, 2021

whedon assigned mikldk May 25, 2021

mikldk removed the waitlisted Submissions in the JOSS backlog due to reduced service mode. label May 25, 2021

whedon unassigned mikldk May 31, 2021

whedon assigned maximtrp and mikldk May 31, 2021

whedon unassigned mikldk Jun 1, 2021

whedon assigned arunmano121, maximtrp and mikldk and unassigned maximtrp Jun 1, 2021

whedon closed this as completed Jun 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PRE REVIEW]: RENT: A Python Package for Repeated Elastic Net Feature Selection #3234

[PRE REVIEW]: RENT: A Python Package for Repeated Elastic Net Feature Selection #3234

whedon commented Apr 30, 2021 •

edited

whedon commented Apr 30, 2021

whedon commented Apr 30, 2021

whedon commented Apr 30, 2021

whedon commented Apr 30, 2021

whedon commented Apr 30, 2021

Kevin-Mattheus-Moerman commented Apr 30, 2021

annajenul commented May 1, 2021

kyleniemeyer commented May 21, 2021

whedon commented May 21, 2021

mikldk commented May 25, 2021

whedon commented May 25, 2021

mikldk commented May 25, 2021

annajenul commented May 27, 2021

mikldk commented May 31, 2021

maximtrp commented May 31, 2021

mikldk commented May 31, 2021

whedon commented May 31, 2021

arunmano121 commented May 31, 2021 via email

arunmano121 commented May 31, 2021 via email

mikldk commented Jun 1, 2021

whedon commented Jun 1, 2021

mikldk commented Jun 1, 2021

whedon commented Jun 1, 2021

[PRE REVIEW]: RENT: A Python Package for Repeated Elastic Net Feature Selection #3234

[PRE REVIEW]: RENT: A Python Package for Repeated Elastic Net Feature Selection #3234

Comments

whedon commented Apr 30, 2021 • edited

whedon commented Apr 30, 2021

whedon commented Apr 30, 2021

whedon commented Apr 30, 2021

whedon commented Apr 30, 2021

whedon commented Apr 30, 2021

Kevin-Mattheus-Moerman commented Apr 30, 2021

annajenul commented May 1, 2021

kyleniemeyer commented May 21, 2021

whedon commented May 21, 2021

mikldk commented May 25, 2021

whedon commented May 25, 2021

mikldk commented May 25, 2021

annajenul commented May 27, 2021

mikldk commented May 31, 2021

maximtrp commented May 31, 2021

mikldk commented May 31, 2021

whedon commented May 31, 2021

arunmano121 commented May 31, 2021 via email

arunmano121 commented May 31, 2021 via email

mikldk commented Jun 1, 2021

whedon commented Jun 1, 2021

mikldk commented Jun 1, 2021

whedon commented Jun 1, 2021

whedon commented Apr 30, 2021 •

edited