`astartes` #120

JacksonBurns · 2023-07-10T22:12:51Z

Submitting Author: Name (@JacksonBurns)
All current maintainers: (@kspieks, @himaghna)
Package Name: astartes
One-Line Description of Package: Better Data Splits for Machine Learning
Repository Link: https://github.com/JacksonBurns/astartes
Version submitted: v1.1.2
Editor: @cmarmo
Reviewer 1: @BerylKanali
Reviewer 2: @du-phan
Archive:
Version accepted: v1.1.3
JOSS DOI:
Date accepted (month/day/year): 10/15/2023

Code of Conduct & Commitment to Maintain Package

I agree to abide by pyOpenSci's Code of Conduct during the review process and in maintaining my package after should it be accepted.
I have read and will commit to package maintenance after the review as per the pyOpenSci Policies Guidelines.

Description

Include a brief paragraph describing what your package does:

note: this is a selection from the abstract of the JOSS paper

Machine Learning (ML) has become an increasingly popular tool to accelerate traditional workflows. Critical to the use of ML is the process of splitting datasets into training, validation, and testing subsets that are used to develop and evaluate models. Common practice in the literature is to assign these subsets randomly. Although this approach is fast and efficient, it only measures a model's capacity to interpolate. Testing errors from random splits may be overly optimistic if given new data that is dissimilar to the scope of the training set; thus, there is a growing need to easily measure performance for extrapolation tasks. To address this issue, we report astartes, an open-source Python package that implements many similarity- and distance-based algorithms to partition data into more challenging splits. Separate from astartes, users can then use these splits to better assess out-of-sample performance with any ML model of choice.

Scope

Please indicate which category or categories.
Check out our package scope page to learn more about our
scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):
- Data retrieval
- Data extraction
- Data processing/munging
- Data deposition
- Data validation and testing
- Data visualization¹
- Workflow automation
- Citation management and bibliometrics
- Scientific software wrappers
- Database interoperability

Domain Specific & Community Partnerships

- [ ] Geospatial
- [ ] Education
- [ ] Pangeo

Community Partnerships

If your package is associated with an
existing community please check below:

Pangeo
- My package adheres to the Pangeo standards listed in the pyOpenSci peer review guidebook

For all submissions, explain how the and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):
- Who is the target audience and what are scientific applications of this package?

The target audience is data scientists, machine learning scientists, and domain scientists using machine learning. The applications of astartes include rigorous ML model validation, automated featurization of chemical data (with flexibility to add others, and instructions for doing so), and reproducibility.

Are there other Python packages that accomplish the same thing? If so, how does yours differ?

We position astartes as a replacement to scikit-learn's provides train_test_split function, but with greater flexibility for sampling algorithms, and availability of train_val_test_split for more rigorous validation.

If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted:

N/A

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

does not violate the Terms of Service of any service it interacts with.
uses an OSI approved license.
contains a README with instructions for installing the development version.
includes documentation with examples for all functions.
contains a tutorial with examples of its essential functions and uses.
has a test suite.
has continuous integration setup, such as GitHub Actions CircleCI, and/or others.

Publication Options

Do you wish to automatically submit to the Journal of Open Source Software? If so:

JOSS Checks

The package has an obvious research application according to JOSS's definition in their submission requirements. Be aware that completing the pyOpenSci review process does not guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS.
The package is not a "minor utility" as defined by JOSS's submission requirements: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria.
The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
on a separate joss-paper branch
The package is deposited in a long-term repository with the DOI: 10.5281/zenodo.7884532

Note: JOSS accepts our review as theirs. You will NOT need to go through another full review. JOSS will only review your paper.md file. Be sure to link to this pyOpenSci issue when a JOSS issue is opened for your package. Also be sure to tell the JOSS editor that this is a pyOpenSci reviewed package once you reach this step.

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

Confirm each of the following by checking the box.

I have read the author guide.
I expect to maintain this package for at least 2 years and can help find a replacement for the maintainer (team) if needed.

Please fill out our survey

Last but not least please fill out our pre-review survey. This helps us track
submission and improve our peer review process. We will also ask our reviewers
and editors to fill this out.

P.S. Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

The editor template can be found here.

The review template can be found here.

Please fill out a pre-submission inquiry before submitting a data visualization package. ↩

The text was updated successfully, but these errors were encountered:

cmarmo · 2023-07-11T20:52:10Z

Hi @JacksonBurns,
Thanks for submitting astartes to pyOpenSci and welcome to our submission procedure!
I'm Chiara and I will take care of the process as Editor.

After checking with the editorial board, I'm glad to confirm that your package is considered in scope with a pyOpenSci submission.
I will perform the editor checks by the end of the week, and start looking for reviewers.

In the meanwhile I have a first question for you, if you don't mind.
While the statistical side of the package is clear and answers to a well known need of the scikit-learn user community, I would like to read a bit more about the package connections with chemical engineering. I guess your scientific application was the motivation for this development in the first place and in the documentation the two aspects look quite unrelated. Perhaps a small pedagogical section about chemical applications in the documentation will also be helpful? Thanks!

JacksonBurns · 2023-07-12T12:55:09Z

@cmarmo thanks for the quick response and intro!

Thank you for sharing the update about scope and reviewers.

We will add a section to the documentation about applications to chemical engineering problems in particular - thank you for this suggestion.

cmarmo · 2023-07-13T23:58:32Z

Hello @JacksonBurns,

Below are the basic checks that your package needs to pass to begin our review.

Please check our Python packaging guide for more information on the elements below.

Initial onboarding survey was filled out
We appreciate each maintainer of the package filling out this survey individually. 🙌
Thank you authors in advance for setting aside five to ten minutes to do this. It truly helps our organization. 🙌

Editor comments

The package is already in excellent shape, congratulations and thanks to all the co-authors for your hard work!
I have some comments, they are not meant to be blockers for the review process, just suggestions for possible improvements.

It seems to me that the introduction about splitting algorithms is missing some references for those, like me, not really into chemical engineering. In particular I looked for the original work defining the Kennard-Stone algorithm (Kennard, R.W., and Stone, L.A., 1969. Computer aided design of experiments. Technometrics 11, 137-148) and I would like to see it cited in your documentation. The other algorithms also would benefit from some more references, when possible openly accessible, as unfortunately the links to most of the articles are pay-walled.
Some external links are not correctly rendered ... I'm afraid I couldn't find a solution for backquoted text with external
link in sphinx... but I think having the link rendered is more readable. (see for example
``AIMSim` <https://vlachosgroup.github.io/AIMSim/README.html>`_ can be replaced by `AIMSim <https://vlachosgroup.github.io/AIMSim/README.html>`_)
Having jupyter notebooks in the documentation helps the user to understand how to use the package: however, I wasn't able to run the binder links (because of the initialization failing with the error "no space left on device").
I know Binder is downsizing, if you don't have alternatives perhaps it would be safer to just remove the links, in order to not unnecessarily pressure the system.
The CONTRIBUTING.md file is clear and complete. As users are explicitly asked to provide the package version when opening an issue, I'm suggesting to define an astartes.__version__ attribute and to detaile how to obtain the information when debugging.
The API section of the documentation would benefit from more examples: the user discovers in it that the package contains a lot more than the train-test-split functions, but it is not always clear how to exploit all this tools.

I think this package may be relevant for the scikit-learn user community, perhaps you might be interested in linking astartes in the related project page among the preprocessor tools, once the review process has come to an end.

I'm going to look for the two reviewers now (good news, I already found one! :) ).
I will be back to you in a week (hopefully sooner), to check if we are able to start the process.

JacksonBurns · 2023-07-14T14:15:20Z

Hi @cmarmo thanks for the speedy work!

Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

I have just turned on the Zenodo automatic packaging releases (which I also just learned was a thing) so hopefully this doesn't fall behind again! I have also uploaded the current version (1.1.1) to Zenodo manually.

The package is already in excellent shape, congratulations and thanks to all the co-authors for your hard work! I have some comments, they are not meant to be blockers for the review process, just suggestions for possible improvements.

Thank you for the kind words!

It seems to me that the introduction about splitting algorithms is missing some references for those, like me, not really into chemical engineering. In particular I looked for the original work defining the Kennard-Stone algorithm (Kennard, R.W., and Stone, L.A., 1969. Computer aided design of experiments. Technometrics 11, 137-148) and I would like to see it cited in your documentation. The other algorithms also would benefit from some more references, when possible openly accessible, as unfortunately the links to most of the articles are pay-walled.

The reference for the Kennard-Stone algorithm is in the Implemented Sampling Algorithms table but it is not at all obvious what the links in the References column actually are. We will clarify this, as well as add references where missing.

As far as accessible articles, we can also try and find open-access versions of those which are pay-walled.

Some external links are not correctly rendered ... I'm afraid I couldn't find a solution for backquoted text with external
link in sphinx... but I think having the link rendered is more readable. (see for example
AIMSim <https://vlachosgroup.github.io/AIMSim/README.html>_ can be replaced by AIMSim <https://vlachosgroup.github.io/AIMSim/README.html>_ ``)

👍 will do.

Having jupyter notebooks in the documentation helps the user to understand how to use the package: however, I wasn't able to run the binder links (because of the initialization failing with the error "no space left on device").
I know Binder is downsizing, if you don't have alternatives perhaps it would be safer to just remove the links, in order to not unnecessarily pressure the system.

We can definitely do that. We will perhaps try to migrate these to target Google Colab.

The CONTRIBUTING.md file is clear and complete. As users are explicitly asked to provide the package version when opening an issue, I'm suggesting to define an astartes.__version__ attribute and to detaile how to obtain the information when debugging.

👍 sounds good!

The API section of the documentation would benefit from more examples: the user discovers in it that the package contains a lot more than the train-test-split functions, but it is not always clear how to exploit all this tools.

👍 will do.

I think this package may be relevant for the scikit-learn user community, perhaps you might be interested in linking astartes in the related project page among the preprocessor tools, once the review process has come to an end.

Didn't know this existed, thanks for sharing! We will do that.

I'm going to look for the two reviewers now (good news, I already found one! :) ). I will be back to you in a week (hopefully sooner), to check if we are able to start the process.

🎉

JacksonBurns · 2023-07-14T17:38:08Z

@cmarmo quick question - we are getting these suggested changes made now (JacksonBurns/astartes#147). Since we submitted astartes v1.1.1, can we go ahead and merge into main, or will that interfere with the review process?

cmarmo · 2023-07-14T20:41:19Z

Since we submitted astartes v1.1.1, can we go ahead and merge into main, or will that interfere with the review process?

Would you be ok with making a quick release of the changes before the review starts and update the description? As far as the version does not change during the review process I think this is not an issue. Thanks for addressing the comments already!

JacksonBurns · 2023-07-15T00:00:03Z

That sounds - we will get the changes in and make a release for the reviewers (and update the description) asap. Thanks!

JacksonBurns · 2023-07-17T14:19:02Z

Hi @cmarmo we have completed the edits mentioned above and pushed a new version (v1.1.2) and updated the description for this PR. Thanks again!

cmarmo · 2023-07-20T08:16:08Z

Hi @JacksonBurns , thank you for your new release: I've still spotted some small issues in the documentation, but I think it's better to move forward with the review.
I'm stll looking for the second reviewer. I hope to find someone by the end of the week. Thank you for your patience.

cmarmo · 2023-08-01T21:41:31Z

Hello @JacksonBurns , I'm finally back! Thank you for your patience!
Sorry it took more time than expected.

@BerylKanali, @du-phan welcome! 👋
Thank you for volunteering to review for pyOpenSci!

Please take some time to introduce yourself here before starting the review.

Please fill out our pre-review survey

Before beginning your review, please fill out our pre-review survey. This helps us improve all aspects of our review and better understand our community. No personal data will be shared from this survey - it will only be used in an aggregated format by our Executive Director to improve our processes and programs.

Beryl Kanali survey completed.
Du Phan survey completed.

The following resources will help you complete your review:

Here is the reviewers guide. This guide contains all of the steps and information needed to complete your review.
Here is the review template that you will need to fill out and submit
here as a comment, once your review is complete.

Note that we ask reviewers to complete reviews in three weeks, the review for astartes is then due for August 22nd, please let me know if you have any major issue preventing to meet this deadline.
Please get in touch with any questions or concerns!

Thanks again for your commitment!

du-phan · 2023-08-07T08:42:59Z

Hi team,
Happy to help with this review! I'm a research scientist at Dataiku, a software company developing a data science platform. We are part of the scikit-learn consortium and have collaborated with them for quite some time. We also developed and maintained our own os packages for some specific ML use cases, for example Cardinal for active learning.

BerylKanali · 2023-08-07T12:11:17Z

Hi everyone,

My name is Beryl Kanali, I am a Data Scientist and Open Source Advocate. I am also currently a graduate student. I like contributing to open source and I am happy to be a reviewer for astartes.

@cmarmo I have completed the pre-review survey.

JacksonBurns · 2023-08-07T14:19:57Z

Thanks @du-phan and @BerylKanali for volunteering your time for this review, we appreciate it! Continued thanks to @cmarmo as well for orchestrating the review process.

We look forward to seeing your reviews.

p.s. @du-phan I hear about Dataiku all the time - you are one of the key sponsors to my hometown National Public Radio Station WVXU!

BerylKanali · 2023-08-21T09:20:39Z

Hi team,
I am almost done with the review process, I will be done by end of tomorrow which is the deadline .

du-phan · 2023-08-24T13:16:54Z

Hi Jackson,

Sorry that my review takes a while, august is always a slow month.

Thank you for your contribution! It is a nice package.

Package Review

As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

A statement of need clearly stating problems the software is designed to solve and its target audience in README:
Installation instructions: for the development version of the package and any non-standard dependencies in README.
Vignette(s) demonstrating major functionality that runs successfully locally.
Function Documentation: for all user-facing functions.
Examples for all user-facing functions.
Community guidelines including contribution guidelines in the README or CONTRIBUTING.
Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

Readme file requirements
The package meets the readme requirements below:

Package has a README.md file in the root directory.

The README should include, from top to bottom:

The package name
Badges for:
- Continuous integration and test coverage,
- Docs building (if you have a documentation website),
- A repostatus.org badge,
- Python versions supported,
- Current package version (on PyPI / Conda).

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

Short description of package goals.
Package installation instructions
Any additional setup required to use the package (authentication tokens, etc.)
Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file.
- Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear)
Link to your documentation website.
If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
Citation information

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.
Package structure should follow general community best-practices. In general please consider whether:

Package documentation is clear and easy to find and use.
The need for the package is clear
All functions have documentation and associated examples for use
The package is easy to install

Functionality

For packages also submitting to JOSS

The package has an obvious research application according to JOSS's definition in their submission requirements.

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

A short summary describing the high-level functionality of the software
Authors: A list of authors with their affiliations
A statement of need clearly stating problems the software is designed to solve and its target audience.
References: With DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 4-5

Review Comments:

Overview

The astartes package tackles an important step in the Machine Learning pipeline that is usually overlooked: generating robust data for training and validating the model performance. It provides several strategies to go beyond the classical random sampling and thus allows data practitioner to evaluate their model under different scenarios of interpolation/extrapolation.

Usage

The two main functions are train_test_split() and train_val_test_split(), which are intuitive enough for users as they just need to swap their sklean's function by those. That's great!
However, the devil is in the details, depending on the kind of sampler used (extrapolative vs interpolative), the number of returned arrays is not the same (as there can be cluster ids that are returned too). As this point is not clear in the doc, users can get confused (I was).
Some method does not support categorical features, which is understandable when understanding how the sampler works, but maybe we can add those details somewhere in the doc.
Being able to use the samplers directly is a nice addition!
The generate_regression_results_dict is a helpful util function, but it looks a bit "experimental" to me as 1) it just supports the regression model and 2) users cannot choose the desired metrics. It feels like with a bit of work we can have something more generic.

Documentation

The statement of needs is lacking in the README, maybe you can add the one in the paper to it?
In the doc, the difference between extrapolative and interpolative sampling is discussed quite briefly. It is by working through the split_comparison notebook that I understood the "weird" behaviors of train_test_split() that I observed above. Maybe we can make this information easier for users to find out.

JacksonBurns · 2023-08-31T20:25:16Z

Hi @du-phan thank you for the review!

However, the devil is in the details, depending on the kind of sampler used (extrapolative vs interpolative), the number of returned arrays is not the same (as there can be cluster ids that are returned too). As this point is not clear in the doc, users can get confused (I was).

We have clarified this in the README, and in doing so referenced the demonstration notebook which you mentioned was useful as an explainer.

Some method does not support categorical features, which is understandable when understanding how the sampler works, but maybe we can add those details somewhere in the doc.

👍 added details to the docs!

The generate_regression_results_dict is a helpful util function, but it looks a bit "experimental" to me as 1) it just supports the regression model and 2) users cannot choose the desired metrics. It feels like with a bit of work we can have something more generic.

We have added the ability to pass custom metrics to the function, and have ideas about adding a more generic generate_results_dict function which would be wrapped by generate_regression_results_dict and generate_classification_results_dict - although we tend to focus on regression.

The statement of needs is lacking in the README, maybe you can add the one in the paper to it?

👍 added!

The changes I have described here are in this draft pull request on astartes if you would like to take a look to see if they satisfy your comments here. After the review from @BerylKanali we will convert this into a normal Pull Request to finish the review process.

Thanks again!

BerylKanali · 2023-09-06T12:06:08Z

Hi Jackson,

Thank you for your patience, I am now settled and here is my review.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

A statement of need clearly stating problems the software is designed to solve and its target audience in README.
Installation instructions: for the development version of the package and any non-standard dependencies in README.
Vignette(s) demonstrating major functionality that runs successfully locally.
Function Documentation: for all user-facing functions.
Examples for all user-facing functions.
Community guidelines including contribution guidelines in the README or CONTRIBUTING.
Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

Readme file requirements
The package meets the readme requirements below:

Package has a README.md file in the root directory.

The README should include, from top to bottom:

The package name
Badges for:
- Continuous integration and test coverage,
- Docs building (if you have a documentation website),
- A repostatus.org badge,
- Python versions supported,
- Current package version (on PyPI / Conda).

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

Short description of package goals.
Package installation instructions
Any additional setup required to use the package (authentication tokens, etc.)
Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file.
- Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear)
Link to your documentation website.
If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
Citation information

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.
Package structure should follow general community best-practices. In general please consider whether:

Package documentation is clear and easy to find and use.
The need for the package is clear
All functions have documentation and associated examples for use
The package is easy to install

Functionality

For packages also submitting to JOSS

The package has an obvious research application according to JOSS's definition in their submission requirements.

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

A short summary describing the high-level functionality of the software
Authors: A list of authors with their affiliations
A statement of need clearly stating problems the software is designed to solve and its target audience.
References: With DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 4hours 40 minutes

Review Comments

As a person who has used sklearn so many times, I am impressed by the astartes package. Being able quickly analyze the impact of sampling algorithms and to evaluate a model even futher under different interpoolation/extrapolation cases is an important step that will help in machine learning.

Documentation

The statement of need is missing. A user should be able to know why they need the package.
However, I have gone through the draft pull request as well and I believe the need has been adrressed and I can see two clear needs which is good.
I also think it is better to have a clear target audience, with that statement of need we should be able to answer if the package is useful in machine learning generally or only those applying it in chemical data work. This is mostly helpful to beginners when they are trying to know if the package is useful to them or not. That is not clear yet
Any plans on adding vignettes?
Suggestion: In future you can add small sections of installation in windows, MacOS and linux like what you have done with pip and conda in the documentation, it has proven to help especially in promoting inclusion of underrepresented persons whose first language is not english, It goes directly to what they might need reducing what they might have to read through.

Functionality
The package installation was easy and as per the instructions. The functional and performance claims are satisfied.

Usability
The functions are easy to use and remember especially if you have used sklearn before. It is also great that

Do all methods support categorical data? This has been addressed in the draft PR and it is easier now to know which methods support it and how to use them.
Are there any suggestions or ongoing work to improve generate_regression_results_dict to make it more generic. Also is there adequate information on what it can presently do?
From the draft PR I can see the issue on number of returned arrays as mentioned by @du-phan.

cmarmo · 2023-09-06T14:33:13Z

Thank you Du and Beryl for your work!
I have changed the status of the issue as "awaiting changes".
@JacksonBurns do you think a three weeks deadline will allow you to address the comments?

JacksonBurns · 2023-09-22T14:41:08Z

Hmm, I see. Yes, since you import all samplers in that util, and that util is used by main, it will slow down most public imports of this package. It might be best to leave it as is. There is a workaround, where the factory does only imports needed within the get_samplet method, rather than taking that hit up front for all possibilities when the required samplers aren't yet known. But that might be too confusing to maintain. Plus doing an import during instance creation will slow it down a lot (still faster overall execution time, but the specific factory calls could be noticed to be slow).

So the alternative would be something like:

class SamplerFactory
   def get_sampler_1():
      from sampler_source import sampler_1
     return sampler_1()

I think my auto-formatters would be mad at that - I wonder why the default suggestion is to always have imports at the top? Agreed that regardless, this would have some maintenance and runtime costs. I've learned something, though!

JacksonBurns · 2023-09-22T14:42:00Z

Having smaller and focused pull requests will simplify the work of the reviewers looking specifically to the modifications addressing their comments. Do you think this is doable in a week? Thank you very much for your time!

Happy to do that - will go about it now.

JacksonBurns · 2023-09-22T15:11:10Z

@cmarmo, I have split the review responses into two small, easier to review pull requests. All updates to the documentation are in JacksonBurns/astartes#154 (statement of need, example updates, etc.) and fixes to the source code (single-source package version, remove pointless statements, etc.) are in JacksonBurns/astartes#155. I hope this is better - I can subdivide further if it would be helpful.

cmarmo · 2023-09-24T14:42:44Z

Thank you @JacksonBurns , this is already more reviewer friendly... :)
Let's see if @du-phan and @BerylKanali are happy with the changes then.
Dear reviewers, do you mind checking the PRs and let us know in the next two weeks if they address your comments?
Thank you!

cmarmo · 2023-10-08T15:13:50Z

Hello @du-phan and @BerylKanali , do you mind letting us know if Jackson addressed all your concerns in JacksonBurns/astartes#154 and JacksonBurns/astartes#155?
Thank you so much for yur collaboration! 🙏

BerylKanali · 2023-10-11T11:27:55Z

Hi @cmarmo and @JacksonBurns thanks for the reminder.
I have looked at both PR's.
Documentation
Good job on addressing the concerns and naming the commits in a way that is easy to identify what was being worked on. I have also gone through the README and the documentation page.
I believe all my concerns have been addressed.

Code
I have gone through the commits and the issues they address and I believe you have addressed and explained everything well. I have no additional concerns on the code or functionality.

I have updated the checklist above.

Good job and congratulations on a job well done!

JacksonBurns · 2023-10-11T13:24:33Z

Thank you @BerylKanali!

@cmarmo I will wait for approval from @du-phan before merging any changes!

du-phan · 2023-10-11T13:54:27Z

Hi @JacksonBurns,

The new document and code update are in great shape, thank you for your time and contribution!

I have updated the check list in my initial comment!

JacksonBurns · 2023-10-11T14:03:02Z

Thanks @du-phan!

@cmarmo can I go ahead and merge the two PRs?

cmarmo · 2023-10-11T14:04:28Z

can I go ahead and merge the two PRs?

Sure! go ahead and congratulations!
I'm going to finalize the editor's checks in the next days.

This pull request contains updates to the documentation as part of the PyOpenSci review (see pyOpenSci/software-submission#120). Each commit message contains additional clarifying details.

This pull request contains updates to the source code as part of the PyOpenSci review (see pyOpenSci/software-submission#120). Each commit message contains additional clarifying details.

cmarmo · 2023-10-15T14:04:32Z

Dear @JacksonBurns,

🎉 astartes has been approved by pyOpenSci! Thank you for your submission and many thanks to @BerylKanali and @du-phan for reviewing this package! 😸
Maintainer and reviewers, do you mind filling out the post-review survey?

The actions needed to finalize this review are detailed below.

Author Wrap Up Tasks

There are a few things left to do to wrap up this submission:

Activate Zenodo watching the repo if you haven't already done so.
Tag and create a release to create a Zenodo version and DOI.
Add the badge for pyOpenSci peer-review to the README.md of . The badge should be [![pyOpenSci](https://tinyurl.com/y22nb8up)](https://github.com/pyOpenSci/software-review/issues/issue-number).
Please fill out the post-review survey. All maintainers and reviewers should fill this out.

It looks like you would like to submit this package to JOSS. Here are the next steps:

Once the JOSS issue is opened for the package, we strongly suggest that you subscribe to issue updates. This will allow you to continue to update the issue labels on this review as it goes through the JOSS process.
Login to the JOSS website and fill out the JOSS submission form using your Zenodo DOI. When you fill out the form, be sure to mention and link to the approved pyOpenSci review. JOSS will tag your package for expedited review if it is already pyOpenSci approved.
Wait for a JOSS editor to approve the presubmission (which includes a scope check).
Once the package is approved by JOSS, you will be given instructions by JOSS about updating the citation information in your README file.
When the JOSS review is complete, add a comment to your review in the pyOpenSci software-review repo here that it has been approved by JOSS. An editor will then add the JOSS-approved label to this issue.

Editor Final Checks

Please complete the final steps to wrap up this review. Editor, please do the following:

Make sure that the maintainers filled out the post-review survey
Invite the maintainers to submit a blog post highlighting their package. Feel free to use / adapt language found in this comment to help guide the author.
Change the status tag of the issue to 6/pyOS-approved6 🚀🚀🚀.
If the author submits to JOSS, please continue to update the labels for JOSS on this issue until the author is accepted (do not remove the 6/pyOS-approved label). Once accepted add the label 9/joss-approved to the issue. Skip this check if the package is not submitted to JOSS.

If you have any feedback for us about the review process please feel free to share it here. We are always looking to improve our process and documentation in the peer-review-guide.

JacksonBurns · 2023-10-16T18:23:32Z

Dear @JacksonBurns,

🎉 astartes has been approved by pyOpenSci! Thank you for your submission and many thanks to @BerylKanali and @du-phan for reviewing this package! 😸 Maintainer and reviewers, do you mind filling out the post-review survey?

🎉! @kspieks, @himaghna, and I are filling out the survey!

* [ ]  Activate [Zenodo](https://zenodo.org/) watching the repo if you haven't already done so.

Already done 👍

* [ ]  Tag and create a release to create a Zenodo version and DOI.

* [ ]  Add the badge for pyOpenSci peer-review to the README.md of . The badge should be `[![pyOpenSci](https://tinyurl.com/y22nb8up)](https://github.com/pyOpenSci/software-review/issues/issue-number)`.

Will do!

It looks like you would like to submit this package to JOSS. Here are the next steps:

* [ ]   Once the JOSS issue is opened for the package, we strongly suggest that you subscribe to issue updates. This will allow you to continue to update the issue labels on this review as it goes through the JOSS process.

Will do!

* [ ]  Login to the JOSS website and fill out the JOSS submission form using your Zenodo DOI. **When you fill out the form, be sure to mention and link to the approved pyOpenSci review.** JOSS will tag your package for expedited review if it is already pyOpenSci approved.

👍

I will keep this issue up-to-date as the reviews progress.

Thanks again to @cmarmo, @BerylKanali, and @du-phan for volunteering your time!

cmarmo · 2023-11-12T21:12:16Z

@JacksonBurns congratulations! astartes has been accepted on JOSS!
I have announced it "officially" on our Discourse too... ;)
If you and and your colleagues maintainers have been happy with the pyOpenSci review process fell free to let the world know via a blog or any kind of testimonial ... :).

@BerylKanali and @du-phan thank you once more for your work as reviewers: if you have some time to fill the pos-review-survey this will help us a lot.

I am going to close the issue now.
Let's go celebrate ! 🥳

BerylKanali · 2023-11-13T14:03:26Z

Congratulations! @JacksonBurns
Thank you @cmarmo @lwasser and @ucodery for the opportunity and mentorship as a first time reviewer.

@cmarmo I have responded to the post-review survey.

JacksonBurns added 0/pre-review-checks New Submission! labels Jul 10, 2023

lwasser added this to the package-in-review milestone Jul 11, 2023

cmarmo self-assigned this Jul 11, 2023

JacksonBurns mentioned this issue Jul 12, 2023

pyOpenSci Initial Review Cleanup JacksonBurns/astartes#144

Closed

cmarmo added 2/seeking-reviewer(s) and removed 0/pre-review-checks labels Jul 13, 2023

cmarmo added 3/reviewer(s)-assigned and removed 2/seeking-reviewer(s) labels Aug 1, 2023

cmarmo added 4/review(s)-in-awaiting-changes and removed 3/reviewer(s)-assigned New Submission! labels Sep 6, 2023

This was referenced Sep 22, 2023

PyOpenSci Review - Documentation Updates JacksonBurns/astartes#154

Merged

PyOpenSci Review - Code Updates JacksonBurns/astartes#155

Merged

cmarmo added 5/awaiting-reviewer(s)-response and removed 4/review(s)-in-awaiting-changes labels Sep 24, 2023

cmarmo added 6/pyOS-approved 🚀🚀🚀 and removed 5/awaiting-reviewer(s)-response labels Oct 15, 2023

arfon mentioned this issue Oct 22, 2023

[PRE REVIEW]: Machine Learning Validation via Rational Dataset Sampling with astartes openjournals/joss-reviews#5970

Closed

cmarmo added the 7/under-joss-review label Oct 25, 2023

lwasser mentioned this issue Nov 2, 2023

BUG: when a user adds a link to their github repo pyOpenSci/pyosMeta#74

Open

cmarmo added 9/joss-approved and removed 7/under-joss-review labels Nov 5, 2023

cmarmo closed this as completed Nov 12, 2023

cmarmo modified the milestones: package-in-review, package-review-complete Dec 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`astartes` #120

`astartes` #120

JacksonBurns commented Jul 10, 2023 •

edited by cmarmo

cmarmo commented Jul 11, 2023 •

edited

JacksonBurns commented Jul 12, 2023

cmarmo commented Jul 13, 2023

JacksonBurns commented Jul 14, 2023

JacksonBurns commented Jul 14, 2023

cmarmo commented Jul 14, 2023

JacksonBurns commented Jul 15, 2023

JacksonBurns commented Jul 17, 2023

cmarmo commented Jul 20, 2023

cmarmo commented Aug 1, 2023 •

edited

du-phan commented Aug 7, 2023

BerylKanali commented Aug 7, 2023

JacksonBurns commented Aug 7, 2023

BerylKanali commented Aug 21, 2023

du-phan commented Aug 24, 2023 •

edited

JacksonBurns commented Aug 31, 2023

BerylKanali commented Sep 6, 2023 •

edited

cmarmo commented Sep 6, 2023

JacksonBurns commented Sep 22, 2023

JacksonBurns commented Sep 22, 2023

JacksonBurns commented Sep 22, 2023 •

edited

cmarmo commented Sep 24, 2023

cmarmo commented Oct 8, 2023

BerylKanali commented Oct 11, 2023 •

edited

JacksonBurns commented Oct 11, 2023

du-phan commented Oct 11, 2023

JacksonBurns commented Oct 11, 2023

cmarmo commented Oct 11, 2023

cmarmo commented Oct 15, 2023 •

edited

JacksonBurns commented Oct 16, 2023

cmarmo commented Nov 12, 2023

BerylKanali commented Nov 13, 2023

astartes #120

astartes #120

Comments

JacksonBurns commented Jul 10, 2023 • edited by cmarmo

Code of Conduct & Commitment to Maintain Package

Description

Scope

Community Partnerships

Technical checks

Publication Options

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

Please fill out our survey

Editor and Review Templates

Footnotes

cmarmo commented Jul 11, 2023 • edited

JacksonBurns commented Jul 12, 2023

cmarmo commented Jul 13, 2023

Editor comments

JacksonBurns commented Jul 14, 2023

JacksonBurns commented Jul 14, 2023

cmarmo commented Jul 14, 2023

JacksonBurns commented Jul 15, 2023

JacksonBurns commented Jul 17, 2023

cmarmo commented Jul 20, 2023

cmarmo commented Aug 1, 2023 • edited

Please fill out our pre-review survey

du-phan commented Aug 7, 2023

BerylKanali commented Aug 7, 2023

JacksonBurns commented Aug 7, 2023

BerylKanali commented Aug 21, 2023

du-phan commented Aug 24, 2023 • edited

Package Review

Documentation

Usability

Functionality

For packages also submitting to JOSS

Final approval (post-review)

Review Comments:

Overview

Usage

Documentation

JacksonBurns commented Aug 31, 2023

BerylKanali commented Sep 6, 2023 • edited

Package Review

Documentation

Usability

Functionality

For packages also submitting to JOSS

Final approval (post-review)

Review Comments

cmarmo commented Sep 6, 2023

JacksonBurns commented Sep 22, 2023

JacksonBurns commented Sep 22, 2023

JacksonBurns commented Sep 22, 2023 • edited

cmarmo commented Sep 24, 2023

cmarmo commented Oct 8, 2023

BerylKanali commented Oct 11, 2023 • edited

JacksonBurns commented Oct 11, 2023

du-phan commented Oct 11, 2023

JacksonBurns commented Oct 11, 2023

cmarmo commented Oct 11, 2023

cmarmo commented Oct 15, 2023 • edited

Author Wrap Up Tasks

Editor Final Checks

JacksonBurns commented Oct 16, 2023

cmarmo commented Nov 12, 2023

BerylKanali commented Nov 13, 2023

`astartes` #120

`astartes` #120

JacksonBurns commented Jul 10, 2023 •

edited by cmarmo

cmarmo commented Jul 11, 2023 •

edited

cmarmo commented Aug 1, 2023 •

edited

du-phan commented Aug 24, 2023 •

edited

BerylKanali commented Sep 6, 2023 •

edited

JacksonBurns commented Sep 22, 2023 •

edited

BerylKanali commented Oct 11, 2023 •

edited

cmarmo commented Oct 15, 2023 •

edited