Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW]: An open learning resource on Reproducible Data Science with Open-Source Python Tools and Real-World Data #156

Closed
44 tasks done
whedon opened this issue Jan 12, 2022 · 77 comments
Assignees
Labels
accepted CSS HTML Jupyter Notebook published Papers published in JOSE recommend-accept Papers recommended for acceptance in JOSE. review

Comments

@whedon
Copy link

whedon commented Jan 12, 2022

Submitting author: @valdanchev (Valentin Danchev)
Repository: https://github.com/valdanchev/reproducible-data-science-python
Version: v2.1.2
Editor: @ShanEllis
Reviewer: @TomDonoghue, @lechten
Archive: 10.5281/zenodo.6895578

⚠️ JOSE reduced service mode ⚠️

Due to the challenges of the COVID-19 pandemic, JOSE is currently operating in a "reduced service mode".

Status

status

Status badge code:

HTML: <a href="https://jose.theoj.org/papers/3e1de7c74161a5b2c4ce74e536ef6898"><img src="https://jose.theoj.org/papers/3e1de7c74161a5b2c4ce74e536ef6898/status.svg"></a>
Markdown: [![status](https://jose.theoj.org/papers/3e1de7c74161a5b2c4ce74e536ef6898/status.svg)](https://jose.theoj.org/papers/3e1de7c74161a5b2c4ce74e536ef6898)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@TomDonoghue & @lechten, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/jose-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @ShanEllis know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Review checklist for @TomDonoghue

Conflict of interest

Code of Conduct

General checks

  • Repository: Is the source for this learning module available at the repository url?
  • License: Does the repository contain a plain-text LICENSE file with the contents of a standard license? (OSI-approved for code, Creative Commons for content)
  • Version: v2.1.2
  • Authorship: Has the submitting author (@valdanchev) made visible contributions to the module? Does the full list of authors seem appropriate and complete?

Documentation

  • A statement of need: Do the authors clearly state the need for this module and who the target audience is?
  • Installation instructions: Is there a clearly stated list of dependencies?
  • Usage: Does the documentation explain how someone would adopt the module, and include examples of how to use it?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the module 2) Report issues or problems with the module 3) Seek support

Pedagogy / Instructional design (Work-in-progress: reviewers, please comment!)

  • Learning objectives: Does the module make the learning objectives plainly clear? (We don't require explicitly written learning objectives; only that they be evident from content and design.)
  • Content scope and length: Is the content substantial for learning a given topic? Is the length of the module appropriate?
  • Pedagogy: Does the module seem easy to follow? Does it observe guidance on cognitive load? (working memory limits of 7 +/- 2 chunks of information)
  • Content quality: Is the writing of good quality, concise, engaging? Are the code components well crafted? Does the module seem complete?
  • Instructional design: Is the instructional design deliberate and apparent? For example, exploit worked-example effects; effective multi-media use; low extraneous cognitive load.

JOSE paper

  • Authors: Does the paper.md file include a list of authors with their affiliations?
  • A statement of need: Does the paper clearly state the need for this module and who the target audience is?
  • Description: Does the paper describe the learning materials and sequence?
  • Does it describe how it has been used in the classroom or other settings, and how someone might adopt it?
  • Could someone else teach with this module, given the right expertise?
  • Does the paper tell the "story" of how the authors came to develop it, or what their expertise is?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?

Review checklist for @lechten

Conflict of interest

Code of Conduct

General checks

  • Repository: Is the source for this learning module available at the repository url?
  • License: Does the repository contain a plain-text LICENSE file with the contents of a standard license? (OSI-approved for code, Creative Commons for content)
  • Version: v2.1.2
  • Authorship: Has the submitting author (@valdanchev) made visible contributions to the module? Does the full list of authors seem appropriate and complete?

Documentation

  • A statement of need: Do the authors clearly state the need for this module and who the target audience is?
  • Installation instructions: Is there a clearly stated list of dependencies?
  • Usage: Does the documentation explain how someone would adopt the module, and include examples of how to use it?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the module 2) Report issues or problems with the module 3) Seek support

Pedagogy / Instructional design (Work-in-progress: reviewers, please comment!)

  • Learning objectives: Does the module make the learning objectives plainly clear? (We don't require explicitly written learning objectives; only that they be evident from content and design.)
  • Content scope and length: Is the content substantial for learning a given topic? Is the length of the module appropriate?
  • Pedagogy: Does the module seem easy to follow? Does it observe guidance on cognitive load? (working memory limits of 7 +/- 2 chunks of information)
  • Content quality: Is the writing of good quality, concise, engaging? Are the code components well crafted? Does the module seem complete?
  • Instructional design: Is the instructional design deliberate and apparent? For example, exploit worked-example effects; effective multi-media use; low extraneous cognitive load.

JOSE paper

  • Authors: Does the paper.md file include a list of authors with their affiliations?
  • A statement of need: Does the paper clearly state the need for this module and who the target audience is?
  • Description: Does the paper describe the learning materials and sequence?
  • Does it describe how it has been used in the classroom or other settings, and how someone might adopt it?
  • Could someone else teach with this module, given the right expertise?
  • Does the paper tell the "story" of how the authors came to develop it, or what their expertise is?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
@whedon
Copy link
Author

whedon commented Jan 12, 2022

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @TomDonoghue it looks like you're currently assigned to review this paper 🎉.

⚠️ JOSS reduced service mode ⚠️

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

⭐ Important ⭐

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/jose-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/jose-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf

@whedon
Copy link
Author

whedon commented Jan 12, 2022

Wordcount for paper.md is 1603

@whedon
Copy link
Author

whedon commented Jan 12, 2022

Software report (experimental):

github.com/AlDanial/cloc v 1.88  T=1.30 s (88.3 files/s, 108947.1 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
HTML                            22           1735            128          37365
SVG                              6              0             13           9323
JavaScript                      16           2342           2391           8757
Jupyter Notebook                32              0          71448           2037
Python                          13           1433            546           1963
CSS                             14            237             96           1321
Markdown                         7             82              0            251
TeX                              2             23              0            238
YAML                             2             16             16             52
JSON                             1              0              0              1
-------------------------------------------------------------------------------
SUM:                           115           5868          74638          61308
-------------------------------------------------------------------------------


Statistical information for the repository '75c25ff7f7e841b795a1a850' was
gathered on 2022/01/12.
The following historical commit information, by author, was found:

Author                     Commits    Insertions      Deletions    % of changes
valdanchev                       5         17466             34          100.00

Below are the number of rows from each author that have survived and are still
intact in the current revision:

Author                     Rows      Stability          Age       % in comments
Valentin Danchev          17432          100.0          0.0               16.85

@whedon
Copy link
Author

whedon commented Jan 12, 2022

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@whedon
Copy link
Author

whedon commented Jan 12, 2022

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1162/99608f92.cb0fa8d2 is OK
- 10.17226/25104 is OK
- 10.1145/3408877.3432586 is OK
- 10.1371/journal.pcbi.1008770 is OK
- 10.1080/09332480.2019.1579578 is OK
- 10.21105/joss.03021 is OK
- 10.18148/srm/2020.v14i2.7746 is OK
- 10.1371/journal.pcbi.1007007 is OK

MISSING DOIs

- 10.1038/d41586-018-07196-1 may be a valid DOI for title: Why Jupyter is data scientists’ computational notebook of choice
- 10.25080/majora-92bf1922-00a may be a valid DOI for title: Data Structures for Statistical Computing in Python
- 10.25080/majora-92bf1922-011 may be a valid DOI for title: statsmodels: Econometric and statistical modeling with python

INVALID DOIs

- None

@TomDonoghue
Copy link

TomDonoghue commented Jan 25, 2022

I've worked through my review, including exploring through the content, and checking I can run some examples through Binder etc. As a note I did skim the notebooks, and read sections, but I did do a full read of the content. I've checked off most things above, with the remainer being left open for small potential updates detailed here.

I have some practical suggestions, that I have described in detail on the resource repository, including:

Other notes:

  • In terms of requirements, I think there is an implicit dependency whereby some notebooks only work on Collab (not Binder), because they load the google module. This is captured in this pre-existing issue. I don't think this is a huge issue, per se, but I think some notes in the notebooks about this kind of extra dependency (what works where and/or what would need to be installed) could be added somewhere.
  • There is not a version identifier that I found, though I admit I'm not entirely sure how versions are supposed to work for this kind of resource
  • Though it's perhaps implicit, for Community Guidelines, an explicit note in the README, and/or textbook landing page about what to do to make a suggestion / report a problem (open an issue, presumably) might be useful, especially for students who might not be clear on standard Github practices.

In terms of the literal content - I don't have as much to say. Based on skimming it all, I think all the content (ideas and concepts) look good to me. It's a bit tricky for me to know exactly how an unfamiliar student would feel about the progression through things. My sense is perhaps that there is a lot introduced quite quickly, and one might need a bit of a sense of basic coding to really be able to jump in - but under the idea of it as a "crash course", I think it broadly works, and is a useful resource to have available.

My one more practical content note is that I think the end_to_end_data_science_project, which in particular feels like jumping in the deep end, could be more explicit that it skims through topics as an introduction (it effectively overviews the course), and that these topics are dealt with in more detail in future notebooks, and that it might make sense for students "jump ahead" to practice these topics there first (at first I even thought maybe this notebook was supposed to come at the end?).

Overall, I think this looks like an interesting resource, with it's strengths being that it:

  • introduces a lot of topics in data science, with a focus on practical elements (working with data, reproducibility, ethics, etc)
  • has a lot of links to other resources to map out other places to learn things
  • uses some real data to demonstrate these things in practice, and is re-runnable by students
  • is focused at a particular area / group of students (social science / sociology) for whom there be fewer dedicated materials

Conclusion: I think if the relatively minor notes I've mentioned can be fixed up a bit, my overall impression is that this is a useful resource, responsive to JOSE's aims and requirements, and after the review edits I'd be happy to sign off on it as a reviewer.

@whedon
Copy link
Author

whedon commented Jan 26, 2022

👋 @TomDonoghue, please update us on how your review is going (this is an automated reminder).

@valdanchev
Copy link

Thank you so much for these very helpful comments @TomDonoghue. Will go through all of your notes and issues and will write back when I have a revised version that addresses them.

@lechten
Copy link

lechten commented Feb 14, 2022

I agree with all points raised by @TomDonoghue above. In summary, this is a valuable and ambitious course/resource.

I’d like to emphasize that I also believe the first notebook to need more descriptions: How might this be presented to students? What is expected of them? What should they be able to do?

For other comments, I opened issues in the source repository.

@valdanchev
Copy link

Thanks so much, @lechten for your very helpful comments and suggestions for improvements. The pointer to nbqa is extremely useful. My teaching-heavy term will be over in a few weeks, and I am looking forward to then revising the resource based on your and @TomDonoghue comments.

@ShanEllis
Copy link

Thanks to both reviewers!

@valdanchev Just checking in on your current expected timeline? (I totally understand the chaos that is the end of teaching-heavy terms...so not rushing just wanted to get a possible timeline established.)

@valdanchev
Copy link

Totally understand @ShanEllis, I'll plan to finalise all the revisions by the end of April if that sounds good.

@valdanchev
Copy link

valdanchev commented May 18, 2022

@ShanEllis thank you for organising the review process, @TomDonoghue and @lechten thank you for your very helpful and constructive feedback and suggestions. The review process was very valuable for improving the resource, repository, and paper. Sorry for my delayed revisions.

I have now made the revisions and addressed all the feedback and comments above as well as the issues #5, #6, #7, #8, #9, #10. Specifically,

I've shortened the title of the resource to "Reproducible Data Science with Python" and would like to also shorten the title of the paper to "Reproducible Data Science with Python: An open learning resource".

Thank you again for the very constructive and transparent review process and feedback.

@lechten
Copy link

lechten commented May 19, 2022

@whedon generate pdf

@whedon
Copy link
Author

whedon commented May 19, 2022

PDF failed to compile for issue #156 with the following error:

 pandoc: paper.bib: openBinaryFile: does not exist (No such file or directory)
Looks like we failed to compile the PDF

@ShanEllis
Copy link

@whedon set 10.5281/zenodo.6895578 as archive

@whedon
Copy link
Author

whedon commented Jul 24, 2022

OK. 10.5281/zenodo.6895578 is the archive.

@ShanEllis
Copy link

@whedon recommend-accept

@whedon whedon added the recommend-accept Papers recommended for acceptance in JOSE. label Jul 24, 2022
@whedon
Copy link
Author

whedon commented Jul 24, 2022

Attempting dry run of processing paper acceptance...

@labarba
Copy link
Member

labarba commented Aug 10, 2022

@whedon generate pdf

1 similar comment
@valdanchev
Copy link

@whedon generate pdf

@whedon
Copy link
Author

whedon commented Aug 10, 2022

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@valdanchev
Copy link

@labarba & @ShanEllis — I think all should work now. There was a problem similar to the one Arfon identified before — a redundant paper.md file was generated after an update just before archiving the repository on Zenodo, — which is now corrected. The pdf can be generated again and hopefully, the other commands would work too.

@ShanEllis
Copy link

@whedon recommend-accept

@whedon
Copy link
Author

whedon commented Aug 23, 2022

Attempting dry run of processing paper acceptance...

@whedon
Copy link
Author

whedon commented Aug 23, 2022

👋 @openjournals/jose-eics, this paper is ready to be accepted and published.

Check final proof 👉 openjournals/jose-papers#101

If the paper PDF and Crossref deposit XML look good in openjournals/jose-papers#101, then you can now move forward with accepting the submission by compiling again with the flag deposit=true e.g.

@whedon accept deposit=true

@whedon
Copy link
Author

whedon commented Aug 23, 2022

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1162/99608f92.cb0fa8d2 is OK
- 10.17226/25104 is OK
- 10.1145/3408877.3432586 is OK
- 10.5281/zenodo.3233853 is OK
- 10.1371/journal.pcbi.1008770 is OK
- 10.3233/978-1-61499-649-1-87 is OK
- 10.1038/d41586-018-07196-1 is OK
- 10.1080/09332480.2019.1579578 is OK
- 10.48550/arXiv.1201.0490 is OK
- 10.25080/Majora-92bf1922-00a is OK
- 10.21105/joss.03021 is OK
- 10.25080/Majora-92bf1922-011 is OK
- 10.18148/srm/2020.v14i2.7746 is OK
- 10.48550/arXiv.2004.04145 is OK
- 10.1093/comjnl/27.2.97 is OK
- 10.1371/journal.pcbi.1007007 is OK
- 10.25080/Majora-4af1f417-011 is OK
- 10.1201/9781003080978 is OK

MISSING DOIs

- None

INVALID DOIs

- None

@labarba
Copy link
Member

labarba commented Oct 23, 2022

hi @valdanchev — it looks like this one slipped through the cracks, sorry (I have total email overload). On a quick browse, I noticed that the repo has as latest tag v2.1.1, while the version here is noted as v2.0.0. Meanwhile, you have a lot of newer commits. (Unfortunately, you do not have meaningful commit messages, as all are titled "Updates"—I do urge you to revisit this practice!!)

Can you make a tagged release with all the latest changes, corresponding to the reviewed and revised JOSE submission? Then report the version number here and we will update it.

@valdanchev
Copy link

@labarba - many thanks, this is very helpful! I have now made a new tagged release corresponding to the JOSE publication (with added informative commit messages) and updated the Zenodo archive, details are below. Let me know if any additional changes are needed. Thank you again.

Latest repo version: v2.1.2
Zenodo DOI: 10.5281/zenodo.7244097

@valdanchev
Copy link

@whedon generate pdf

@whedon
Copy link
Author

whedon commented Oct 24, 2022

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@labarba
Copy link
Member

labarba commented Oct 24, 2022

@whedon set v2.1.2 as version

@whedon
Copy link
Author

whedon commented Oct 24, 2022

OK. v2.1.2 is the version.

@labarba
Copy link
Member

labarba commented Oct 24, 2022

hi @valdanchev — One last thing: We request that authors edit the metadata of the Zenodo deposit so title and author list match the JOSE paper. It's just cleaner that way as readers see these as part of the "same scholarly object." Could you do that?

@valdanchev
Copy link

hi @labarba — thank you, more than happy to do that, just updated the metadata of the deposit.

@labarba
Copy link
Member

labarba commented Oct 24, 2022

@whedon accept deposit=true

@whedon
Copy link
Author

whedon commented Oct 24, 2022

Doing it live! Attempting automated processing of paper acceptance...

@whedon whedon added accepted published Papers published in JOSE labels Oct 24, 2022
@whedon
Copy link
Author

whedon commented Oct 24, 2022

🚨🚨🚨 THIS IS NOT A DRILL, YOU HAVE JUST ACCEPTED A PAPER INTO JOSE! 🚨🚨🚨

Here's what you must now do:

  1. Check final PDF and Crossref metadata that was deposited 👉 Creating pull request for 10.21105.jose.00156 jose-papers#107
  2. Wait a couple of minutes, then verify that the paper DOI resolves https://doi.org/10.21105/jose.00156
  3. If everything looks good, then close this review issue.
  4. Party like you just published a paper! 🎉🌈🦄💃👻🤘

Any issues? Notify your editorial technical team...

@labarba
Copy link
Member

labarba commented Oct 24, 2022

Congratulations, @valdanchev, your JOSE paper is accepted! 🚀

Huge thanks to our editor: @ShanEllis and the reviewers: @TomDonoghue, @lechten — your contribution makes this adventure possible 🙏

@labarba labarba closed this as completed Oct 24, 2022
@whedon
Copy link
Author

whedon commented Oct 24, 2022

🎉🎉🎉 Congratulations on your paper acceptance! 🎉🎉🎉

If you would like to include a link to your paper from your README use the following code snippets:

Markdown:
[![DOI](https://jose.theoj.org/papers/10.21105/jose.00156/status.svg)](https://doi.org/10.21105/jose.00156)

HTML:
<a style="border-width:0" href="https://doi.org/10.21105/jose.00156">
  <img src="https://jose.theoj.org/papers/10.21105/jose.00156/status.svg" alt="DOI badge" >
</a>

reStructuredText:
.. image:: https://jose.theoj.org/papers/10.21105/jose.00156/status.svg
   :target: https://doi.org/10.21105/jose.00156

This is how it will look in your documentation:

DOI

We need your help!

Journal of Open Source Education is a community-run journal and relies upon volunteer effort. If you'd like to support us please consider doing either one (or both) of the the following:

@valdanchev
Copy link

Great!! Thank you so much @labarba, @ShanEllis, @TomDonoghue, and @lechten for all of your work — the review process has really improved the learning resource!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted CSS HTML Jupyter Notebook published Papers published in JOSE recommend-accept Papers recommended for acceptance in JOSE. review
Projects
None yet
Development

No branches or pull requests

7 participants