Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW]: SmartEDA: An R Package for Automated Exploratory Data Analysis #1509

Open
whedon opened this issue Jun 18, 2019 · 66 comments

Comments

@whedon
Copy link
Collaborator

commented Jun 18, 2019

Submitting author: @sayanddude (Sayan Putatunda)
Repository: https://github.com/daya6489/SmartEDA
Version: 0.3.2
Editor: @mgymrek
Reviewer: @nhejazi, @terrytangyuan
Archive: 10.5281/zenodo.3383824

Status

status

Status badge code:

HTML: <a href="http://joss.theoj.org/papers/e56dad3d192cfeddd10fcc1550505ceb"><img src="http://joss.theoj.org/papers/e56dad3d192cfeddd10fcc1550505ceb/status.svg"></a>
Markdown: [![status](http://joss.theoj.org/papers/e56dad3d192cfeddd10fcc1550505ceb/status.svg)](http://joss.theoj.org/papers/e56dad3d192cfeddd10fcc1550505ceb)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@nhejazi & @terrytangyuan , please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @mgymrek know.

Please try and complete your review in the next two weeks

Review checklist for @nhejazi

Conflict of interest

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the repository url?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • Version: 0.3.2
  • Authorship: Has the submitting author (@sayanddude) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Authors: Does the paper.md file include a list of authors with their affiliations?
  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?

Review checklist for @terrytangyuan

Conflict of interest

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the repository url?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • Version: 0.3.2
  • Authorship: Has the submitting author (@sayanddude) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Authors: Does the paper.md file include a list of authors with their affiliations?
  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 18, 2019

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @nhejazi, @terrytangyuan it looks like you're currently assigned to review this paper 🎉.

⭐️ Important ⭐️

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands
@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 18, 2019

Attempting PDF compilation. Reticulating splines etc...
@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 18, 2019

@terrytangyuan

This comment has been minimized.

Copy link
Member

commented Jun 19, 2019

I saw that you submitted for review to JSS already https://arxiv.org/pdf/1903.04754.pdf. Is this submission to JOSS still necessary?

@sayanddude

This comment has been minimized.

Copy link

commented Jun 20, 2019

@terrytangyuan Hi! We initially submitted the paper (given that it's a CRAN package) to JSS a couple of months back but it didn't work out there. We assure you that currently the paper is not under consideration for submission at any other journal or conference. We are hoping that this paper goes through the rigorous review and gets published at JOSS!

@terrytangyuan

This comment has been minimized.

Copy link
Member

commented Jul 11, 2019

@sayanddude I see. There are many other tools for (automated) exploratory data analysis in R. What makes this package useful/different?

@sayanddude

This comment has been minimized.

Copy link

commented Jul 12, 2019

@whedon generate pdf

@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 12, 2019

Attempting PDF compilation. Reticulating splines etc...
@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 12, 2019

@sayanddude

This comment has been minimized.

Copy link

commented Jul 12, 2019

@whedon generate pdf

@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 12, 2019

Attempting PDF compilation. Reticulating splines etc...
@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 12, 2019

@sayanddude

This comment has been minimized.

Copy link

commented Jul 12, 2019

@sayanddude I see. There are many other tools for (automated) exploratory data analysis in R. What makes this package useful/different?

@terrytangyuan Hi! We have added a new section “Comparison with other R Packages” in the updated version of the paper pdf generated above. In this section we have given a snapshot of various capabilities of SmartEDA vs. some of the competing R packages (such as dlookr, explorer, DataExplorer, etc.) and have highlighted its advantages. Figure 9 in the paper gives a snapshot of the comparison and shows how it’s better than most of the available R packages for automated exploratory data analysis (Please find attached the figure below).

Fig9

To summarize, some of the key benefits of SmartEDA are:
• No need remember the different R package names as SmartEDA has most of the exploratory function and dependencies
• No need to write lengthy R scripts. SmartEDA does the exploratory in one line R script
• It cuts down time for exploratory data analysis
• SmartEDA has the extension of data.table to build customized summary statistics and cross tables
• SmartEDA function can generate 100’s of ggplot (like scatter, bar, stacked bar, boxplot, density, qqplot, co-ordinate plots) at a time with customized theme using ggthemes package options

Also, SmartEDA is mentioned in the study conducted by Staniak and Biecek (2018) where they reviewed the landscape of R packages for automated Exploratory analysis. Some of the distinguishing features of SmartEDA pointed out by the authors when comparing it with other R packages are:
• The SmartEDA package reports skewness and displays QQ plots against normal distribution
• SmartEDA package provides a method of visualizing multivariate relationships - parallel coordinate plot.
• SmartEDA give a reasonable insight into variables distributions and simple relationships.
• Parallel Co-ordinates Plots (PCP) in SmartEDA is unique and is very well done.
This paper by Staniak and Biecek (2019) is available in ArXiv- https://arxiv.org/pdf/1904.02101.pdf

Reference:
Mateusz Staniak and Przemyslaw Biecek (2019), “The Landscape of R Packages for Automated Exploratory Data Analysis”, arXiv:1904.02101 [stat.CO]

@sayanddude

This comment has been minimized.

Copy link

commented Jul 12, 2019

👉 Check article proof 📄 👈

@nhejazi @terrytangyuan Requesting the reviewers to kindly consider the latest version of the pdf generated above (10.21105.joss.01509.pdf). We have added a section on “Comparison with other R Packages” and corrected some formatting issues that were there in the earlier version.

@terrytangyuan

This comment has been minimized.

Copy link
Member

commented Jul 12, 2019

@sayanddude Thanks. The table looks great. Some feedback:

  • I see lots of long functions with >150 lines of code and the code style is not consistent (I suggest running a lintr check). Similarly, in the roxygen docs, the style is not consistent. I see both ##' and #'. The indentation levels and the roxygen syntax are sometimes incorrect. Please double check.
  • There isn't any unit test for the package.
  • Have you tried running this package on large datasets?
  • Please make the paper more concise - I see many pages where each page only has one giant picture.
@sayanddude

This comment has been minimized.

Copy link

commented Jul 12, 2019

@terrytangyuan Thanks for the feedback! I will work on your comments and will get back to you with the updated paper and the required code changes as soon as possible.

@labarba

This comment has been minimized.

Copy link
Member

commented Aug 4, 2019

@sayanddude — can you give us a status update? If you will need considerable more time, it would help if you let us know and we can add a "paused" label here.

@sayanddude

This comment has been minimized.

Copy link

commented Aug 4, 2019

@labarba - Hi! We are almost done addressing all the comments of the reviewer. We are now at the final stages of creating the unit test for the package. Please give us a couple of more days, we will update the code repository along with the updated paper by Tuesday (6th Aug, 2019) end of the day. Thanks!

@sayanddude

This comment has been minimized.

Copy link

commented Aug 6, 2019

@whedon generate pdf

@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Aug 6, 2019

Attempting PDF compilation. Reticulating splines etc...
@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Aug 6, 2019

@sayanddude

This comment has been minimized.

Copy link

commented Aug 6, 2019

@terrytangyuan We have worked on addressing all of your comments. Please find below our response to each action items:

I see lots of long functions with >150 lines of code and the code style is not consistent (I suggest running a lintr check). Similarly, in the roxygen docs, the style is not consistent. I see both ##' and #'. The indentation levels and the roxygen syntax are sometimes incorrect. Please double check.
Thanks for your comment! We have updated the Github repository and we have ensured the length of almost all the functions are below 150 lines of code. We have also run a lintr check and have corrected all the issues related to style inconsistencies, indentation issues and incorrect syntaxes.

There isn't any unit test for the package.
Thanks a lot for the comment! We have now implemented the unit test of the package (available at https://github.com/daya6489/SmartEDA).

Have you tried running this package on large datasets?.
Yes, the package works well on large datasets. Recently, we applied the SmartEDA package on the Microsoft malware prediction data in Kaggle (available at- https://www.kaggle.com/ajithvallabai/microsoft-malware-prediction/data ). This dataset was considerably large i.e. it had 8900000 rows and 82 columns. The SmartEDA package worked seamlessly on this dataset.

Please make the paper more concise - I see many pages where each page only has one giant picture.
Thanks for the comment! We have considerably reduced the size of the paper (please refer to the updated version of the paper i.e. 10.21105.joss.01509.pdf ) by removing a few verbose content and also by consolidating most of the images into a single figure i.e. Figure 2.

Thanks and Regards,
Sayan

@sayanddude

This comment has been minimized.

Copy link

commented Aug 6, 2019

@labarba Hi! As discussed, we have completed working on the comments and have updated the Github code repository along with the paper.

@labarba

This comment has been minimized.

Copy link
Member

commented Aug 6, 2019

The handling editor, @mgymrek, will take it from here.

@mgymrek

This comment has been minimized.

Copy link

commented Aug 8, 2019

@sayanddude thanks for making these changes

@nhejazi, @terrytangyuan can you now go over the revision? If your comments have been sufficiently addressed please finish filling out the checklist

@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Aug 19, 2019

Attempting PDF compilation. Reticulating splines etc...
@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Aug 19, 2019

@sayanddude

This comment has been minimized.

Copy link

commented Aug 19, 2019

@nhejazi We have worked on addressing all of your comments. Please find below our response to each action items:

Major (cannot be checked off yet)

  • References: I believe some of the references are missing DOIs. As ...........
    Thanks for your comment! We have added the DOI for almost all the articles for which DOIs are available in the updated version of the paper. Only for the cited R packages and for the Exploratory Data Analysis book by Tukey (1977), there's no DOI available.

  • Community Guidelines: While the package repository contains a CODE_OF_CONDUCT.md file, a "Contributions" section ..........................
    Thanks for pointing it out! We have added an "Issues" section and a "Contributions" section in the README.md file of the package (please refer to https://github.com/daya6489/SmartEDA). We have also added a "Contribution Guidelines" as suggested by you.

Minor (things that will not hold up the review):

  • Unit tests: it looks like .............
    Thanks for your valuable comment! Yes, we agree that more tests can be written for testing the summary statistics. And yes, as more users start using our package we will keep writing more unit tests. In fact, we plan to write some more unit test in the next release version of the package.

  • JOSS paper: From a quick read, there are some .................
    Thanks for your comment! We have corrected the required markdown syntax (i.e. @eda:1). We have also cited the R core development team for "R" and have called R as a "programming environment" in text (pg. 1, paragraph 3, Introduction section) instead of "a statistical computing package" that was mentioned in the earlier version of the paper. We have also corrected some of the grammatical mistakes in the paper and have used tools such as "Grammarly" to run a check of the entire manuscript for any grammatical errors.

@sayanddude

This comment has been minimized.

Copy link

commented Aug 19, 2019

@mgymrek @terrytangyuan @nhejazi Hi! We have addressed all the comments mentioned above and have also made the required changes to the code repository. Kindly let us know if there's any pending action item from our end.

Thanks and Regards,
Sayan

@nhejazi

This comment has been minimized.

Copy link
Collaborator

commented Aug 20, 2019

@sayanddude Thank you for quickly addressing our concerns, including the addition of the DOIs to all references (where possible). I've gone ahead and completed the reviewer checklist available to me and am ready to recommend the software paper for acceptance into JOSS. There may yet be other concerns to address but I do think the paper and R package are close.

@sayanddude

This comment has been minimized.

Copy link

commented Aug 21, 2019

@nhejazi Thank you so much!

@mgymrek

This comment has been minimized.

Copy link

commented Sep 1, 2019

Thanks!

@sayanddude see next steps below.

Some minor comments on the paper:

  • "EDA can be categorized into Descriptive statistical techniques and graphical techniques": uncapitalize descriptive
  • You can reference just Figure 2 rather than spelling out (a) through (f)
  • There is a reference to a Figure 9 that I believe should be to Figure 3 instead

After fixing those typos, can you please make a zenodo archive, being sure the title and author list match those on the paper, and report the DOI here?

@terrytangyuan

This comment has been minimized.

Copy link
Member

commented Sep 1, 2019

@sayanddude Thanks for addressing the comments. The paper looks good to me now and I recommend for publication.

@sayanddude

This comment has been minimized.

Copy link

commented Sep 2, 2019

@terrytangyuan Thank you so much!

@sayanddude

This comment has been minimized.

Copy link

commented Sep 2, 2019

@whedon generate pdf

@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Sep 2, 2019

Attempting PDF compilation. Reticulating splines etc...
@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Sep 2, 2019

@sayanddude

This comment has been minimized.

Copy link

commented Sep 2, 2019

Thanks!

@sayanddude see next steps below.

Some minor comments on the paper:

  • "EDA can be categorized into Descriptive statistical techniques and graphical techniques": uncapitalize descriptive
  • You can reference just Figure 2 rather than spelling out (a) through (f)
  • There is a reference to a Figure 9 that I believe should be to Figure 3 instead

After fixing those typos, can you please make a zenodo archive, being sure the title and author list match those on the paper, and report the DOI here?

@mgymrek Thanks for the feedback! I have corrected all the required typos in the updated version of the pdf.
Also, I have created the zenodo archive for the SmartEDA package (version 0.3.2) with DOI: 10.5281/zenodo.3383824 and the URL is: https://doi.org/10.5281/zenodo.3383824.

I have ensured that the title and the author names are same as the ones mentioned in the paper. Please let me know if I have missed out on anything. Thank you so much!

@sayanddude

This comment has been minimized.

Copy link

commented Sep 2, 2019

@whedon generate pdf

@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Sep 2, 2019

Attempting PDF compilation. Reticulating splines etc...
@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Sep 2, 2019

@mgymrek

This comment has been minimized.

Copy link

commented Sep 4, 2019

@whedon set 10.5281/zenodo.3383824 as archive

@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Sep 4, 2019

OK. 10.5281/zenodo.3383824 is the archive.

@mgymrek

This comment has been minimized.

Copy link

commented Sep 4, 2019

@whedon set 0.3.2 as version

@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Sep 4, 2019

OK. 0.3.2 is the version.

@mgymrek

This comment has been minimized.

Copy link

commented Sep 4, 2019

Thanks @sayanddude!
@openjournals/joss-eics this paper is ready to be accepted!

@arfon

This comment has been minimized.

Copy link
Member

commented Sep 4, 2019

@whedon accept

@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Sep 4, 2019

Attempting dry run of processing paper acceptance...
@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Sep 4, 2019

Check final proof 👉 openjournals/joss-papers#946

If the paper PDF and Crossref deposit XML look good in openjournals/joss-papers#946, then you can now move forward with accepting the submission by compiling again with the flag deposit=true e.g.

@whedon accept deposit=true
@arfon

This comment has been minimized.

Copy link
Member

commented Sep 4, 2019

@whedon accept deposit=true

@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Sep 4, 2019

Doing it live! Attempting automated processing of paper acceptance...

@whedon whedon added the accepted label Sep 4, 2019

@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Sep 4, 2019

🐦🐦🐦 👉 Tweet for this paper 👈 🐦🐦🐦

@whedon

This comment has been minimized.

Copy link
Collaborator Author

commented Sep 4, 2019

🚨🚨🚨 THIS IS NOT A DRILL, YOU HAVE JUST ACCEPTED A PAPER INTO JOSS! 🚨🚨🚨

Here's what you must now do:

  1. Check final PDF and Crossref metadata that was deposited 👉 openjournals/joss-papers#947
  2. Wait a couple of minutes to verify that the paper DOI resolves https://doi.org/10.21105/joss.01509
  3. If everything looks good, then close this review issue.
  4. Party like you just published a paper! 🎉🌈🦄💃👻🤘

Any issues? notify your editorial technical team...

@sayanddude

This comment has been minimized.

Copy link

commented Sep 4, 2019

Thank you @mgymrek , @nhejazi and @terrytangyuan for guidance, support, and patience! It is much appreciated!
It has been a great experience for us! And we look forward to submitting our next software at JOSS :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.