[REVIEW]: Local clustering #960

whedon · 2018-09-17T19:17:02Z

Submitting author: @volfpeter (Péter Volf)
Repository: https://github.com/volfpeter/localclustering
Version: v0.10
Editor: @pjotrp
Reviewer: @DannyArends
Archive: 10.5281/zenodo.1443551

Status

Status badge code:

HTML: <a href="http://joss.theoj.org/papers/c6859cfde227c60d4d1836bc24ade7b4"><img src="http://joss.theoj.org/papers/c6859cfde227c60d4d1836bc24ade7b4/status.svg"></a>
Markdown: [![status](http://joss.theoj.org/papers/c6859cfde227c60d4d1836bc24ade7b4/status.svg)](http://joss.theoj.org/papers/c6859cfde227c60d4d1836bc24ade7b4)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@DannyArends, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

Make sure you're logged in to your GitHub account
Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.theoj.org/about#reviewer_guidelines. Any questions/concerns please let @pjotrp know.

✨ Please try and complete your review in the next two weeks ✨

Review checklist for @DannyArends

Conflict of interest

As the reviewer I confirm that I have read the JOSS conflict of interest policy and that there are no conflicts of interest for me to review this work.

Code of Conduct

I confirm that I read and will adhere to the JOSS code of conduct.

General checks

Repository: Is the source code for this software available at the repository url?
License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
Version: Does the release version given match the GitHub release (v0.10)?
Authorship: Has the submitting author (@volfpeter) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

Installation: Does installation proceed as outlined in the documentation?
Functionality: Have the functional claims of the software been confirmed?
Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

Authors: Does the paper.md file include a list of authors with their affiliations?
A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?

The text was updated successfully, but these errors were encountered:

whedon · 2018-09-17T19:17:05Z

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @DannyArends it looks like you're currently assigned as the reviewer for this paper 🎉.

⭐ Important ⭐

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

For a list of things I can do to help you, just type:

@whedon commands

whedon · 2018-09-17T19:17:06Z

Attempting PDF compilation. Reticulating splines etc...

whedon · 2018-09-17T19:17:42Z

👉 Check article proof 📄 👈

pjotrp · 2018-09-17T19:19:13Z

@volfpeter to expedite the review process do you mind going through above list of check boxes and make sure they can be ticked (you can't tick them). Ping us here when you are done.

volfpeter · 2018-09-17T20:08:32Z

@pjotrp I've gone through the checklist. I think everything is fine, but here are a couple of notes:

Version: The latest release is v0.10, the code itself has not changed since that release, but obviously the paper is not included in that release. If the published release must contain the paper, then i will have to update the version.
Automated tests: There are only manual "tests" in the form of the iPython notebook and demo.py. I hope it will be enough.
References: Do i need to look up the DOI of all citations? If so, then i will get to it.

pjotrp · 2018-09-18T06:51:13Z

Just create a new release when we are ready. Tests should be fine. DOIs, please add.

@DannyArends you can start review. Thanks!

DannyArends · 2018-09-18T07:45:09Z

Ticked all the boxes I could, in general the software is in a good state. However how to use the library, what the advantages are, and the use-cases of this clustering algorithm compared to already available methods remains a mystery to me.

I would advice the author to additionally take a close look at the language used in the paper (although this is outside of the scope of the review) I added some comments in the additional section below.

Furthermore I was not able to install the software yet, I will do that after the issues listed below are taken care of.

Functionality - Installation

The software only provides a short list of dependencies, no clear install guide is provided. Please provide an install guide outlining the steps needed to be taken by a new user to install the software, even if this is a simple as:

    pip install graphscraper
    git clone https://github.com/volfpeter/localclustering.git

Then what do I need to do ? Do I need to include the library ? does it need compilation/installation ? Do I need to update my path variable ?

Additionally, it is not specified if this requires a certain OS, is this windows, linux, macOSX compatible ? has it been tested on multiple platforms ?

Documentation - A statement of need

No statement of need is provided, please make clear in the readme for whom the software in this repository is useful: e.g. is it aimed at biologist, mathematicians, physics

Documentation - Installation instructions

There is no clear list of dependencies, and dependencies for the library, the demo and the python notebook are different. Furthermore, it is unclear which version of "GraphScraper library" is required, since the readme file links to the pypi version (https://pypi.python.org/pypi/graphscraper) as well as the authors fork of the same software: (https://github.com/volfpeter/graphscraper).

A solution would be to split up the dependencies section into three parts outlining the requirements for: Library, Demo and IPython notebook, and then giving install instructions on how-to install the dependencies.

Documentation - Automated tests

No automated tests are provided, and no instructions are provided in the readme on howto run manual tests of the library.

Documentation - Community guidelines

point 2) Report issues or problems with the software, and point 3) Seek support are to be addressed

Software paper - Authors

Author is listed, but the affiliation is mentioned as "None", if no affiliation exists, it would be more professional to mark the author as "independant developer"

Software paper - A statement of need

Same comment as before in the documenttation section, please provide a statement of need

Software paper - References

No DOIs are provided for the references

Additional

The text in the paper is of low quality in some areas, and the author might want to improve the text before publication, for example:

"Here are some ideas for future work:"
"These are the main resources besides the source code:"

Furthermore no comparison is made between the current clustering algorithm and known clustering
algorithm, how is this algorithm different from K-Means, Mean-Shift, DBSCAN, EM Clustering with
Gaussian Mixture Models (GMM) or Agglomerative Hierarchical Clustering.

To me it is not clear what advantages this clustering algorithm has, and it is not clear what
the author means by improving the "cluster's quality". How is this measured?

Furthermore, the phrase "they define a set of requirements cluster definitions must fulfill" is
very vague, and no explanation (or link to an example) is given how the user must do this.

volfpeter · 2018-09-18T09:18:25Z

Thanks for the comments! I'll start fixing these issues and get back to you when it's done.

pjotrp · 2018-09-26T05:50:59Z

Hi @volfpeter, you had great momentum. Don't let it slip :). I think @DannyArends has some good points, but they are not intrusive. I think most of the work is to provide a clear install path (we require that) and a bit more contextual information on other clustering algorithms. I don't think a full comparison is required. Also tests are optional.

volfpeter · 2018-09-26T06:21:32Z

Hi @pjotrp , I haven't had time last week, but i started working on it this week. I decided to completely rewrite and significantly extend the readme of the project because it was really lacking in some areas and only this review made it obvious to me (it was OK for someone who knew how to use the software, i.e. me, but it was not very useful for others)...
Regarding the comparison, i haven't made a decision on that part yet. All the algorithms mentioned in the review (and the ones referenced in the paper) are global clustering algorithms that group all the nodes of a graph into clusters while this one is local in the sense that it aims to find the cluster of a small set of nodes (usually one) without even looking at the whole graph. It's a bit like comparing apples and oranges. Anyway, i'll think of something, maybe clustering the Zachary karate club graph which is well-known and analyzed by global algorithms.

pjotrp · 2018-09-26T09:43:34Z

Excellent. That is what review is for - to improve the paper :)

volfpeter · 2018-09-27T20:06:52Z

@DannyArends I've updated the readme. Hopefully it's ok now, although i'm not sure you will be satisfied with the "statement of need" paragraph.
I'll update the paper soon, it's not ready yet.

DannyArends · 2018-10-01T08:36:03Z

@volfpeter Looks a lot better already, I check all the boxes, except the "automated tests", since @pjotrp mentioned that "Also tests are optional." I could just tick it if needed:

A couple of minor things:
In the README.md, please specify the language for the example text boxes, so that it will do code-highlighting, by specifying the language after the ''' to '''Python:

from localclustering.definitions.connectivity import ConnectivityClusterDefinition
from localclustering.localengine import LocalClusterEngine

compared with (not the best example, but I think you'll get the idea):

from localclustering.definitions.connectivity import ConnectivityClusterDefinition
from localclustering.localengine import LocalClusterEngine

There is a missing space in the README.md: "a detailed descriptionand", as well as a minor spelling error: change 'in' to 'by' in the following sentence: "contact the repository owner in email"

Anyway, it looks good to me, just update the paper (if you want I could proofread it for you) and if @pjotrp confirms his "tests are optional" statement I will check the last box.

Danny

Ps. I noticed there is 1 reference without a DOI still, however this paper from 2000 does not have a DOI assigned to it, is this ok @pjotrp ?

volfpeter · 2018-10-01T09:08:39Z

I've fixed the readme.

The reference that doesn't have a DOI is a technical report. I have found DOIs for related papers, but i wanted to cite the technical report because that's what i read back in the day.

Regarding the test, the only related thing in the repository is the demo module. It's easy to run if you went through the Getting started section of the readme. You will receive a very detailed error message saying you need to install the cairo library and its python binding (pycairo), then it will run.

One more thing: i've released a new version (0.11) that fixed the setup.py file, so that's the latest version right now.

Hopefully i will have time to finalize and commit the paper today.

DannyArends · 2018-10-01T09:17:10Z

@volfpeter I think it's more for regression/CI tests e.g. when using travis.ci

These kinds of tests allow you to know when a change to the code breaks the results compared to a previous version. I use it for the CTLmapping repository, every time I push to the master branch, the code is checked to make sure it can still install, and it runs tests to see if the mapping algorithm still produces the same output as before. See for example:

https://github.com/DannyArends/CTLmapping/blob/master/Rctl/tests/test_openmp.R

in this code my own openMP correlation algorithm is compared to the build-in correlation algorithm from R. If my algorithm produces different results compared to the R version:

if(!all(res.ref == res.cor)) stop("ref != cor\n") # When results differ, stop

The test is stopped (stop is R lingo for error), and travis.ci will email me that the latest changes have broken the algorithm.

You could do something like this for the clustering, in which you calculate a base case, which you then test.

@pjotrp Is this a requirement or a nice to have feature for JOSS ?

Additionally it gives you the opportunity to add the nice green "build passing" icon to your repository

pjotrp · 2018-10-01T11:08:34Z

tests are optional.

DannyArends · 2018-10-01T11:18:17Z

Ok,
thanks for the confirmation, I checked all the boxes.
After the paper is updated, everything should be good 2 go

Danny

pjotrp · 2018-10-01T15:29:51Z

Yup. If there is no DOI there is none ;)

volfpeter · 2018-10-01T18:51:20Z

@DannyArends I've committed the updated version of the paper and regenerated the PDF. Please review it and let me know if further updates or fixes are needed. Thanks!

DannyArends · 2018-10-02T08:24:15Z

Submitted some minor changes to the text, seems good 2 me.
@pjotrp After regenerating the paper, it's ready for publication in my opinion.

volfpeter · 2018-10-02T08:58:07Z

@DannyArends Thanks a lot for the fixes! I've merged the pull request.
@DannyArends @pjotrp Thanks for all the help during the pre-review and review process! It was a much better experience than i had expected.

pjotrp · 2018-10-03T07:31:49Z

@whedon generate pdf

whedon · 2018-10-03T07:31:51Z

Attempting PDF compilation. Reticulating splines etc...

whedon · 2018-10-03T07:32:21Z

👉 Check article proof 📄 👈

pjotrp · 2018-10-03T07:40:56Z

Excellent. The submission is slightly over 1000 words at 1200 words, but in this case I think it serves its purpose (@arfon is that OK?).

We need you to deposit a copy of your software repository (including any revisions made during the JOSS review process) with a data-archiving service. To do so:

Create a GitHub release of the current version of your software repository
Deposit that release with Zenodo, figshare, or a similar DOI issuer.
Post a comment here with the DOI for the release.

arfon · 2018-10-03T12:07:21Z

Excellent. The submission is slightly over 1000 words at 1200 words, but in this case I think it serves its purpose (@arfon is that OK?).

👍 LGTM

volfpeter · 2018-10-03T18:42:56Z

I've submitted the repository to Zenodo:

DOI for "paper" release: 10.5281/zenodo.1443551
DOI for all releases: 10.5281/zenodo.1443550

The repository version of the "paper" release is 0.12.1.

pjotrp · 2018-10-03T21:08:55Z

@whedon set 10.5281/zenodo.1443551 as archive

whedon · 2018-10-03T21:09:04Z

OK. 10.5281/zenodo.1443551 is the archive.

pjotrp · 2018-10-03T21:10:15Z

@arfon ready to R&R

arfon · 2018-10-03T22:39:15Z

@DannyArends - many thanks for your review here and to @pjotrp for editing this submission ✨

@volfpeter - your paper is now accepted into JOSS and your DOI is https://doi.org/10.21105/joss.00960 ⚡ 🚀 💥

whedon · 2018-10-03T22:39:17Z

🎉🎉🎉 Congratulations on your paper acceptance! 🎉🎉🎉

If you would like to include a link to your paper from your README use the following code snippets:

Markdown:
[![DOI](http://joss.theoj.org/papers/10.21105/joss.00960/status.svg)](https://doi.org/10.21105/joss.00960)

HTML:
<a style="border-width:0" href="https://doi.org/10.21105/joss.00960">
  <img src="http://joss.theoj.org/papers/10.21105/joss.00960/status.svg" alt="DOI badge" >
</a>

reStructuredText:
.. image:: http://joss.theoj.org/papers/10.21105/joss.00960/status.svg
   :target: https://doi.org/10.21105/joss.00960

This is how it will look in your documentation:

We need your help!

Journal of Open Source Software is a community-run journal and relies upon volunteer effort. If you'd like to support us please consider doing either one (or both) of the the following:

Volunteering to review for us sometime in the future. You can add your name to the reviewer list here: http://joss.theoj.org/reviewer-signup.html
Making a small donation to support our running costs here: https://www.flipcause.com/secure/cause_pdetails/Mjk3Nzk=

whedon assigned pjotrp Sep 17, 2018

whedon added the review label Sep 17, 2018

whedon mentioned this issue Sep 17, 2018

[PRE REVIEW]: Local clustering #954

Closed

arfon added the accepted label Oct 3, 2018

arfon closed this as completed Oct 3, 2018

whedon added published Papers published in JOSS recommend-accept Papers recommended for acceptance in JOSS. labels Mar 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW]: Local clustering #960

[REVIEW]: Local clustering #960

whedon commented Sep 17, 2018 •

edited

Loading

whedon commented Sep 17, 2018

whedon commented Sep 17, 2018

whedon commented Sep 17, 2018

pjotrp commented Sep 17, 2018

volfpeter commented Sep 17, 2018

pjotrp commented Sep 18, 2018

DannyArends commented Sep 18, 2018 •

edited

Loading

volfpeter commented Sep 18, 2018

pjotrp commented Sep 26, 2018

volfpeter commented Sep 26, 2018

pjotrp commented Sep 26, 2018

volfpeter commented Sep 27, 2018

DannyArends commented Oct 1, 2018 •

edited

Loading

volfpeter commented Oct 1, 2018

DannyArends commented Oct 1, 2018 •

edited

Loading

pjotrp commented Oct 1, 2018

DannyArends commented Oct 1, 2018

pjotrp commented Oct 1, 2018

volfpeter commented Oct 1, 2018

DannyArends commented Oct 2, 2018

volfpeter commented Oct 2, 2018

pjotrp commented Oct 3, 2018

whedon commented Oct 3, 2018

whedon commented Oct 3, 2018

pjotrp commented Oct 3, 2018

arfon commented Oct 3, 2018

volfpeter commented Oct 3, 2018

pjotrp commented Oct 3, 2018

whedon commented Oct 3, 2018

pjotrp commented Oct 3, 2018

arfon commented Oct 3, 2018

whedon commented Oct 3, 2018

[REVIEW]: Local clustering #960

[REVIEW]: Local clustering #960

Comments

whedon commented Sep 17, 2018 • edited Loading

Status

Reviewer instructions & questions

Review checklist for @DannyArends

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

whedon commented Sep 17, 2018

whedon commented Sep 17, 2018

whedon commented Sep 17, 2018

pjotrp commented Sep 17, 2018

volfpeter commented Sep 17, 2018

pjotrp commented Sep 18, 2018

DannyArends commented Sep 18, 2018 • edited Loading

Functionality - Installation

Documentation - A statement of need

Documentation - Installation instructions

Documentation - Automated tests

Documentation - Community guidelines

Software paper - Authors

Software paper - A statement of need

Software paper - References

Additional

volfpeter commented Sep 18, 2018

pjotrp commented Sep 26, 2018

volfpeter commented Sep 26, 2018

pjotrp commented Sep 26, 2018

volfpeter commented Sep 27, 2018

DannyArends commented Oct 1, 2018 • edited Loading

volfpeter commented Oct 1, 2018

DannyArends commented Oct 1, 2018 • edited Loading

pjotrp commented Oct 1, 2018

DannyArends commented Oct 1, 2018

pjotrp commented Oct 1, 2018

volfpeter commented Oct 1, 2018

DannyArends commented Oct 2, 2018

volfpeter commented Oct 2, 2018

pjotrp commented Oct 3, 2018

whedon commented Oct 3, 2018

whedon commented Oct 3, 2018

pjotrp commented Oct 3, 2018

arfon commented Oct 3, 2018

volfpeter commented Oct 3, 2018

pjotrp commented Oct 3, 2018

whedon commented Oct 3, 2018

pjotrp commented Oct 3, 2018

arfon commented Oct 3, 2018

whedon commented Oct 3, 2018

whedon commented Sep 17, 2018 •

edited

Loading

DannyArends commented Sep 18, 2018 •

edited

Loading

DannyArends commented Oct 1, 2018 •

edited

Loading

DannyArends commented Oct 1, 2018 •

edited

Loading