Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python-graphblas: high-performance sparse linear algebra for scalable graph analytics #81

Closed
11 of 24 tasks
eriknw opened this issue Feb 4, 2023 · 54 comments
Closed
11 of 24 tasks

Comments

@eriknw
Copy link

eriknw commented Feb 4, 2023

Submitting Author: Erik Welch (@eriknw)
All current maintainers: (@eriknw, @jim22k, @SultanOrazbayev)
Package Name: Python-graphblas
One-Line Description of Package: Python library for GraphBLAS: high-performance sparse linear algebra for scalable graph analytics
Repository Link: https://github.com/python-graphblas/python-graphblas
Version submitted: 2023.1.0
Editor: @tomalrussell
Reviewer 1: @sneakers-the-rat
Reviewer 2: @szhorvat
Archive: DOI
JOSS DOI: N/A
Version accepted: 2023.7.0
Date accepted (month/day/year): 07/14/2023


Description

Python-graphblas is like a faster, more capable scipy.sparse that can implement NetworkX. It is a Python library for GraphBLAS: high-performance sparse linear algebra for scalable graph analytics. Python-graphblas mimics the math notation, making it the most natural way to learn, use, and think about GraphBLAS. In contrast to other high level GraphBLAS bindings, Python-graphblas can fully and cleanly support any implementation of the GraphBLAS C API specification thereby allowing us to be vendor-agnostic.

Scope

  • Please indicate which category or categories this package falls under:
    • Data retrieval
    • Data extraction
    • Data munging
    • Data deposition
    • Reproducibility
    • Geospatial
    • Education
    • Data visualization*
    • Scientific software wrappers (added from here)

Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see notes on categories of our guidebook.

  • For all submissions, explain how the and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):

    • Who is the target audience and what are scientific applications of this package?

Audience: anybody who works with sparse data or graphs. We are also implementing a backend to NetworkX (which supports dispatching in version 3.0) written in Python-graphblas called graphblas-algorithms, so we are quite literally targeting NetworkX users!

Python-graphblas provides a faster, easier, more flexible, and more scalable way to operate on sparse data, including for graph algorithms. There are too many scientific applications to list ranging from neuroscience, genomics, biology, etc. It may be useful wherever scipy.sparse or NetworkX are used. Although GraphBLAS was designed to build graph algorithms, it is flexible enough to be used in other applications. Anecdotally, most of our current users that I know about are from research groups in universities and laboratories.

We are also targeting applications that need very large distributed graphs. We have experimented with Dask-ifying python-graphblas here, and we get regular interest from people who want e.g. distributed PageRank or connected components.

  • Are there other Python packages that accomplish the same thing? If so, how does yours differ?

pygraphblas, which hasn't been updated in more than 16 months. There are many differences in syntax, functionality, philosophy, architecture, and (I would argue) robustness and maturity. python-graphblas syntax targets the math syntax, whereas pygraphblas is much closer to C. python-graphblas handles dtypes much more robustly, has efficient conversions to/from numpy and other formats, is architected to handle additional GraphBLAS implementations (more are on the way!), has exceptional error messages, has many more tests and functionality, supports Windows, and much, much more. We have also been growing our team, because sustainability is very important to us.

Although we have/had irreconcilable differences (which is why we decided to create python-graphblas), the authors have always been cordial. We all believe strongly in the ethos of open source, and I would describe our relationship as having "radical generosity". For example, we have an outstanding agreement that each library is welcome to "borrow" from the other (with credit). We may "borrow" some of their documentation :)

We also worked together to create and maintain the C binding to SuiteSparse:GraphBLAS:
https://github.com/GraphBLAS/python-suitesparse-graphblas/
We could use help automatically generating wheels for this library on major platforms via cibuildwheel.

  • If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted:

Limited prior discussion in this issue: pyOpenSci/python-package-guide#21 (comment)

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • has an OSI approved license.
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
    • We're working on this! The C API and SuiteSparse:GraphBLAS C library are both well documented. We have a very large API surface area to cover, so "documentation with examples for all functions" is a really, really high bar, but one I hope we achieve someday :)
  • contains a vignette with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.

Publication options

  • Do you wish to automatically submit to the Journal of Open Source Software? If so:
    • Undecided. We don't have a paper, but in principle I would like for us to submit a paper to JOSS someday.
JOSS Checks
  • The package has an obvious research application according to JOSS's definition in their submission requirements. Be aware that completing the pyOpenSci review process does not guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS.
  • The package is not a "minor utility" as defined by JOSS's submission requirements: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria.
  • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
  • The package is deposited in a long-term repository with the DOI:

Note: Do not submit your package separately to JOSS

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

  • Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

Code of conduct

Other comments (manually added)

Given a product mindset, we believe that Python-graphblas is a great product, but I think our go-to-market strategy has been lacking. We have been very engineering-heavy, and even our goal of targeting NetworkX users is engineering-heavy via creating graphblas-algorithms. I hope this peer-review process can help us prioritize our efforts (such as a plan to improve documentation) as well as a place to write a blog post or two.

Please fill out our survey

P.S. *Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

The editor template can be found here.

The review template can be found here.

@NickleDave
Copy link
Contributor

Hi @eriknw!

We're very glad to see you all have gone ahead with a full submission, after discussion in pyOpenSci/python-package-guide#21 (comment) as you linked above.

I just want to welcome you and let you know we are working on this.

I will finish the initial editor checks by the end of this week. @lwasser is traveling and we will need to co-ordinate about editors, but I expect to get back to you about that by the middle of next week at the latest.

Thank you for providing all the detail in the submission. The context you've provided will be helpful for the review (and will definitely help me with editor checks!). It sounds like you've anticipated some points @lwasser brought up when we discussed in Slack. Looking forward to helping you improve the docs and giving you some blog post material! 😁

@NickleDave
Copy link
Contributor

Editor in Chief checks

These are the basic checks that the package needs to pass to begin review.
Please check our Python packaging guide for more information on the elements below.

  • Installation The package can be installed from a community repository such as PyPI (preferred), and/or a community channel on conda (e.g. conda-forge, bioconda).
    • The package imports properly into a standard Python environment import package-name.
      • was able to do both pip install python-graphbals in a python 3.10 venv on Linux
      • and install with conda on Mac OS, python 3.10
  • Fit The package meets criteria for fit and overlap.
  • Documentation The package has sufficient online documentation to allow us to evaluate package function and scope without installing the package. This includes:
    • User-facing documentation that overviews how to install and start using the package.
    • Short tutorials that help a user understand how to use the package and what it can do for them.
      • but see comments below
    • API documentation (documentation for your code's functions, classes, methods and attributes): this includes clearly written docstrings with variables defined using a standard docstring format. We suggest using the Numpy docstring format.
  • Core GitHub repository Files
    • README The package has a README.md file with clear explanation of what the package does, instructions on how to install it, and a link to development instructions.
      • but see comments below
    • Contributing File The package has a CONTRIBUTING.md file that details how to install and contribute to the package.
    • Code of Conduct The package has a Code of Conduct file.
    • License The package has an OSI approved license.
      • Apache 2.0
        NOTE: We prefer that you have development instructions in your documentation too.
  • Issue Submission Documentation All of the information is filled out in the YAML header of the issue (located at the top of the issue template).
  • Automated tests Package has a testing suite and is tested via GitHub actions or another Continuous Integration service.
  • Repository The repository link resolves correctly.
  • Package overlap The package doesn't entirely overlap with the functionality of other packages that have already been submitted to pyOpenSci.
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly.
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

  • Initial onboarding survey was filled out
    We appreciate each maintainer of the package filling out this survey individually. 🙌
    Thank you authors in advance for setting aside five to ten minutes to do this. It truly helps our organization. 🙌


Editor comments

This package passes all checks; we can begin review.

It's obvious the developers and maintainers have done a ton of work.
Our goal here should be to help make sure everyone can appreciate how much work they have done, and how much functionality is packed into python-graphblas.

Along those lines, some notes for the review (none of these need to be addressed before we start):

  • the description section of the README needs a high level, non-technical description so that the use of the package would be clear to anyone with a general scientific background
    • Specifically there should be language that defines sparse data - just a simple definition like "data with lots of 0's "and then link to a good description of sparse data from scipy or some other resource
    • Language the submitting authors use above is very close to a good high level description: "python-graphblas ... is a Python library for ... high-performance sparse linear algebra for scalable graph analytics. [Think of it like a faster, more capable scipy.sparse that can implement NetworkX.] Python-graphblas mimics the math notation, making it the most natural way to learn, use, and think about GraphBLAS. In contrast to other high level GraphBLAS bindings, Python-graphblas can fully and cleanly support any implementation of the GraphBLAS C API specification thereby allowing us to be vendor-agnostic." This could be the first thing someone reads both in the README description and on the index of the docs
  • the description section of the README should also clarify why would someone use python-graphblas in place of, or in conjunction, with the scipy sparse module and/or networkx, but in very easy to understand language
  • ideally the README itself should have a short, beginning to end tutorial to get someone started, that really demonstrates the power of the package
    • for example this snippet appears to be almost stand-alone, and the set-up lines will be understandable by most scientific Python users: https://python-graphblas.readthedocs.io/en/latest/getting_started/primer.html#sssp-in-python-graphblas
    • as opposed to the examples that are currently in the README that are meant to give an impression of the package but are not stand-alone runnable snippets; these examples may be useful in API documentation. But the README as currently written may be missing the opportunity to recruit users without a quick friendly walkthrough
  • as noted by submitting author, there is an existing tutorial, the python-graphblas primer, but there could be additional vignettes;
    • what are typical use cases, perhaps analyses from papers or packages that already use python-graphblas could be added in a how-tos section of the docs? Submitting authors state there may be examples in pygraphblas documentation

@NickleDave
Copy link
Contributor

@eriknw @jim22k, @SultanOrazbayev the tl;dr is that python-graphblas passed editor checks 🙂 🎉

Like I said above, I'll need to co-ordinate with @lwasser who is traveling about an editor for this review, but would expect us to reply back here by middle of next week at the latest

@eriknw
Copy link
Author

eriknw commented Feb 9, 2023

Hooray! 🎉

Thanks @NickleDave. We appreciate the attention y'all are giving us, and thanks for telling us what (and when) to expect next. We're in no particular rush--it's more important to give the right people the right amount of time to do things right :)

@NickleDave
Copy link
Contributor

it's more important to give the right people the right amount of time to do things right

🙌

finding the right people now! 🙂

@NickleDave
Copy link
Contributor

Hi again @eriknw, @jim22k, @SultanOrazbayev -- just letting you know that I did have a chance to talk with @lwasser now that she has returned, and we are in the process of finding an editor

When you have a chance could you please (all) fill out the pre-review survey?
It's here: https://forms.gle/F9mou7S3jhe8DMJ16

We appreciate each maintainer of the package filling out this survey individually. 🙌 Thank you authors in advance for setting aside five to ten minutes to do this. It truly helps our organization. 🙌

(I know it's easy to miss in the template)

@NickleDave
Copy link
Contributor

Hi @eriknw, @jim22k, @SultanOrazbayev, brief update:
very happy to inform you that @tomalrussell will be guest editor for this review! 🎉 NetworkX contributor, developer of spatial & network tools like snkit.
I will let @tomalrussell take it from here!

@tomalrussell
Copy link
Collaborator

Hi @eriknw, @jim22k, @SultanOrazbayev, and thanks to @NickleDave for the introduction.

I've reached out to potential reviewers, and incidentally look forward to taking a closer look at python-graphblas myself. I'll update here as soon as, definitely within a week.

@tomalrussell
Copy link
Collaborator

👋 Hi @sneakers-the-rat and @szhorvat! Thank you for volunteering to review
for pyOpenSci!

The following resources will help you complete your review:

  1. Here is the reviewers guide. This guide contains all of the steps and information needed to complete your review.
  2. Here is the review template that you will need to fill out and submit
    here as a comment, once your review is complete.

Please get in touch with any questions or concerns! Your review is due in three weeks: 29th March 2023

@szhorvat
Copy link

szhorvat commented Mar 7, 2023

Hello everyone 👋

I'm excited to do this review and learn more about the GraphBLAS approach in general.

I plan do the review gradually, and through continuous communication with the authors. I will make it clear when I consider the review to be completed. Feel free to respond to anything I might bring up before then. The same applies to the review checklist: I will post it below today, and will check off boxes gradually.

Any issues I open for python-graphblas will have a title prefixed with [pyos] and a link back here. You'll probably see me in the discussion forum as well, as I will likely need a bit of help while trying to solve a few toy problems with the library.

Expect comments mostly on mathematical aspects, correctness, docs, and usability from me. Hopefully other reviewers will cover the more technical aspects of Python.

I will aim to complete the review by April 2nd. Since the authors will have the opportunity to address concerns before then, I hope this little delay over the 3 week deadline will be fine. I will not be available during the week of the 20th.

Let me know if you'd like any changes to this arrangement—I can be flexible.

For transparency, I should note that am involved with the igraph project (https://igraph.org/). igraph is not a competitor to python-graphblas, but it does have similar aims to NetworkX and in extension to graphblas-algorithms. I think this review will be a good opportunity for us to learn form each other.

@szhorvat
Copy link

szhorvat commented Mar 7, 2023

Review is now complete.


Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README.
    • Improvements needed, see main review. Pure Python users vs C users already familiar with GraphBLAS? Explain difference from graph theory / network analysis packages.
  • Installation instructions: for the development version of the package and any non-standard dependencies in README.
  • Vignette(s) demonstrating major functionality that runs successfully locally.
    • There are some notebooks, but it is difficult to learn from them. They are sparsely commented, several draw parallels with C code, requiring both C and GraphBLAS knowledge. Discoverability: they are not linked from the docs.
  • Function Documentation: for all user-facing functions.
    • Not all functions appear in docs, e.g. visualization functions are missing, details of GraphBLAS API is not well covered.
  • Examples for all user-facing functions.
    • IMO it makes no sense to take this requirement literally, as functions are not independent from each other. Multiple useful examples are present in the user guide, which is more than what some other packages provide. That said, improvements are definitely possible (compare with typical R or Mathematica docs). In particular, more examples showing the implementations of simple well-known algorithms would be welcome.
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING.
  • Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

Readme file requirements
The package meets the readme requirements below:

  • Package has a README.md file in the root directory.

The README should include, from top to bottom:

  • The package name
  • Badges for:
    • Continuous integration and test coverage,
    • Docs building (if you have a documentation website),
    • A repostatus.org badge,
    • Python versions supported,
    • Current package version (on PyPI / Conda).

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

  • Short description of package goals.
  • Package installation instructions
  • Any additional setup required to use the package (authentication tokens, etc.)
  • Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file.
    • Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear)
      • Links to notebooks are not present. It would be useful to have a thoroughly commented walkthrough for implementing a couple of well-known algorithms.
  • Link to your documentation website.
  • If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
  • Citation information

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.
Package structure should follow general community best-practices. In general please consider whether:

  • Package documentation is clear and easy to find and use.
  • The need for the package is clear
  • All functions have documentation and associated examples for use
  • The package is easy to install

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Continuous Integration: Has continuous integration setup (We suggest using Github actions but any CI platform is acceptable for review)
  • Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines.
    A few notable highlights to look at:
    • Package supports modern versions of Python and not End of life versions.
    • [?] Code format is standard throughout package and follows PEP 8 guidelines (CI tests for linting pass)

For packages also submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: With DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Review Comments

@szhorvat
Copy link

szhorvat commented Mar 7, 2023

@tomalrussell Could you please clarify if the authors are expected to follow all the above checklist items to the letter? (E.g., all mentioned badges mandatory, links to all "vignettes" from the README, all functions have examples of use, etc.)


@eriknw :

includes documentation with examples for all functions.

  • We're working on this! The C API and SuiteSparse:GraphBLAS C library are both well documented. We have a very large API surface area to cover, so "documentation with examples for all functions" is a really, really high bar, but one I hope we achieve someday :)

Indeed, the documentation of most Python projects won't give usage examples for every single function, and doing so would definitely be a lot of work. But just to show that it is possible, and often tremendously useful to users, I wanted to point to Mathematica's documentation where each function has not one, but many examples. See e.g. LinearSolve. I tried to follow the same with my IGraph/M package for Mathematica, but it's still a work in progress. R packages also often have at least one example for each function.

@sneakers-the-rat
Copy link
Collaborator

Indeed, the documentation of most Python projects won't give usage examples for every single function, and doing so would definitely be a lot of work.

From what I recall we had a conversation at some point about allowing the author to define what is intended as the public interface of the package and what isn't? but ya for packages that wrap another library it seems like a lot of extra work if, eg. there are examples from the main library that are trivially different (ie. could be inferred) from the wrapper's API.

@sneakers-the-rat
Copy link
Collaborator

sneakers-the-rat commented Mar 8, 2023

I'll also be doing this JOSS-style, leaving this here and editing/raising issues as I go. I like @szhorvat 's idea of using an issue tag, so i'll also prefix mine with [pyos].

I'm happy to focus on more of the python implementation side of things, glad to have someone who's more adept with the math :).

I don't have any conflicts to declare, except that I'm going to be writing some triplet store code soon, but that doesn't really relate or create a conflict imo.

I don't have an expected completion date, but i have this in my calendar as a daily todo item and welcome being relentessly pinged if i am the one holding us up :)

Review status

  • 23-03-20: approaching from the docs as a naïve user first, then will dive into the source after that first pass. raising some issues for docs clarity and organization. Pausing midway through the README requirements for the day.
  • 23-04-09: making progress working my way through the package and docs, trying to understand the general architecture so that I can evaluate high level claims. definitely a lot of work here and so I'm not trying to "fine tooth comb" things as much as get to a point where I can reasonably explain how the package works for the sake of a review and also to help out with docs if I can. Stopped at the "Operators" section of the user guide after raising issue 429.
  • 23-04-17: I have completed the checklist and gotten enough of a sense of the package where I feel comfortable writing my final review, but will wait to do so until the authors can see some of the issues I raised today and we close some of them (not all are time sensitive or important to me to consider the review completed, i'll communicate with the authors on the issues)
  • 23-06-21: review completed! thanks for your patients <3

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README.
  • Installation instructions: for the development version of the package and any non-standard dependencies in README.
    • PyPI install seems to work! And the examples in the notebooks run, so i assume the installation is successful
  • Vignette(s) demonstrating major functionality that runs successfully locally.
  • Function Documentation: for all user-facing functions.
  • Examples for all user-facing functions.
    • Similarly, that is not currently the case, but is imo impractical/not necessarily desirable for this package. What would make more sense to me is conceptual examples that demonstrate the different categories of user-facing functions, and I have yet to evaluate that.
    • Really nice examples in the user guide, would like to see some of those make their way into docstrings/API docs, but again don't think that's necessary here because of the structure as a wrapper. Raised some issues on some examples that didn't work for me, but appreciated the amount of work that clearly was done here.
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING.
    • Yes, both code of conduct and brief contribution docs, but have raised issue about need for developer docs separately. this requirement is met, tho.
  • Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

Readme file requirements
The package meets the readme requirements below:

  • Package has a README.md file in the root directory.

The README should include, from top to bottom:

  • The package name
  • Badges for:
    • Continuous integration and test coverage, - 99% coverage, nice lmao.
    • Docs building (if you have a documentation website),
    • A repostatus.org badge - not present, but i'm not a huge fan of this granularity of badge requirements.
    • Python versions supported,
    • Current package version (on PyPI / Conda).

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

  • Short description of package goals. - present, but this could be clearer, the description is written for people who already know what graphBLAS is, getting pretty quickly into the weeds about the mapping between python and C API syntax before I even know what the heck the package is for at a high level. I think it would be good to restructure this in the the README and the docs to be like "Hey what up this is python-graphblas, a wrapper around graphBLAS that lets you do graph math with linear algebra. With it you can do stuff like this... (one or two line example). Previous things work like this (example) but python-graphblas works like this instead (exampel of your syntax) which is good because (two sentences)"
  • Package installation instructions
  • Any additional setup required to use the package (authentication tokens, etc.)
  • Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file. - There is a link to the docs, which have examples. There should also be a link to the notebooks (even though it is relatively obvious they are there by virtue of there being a notebooks folder in the repo root) in the README.
    • Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear) - present but similar to previous comment. The README seems like it has gotten a little bit confused about its role in the documentation and does a lot of things that i would expect to just go in the full on docs. So there are some examples of the syntax, but not an example of actually using the package, which is what I would expect in a README.
  • Link to your documentation website.
  • If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem. - not present in README, but could be done by just linking to https://python-graphblas.readthedocs.io/en/stable/getting_started/faq.html#what-is-the-relationship-between-python-graphblas-and-pygraphblas
  • Citation information - there is a DOI and a zenodo repo, but I would also strongly recommend adding a CITATION file: https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-citation-files

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.
Package structure should follow general community best-practices. In general please consider whether:

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.

In trying to summarize what the functional claims of the software were, I think it basically boils down to

For packages also submitting to JOSS

N/A

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing:
(final estimate)
~14h


Review

General Comments

python-graphblas is an exemplary package among scientific python packages, with excellent docs, tests, code quality, examples, and contributor engagement. The package is an opinionated wrapper and interface to GraphBLAS, which is well justified and differentiated from prior wrappers. Throughout the review process, the authors have been receptive to feedback and I have all faith that they will address any continuing suggestions I have in future development work. I have no hesitation recommending this package to anyone who wants to use GraphBLAS from Python.

My outstanding recommendations in the remaining open issues are all future suggestions that the authors can take or leave, none are mission-critical. I want to compliment the authors on this excellent work, I'm glad to have had a reason to have read it. I would be happy to respond to any questions the authors have about this review and otherwise continue to engage on open issues.

Code Quality

Library wrappers have their own sets of challenges and idioms, and python-graphblas handles them well with some room for future improvement. Wrapping code needs to make decisions about how to map the underlying API between languages and interact with the C library. The strategy used to abstract and expose the underlying library API can have large impacts on the maintainability and readability of the code, as can strategies for keeping the version of the wrapping package in sync with the changing API of the wrapped library.

python-graphblas further complicates these choices by admirably taking on the challenge of having multiple backend implementations (numpy and SuiteSparse:GraphBlas), and multiple export formats (NetworkX, Scipy.Sparse, PyData.Sparse, numpy, SuiteSparse, Matrix Market).

The python-graphblas team has chosen a bifurcated abstraction style which fits with the design of their wrapper. Rather than a transparent wrapper, the package introduces its own calling syntax to logically separate the i/o parts of a GraphBLAS call from the computation, which is well described and justified in the documentation.

At the time of reviewing,

  • the basic container types are implmented as classes within the core module (eg core.vector.Vector)
  • operators implemented as separate modules within the main module, each with separate submodules for numpy and suitesparse implementations (eg. graphblas.unary)
    • ss implementations are actually implemented in the core.operator module, which then makes the symbols available in the top-level module namespace as a side effect of importing, eg. (graphblas.unary.ss)
  • the core module also implements the other syntax elements like infix operators that are used throughout the package.

This structure makes for a nice user-facing interface, being able to refer to classes and methods in a natural way (eg. gb.Vector, gb.unary.abs), obscuring the underlying code structure.

The tradeoff is the significant amount of indirection and implicitness in the code which presents a relatively high barrier to entry for new contributors. The mappings between the suitesparse GraphBLAS implementation and Python are programmed in several places, eg. within a method and then again in the tests, which the authors describe as being useful as a consistency check. Having some parts of the library dependent on import time side-effects is less than optimal and makes the code more difficult to reason about, but is certainly not fatal to the usability of the package. Again, the nature of a wrapping package requires making decisions about abstraction, so the only concern I have for the current code structure is the impact on maintainability. The current maintainers seem to have no trouble reasoning about the package, and I believe they are aware of these challenges and are actively working on them (they refactored a formerly massive operator file during the course of the review (python-graphblas/python-graphblas#420)). The maintainers can do themselves and future contributors a favor by writing some additional developer docs that explain the structure of the library in greater detail, and I believe they will!

The other question I had was about the relationship between versioning in the API implementation in Python and in the underlying C library. GraphBLAS is a relatively mature API and seems to be only rarely changed, with care for backwards compatibility, so this is less of a concern than for a wrapper around a more actively evolving API. The authors have chosen to formally support a specific version of GraphBLAS ( python-graphblas/python-graphblas#441 ) rather than build version-compatibility infrastructure within the package, which seems like an appropriate decision to me, given their other comments on how refactoring their backend system is on their development roadmap.

I want to also emphasize several of the "nice to have" features in the package, including a very impressive Recorder class that can keep a record of all the GraphBLAS calls made for debugging and reproducibility, automatic compilation of user-supplied functions including lambdas using numba, and the excellent i/o module. These indicate the authors are actively invested in user experience above and beyond the tidy API they expose.

Aside from the above comments, the fundamentals of the package design are strong: modern packaging with setuptools, automated code quality checks, CI, and coverage reports. The decisions made in code structure seem principled, responsive to the constraints of the problem, and result in a very usable user interface - my compliments to the Authors.

Docs

The package is well documented from an introduction to the problem that GraphBLAS attempts to solve through package design decisions and practical examples. It is suitable for a general audience with some exposure to graph operations and programming in python, which is impressive given the highly specialized nature of the library.

Future suggestions for the authors include embedding their example notebooks in the documentation, and improving the API documentation. Currently, I assume due to some of the abstraction decisions made in the rest of the package, there is not comprehensive documentation of every operation available to the user that might otherwise be accomplished with calling autodoc over a class or a module, but some common operators are listed here and the list of available operators are present in the GraphBLAS documentation as well as through object inspection within an interactive Python session. Given the nature of wrapping code where the underlying operations are well documented elsewhere, this is less of a problem than it would be in other packages.

Altogether the docs are excellent with several clear points of improvement, but far above average in the landscape of scientific python packages.

Tests

Who among us can claim 100% test coverage? https://coveralls.io/github/python-graphblas/python-graphblas

The tests are well organized and comprehensive, and I was able to find a corresponding test for every package feature I looked at easily. In more security-sensitive contexts one would want to do more adversarial input fuzzing, but I don't think that's all that relevant here since I've never seen graph data analysis libraries used as a malware vector. I have no notes on the tests, this is good work.

Issues Opened:

@tomalrussell
Copy link
Collaborator

Thanks both, great start 💯 - I'll aim to check in occasionally or as needed.

Could you please clarify if the authors are expected to follow all the above checklist items to the letter? (E.g., all mentioned badges mandatory, links to all "vignettes" from the README, all functions have examples of use, etc.)

I would comment on anything you notice and let the authors respond, we can always exercise judgment if "to the letter" seems unhelpful.

@lwasser
Copy link
Member

lwasser commented Mar 8, 2023

hey y'all. We normally prefer that reviews happen "all at once" in the sense that we're prefer the text of the review to NOT change once submitted and the conversation to happen after. I have reached out to JOSS about how the implement their reviews but I don't want to change our policy on how reviews happen (submitted all at once) until i've spoken to JOSS. @sneakers-the-rat @szhorvat it's fine if you want to leave the review text and check things off and open issues as you go but i prefer that the text of the review that you add to be added all at once to avoid any confusion regarding when you review is complete and what the maintainer of the package should focus on. Many thanks for understanding. I will update once i hear back from JOSS but i don't want to modify our process on the fly until we've thought things through more completely.

many thanks for your time y'all!

@lwasser
Copy link
Member

lwasser commented Mar 8, 2023

one other note - i think it's great to open issues as you go but one other element that is important is documentation of what changed and why so there is a full record of the review so please if you open issues be sure to reference them in the text of the review in the context of why you opened them. that will allow the editor (and us as an organization) to keep track of the review in one place. again many thanks! we are learning as an organization in this process

@sneakers-the-rat
Copy link
Collaborator

fair enough @lwasser :)
typically the way it works at JOSS is that you'll be opening issues on the repo and then linking to the review issue (this one) so that basically the review issue serves as a timeline of changes and discussion for the review (all edits to the review checklist comment are also logged). Happy to also do the inverse (link back to opened issues) as well.

So - for this review, will only post text of review when checklist finished, is that what you had in mind?

@szhorvat
Copy link

szhorvat commented Jun 26, 2023

Thanks for the patience everyone.

tl;dr This is a very nice package, technically sound, with a well-thought out Pythonic interface. My perception was that in order to bring GraphBLAS to Python users (and thus fully realize its promise of implementing graph algorithm in terms of high-level building blocks), what we need the most is Python-centric documentation and training materials.

Introduction

First of all, let me say that python-graphblas is a well thought out and technically solid package. It provides a Pythonic interface to the C-based GraphBLAS API, currently supporting SuiteSparse:GraphBLAS (which is the only usable GraphBLAS implementatin available at the moment).

The review should concern python-graphblas, but it's impossible to talk about it without also discussing GraphBLAS itself. And here I must make it clear that I am a newcomer to GraphBLAS. GraphBLAS is a sandardized C API aiming to provide high-level yet general building blocks for graph algorithms, similarly to how BLAS and LAPACK provide building blocks for linear algebra. It does this through some very elegant math, employing the language of linear algebra and abstract algebra, generalizing the usual (+, *) operations to a wider class of semirings. For example, with the usual (+, *) matrix product the kth power of a graph's adjacency matrix gives the number of walks between any two vertices. Replacing + by min and * by + will give us shortest path lengths up to k instead. What is not yet clear to me where the limits of this approach are: Are there common graph concepts that cannot be expressed in this framework? Are there some which can be expressed, but the most efficient algorithms to compute them can't be expressed with GraphBLAS?

Documentation and target audience

python-graphblas is realizing one of the major advantages of the GraphBLAS approach: it is now possible to implement graph algorithms in a convenient and high-level langauge like Python without the need for explicit loops, and thus without a significant loss of performance. Indeed, the python-graphblas authors have also created the graphblas-algorithm package, which achieves performance comparable to C/C++ libraries while being written in pure Python. (To be fair, it's not clear to me how much of the performance comes from parallelization, but the benchmarks are impressive.)

This brings me to a point about how python-graphblas is presented. This is the first paragraph in the docs:

python-graphblas is a Pythonic interface to the highly performant SuiteSparse:GraphBLAS library for performing graph analytics in the language of linear algebra.

This makes the impression that python-graphblas is just an interface to a C library, perhaps even aimed at people who already understand that library. I think this is backwards: if one of the main GraphBLAS advantages is usability from high-level languages, then the target audience should be users of such languages. It seems to me that a project like this should spend at least as much effort on good documentation and training materials as on technical bindings. And this is precisely the weak point of python-graphblas. It is impossible to learn the system without referring to external material that describes the C API. In fact the competing package pygraphblas seems to be doing a little bit better on this front, though still not well enough for users to be able to avoid C-based documentation.

My main recommendation---and I realize that this is a long-term project---is making significant improvements to the documentation with the pure-Python user in mind.

In the intro:

  • Explain how this package is different from graph libraries like igraph or NetworkX? Make it clear that it is not at all an alternative to them.
  • Who is the target audience? Algorithm implementors, not people looking for existig algorithms.

More in depth:

  • Explain the basics of GraphBLAS. There's a very good starting point here.
  • Sufficient documentation for people to learn to work with GraphBLAS without ever leaving Python.
  • This includes better API docs as well, e.g. the discoverability of available operators and semirings.

Interoperability

In addition to the technically sound and well thought out GraphBLAS interface, the package contains functionality for interoperability with graph theory / network analysis tools, as well as some visualization tools. The quality of these is in need of some improvement:

I opened some visualization issues, python-graphblas/python-graphblas#473 python-graphblas/python-graphblas#474 python-graphblas/python-graphblas#475

All this said, interoperability and visualization are not core functionality. As I see it, they are for convenience, and the package should not be judges based on these.

@tomalrussell
Copy link
Collaborator

Thanks for the above, @sneakers-the-rat and @szhorvat !

@eriknw @jim22k @SultanOrazbayev - recognising that you've been engaged with the review already, and that some things may be pushed to a longer timeframe, are you happy to respond and make any priority changes by the end of the week (30th June)?

@eriknw
Copy link
Author

eriknw commented Jul 11, 2023

I applaud the reviewers 👏 . I never expected such thorough and honest reviews. Thank you for positive feedback and the criticisms. I agree with all of it--I think you nailed both the strengths and the weaknesses. The reviews are valuable and I suspect will help shape our vision and effort for the next couple of years.

In particular, there are two main areas we need to give more attention:

  • documentation
  • maintainability

Heh, this is probably true for many projects. My main focus in the near and medium term will be maintainability.

Now if I may ramble on for a bit...

It's interesting that reviews occur at a specific moment in time. I know the history of python-graphblas, and, trust me, it has evolved significantly in every 6 month period of its 3.5 year lifetime. If we had been writing pristine user-facing documentation from the beginning, it would have needed to be rewritten and revised endlessly. Functionality would probably be 1.5-2 years behind where we are today. If you're curious, go back and look at versions around 1.3.8 to 1.3.14. It's recognizable, but so different and missing so, so much!

Anyway. I want to highlight this comment:

if one of the main GraphBLAS advantages is usability from high-level languages, then the target audience should be users of such languages

Absolutely! I agree 100%. We aspire to this. It will take time.

From a product perspective, I/we wanted to get the syntax and functionality stable enough to begin writing graphblas-algorithms in earnest. We are targeting networkx users and are adding dispatching to networkx.

Oh, and if you're curious why our test coverage is so high, it's' for multiple reasons:

  • I was considering Cythonizing everything and moving it to cygraphblas, and thorough coverage would give me more confidence
    • I eventually dropped this idea, b/c it would be too great a barrier for maintainability (also, coverage now supports Cython)
  • Similarly, we still plan to further develop a parallel, Dask-ified version of this library that will use the same tests.
  • For greater confidence when adding a new implementation, or to help test a new implementation of GraphBLAS
  • It's simply a good idea, especially for a package as complicated as ours!

Wrapping up... I think we have replied to all open issues from the review. They will keep us busy, that's for sure.

@jim22k @SultanOrazbayev want to say anything else? I don't think it's necessary for you to comment here.

Thanks again all, hope to see you around ❤️ !

@tomalrussell
Copy link
Collaborator

Thanks @eriknw (also for the lovely tone and for the background and bit of history)!

@szhorvat and @sneakers-the-rat for complete clarity, can you confirm you're happy with responses?

@szhorvat
Copy link

Yes. The suggestions I made are mostly for the long term.

@sneakers-the-rat
Copy link
Collaborator

Same, full approval from me :)

@tomalrussell
Copy link
Collaborator

tomalrussell commented Jul 14, 2023

Thanks all, time and effort very much appreciated. The review process is done!

All that's left is to wrap up, publish the version of record and acknowledge all your contributions.


🎉 python-graphblas has been approved by pyOpenSci! Thank you @eriknw for submitting python-graphblas and many thanks to @szhorvat and @sneakers-the-rat for reviewing this package! 😸

Author and Reviewer Wrap-Up Tasks

There are just a few things left to do to wrap up this submission, @eriknw, @jim22k, @SultanOrazbayev:

  • Activate Zenodo watching the repo if you haven't already done so.
  • Tag and create a release to create a Zenodo version and DOI.
  • Add the badge for pyOpenSci peer-review to the README.md of python-graphblas. The badge should be [![pyOpenSci](https://tinyurl.com/y22nb8up)](https://github.com/pyOpenSci/software-review/issues/81).
  • Add python-graphblas to the pyOpenSci website. Please open a PR to update this file: to add your package and name to the list of contributors.

Both reviewers and maintainers (@sneakers-the-rat, @szhorvat too):

  • Please fill out the post-review survey. All maintainers and reviewers should fill this out.
  • Reviewers and maintainers, if you have time and are open to being listed on our website, please add yourselves to this file via a PR so we can list you on our website as contributors!

Editor Final Checks

Please complete the final steps to wrap up this review. @tomalrussell, please do the following:

  • Make sure that the maintainers filled out the post-review survey
  • Invite the maintainers to submit a blog post highlighting their package. Feel free to use / adapt language found in this comment to help guide the author.
  • Change the status tag of the issue to 6/pyOS-approved6 🚀🚀🚀.

If you have any feedback for us about the review process please feel free to share it here. We are always looking to improve our process and documentation in the peer-review-guide.

@tomalrussell
Copy link
Collaborator

@eriknw and team, I'd also like to invite you to write a blog post on your package for us to promote your work. If you are interested - here are a few examples of other blog posts:

This can be a really high-level motivation for the package, for a slightly-scientific-Python-user-audience, or could draw on your introductory tutorial material to get straight to what the package does..

This is totally optional and not a requirement, but if you have time, we'd love to spread the word about python-graphblas to pyOpenSci blog readers ☺️

@sneakers-the-rat
Copy link
Collaborator

did the post-review survey and submitted contributors.yml patch :)

@lwasser
Copy link
Member

lwasser commented Jul 27, 2023

Friends - i Believe this issue can be closed!! if it should be opened please just reopen or let me know! congratulations on a successful review and thank you everyone for participating in our pyOpenSci review process! I am so appreciative of you all!! ✨

@lwasser lwasser closed this as completed Jul 27, 2023
@eriknw
Copy link
Author

eriknw commented Jul 27, 2023

w00t! Also, we'd be happy to write a short blog post :)

SultanOrazbayev added a commit to SultanOrazbayev/pyopensci.github.io that referenced this issue Jul 27, 2023
This adds the package details for python-graphblas. This to-do was marked as completed in the main thread here: pyOpenSci/software-submission#81, but I didn't see the update in yaml. Did I miss it? If so, kindly disregard.

@eriknw
@sneakers-the-rat sneakers-the-rat mentioned this issue Mar 8, 2024
30 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: pyos-accepted
Development

No branches or pull requests

7 participants