Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BioCypher submission #110

Closed
20 of 30 tasks
slobentanzer opened this issue May 4, 2023 · 49 comments
Closed
20 of 30 tasks

BioCypher submission #110

slobentanzer opened this issue May 4, 2023 · 49 comments

Comments

@slobentanzer
Copy link

slobentanzer commented May 4, 2023

Submitting Author: Sebastian Lobentanzer (@slobentanzer)
All current maintainers: (@slobentanzer)
Package Name: biocypher
One-Line Description of Package: framework for creating biomedical knowledge graphs
Repository Link: https://github.com/biocypher/biocypher
Version submitted: 0.5.14
Editor: @arianesasso
Reviewer 1: @Zethson
Reviewer 2: @Midnighter
Reviewer 3: Sinna
Archive: DOI
Version accepted: v0.5.33
JOSS DOI: N/A
Date accepted (month/day/year): 11/14/2023


Code of Conduct & Commitment to Maintain Package

Description

  • Include a brief paragraph describing what your package does:
    BioCypher is a flexible and modular framework for the creation of knowledge graphs on the basis of ontologies, yielding a number of representations of the input knowledge sources. Its aim is to increase accessibility of biomedical knowledge graphs to a wide range of biomedical researchers.

Scope

  • Please indicate which category or categories.
    Check out our package scope page to learn more about our
    scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):

    • Data retrieval
    • Data extraction
    • Data processing/munging
    • Data deposition
    • Data validation and testing
    • Data visualization1
    • Workflow automation
    • Citation management and bibliometrics
    • Scientific software wrappers
    • Database interoperability

Domain Specific & Community Partnerships

- [ ] Geospatial
- [ ] Education
- [ ] Pangeo

Community Partnerships

If your package is associated with an
existing community please check below:

  • For all submissions, explain how the and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):

    • Who is the target audience and what are scientific applications of this package?

    • Are there other Python packages that accomplish the same thing? If so, how does yours differ?

    • If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted:

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • uses an OSI approved license.
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a tutorial with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration setup, such as GitHub Actions CircleCI, and/or others.

Publication Options

JOSS Checks
  • The package has an obvious research application according to JOSS's definition in their submission requirements. Be aware that completing the pyOpenSci review process does not guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS.
  • The package is not a "minor utility" as defined by JOSS's submission requirements: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria.
  • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
  • The package is deposited in a long-term repository with the DOI:

Note: Do not submit your package separately to JOSS

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

  • Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

Confirm each of the following by checking the box.

  • I have read the author guide.
  • I expect to maintain this package for at least 2 years and can help find a replacement for the maintainer (team) if needed.

Please fill out our survey

P.S. Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

The editor template can be found here.

The review template can be found here.

Footnotes

  1. Please fill out a pre-submission inquiry before submitting a data visualization package.

@lwasser
Copy link
Member

lwasser commented May 12, 2023

hi @slobentanzer thank you for this submission and welcome to pyOpenSci! I just wanted to say hello 👋 and to let you know that we see this submission. We will get back to you with the start of our initial checks next week! In the meantime, have a wonderful weekend. we will be in touch. ✨

@NickleDave
Copy link
Contributor

NickleDave commented May 16, 2023

Welcome to pyOpenSci @slobentanzer and thank you for your patience.

I'm happy to report that biocypher passes almost all of the initial editor checks.

There is one thing we need you to take care of before we start the review:

  • README The package has a README.md file with clear explanation of what the package does, instructions on how to install it, and a link to development instructions.

Please add the following to the README:

  • a clear explanation of what the package does, such that it could be understood by a general scientific audience.
    • What is a knowledge graph? How are they used? This can be brief with a link to a review or wikipedia article to supplement, that is understandable even to people that don't already work with knowledge graphs
    • It would really help to have a precise, succinct definition of what the package does and how it does it; use verbs. "Allows users to construct knowledge graphs from heterogeneous biomedical data, using adapters written in Python". This should ideally be the first thing in the README. The description that comes a bit lower has a helpful figure, but the language makes it sound very abstract without any detail on what or how, "this is our proposal for a framework..."
  • instructions on how to install: this can be simple, pip install biocypher and if there is a package on a community conda channel, conda install biocypher -c conda-forge (or bioconda or whatever)

Please let me know once you have addressed this requirement.

The filled-out template with checks follows.

Editor in Chief checks

Hi there! Thank you for submitting your package for pyOpenSci
review. Below are the basic checks that your package needs to pass
to begin our review. If some of these are missing, we will ask you
to work on them before the review process begins.

Please check our Python packaging guide for more information on the elements below.

  • Installation The package can be installed from a community repository such as PyPI (preferred), and/or a community channel on conda (e.g. conda-forge, bioconda).
    • pip install biocypher works. There is not a conda-forge package as far as I can tell
    • The package imports properly into a standard Python environment import package-name.
  • Fit The package meets criteria for fit and overlap.
  • Documentation The package has sufficient online documentation to allow us to evaluate package function and scope without installing the package. This includes:
    • User-facing documentation that overviews how to install and start using the package.
    • Short tutorials that help a user understand how to use the package and what it can do for them.
    • API documentation (documentation for your code's functions, classes, methods and attributes): this includes clearly written docstrings with variables defined using a standard docstring format. We suggest using the Numpy docstring format.
  • Core GitHub repository Files
    • README The package has a README.md file with clear explanation of what the package does, instructions on how to install it, and a link to development instructions.
      • There is a README but it does not have an explanation of what the package does or install instructions8
    • Contributing File The package has a CONTRIBUTING.md file that details how to install and contribute to the package.
    • Code of Conduct The package has a Code of Conduct file.
    • License The package has an OSI approved license.
      NOTE: We prefer that you have development instructions in your documentation too.
  • Issue Submission Documentation All of the information is filled out in the YAML header of the issue (located at the top of the issue template).
  • Automated tests Package has a testing suite and is tested via GitHub actions or another Continuous Integration service.
  • Repository The repository link resolves correctly.
  • Package overlap The package doesn't entirely overlap with the functionality of other packages that have already been submitted to pyOpenSci.
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly.
    • there is already a pre-print associated with the package, see link in README
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

  • Initial onboarding survey was filled out
    We appreciate each maintainer of the package filling out this survey individually. 🙌
    Thank you authors in advance for setting aside five to ten minutes to do this. It truly helps our organization. 🙌


Editor comments

None of these are required but are just things I noted during the checks:

  • In addition to adding the required clear explanation and install instructions to the README, you might consider adding more info from the index of the docs; there is a lot of very useful info / context there. For more on good practices for READMEs, see https://www.pyopensci.org/python-package-guide/documentation/repository-files/readme-file-best-practices.html
  • It's a bit confusing that the install instructions (1) recommend a specific development tool and show adding the package as a dependency with that tool, and (2) use pip inside a conda environment. Typically one might show pip install package for the first and conda install package -c conda-forge for the second
  • I noticed the package defaults to logging at the INFO level and writing a log file on import. Not sure that everyone will want that by default, especially if they are on a shared system

@slobentanzer
Copy link
Author

Hi @NickleDave,
thanks for the feedback, I added the requested details to the README.

Regarding your comments, here's my opinion:

  • I prefer not having dual maintenance of content, i.e., I do not want to keep a readme and the docs up to date simultaneously; so I am not in favour of packing things from the docs into the readme, because they need to be kept in sync in this case. What is the benefit of having it in the README, if users can just read the docs? BTW, that was also the reason that the docs were the first section of the README, as opposed to the description as suggested by you.
  • I always use pip inside conda envs and did not know this was unusual. Happy for suggestions as to how to make that section clearer or more aligned with good practise.
  • I am new to Python and find logging quite confusing; happy for suggestions (or a pointer to a best practise resource) here as well.

@NickleDave
Copy link
Contributor

Thank you @slobentanzer for adding those details to the README. Looks great!

What is the benefit of having it in the README, if users can just read the docs?

I hear you. I also do not like the idea of duplicating content in the README that can go out of sync.
One way around this is to have the single source of truth be your README, and then get that into your docs by using an include directive in RST or in MyST

What is the benefit of having it in the README, if users can just read the docs?

There is an argument to be made that developers hyper-fixate on "Awesome" READMEs, mainly to impress other developers. Still, you don't want to lose a user because they found your GitHub repo first, and they couldn't figure out what your project really does, so they gave up without bothering to read the docs. What is the killer feature and what does it look like in code? If you can express that in one sentence and a single snippet with a few lines of code, at the top of both your README and on the index page of your docs, then you know you're making the best case for your package to new users. Of course that's not possible for every package, but seems like a good general goal to aim for.

I always use pip inside conda envs and did not know this was unusual. Happy for suggestions as to how to make that section clearer or more aligned with good practise.

I am not a conda expert but my understanding is that it's good practice to prefer conda install over pip install, to be able to reproduce environments. Basically because pip does things that conda doesn't do, or that conda doesn't know about. Maybe @jjhelmus can let us know if the advice in this blog post is still up-to-date:
https://www.anaconda.com/blog/using-pip-in-a-conda-environment

I am new to Python and find logging quite confusing; happy for suggestions (or a pointer to a best practise resource) here as well.

I am less new to Python but I still find logging confusing 😭

I'd suggest a quote from Hitchhiker's guide to Python is relevant here:

the user, not the library, should dictate what happens when a logging event occurs

https://docs.python-guide.org/writing/logging/#logging-in-a-library

For that reason I would avoid forcing a logging file and messages on your users.

I have looked several times but have never found a guide to logging that I felt answered my questions.

You probably know these already but:

The main question I had as a library developer is how to use the logger in my modules, and how should I configure it. Through trial and error I have found it's "good enough" practice to make a new logger instance in each module, assigning it __name__ as its name (as you will read in most guides). I have also found that if I do want to configure those logger instances, it's easiest to achieve this through a function I call that checks if there are already existing Handlers attached, so that I don't stomp on a user's existing configuration. See here: vocalpy/vak#535

In my experience is that most scientific Python libraries do not include logging. There are exceptions I can think of:

You might find reading the code of those libraries helpful

@NickleDave
Copy link
Contributor

Hope I did not just overwhelm you with advice.
I will find an editor and we will proceed with the review.
I know I got off to a bit of a slow start--thank you for your patience.

@NickleDave
Copy link
Contributor

Hi again @slobentanzer, happy to inform you that @arianesasso will be editor for this review.
We are seeking appropriate reviewers now and she will tag them in once we have found them.

@arianesasso arianesasso self-assigned this Jun 1, 2023
@arianesasso
Copy link

Hello @slobentanzer 👋🏼 . Thanks for the intro @NickleDave :). It is my pleasure to review BioCypher! We are now in the process of finding reviewers for the package. Hopefully, depending on their response, this should take one to two weeks. I will keep you posted on this thread, and if you have any questions, feel free to reach out!

@arianesasso
Copy link

Dear @slobentanzer, finding reviewers is taking a bit longer than expected. I hope that is ok. We might have someone already but he would only be able to start the review in a few weeks. Would that be acceptable?

@slobentanzer
Copy link
Author

Hi @arianesasso,
sure, I have no immediate time pressures.

@arianesasso
Copy link

@slobentanzer Great! Sorry for the delay, but now we have reviewers for BioCypher 🥳 .

@Zethson will be our first reviewer and should be starting his review soon.

@Zethson Thank you so much for your contribution! Please, check our reviewer's guide, the reviewer template, and the Python packaging standards detailed in the packaging guide. Also, please reach out in case you have any questions. Happy reviewing!

@arianesasso
Copy link

Dear @slobentanzer, we now have our second reviewers 🥳 !

@Midnighter will be the second reviewer, together with his colleague Sinna. She will cover the domain knowledge while he will look more on the Python side. We ask you for a bit of patience since now people are mainly on Summer vacation, so the review might take a bit longer.

@Midnighter Thank you so much to both of you for agreeing to review BioCypher! Please, check our reviewer's guide, the reviewer template, and the Python packaging standards detailed in the packaging guide. Also, please don't hesitate to reach out if you have any questions. Happy reviewing!

@arianesasso
Copy link

arianesasso commented Jul 13, 2023

@Zethson @Midnighter (and Sinna), before starting to review, please fill out our pre-review survey. This helps us improve all aspects of our review and better understand our community. No personal data will be shared from this survey - it will only be used in an aggregated format by our Executive Director to improve our processes and programs.

The following resources will help you complete your review:

  1. Here is the reviewers guide. This guide contains all of the steps and information needed to complete your review.
  2. Here is the review template that you will need to fill out and submit
    here as a comment once your review is complete.

Please get in touch with any questions or concerns! Your review is due: 31 August 2023.


Reviewers: @Zethson and @Midnighter (and Sinna)
Due date: 31 August 2023

@Zethson
Copy link

Zethson commented Jul 13, 2023

Completed the survey.

@slobentanzer
Copy link
Author

Thanks @Zethson and @Midnighter / Sinna for reviewing! Looking forward to your comments. :)

@Midnighter
Copy link

FYI I will be on vacation until Aug 8. So I won't be active until then.

@Zethson
Copy link

Zethson commented Jul 21, 2023

Lukas (zethson) review

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work.

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README.
  • Installation instructions: for the development version of the package and any non-standard dependencies in README.
  • Vignette(s) demonstrating major functionality that runs successfully locally.
  • Function Documentation: for all user-facing functions.
  • Examples for all user-facing functions.
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING.
  • Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

Readme file requirements
The package meets the readme requirements below:

  • Package has a README.md file in the root directory.

The README should include, from top to bottom:

  • The package name
  • Badges for:
    • [] Continuous integration and test coverage,
    • [] Docs building (if you have a documentation website),
    • A repostatus.org badge,
    • Python versions supported,
    • Current package version (on PyPI / Conda).

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

  • Short description of package goals.
  • Package installation instructions
  • Any additional setup required to use the package (authentication tokens, etc.)
  • Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file.
    • Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear)
  • Link to your documentation website.
  • If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
  • Citation information

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.
Package structure should follow general community best-practices. In general please consider whether:

  • Package documentation is clear and easy to find and use.
  • The need for the package is clear
  • All functions have documentation and associated examples for use
  • The package is easy to install

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests:
    • All tests pass on the reviewer's local machine for the package version submitted by the author. Ideally this should be a tagged version making it easy for reviewers to install.
    • Tests cover essential functions of the package and a reasonable range of inputs and conditions.
  • Continuous Integration: Has continuous integration setup (We suggest using Github actions but any CI platform is acceptable for review)
  • Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines.
    A few notable highlights to look at:
    • Package supports modern versions of Python and not End of life versions.
    • Code format is standard throughout package and follows PEP 8 guidelines (CI tests for linting pass)

For packages also submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: With DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing:


Review Comments

Hey @slobentanzer !

Really amazing work with Biocypher and it really shows that you're trying hard and succeeding with creating a package that is maintainable, well documented and works. I have a few random comments that you can consider addressing:

  1. https://biocypher.org/tutorial.html suggests to install biocypher from source with poetry. I think that this is "noise" here and that all tutorials should always work with the latest version on pypi.
  2. Nit: On the tutorial site terms like "DBMS instance" are not introduced.
  3. Am I wrong or are the Pandas versions of the tutorials not rendered?
  4. API documentation is somewhat "hidden" under "submodule documentation". "API documentation" or something of that sort is probably more common. I also feel like it mostly shows private functions (based on underscores) but not really public ones?
  5. I'm having trouble finding a CI that runs all tests. Am I looking at the wrong places?

Thank you very much!

@slobentanzer
Copy link
Author

Hi @Zethson, huge thanks for taking the time to review the package! It is accurate. I will try to address the missing points and comments as time allows. Here are some explanations, feel free to suggest ways to improve the current state:

  1. The reason we recommend to clone and install the package locally is not because of code dependence, but rather because the tutorial files are included in the repo, but not the package. Do you (or another person reading this thread) have another solution for making the tutorial code available to the user in the simplest way?
  2. Will go over the docs and explain all abbreviations soon.
  3. For me it is. (The first couple functions for setup do not have much of an output.) Could be something to do with how the colab works (each user has their own version, maybe)? I don't usually work with colab, so feel free to suggest improvements.
  4. I am using the Sphinx autodoc module for rendering the function documentation. There is already an issue for making it work for all modules, but I have not had time to delve deep into Sphinx functionality to figure this out. We do only have a fairly small amount of user-facing functions in the first place, which are explained in the tutorials. I could maybe add a manual explanation of the most important functions in quickstart, but that would not be automatic any more.
  5. You are right, there is no action for CI. I have been doing it manually so far due to lack of time to establish a workflow, but I agree that this takes high priority. One issue is that some tests depend on DBMSs running on the test machine, so this is not a straightforward setup. It is on the list.

@Zethson
Copy link

Zethson commented Jul 21, 2023

Hi @Zethson, huge thanks for taking the time to review the package! It is accurate. I will try to address the missing points and comments as time allows. Here are some explanations, feel free to suggest ways to improve the current state:

1. The reason we recommend to clone and install the package locally is not because of code dependence, but rather because the tutorial files are included in the repo, but not the package. Do you (or another person reading this thread) have another solution for making the tutorial code available to the user in the simplest way?

I'd move all tutorials into notebooks and add them to a biocypher-tutorials repository. The tutorials themselves can then download data as needed. Ideally, you could also offer colab/binder instances to run these notebooks then. This may require custom containers and will generally require substantial effort. You could also start by adding "download notebook" buttons to the tutorials.

3. For me it is. (The first couple functions for setup do not have much of an output.) Could be something to do with how the colab works (each user has their own version, maybe)? I don't usually work with colab, so feel free to suggest improvements.

I'll look again.

4. I am using the Sphinx autodoc module for rendering the function documentation. There is already an [issue](https://github.com/biocypher/biocypher/issues/183) for making it work for all modules, but I have not had time to delve deep into Sphinx functionality to figure this out. We do only have a fairly small amount of user-facing functions in the first place, which are explained in the tutorials. I could maybe add a manual explanation of the most important functions in [quickstart](https://github.com/biocypher/biocypher/issues/219), but that would not be automatic any more.

Yeah, you're explaining them in the tutorials, but API documentation for all public functions is imperative IMO.

5. You are right, there is no action for CI. I have been doing it manually so far due to lack of time to establish a workflow, but I agree that this takes high priority. One issue is that some tests depend on DBMSs running on the test machine, so this is not a straightforward setup. It is on [the list](https://github.com/biocypher/biocypher/issues/218).

I can totally see how this is a lot of work, but I really feel like it will be time well spent. This will heavily improve maintainability and the ability for other people to contribute.

Thanks!

@slobentanzer
Copy link
Author

Thanks for the feedback, will address and update here. :)

@Midnighter
Copy link

Hi everyone,

Just to update you, that we started going through the tutorials and looking at the codebase, but won't be able to complete the review today. Our plan is to do so on Friday.

@arianesasso
Copy link

arianesasso commented Aug 16, 2023

Thank you for the update @Midnighter ! I will change the deadline then :). Please let me know in case you have any questions. Also, please answer our survey in case you haven’t.

@Midnighter
Copy link

Moritz & Sinna's Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README.

  • Installation instructions: for the development version of the package and any non-standard dependencies in README.

  • Vignette(s) demonstrating major functionality that runs successfully locally.

  • Function Documentation: for all user-facing functions.

    We don't know if all functions are covered, but there is an API reference.

  • Examples for all user-facing functions.

    In terms of docstrings, they do not contain examples. However, there are extensive tutorials that seem sufficient.

  • Community guidelines including contribution guidelines in the README or CONTRIBUTING.

  • Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

    The metadata mention the two major contributors.

Readme file requirements
The package meets the readme requirements below:

  • Package has a README.md file in the root directory.

The README should include, from top to bottom:

  • The package name
  • Badges for:
    • Continuous integration and test coverage,
    • Docs building (if you have a documentation website),
    • A repostatus.org badge,
    • Python versions supported,
    • Current package version (on PyPI / Conda).

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

  • Short description of package goals.

  • Package installation instructions

  • Any additional setup required to use the package (authentication tokens, etc.)

  • Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file.

    • Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear)
  • Link to your documentation website.

  • If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.

  • Citation information

    There are links to the pre-print and the published paper. We think, this could be improved by adding a CITATION.cff to the repository.

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.
Package structure should follow general community best-practices. In general please consider whether:

  • Package documentation is clear and easy to find and use.
  • The need for the package is clear
  • All functions have documentation and associated examples for use
  • The package is easy to install

Functionality

  • Installation: Installation succeeds as documented.

    Succeeded on Linux with just pip install biocypher. Failed when installing with poetry as per the developer guide. Installing neo4j causes an error:

    ImportError
    
    cannot import name 'Python2Supports' from 'virtualenv.create.describe' (/home/ubuntu/.pyenv/versions/3.10.12/envs/biocypher/lib/python3.10/site-packages/virtualenv/create/describe.py)
    
    at ~/.pyenv/versions/3.10.12/envs/biocypher/lib/python3.10/site-packages/virtualenv/create/via_global_ref/builtin/python2/python2.py:6 in <module>
          2│ import json
          3│ import os
          4│ from pathlib import Path
          5│
      →   6│ from virtualenv.create.describe import Python2Supports
          7│ from virtualenv.create.via_global_ref.builtin.ref import PathRefToDest
          8│ from virtualenv.info import IS_ZIPAPP
          9│ from virtualenv.util.zipapp import read as read_from_zipapp
    
  • Functionality: Any functional claims of the software been confirmed.

  • Performance: Any performance claims of the software been confirmed.

  • Automated tests:

    • All tests pass on the reviewer's local machine for the package version submitted by the author. Ideally this should be a tagged version making it easy for reviewers to install.

    • Tests cover essential functions of the package and a reasonable range of inputs and conditions.

      The coverage is good. We did not go into detail of checking the reasonableness of each individual unit test.

  • Continuous Integration: Has continuous integration setup (We suggest using Github actions but any CI platform is acceptable for review)

    There is a GH workflow that runs the tests. However, only on Linux. We strongly recommend to also test on Windows and MacOS.

  • Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines.
    A few notable highlights to look at:

    • Package supports modern versions of Python and not End of life versions.
    • Code format is standard throughout package and follows PEP 8 guidelines (CI tests for linting pass)

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing:


Review Comments

Major

We see the major value added by biocypher in providing a tool to manage knowledge graphs and hosting a repository of adapters to quickly import data from a range of well-known sources. In general, the documentation is well written and makes the topic of knowledge graphs and ontologies accessible. Even though it is beyond the scope of the biocypher tool itself, we feel that it would be beneficial if the documentation also provided simple examples of how a knowledge graph can then be exploited in different applications, and not only examples of its main purpose to ingest data sources.

Minor

We have inserted a few minor comments in italic inbetween the checklists above where they fit the context tightly. Below, we list some more that are not directly related to the checklists.

  1. Installation instructions in the README are short, but longer instructions are available in the full documentation, which is the way we personally prefer it, but it clashes with the pyOpenSci guidelines, which prefer a more extensive README.

  2. Neither the online documentation, the CONTRIBUTING.md, nor the DEVELOPER.md guide mention that poetry installs optional development dependencies. This will be important to know if contributors want to add further dependencies (they should add them to the correct group).

  3. The DEVELOPER.md mentions that the project uses napoleon as the documentation format. That doesn't make sense. Napoleon is a tool to parse docstrings written in the Google or numpy style, so that should be changed.

  4. We recommend formatting the badges as a table. There are already quite many.

  5. The basic tutorial suggests to run poetry run python tutorial/01__basic_import.py. This cannot currently work as the line

    from tutorial.data_generator import Protein

    will fail. The tutorial directory will require an __init__.py module and it would need to be on the PYTHONPATH. Alternatively, users could be required to change the working directory into tutorial and the scripts are adjusted to import the data_generator directly.

  6. The basic tutorial mentions a method bc.log_missing_bl_types() # show input unaccounted for in the schema. This method no longer exists on the BioCypher class. We recommend to execute code in documentation in order to discover such problems quickly.

  7. We feel that it would improve the examples, if the data generator created
    description fields with actual text (like lorem ipsum), rather than just
    random letters.

  8. While the code formatting is fine style-wise, in particular the tests make ample use of single letter variables. This is against the pep8 guidelines and current best practices in general. We recommend using descriptive variable names everywhere.

  9. The tests directory contains an __init__.py module. Pytest doesn't need those and it should be removed.

  10. Some of the tests declare module level globals. It would be better to use fixtures instead and set the appropriate scope for them.

  11. The CI workflow definition creates a postgres container with the latest tag. Are you sure that you don't want to depend on a specific version as you do for neo4j?

  12. The CI workflow checkout and Python setup actions are outdated.

  13. You could publish the package to PyPI via a workflow as well. See our taxpasta workflow for an example of how to do so.

  14. Running the tests on postgres requires not only a running database, but also the psql command line tool. I did not see this in the documentation. In general, the section https://biocypher.org/installation.html#for-developers has detailed instructions for neo4j, but not for postgres.

  15. Having the contribution guidelines as part of the main docs in addition to the DEVELOPER.md file would be very helpful, too!

@arianesasso
Copy link

Thank you @Midnighter and Sinna for the great review! 😄

@slobentanzer would two weeks be a good timeline for you to address the reviews, or is it too tight? Also, let us know in case you have any general or specific questions about them :).

@slobentanzer
Copy link
Author

@Midnighter and Sinna, thanks a lot for your time and thoughtful comments. I agree with your review, and have added corresponding issues to the review milestone.

@arianesasso the coming weeks are pretty full of deadlines, I am not sure I will be able to address all points in 2 weeks. End of September seems more realistic, but I will update here in any case (plus any questions that may arise).

@arianesasso
Copy link

@slobentanzer thank you for the update and no worries at all :). If you need anything in the meantime, just let me know.

@arianesasso
Copy link

arianesasso commented Sep 12, 2023

@slobentanzer Just wanted to check-in to see how things are going and if you need any help :).

@arianesasso
Copy link

Heeey @slobentanzer, just checking in to see how things are going. Is there any way we can support you moving forward?

@slobentanzer
Copy link
Author

Hi @arianesasso,
thanks for checking in. Not sure what you mean by support, can you elaborate?

Other than that, you can still track the progress of the review implementations in the milestone I mentioned above. I have onboarded @nilskre, who is helping with the proceedings while getting familiar with the framework.

@arianesasso
Copy link

Thanks for the updates!

I was just wondering if you had any open questions you would like to ask me and/or the reviewers :). Also, about the deadline, does the end of September still work for you?

@slobentanzer
Copy link
Author

It depends a bit on our speed, could go into October. We are dedicating most of our available development time in BioCypher to this review.

@slobentanzer
Copy link
Author

Hi @arianesasso,
we have now completed all of the issues I created in response to the reviewer comments. For an overview, you can refer to this milestone, which also functions as a response of sorts.

How do we proceed from here?

@arianesasso
Copy link

Hey @slobentanzer, I just checked the link and it sounds like you addressed all the reviewers' comments! I changed the status of your submission and now we need @Midnighter and @Zethson to go over their checklists again and confirm that all the points were addressed. You are almost there 🥳 . Thank you for all the hard work!

@Zethson
Copy link

Zethson commented Nov 7, 2023

Great! I confirm that all points were addressed and recommend approving this package

@slobentanzer
Copy link
Author

Thanks for the great input, @Zethson!

@Midnighter
Copy link

Midnighter commented Nov 8, 2023

Can confirm that all our comments regarding the code project have been addressed. Still think it'd be motivating for potential users unfamiliar with or new to knowledge graphs to see some usage scenarios, but we can understand that you consider it too much right now.

We recommend approval.

@slobentanzer
Copy link
Author

@Midnighter and Sinna, thanks a lot for your feedback! We agree that use cases are very important, and we will definitely be updating the docs regularly with more applied scenarios, as time permits. :)

@NickleDave
Copy link
Contributor

NickleDave commented Nov 15, 2023

Hi all 👋 I'm here to help our excellent editor @arianesasso who had a family emergency.

Thank you @Midnighter and @Zethson for confirming your comments were addressed, and letting us know you recommend approval.

@slobentanzer I am happy to report that 🎉 BioCypher has been approved by pyOpenSci! Thank you for submitting! 🎉

Author Wrap Up Tasks

There are a few things left to do to wrap up this submission:

  • Activate Zenodo watching the repo if you haven't already done so.
  • Tag and create a release to create a Zenodo version and DOI.
  • Add the badge for pyOpenSci peer-review to the README.md of . The badge should be [![pyOpenSci](https://tinyurl.com/y22nb8up)](https://github.com/pyOpenSci/software-review/issues/issue-number).
  • Please fill out the post-review survey. All maintainers and reviewers should fill this out.

Editor Final Checks

Please complete the final steps to wrap up this review. Editor, please do the following:

  • Make sure that the maintainers filled out the post-review survey
  • Invite the maintainers to submit a blog post highlighting their package. Feel free to use / adapt language found in this comment to help guide the author.
  • Change the status tag of the issue to 6/pyOS-approved6 🚀🚀🚀.

If you have any feedback for us about the review process please feel free to share it here. We are always looking to improve our process and documentation in the

@slobentanzer
Copy link
Author

Thanks a lot, @NickleDave! I have activated Zenodo, added the bade, and filled out the form. Thanks for everything to everybody who helped in this review! :)

@NickleDave
Copy link
Contributor

Thank you @slobentanzer and congratulations again! 🎉

I have announced it on our Discourse too.

If you are interested, we'd love to help you spread the word about BioCypher with a blog post on pyOpenSci's site.

@Midnighter and @Zethson huge thank you again for your work as reviewers. Please if you have a few minutes it would also help us if you could fill out the post-review survey.

And thanks of course to @arianesasso for being the editor for this one. I'll go ahead and close this issue.

@slobentanzer
Copy link
Author

Hi @NickleDave, thanks again (also to all), and happy to help with a blog post if it should be required.

@NickleDave
Copy link
Contributor

Definitely not required @slobentanzer, we just want to give you a venue to share BioCypher. Sometimes a blog post can be a way to talk about the package that's not as clear from docs and papers.
I'll invite you to our Slack team too if you'd like to join there and ask questions.

@slobentanzer
Copy link
Author

Sure, sounds great. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: pyos-accepted
Development

No branches or pull requests

7 participants