Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crowsetta: A Python tool to work with any format for annotating animal vocalizations and bioacoustics data. #68

Closed
16 of 24 tasks
NickleDave opened this issue Jan 3, 2023 · 58 comments

Comments

@NickleDave
Copy link
Contributor

NickleDave commented Jan 3, 2023

Submitting Author: David Nicholson (@NickleDave )
All current maintainers: (@NickleDave )
Package Name: crowsetta
One-Line Description of Package: A Python tool to work with any format for annotating animal vocalizations and bioacoustics data.
Repository Link: https://github.com/vocalpy/crowsetta
Version submitted: 4.0.0.post2
Editor: @cmarmo
Reviewer 1: @rhine3
Reviewer 2: @shaupert
Archive: DOI
JOSS DOI: DOI
Version accepted: v 5.0
Date accepted (month/day/year): 03/28/2023


Description

  • Include a brief paragraph describing what your package does:
    crowsetta provides a Pythonic way to work with annotation formats for animal vocalizations and bioacoustics data. It has has built-in support for many widely used formats such as Audacity label tracks, Praat .TextGrid files, and Raven .txt files. The package focuses on providing interoperability, as well as making it easier to share data in plaintext flat-file formats (csv) and common serialization formats (json). In addition, abstractions in the package are designed to make it easy to use these simplified formats for common downstream tasks. Examples of such tasks are fitting statistical models of vocal behavior, and building datasets to train machine learning models that predict new annotations.

Scope

  • Please indicate which category or categories this package falls under:
    • Data retrieval
    • Data extraction
    • Data munging
    • Data deposition
    • Reproducibility
    • Geospatial
    • Education
    • Data visualization*

Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see notes on categories of our guidebook.

n/a

  • For all submissions, explain how the and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):

    • Who is the target audience and what are scientific applications of this package?

Anyone that works with animal vocalizations or other bioacoustics data that is annotated in some way. Examples (from the landing page of the docs): neuroscientists studying how songbirds learn their song, or why mice emit ultrasonic calls. Ecologists studying dialects of finches distributed across Asia, linguists studying accents in the Caribbean, a speech pathologist looking for phonetic changes that indicate early onset Alzheimer’s disease.

  • Are there other Python packages that accomplish the same thing? If so, how does yours differ?

Not to my knowledge.

There are many format-specific packages in various states of maintenance, e.g. a search for the format textgrid used by the application Praat on PyPI currently returns 33 packages (include crowsetta):
https://pypi.org/search/?q=textgrid&o=

There are also several larger packages whose functionality includes the ability to parse specific formats, e.g. Parselmouth wraps all of Praat and thus can load TextGrid files. But the goal of crowsetta is mainly to provide interoperability, and to do so for a wide array of formats, so that other higher-level libraries can leverage its functionality. This emphasis on data extraction + munging, like the possibly destructive transformation to other formats, makes it in scope for pyOpenSci.

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • has an OSI approved license.
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a vignette with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.

Publication options

yes

  • If so:
JOSS Checks
  • The package has an obvious research application according to JOSS's definition in their submission requirements. Be aware that completing the pyOpenSci review process does not guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS.
  • The package is not a "minor utility" as defined by JOSS's submission requirements: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria.
  • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
  • The package is deposited in a long-term repository with the DOI:

Note: Do not submit your package separately to JOSS

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

  • Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

Code of conduct

Please fill out our survey

P.S. *Have feedback/comments about our review process? Leave a comment here

Potential reviewers

As discussed with @lwasser I am tagging some potential reviewers given that pyOpenSci does not currently have anyone that's familiar with this area (besides me)

@rhine3 @shyamblast @YannickJadoul @avakiai @danibene @nilor

edit: just updating here for clarity that I am only suggesting reviewers to help bootstrap the process since we're still a growing org. @lwasser will assign an editor that will then reach out directly to potential reviewers. Sorry for any confusion, and we appreciate interest of people that have replied so far.

Editor and Review Templates

Editor and review templates can be found here

@lwasser
Copy link
Member

lwasser commented Jan 12, 2023

hi @NickleDave responding here so it's visible that this issue is being worked on! I am looking for an editor for this package now given I can't step in due to COI! :) but please know i've reached out to a few people and am in the process of finding an editor.

@NickleDave
Copy link
Contributor Author

Thank you @lwasser, understood

@cmarmo
Copy link
Member

cmarmo commented Jan 12, 2023

Hi @NickleDave , I'm Chiara and I'm going to be the editor for your submission.
I'm sure I'm going to learn a lot in the process. Nice to meet you!
I will perform the editor checks in the next days (no more than a week) and starting to look for reviewers (thanks for your suggestions! 🙏 ).

@NickleDave
Copy link
Contributor Author

Hi @cmarmo! Nice to meet you as well.

Looking forward to working with you on the review, I am sure I will learn a lot.

A week is perfect, I should have survived a deadline at my day job by then 😆

@cmarmo
Copy link
Member

cmarmo commented Jan 15, 2023

Editor in Chief checks

Hi @NickleDave, below are the basic checks that your package needs to pass
to begin our review.

  • Installation The package can be installed from a community repository such as PyPI (preferred), and/or a community channel on conda (e.g. conda-forge, bioconda).
    • The package imports properly into a standard Python environment import package-name.
  • Fit The package meets criteria for fit and overlap.
  • Documentation The package has sufficient online documentation to allow us to evaluate package function and scope without installing the package. This includes:
    • User-facing documentation that overviews how to install and start using the package.
    • Short tutorials that help a user understand how to use the package and what it can do for them.
    • API documentation (documentation for your code's functions, classes, methods and attributes): this includes clearly written docstrings with variables defined using a standard docstring format. We suggest using the Numpy docstring format.
  • Core GitHub repository Files
    • README The package has a README.md file with clear explanation of what the package does, instructions on how to install it, and a link to development instructions.
    • Contributing File The package has a CONTRIBUTING.md file that details how to install and contribute to the package.
      • The contributing.md file is located in the .github/ folder, while pyOpenSci recommends including it in the root repository.
    • Code of Conduct The package has a Code of Conduct file.
    • License The package has an OSI approved license.
      NOTE: We prefer that you have development instructions in your documentation too.
  • Issue Submission Documentation All of the information is filled out in the YAML header of the issue (located at the top of the issue template).
  • Automated tests Package has a testing suite and is tested via GitHub actions or another Continuous Integration service.
  • Repository The repository link resolves correctly.
  • Package overlap The package doesn't entirely overlap with the functionality of other packages that have already been submitted to pyOpenSci.
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly.
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

  • Initial onboarding survey was filled out
    We appreciate each maintainer of the package filling out this survey individually. 🙌
    Thank you authors in advance for setting aside five to ten minutes to do this. It truly helps our organization. 🙌


Editor comments

The package is already in a very good shape, thank you for all your work!

I would like to mention that the contributing.md file is located in the .github/ directory and not in the root repository as recommended by the pyOpenSci documentation.
I don't have any strong opinion about that, but I must confess that I had to look for it for a moment before remembering that it could be placed in the .github directory. I guess having it in the root repo will make it more discoverable directly on-line (without downloading the sources).
Feel free to move it or not, depending on your point of view.

I'm going to look for reviewers for your package now.
I hope this will not take too much time... I will let you know anyway by the end of the next week.
Thanks for your patience!

@NickleDave
Copy link
Contributor Author

NickleDave commented Jan 15, 2023

🙏 thank you @cmarmo

the contributing.md file is located in the .github/ directory and not in the root repository as recommended by the pyOpenSci documentation.
I guess having it in the root repo will make it more discoverable directly on-line

I can move it to the root, and I agree with you that it's easier to spot that way for someone looking at the repo e.g. on GitHub in the browser.

I think I was basically copying what I saw in numpy--I thought the file had to be there for the link to show up when someone opened an issue.

But now I see the GitHub docs say it can be there, in the root, or docs, as you said. Good to know.

So I'll just move it.

@NickleDave
Copy link
Contributor Author

NickleDave commented Jan 15, 2023

Done in vocalpy/crowsetta@9d435f6

edit: also I have now filled out the onboarding survey (sorry, forgot!)

@cmarmo
Copy link
Member

cmarmo commented Jan 15, 2023

So I'll just move it.

Thanks @NickleDave !

@cmarmo
Copy link
Member

cmarmo commented Jan 20, 2023

Dear @NickleDave,
I have been lucky this week and already found the two reviewers for your submission.
I would like to thank @rhine3 and @shaupert for accepting to serve as reviewers.
I'll let them introduce themselves here when ready (you know ... it's almost week-end already.. 🙂)

@shaupert
Copy link

shaupert commented Jan 23, 2023

Hi @cmarmo, @NickleDave, and @rhine3,
Thank you for inviting me to review the crowsetta package. It looks great.
I'm a novice in building Python packages but I'll try to do my best to review the package.
I'm very exciting to test crowsetta and see you I can use it :)
I hope to send my review by mi-February.

@lwasser
Copy link
Member

lwasser commented Jan 23, 2023

@cmarmo this is all awesome! @rhine3 and @shaupert welcome to pyopensci!!

@shaupert if you have any questions at all about the review process please get in touch. It is useful for us to have a review that focuses on usability, documentation, etc. You are welcome to reach out to us with questions at any time - so please feel free to do that if you wish / if it would be helpful.

@rhine3
Copy link

rhine3 commented Jan 24, 2023

Hi @cmarmo, @NickleDave, et al.,

Thanks for the invitation to review and the warm welcome! As both a dabbler in Python software development and a potential user of this tool, I am looking forward to reviewing this package. I plan to finish my review by February 10th.

@NickleDave
Copy link
Contributor Author

Thank you @shaupert and @rhine3. Just a heads up that I just now merged a branch I had in process before submission, that mostly cleaned up + added docs--promise not to do any further dev for now. Looking forward to your reviews.

@rhine3
Copy link

rhine3 commented Feb 10, 2023

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README.
  • Installation instructions: for the development version of the package and any non-standard dependencies in README.
  • Vignette(s) demonstrating major functionality that runs successfully locally.
  • Function Documentation: for all user-facing functions.
  • Examples for all user-facing functions.
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING.
  • Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

Readme file requirements
The package meets the readme requirements below:

  • Package has a README.md file in the root directory.

The README should include, from top to bottom:

  • The package name
  • Badges for:
    • Continuous integration and test coverage,
    • Docs building (if you have a documentation website),
    • A repostatus.org badge,
    • Python versions supported,
    • Current package version (on PyPI / Conda).

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

  • Short description of package goals.
  • Package installation instructions
  • Any additional setup required to use the package (authentication tokens, etc.)
  • Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file.
    • Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear)
  • Link to your documentation website.
  • If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
  • Citation information

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.
Package structure should follow general community best-practices. In general please consider whether:

  • Package documentation is clear and easy to find and use.
  • The need for the package is clear
  • All functions have documentation and associated examples for use
  • The package is easy to install

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Continuous Integration: Has continuous integration setup (We suggest using Github actions but any CI platform is acceptable for review)
  • Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines.
    A few notable highlights to look at:
    • Package supports modern versions of Python and not End of life versions.
    • Code format is standard throughout package and follows PEP 8 guidelines (CI tests for linting pass)

For packages also submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: With DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing:


Review Comments

Function documentation:

A Question about pyOpenSci in general - is an individual example really required for every function? The functions are well-documented but some do not have examples, e.g. crowsetta.data.data.available_formats(). Some functions do have examples e.g. some of the functions here. I’m not sure how to quickly and reliably identify which functions are user-facing and non-user facing. I do feel that the number of examples provided is sufficient.

Community guidelines:
  • Great guidelines! These should probably be linked to in the relevant section of the README as well, unless I missed it?
README contents:
  • The four vignettes are all linked to using a single link. I think that is okay. It could also be nice to add some additional details near that link, though! Otherwise, The package description is sufficiently clear and well-documented in the links such that it is not necessary to add any additional demonstration of package use in the README.
  • A couple of badges are missing (Repostatus and Python versions supported)
Usability:

Overall, the package was very easy to install on my computer. Documentation is linked to on the repository page and in the README. The entire API is fully documented and I felt that the number of examples provided was sufficient.

A minor comment: installing the developer version was slightly more difficult–because the package documentation required me to use pipx/nox, I had to update Homebrew and chase down the source of some errors, which ended up taking me about an hour of detective work (and probably another hour of my computer downloading/upgrading packages in the background). For the average user this doesn’t apply. For the developer, this is a minor and routine obstacle.

Functionality:
Tests:

A handful of tests failed at first when I attempted to use a conda-installed older version of crowsetta. All tests eventually succeeded when I eventually used a development environment (vocalpy/crowsetta#218). I wonder if PyOpenSci’s review template could suggest that reviewers make sure to look in the package's documentation to figure out how the package maintainers request tests be run, e.g., using a development install of the package.

Code format:
  • I am unsure how to check/filter actions to see whether there is CI for linting; maybe @NickleDave could direct me to this?
JOSS paper
  • There is no paper.md in the repository.

@shaupert
Copy link

shaupert commented Feb 16, 2023

Hi, please find my review below:

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README.
  • Installation instructions: for the development version of the package and any non-standard dependencies in README.
  • Vignette(s) demonstrating major functionality that runs successfully locally.
  • Function Documentation: for all user-facing functions.
  • Examples for all user-facing functions.
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING.
  • Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

Readme file requirements
The package meets the readme requirements below:

  • Package has a README.md file in the root directory.

The README should include, from top to bottom:

  • The package name
  • Badges for:
    • Continuous integration and test coverage,
    • Docs building (if you have a documentation website),
    • A repostatus.org badge,
    • Python versions supported,
    • Current package version (on PyPI / Conda).

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

  • Short description of package goals.
  • Package installation instructions
  • Any additional setup required to use the package (authentication tokens, etc.)
  • Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file.
    • Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear)
  • Link to your documentation website.
  • If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
  • Citation information

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.
Package structure should follow general community best-practices. In general please consider whether:

  • Package documentation is clear and easy to find and use.
  • The need for the package is clear
  • All functions have documentation and associated examples for use
  • The package is easy to install

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Continuous Integration: Has continuous integration setup (We suggest using Github actions but any CI platform is acceptable for review)
  • Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines.
    A few notable highlights to look at:
    • Package supports modern versions of Python and not End of life versions.
    • Code format is standard throughout package and follows PEP 8 guidelines (CI tests for linting pass)

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 10h


Review Comments

This is my first review of a Python package, this is the reason why it took me about 10h.
I would like to say that I learnt a lot during this process, mostly because the crowsetta package is really well written (@NickleDave did a really good job) following good practices that, I must confess, I was not aware of. I will try to apply these good practices for my past and next packages as I think it really improves the quality and robustness of a package.

The package does a simple task, which is to parse and convert annotations done on a spectrogram or a waveform into an object that is easy to manipulate. I’m thinking of using it in my own package. As a contributor to the packages scikit-maad and bambird, we had to face different formats of annotation, and crowsetta can do the job.

As a user of Audacity to annotate my spectrograms, I was a little bit surprised that crowsetta only supports the standard format of Audacity along the time axis. As there is the possibility to register your own format, I tried to figure out how to do that by simply adapting the Raven class and it seems to work. I will add the code of the function later a

Finally I tried to follow the guidelines to create and a class for a new format. I found the process a bit hard to follow, especially because it was written for seq not for bbox which I realized to late. Maybe, it would be interesting to have a guideline for both types of format.

Below are some minor points

Test

Only one test failed. I opened an issue

Bugs

crowsetta.data.data.available_formats() and crowsetta.formats.as_list() do not give the same list. The format "yarden" is not in crowsetta.data.data.available_formats()

Documentation

crowsetta.Segment : miss a description of the input arguments

Improvements

Examples are not provided for each class. I recommend adding a simple example for each class and for each method if possible.
The function scribe cannot load an audacity label .txt file that contains also frequency min and max. I recommend adding a new class to be able to handle this non standard format (see my issue)

Summary

I agree with most of the comments of @rhine3. I don't think it's necessary to add them twice ;).
crowsetta package deserve the pyOpenScience badge :)

@NickleDave
Copy link
Contributor Author

Hi @rhine3 and @shaupert thank you both for your detailed reviews and the helpful issues you raised.

I plan to reply here and address those issues within two weeks as required by the guide.

Just wanted to let you and @cmarmo know in the meantime that I have seen this 🙏 and I'm working on it 😅

@cmarmo
Copy link
Member

cmarmo commented Feb 16, 2023

Thanks @rhine3 and @shaupert for your timely and detailed review!

@NickleDave
Copy link
Contributor Author

I wonder if PyOpenSci’s review template could suggest that reviewers make sure to look in the package's documentation to figure out how the package maintainers request tests be run, e.g., using a development install of the package.

@rhine3 I agree with you. Was just discussing this with @lwasser in the pyOS Slack (putting on my pyOS maintainer hat for a second).

Could you please raise an issue on the software-peer-review repo suggesting we add this? I am guessing we will do so, and want to make sure you get credit

@drammock
Copy link

@NickleDave just a drive-by comment to make sure you're aware of the official TextGrid format spec: https://www.fon.hum.uva.nl/praat/manual/TextGrid_file_formats.html In particular, this quote:

...the union of the time domains of the set of intervals in a tier is a contiguous stretch of time, and no intervals overlap.

So you are free to error out on TextGrids that have overlapping intervals.

@NickleDave
Copy link
Contributor Author

Thank you @drammock! Will do so then

@cmarmo
Copy link
Member

cmarmo commented Mar 23, 2023

Hi @NickleDave , just checking... Do you think you will be able to address all the comments about TextGrid, or maybe it is worth to submit to JOSS and let major changes for a new version of crowsetta?
I think it is important that all your work and the review effort come to a well deserved achievement in a reasonable time... :)
Let me know if I can proceed to the editor checks for submission to JOSS, thanks!

@NickleDave
Copy link
Contributor Author

Hi @cmarmo I think I will be able to address but it's taking a little longer than expected.

Feedback from Yannick has made me realize crowsetta is not quite providing what it promises for TextGrids. I thought the fix would be trivial but it turns out the code I vendored is not so easy to change.

I have a plan to address, just haven't had time yet.
I think I can finish by Monday if that's okay.

@NickleDave
Copy link
Contributor Author

NickleDave commented Mar 28, 2023

Just merged a final PR that addresses feedback from @YannickJadoul along with some stray commits I made to main that I'll link here:
vocalpy/crowsetta#243

At a high level, I totally rewrote the TextGrid class so that it can:

  • parse normal and short format TextGrids saved as text files in either UTF-8 or UTF-16 encoding
  • does not load intervals with empty labels (by default, can be loaded with an optional keep_empty flag)
  • can convert either all interval tiers to sequences (what you get by default when you call to_seq or to_annot with no tier argument) or a single specific tier (by supplying a tier argument)
    as described here: ENH: Better handle Praat TextGrids vocalpy/crowsetta#241

To make the it possible to convert all interval tiers, I also added a new feature for crowsetta.Annotation: the seq attribute can now be a list of crowsetta.Sequences. This resolved a longstanding issue (vocalpy/crowsetta#42).

Replying to note how I addressed comments:

Maybe the TextGrid format description could contain a short paragraph on how the format maps to crowsetta's data model?

I agree this needs to be clear, but would prefer to keep any description of how the format maps to crowsetta's data model in the docstrings with the code, in case it changes. I have added language saying basically "please read the docstring for further detail".
In vocalpy/crowsetta#245

I found that, just like the code I have vendored now, textgridtools also raises a ValueError when opening most of the files from this dataset because of overlapping tiers.
I think it would be fine to refuse these things in crowsetta

The new TextGrid class raises an error for overlapping intervals. And thanks @drammock for confirming! 🙂

Since you can just read one tier, is there an easy way of merging Annotations?

You can now read more than one tier. A single Annotation can have multiple Sequences.

In my experience, empty TextGrid intervals are almost always the equivalent of "no annotation" (given the way IntervalTiers work, as mentioned above). So I would suggest that maybe you could remove those by default, or at least provide an easy way for users to not do this manually.

The empty intervals are now removed by default.

even though it would seem to be supported by textgrid.py, it can't read the unicode-containing short-format TextGrids I fed it

This is one of the main reasons I rewrote from scratch.

But it might be good to note in the documentation that not all short TextGrids can be read?

Both short and full formats should now be parsed and the text encoding should be handled correctly.
I'll spare you the implementation details but the "Notes" section in the docstring points to other implementations I looked at.

Binary-format TextGrid files cannot be read either, but I don't think this is very important.

I did find some implementations that handle the binary format but did not implement at this time, made an issue to add later

Thank you for providing the example TextGrid files.
I have included these in the test data for crowsetta and cited Parselmouth in a README there -- I note you use this for your own tests, I assume you exported each from Praat in the different formats?

But maybe just specify that only text-based formats are supported.

Now noted in the docstring, along with a link to the issue about adding ability to load binary format.

Mostly out of curiosity: my Segment.onset_sample and Segment.offset_sample are always `None. Is there a way to specify an audio file to get these filled in when reading a TextGrid? Or is this just not applicable in this case?

There is not a way to do this -- would like to know if people would use it. Some formats specify sample numbers so I tried to include this in the "generic" format although I think when segmenting most people end up in units of segment anyways.

Potentially, when converting a TextGrid to Annotation, it's nice if the user can specify a tier name instead of just the index. But then again, nothing urgent; more a thought of something useful.

You can now specify tiers by name or index.
E.g., either textgrid[2] or textgrid['Phones'] will work.
See examples here:
https://crowsetta.readthedocs.io/en/latest/api/generated/crowsetta.formats.seq.textgrid.textgrid.TextGrid.html#crowsetta.formats.seq.textgrid.textgrid.TextGrid

I don't think crowsetta means to do provide this, but I couldn't find a way to write TextGrids, right? So I don't need to have a look at that?

Crowsetta does mean to provide this but it doesn't right now.
vocalpy/crowsetta#246

In general, throughout the docs and especially the API reference, things like crowsetta.Annotation are not clickable.

I totally did not know you could do that in docs. Thanks so much for pointing this out. I have fixed it as much as possible.

Surprisingly it doesn't work for pandas. Seems to be an open issue related to it: pandas-dev/pandas#38397

The Tutorial page of the docs has an execution error somewhere

Fixed, thank you!
vocalpy/crowsetta@413ea13

On the "How to use crowsetta to remove unlabeled intervals from TextGrid annotations" page, there's also something weird happening to the output during the download. Maybe there's a way of suppressing the output of that cell?

Suppressed
vocalpy/crowsetta@0fdea17
but now there are other issues with this tutorial.
I plan to rewrite totally now that empty intervals are done by default--I think it's still a good example of a full analysis using crowsetta.
vocalpy/crowsetta#244

But this will have to wait until after today!

@NickleDave
Copy link
Contributor Author

@cmarmo please proceed to checks for JOSS, thank you for your patience

@cmarmo
Copy link
Member

cmarmo commented Mar 28, 2023


🎉 crowsetta has been approved by pyOpenSci! Thank you @NickleDave for submitting crowsetta and many thanks to @rhine3 and @shaupert for reviewing this package, and @YannickJadoul for providing expertise on TextGrid format. 😸

There are a few things left to do to wrap up this submission:

  • Activate Zenodo watching the repo if you haven't already done so.
  • Tag and create a release to create a Zenodo version and DOI.
  • Add the badge for pyOpenSci peer-review to the README.md of crowsetta. The badge should be [![pyOpenSci](https://tinyurl.com/y22nb8up)](https://github.com/pyOpenSci/software-review/issues/issue-number).
  • Add crowsetta to the pyOpenSci website. @NickleDave , please open a pr to update this file: to add your package and name to the list of contributors.
  • @rhine3 @shaupert @YannickJadoul if you have time and are open to being listed on our website, please add yourselves to this file via a pr so we can list you on our website as contributors!

It looks like you would like to submit this package to JOSS. Here are the next steps:

  • Login to the JOSS website and fill out the JOSS submission form using your Zenodo DOI. When you fill out the form, be sure to mention and link to the approved pyOpenSci review. JOSS will tag your package for expedited review if it is already pyOpenSci approved.
  • Wait for a JOSS editor to approve the presubmission (which includes a scope check).
  • Once the package is approved by JOSS, you will be given instructions by JOSS about updating the citation information in your README file.
  • When the JOSS review is complete, add a comment to your review in the pyOpenSci software-review repo that it has been approved by JOSS.

@cmarmo
Copy link
Member

cmarmo commented Mar 28, 2023

@NickleDave I have added the 'accepted date' in the issue description. Do you mind releasing the current version of crowsetta as I can refer to it in the issue description? Thanks!

@YannickJadoul
Copy link

Just merged a final PR that addresses feedback from @YannickJadoul along with some stray commits I made to main that I'll link here:

Wow, I'm impressed how you went above and beyond to address all those picked nits, @NickleDave!
(I do hope/think this will save you a bunch of time debugging Praat/TextGrid-related future issues from all those users :-) )

I have included these in the test data for crowsetta and cited Parselmouth in a README there -- I note you use this for your own tests, I assume you exported each from Praat in the different formats?

To answer your final question: yes, I created the TextGrid myself to go along with the fragment of the audio file I've been using during developments and tests (and checked with people more knowledgeable about linguistics if it's +- correct ;-) ). Feel free to just use it without any need to reference Parselmouth!
And for the full record: yes, I converted my original file to all other formats with Praat 6.2.23.

@NickleDave
Copy link
Contributor Author

Hi @cmarmo just updating here that I made a release, here's the DOI: https://zenodo.org/record/7781587.
Also made the PR adding crowsetta to packages.yml. Will proceed with JOSS submission now.

@cmarmo
Copy link
Member

cmarmo commented Mar 29, 2023

@rhine3 @shaupert just to let you know that you should have both received an invitation to join the pyOpenSci Slack, if you are interested to join. Thanks!

@NickleDave
Copy link
Contributor Author

@YannickJadoul I asked @lwasser to send you an invite to the pyOpenSci slack too!
(I guess I can break out of character as "reviewee" now that the review is finished ... sorry @cmarmo maybe I am thinking too much).

(I do hope/think this will save you a bunch of time debugging Praat/TextGrid-related future issues from all those users :-) )

yes very much so, thank you

@lwasser
Copy link
Member

lwasser commented Mar 29, 2023

wow - everyone!! this is awesome. Congratulations @NickleDave !! and @cmarmo for a successful review! And so many thanks @shaupert @YannickJadoul @rhine3 for your time in reviewing here! David, let us know when this goes to JOSS! in the meantime i'll post about crowsetta on social.

We'd also love a blog about the package if you have time. 😄 This blog allows us to better promote your package / show what your package can do!

@NickleDave
Copy link
Contributor Author

NickleDave commented Apr 3, 2023

JOSS review in progress: openjournals/joss-reviews#5332

edit: that was the pre-review issue, this is the review:
openjournals/joss-reviews#5338

@NickleDave
Copy link
Contributor Author

@cmarmo accepted into JOSS! 🎉
openjournals/joss-reviews#5338 (comment)
https://joss.theoj.org/papers/10.21105/joss.05338

@cmarmo
Copy link
Member

cmarmo commented Apr 13, 2023

Congratulations @NickleDave !
Thanks for your work and thanks to all the people involved in this review ... they made my editor task quite easy 😊!

I'm going to perform the last editor checks and then close this issue. 🎉

@cmarmo
Copy link
Member

cmarmo commented Apr 13, 2023

All done! Wonderful package and review experience! 🎉

@cmarmo cmarmo closed this as completed Apr 13, 2023
@lwasser
Copy link
Member

lwasser commented Jun 7, 2023

hi colleagues!! @NickleDave if you have time, i'd greatly appreciate your taking the post-review survey. the main pieces of data that we'd love to collect are how the review improved / impacted your package. but some other questions are asked there as well. many thanks in advance for doing this!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Joss accepted
Development

No branches or pull requests

7 participants