Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cardsort: analyzing data from open card sorting tasks #102

Closed
16 of 30 tasks
katoss opened this issue Apr 20, 2023 · 59 comments
Closed
16 of 30 tasks

cardsort: analyzing data from open card sorting tasks #102

katoss opened this issue Apr 20, 2023 · 59 comments

Comments

@katoss
Copy link

katoss commented Apr 20, 2023

Submitting Author: (@katoss)
All current maintainers: (@katoss)
Package Name: cardsort
One-Line Description of Package: A python package to analyse data from open card sorting tasks
Repository Link: https://github.com/katoss/cardsort
Version submitted: 0.2.2
Editor: @Batalex
Reviewer 1: @Robaina
Reviewer 2: @khynder
Archive: DOI
JOSS DOI: N/A
Version accepted: v 0.2.36
Date accepted (month/day/year): 08/16/2023


Code of Conduct & Commitment to Maintain Package

Description

  • Include a brief paragraph describing what your package does:

Card sorting is a standard research method in the human computer interaction field to design information architectures. The cardsort package analyzes and visualizes data from open card sorting tasks. Using csv data in the format returned by the kardsort tool (or any other tool outputting the same columns), it outputs a dendrogram based on hierarchical cluster analysis and pairwise similarity scores. It can also return category labels to learn which labels study participants gave to combinations of cards from emerging clusters.

Scope

  • Please indicate which category or categories.
    Check out our package scope page to learn more about our
    scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):

    • Data retrieval
    • Data extraction
    • Data processing/munging
    • Data deposition
    • Data validation and testing
    • Data visualization1
    • Workflow automation
    • Citation management and bibliometrics
    • Scientific software wrappers
    • Database interoperability

Domain Specific & Community Partnerships

- [ ] Geospatial
- [ ] Education
- [ ] Pangeo

Community Partnerships

If your package is associated with an
existing community please check below:

  • For all submissions, explain how the and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):

    • Who is the target audience and what are scientific applications of this package?

The target audience are researchers or practitioners in human computer interaction and user experience. Open card sorting is a popular user research method used to understand how people order and conceptualize information, in order to design information architectures. In order to make sense of the data, clusters are often visualized in form of a dendrogram. To do so, pairwise similarity scores need to be calculated for all cards, followed by a hierarchical cluster analysis. This functionality is provided by the cardsort package. It also offers functionality to return category labels, in order to learn which labels study participants gave to combinations of cards in the emerging clusters.

  • Are there other Python packages that accomplish the same thing? If so, how does yours differ?

I did not find any python packages, nor other open source tools, that accomplish this, which was my motivation to make this package, when I was running a card sorting study in the course of my PhD. While research articles describe the steps of this analysis (i.e. pairwise similarity scores --> hierarchical cluster analysis --> dendrogram, see articles linked in question above), I did not find any open source tools that help to do this analysis. I used the widely used kardsort.com tool to collect the data, which refers to rather dated, closed source Windows software for analysis, which underpins my assumption of a lack of open source tools. The approach is rather simple, making use of scipy for hierarchical cluster analysis and dendrogram, adding a custom function to create the similarity scores, and putting everything in an easy-to-use pipeline. Nevertheless, in doing so, I think the cardsort package can help remove barriers for the application of this user research method (easy to use even for python beginners, no need to use closed source software that only runs on windows or expensive subscription tools). In any case I would have liked to have this package when I started my study :)

  • If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted:

#101

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • uses an OSI approved license.
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a tutorial with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration setup, such as GitHub Actions CircleCI, and/or others.

Publication Options

JOSS Checks
  • The package has an obvious research application according to JOSS's definition in their submission requirements. Be aware that completing the pyOpenSci review process does not guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS.
  • The package is not a "minor utility" as defined by JOSS's submission requirements: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria.
  • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
  • The package is deposited in a long-term repository with the DOI:

Note: Do not submit your package separately to JOSS

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

  • Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

Confirm each of the following by checking the box.

  • I have read the author guide.
  • I expect to maintain this package for at least 2 years and can help find a replacement for the maintainer (team) if needed.

Please fill out our survey

P.S. Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

The editor template can be found here.

The review template can be found here.

Footnotes

  1. Please fill out a pre-submission inquiry before submitting a data visualization package.

@Batalex
Copy link
Contributor

Batalex commented Apr 22, 2023

Hey @katoss, I am Alex, and I will be acting as the editor-in-chief for this submission. I am looking forward to working with you 👋
Over this weekend, I will review the checklist below to see if we need to address anything before undergoing the review process.

@Batalex
Copy link
Contributor

Batalex commented Apr 22, 2023

Editor in Chief checks

Hi there! Thank you for submitting your package for pyOpenSci
review. Below are the basic checks that your package needs to pass
to begin our review. If some of these are missing, we will ask you
to work on them before the review process begins.

Please check our Python packaging guide for more information on the elements
below.

  • Installation The package can be installed from a community repository such as PyPI (preferred), and/or a community channel on conda (e.g. conda-forge, bioconda).
    • The package imports properly into a standard Python environment import package-name.
  • Fit The package meets criteria for fit and overlap.
    Note 🐈‍⬛: addressed in presubmission inquiry
  • Documentation The package has sufficient online documentation to allow us to evaluate package function and scope without installing the package. This includes:
    • User-facing documentation that overviews how to install and start using the package.
    • Short tutorials that help a user understand how to use the package and what it can do for them.
    • API documentation (documentation for your code's functions, classes, methods and attributes): this includes clearly written docstrings with variables defined using a standard docstring format. We suggest using the Numpy docstring format.
  • Core GitHub repository Files
    • README The package has a README.md file with clear explanation of what the package does, instructions on how to install it, and a link to development instructions.
    • Contributing File The package has a CONTRIBUTING.md file that details how to install and contribute to the package.
    • Code of Conduct The package has a Code of Conduct file.
    • License The package has an OSI approved license.
      NOTE: We prefer that you have development instructions in your documentation too.
  • Issue Submission Documentation All of the information is filled out in the YAML header of the issue (located at the top of the issue template).
  • Automated tests Package has a testing suite and is tested via GitHub actions or another Continuous Integration service.
  • Repository The repository link resolves correctly.
  • Package overlap The package doesn't entirely overlap with the functionality of other packages that have already been submitted to pyOpenSci.
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly.
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

  • Initial onboarding survey was filled out
    We appreciate each maintainer of the package filling out this survey individually. 🙌
    Thank you authors in advance for setting aside five to ten minutes to do this. It truly helps our organization. 🙌


Editor comments

  • The version in pyproject.toml does not match the latest release. Is there any way to use semantic release to update the pyproject.toml file based on the tag? If not, maybe we should set the version here to something like "0.0.0" to indicate that this is not the true version.
  • This is on the verge of reviewing the code, so I will no go any further at the moment, but the dependencies constraints are quite hard.
pandas = "^1.5.3"
numpy = "^1.24.2"
matplotlib = "^3.7.1"
scipy = "^1.10.1"

You could increase the pool of potential users by loosening the constraints. Based on what I saw in the sources, we do not need the latest versions for these four packages.
+ myst-nb should be a dev dependency

  • I believe the import would be clearer if we just used the functionalities from the top module namespace, e.g from cardsort import get_distance_matrix or import cardsort; cardsort.get_distance_matrix() instead of propagating analysis all the way up to the user code. This is quite easy, you just need to import the functions in __init__.py

@katoss
Copy link
Author

katoss commented Apr 24, 2023

Thank you @Batalex ! I will try to address your comments in the course of the week. :)

@katoss
Copy link
Author

katoss commented Apr 27, 2023

Hi @Batalex,
thank you very much for your feedback. I tried addressing everything, but have some open questions:

The version in pyproject.toml does not match the latest release. Is there any way to use semantic release to update the pyproject.toml file based on the tag? If not, maybe we should set the version here to something like "0.0.0" to indicate that this is not the true version.

Normally semantic release should bump the version automatically, but I keep running into issues. It only seems to work via getting the release version from the GitHub tag. I set the version to 0.0.0 for now, as you suggested. I will try to find a better solution, but for now at least the version is correctly updated on github and pypi.

This is on the verge of reviewing the code, so I will no go any further at the moment, but the dependencies constraints are quite hard.
pandas = "^1.5.3"
numpy = "^1.24.2"
matplotlib = "^3.7.1"
scipy = "^1.10.1"
You could increase the pool of potential users by loosening the constraints. Based on what I saw in the sources, we do not need the latest versions for these four packages.

I am sorry if this is a stupid question, but how do I know which are the minimum versions of packages I can allow without breaking my code? I tried using pipdeptree, but did not find that very helpful, because it only tells me which are the dependencies of the currently installed package versions, not if I could use something lower.

  • myst-nb should be a dev dependency

True, thank you. Fixed it.

I believe the import would be clearer if we just used the functionalities from the top module namespace, e.g from cardsort import get_distance_matrix or import cardsort; cardsort.get_distance_matrix() instead of propagating analysis all the way up to the user code. This is quite easy, you just need to import the functions in init.py

I wasn't aware that this was possible. Thank you, fixed it!

Best,

@katoss

@Batalex
Copy link
Contributor

Batalex commented Apr 27, 2023

I am sorry if this is a stupid question, but how do I know which are the minimum versions of packages I can allow without breaking my code?

This is an excellent question! And a truly difficult one to answer in Python, given the dynamic nature of the language. Maybe someday, a tool will be able to parse the source code and match it against the dependencies changelogs, but I am dreaming 🫠

If the dependencies were perfectly following the semver convention, you could use the latest major releases for each dependency. This is extreme in a practical setting, so I would not advise going so far.

You are using but a few basic numpy functions, so you should be could fine with the same constraint as pandas.
The 1.5.X series defines the following minimal dependencies: https://pandas.pydata.org/pandas-docs/stable/whatsnew/v1.5.0.html#backwards-incompatible-api-changes.
You should be good using

pandas = "^1.5.0"
numpy = "^1.20.3"
matplotlib = "^3.3.2"
scipy = "^1.7.1"

Maybe we could go even lower with pandas, but this change is enough IMO.
With the recent release of pandas 2.0.0, it might be worth loosening the upper constraint users using the latest version. Based on what I see, this should not be an issue for cardsort.

@katoss
Copy link
Author

katoss commented Apr 28, 2023

Thank you for the explanations! So it is really not that easy. For now, I reduced the constraints to the versions you indicated.

@NickleDave
Copy link
Contributor

Hi @Batalex @katoss just adding one more possibly useful piece of info:
this Scientific Python Ecosystem Coordination (SPEC) document specifies when packages should drop support for versions of core scientific Python packages, which includes your dependencies @katoss:
https://scientific-python.org/specs/spec-0000/

Unfortunately it does not say what is the minimum that you should support 🤷 but I basically go with the minimum for whatever Python is still supported -- right now that's Python 3.9 -> numpy 1.21.0, etc. ...

It looks like you ended up with roughly those versions as your lower bounds anyways.

Not to overwhelm you with info but if you don't know about it already you might want to check out: https://iscinumpy.dev/post/bound-version-constraints/
For now this probably won't matter (we won't get scipy version 2 anytime soon) but in general you might want to use loose lower bounds instead of the poetry-specific caret operator

@Batalex
Copy link
Contributor

Batalex commented Apr 28, 2023

Hey @katoss!,
You are not done with me yet, as I will be the editor for this review! 🙌
I will be recruiting reviewers in the coming days/weeks.

@Batalex
Copy link
Contributor

Batalex commented May 10, 2023

👋 Hi @Robaina and @khynder! Thank you for volunteering to review for pyOpenSci! I am thrilled to have you both on board for this review. Feel free to introduce yourselves to our very own submitter @katoss 🤗

@katoss, meet @Robaina, whom I had the pleasure of working with on his submission to pyOpenSci.
As for @khynder, I have enjoyed working with her over the past five years and can vouch for her dedication and passion for science.

The following resources will help you complete your review:

  1. Here is the reviewers guide. This guide contains all of the steps and information needed to complete your review.
  2. Here is the review template that you will need to fill out and submit
    here as a comment, once your review is complete.

Please get in touch with any questions or concerns! Your review is due: on June 5th.

Reviewers: @Robaina, @khynder
Due date: June 5th, 2023

@Robaina
Copy link

Robaina commented May 10, 2023

Hi @katoss, nice to meet you! I'll be one of the reviewers of cardsort. I had such a nice experience when I submitted my package to pyOpenSci. Hope you do too! Looking forward to the review.

@katoss
Copy link
Author

katoss commented May 10, 2023

Nice to meet you @Robaina and @khynder! Thank you for taking the time to review my package :) Looking forward to working with you.

@Robaina
Copy link

Robaina commented May 14, 2023

Hi @Batalex, @katoss,

a quick question: according to pyOpenSci docs I understand that I should review cardsort v0.2.2 and not the latest release (v0.2.12) since this is the version that was submitted to pyOpenSci, right? Just to be sure.

@Batalex
Copy link
Contributor

Batalex commented May 15, 2023

Hey,
This is indeed a tricky question. On the one hand, it would be inconvenient to ask submitting authors to freeze the project development during submission. On the other hand, reviewing the complete code base is already very time-consuming for the reviewers; they do not have the time to review the whole code repeatedly.
We are going to proceed with the following approach:

  • @Robaina, @khynder the version listed (0.2.2) is the one you should review
  • Once you are done with your review, it is up to @katoss to point out the PRs relevant to your comments / requested changes.

The rationale behind this approach is that we want to ensure the review scope stays more or less the same. No one wants to commit to a review on a fixed scope only to discover that the goalposts moved a few weeks later.
Furthermore, some of the reviewers' comments might already be addressed in the development branch. The submitting author would then be the most likely to answer with something like "thanks for the comment. Are you satisfied with the solution we made a few days ago on ecb645 / PR # 32?"

Based on what I saw, the most recent releases deal with a few mishaps on the packaging side. Most of the code base remains untouched so this approach should be fine!

@Robaina
Copy link

Robaina commented May 15, 2023

Alright! Thanks for the explanation

@Robaina
Copy link

Robaina commented May 17, 2023

Hi @katoss, @Batalex,

⭐ I have completed the review ⭐

I find cardsorta nice contribution! You have put a lot of work into structuring and documenting the package, it has sufficient tests and a cool CI / CD pipeline 🚀🚀 . I have got some comments, mostly suggestions to make the code more efficient and readable. I hope these will be useful to you! Please, let me know if something isn't clear from my comments. Also, I'm willing to help you implement these suggestions with PRs if you want.

Ok, here it goes:

CHECKLIST:

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README.

It is succinct but sufficient. However, I think the repo would benefit from a little more explanation for non-experts. This may be as easy as providing a link to an url explaining why card sorting is useful in UX design.

  • Installation instructions: for the development version of the package and any non-standard dependencies in README.

The instructions are located within the 'CONTRIBUTING.md' file. I would add the link to this file directly in the Contributing section of the README.md file.

  • Vignette(s) demonstrating major functionality that runs successfully locally.

  • Function Documentation: for all user-facing functions.

The documentation is extensive. Hosted in readthedocs.io. I suggest to provided the url to the docs in the About section of the repo to increase visibility.

  • Examples for all user-facing functions.

  • Community guidelines including contribution guidelines in the README or CONTRIBUTING.

I suggest adding the link to the contributing file in the contributing section of the README.md file.

  • Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

Readme file requirements
The package meets the readme requirements below:

  • Package has a README.md file in the root directory.

The README should include, from top to bottom:

  • The package name

The current name of the package, 'cardsort', is missing as such in the title of the README file. Also, if you like, you could create a simple logo for the package to attract more users (See this repo as an example).

  • Badges for:
    • Continuous integration and test coverage,
    • Docs building (if you have a documentation website),
    • A repostatus.org badge,
    • Python versions supported,
    • Current package version (on PyPI / Conda).

There are no badges in the README. You can read about badges here in case you are not familiar. Also, I can help with this if you need it.

  • Short description of package goals.
  • Package installation instructions
  • Any additional setup required to use the package (authentication tokens, etc.)
  • Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file.
    • Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear)
  • Link to your documentation website.

As stated above, I suggest adding the URL to the About section of the repo to increase visibility.

  • If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.

  • Citation information

No citation info was provided. I suggest publishing the repo in Zenodo and adding the citation and the DOI to the README.md file. Once published, Zenodo provides a badge containing the DOI and which you can display in the badge section of the README file. You could also add a CITATON.cff file to the repo so users can easily get the citation string.

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.
Package structure should follow general community best-practices. In general please consider whether:

  • Package documentation is clear and easy to find and use.

  • The need for the package is clear

While the purpose of this package is indicated in the README, I think a little more explanation for non-experts would be useful. This may be as easy as providing a link to an url explaining why card sorting is useful in UX design.

  • All functions have documentation and associated examples for use

  • The package is easy to install

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.

As long as I can tell, the package works as expected.

  • Performance: Any performance claims of the software been confirmed.

  • Automated tests:

    • All tests pass on the reviewer's local machine for the package version submitted by the author. Ideally this should be a tagged version making it easy for reviewers to install.
    • Tests cover essential functions of the package and a reasonable range of inputs and conditions.
  • Continuous Integration: Has continuous integration setup (We suggest using Github actions but any CI platform is acceptable for review)

  • Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines.
    A few notable highlights to look at:

    • Package supports modern versions of Python and not End of life versions.

    • Code format is standard throughout package and follows PEP 8 guidelines (CI tests for linting pass)

    There are no linting tests in the CI. Also, the original code does not follow PEP8 in some cases (see below). I suggest using a linter such as black or ruff (highly recommended) to ensure the code follows PEP8.

For packages also submitting to JOSS

Not applicable.

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: With DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 3.5


Review Comments

General comments:

  1. While not necessary, I think using type hints would improve the readability of the functions.

  2. Inconsistent use of .query and .loc as selectors. I would recommend using .loc in all cases as the code is not using complex queries and .loc may be more readable for Python users.

  3. PEP-8 recommends using format strings instead of concatenation as it improves readability. Overall, I would recommend using a linter such as black or ruff (highly recommended) to ensure the code follows PEP8. The linter could also be added to a CI pipeline (let me know if you need help with this)

  4. There are several cases in which a while loop followed by a try-except block is used. This choice makes the code more complex and it is unnecessary. In these cases, it can be replaced by a simpler loop, removing the need for the try-except block (see below).

  5. 'example-data.csv`is located within /docs, you could indicate the relative path in the example to help users find the file within the repo.

Detailed comments for each function:

_get_distance_matrix_for_user
  1. There is no need to update the index variables, i, j, within the loop, their value is set by the for loop itself.

  2. Given that the distance matrix is symmetric, you could save computational time by only calculating the upper triangle of the matrix and then copying the values to the lower triangle. For example:

for i in range(n):
    for j in range(i, n):
        cat1 = ...
        cat2 = ...
        X[i, j] = X[j, i] = 0 if cat1 == cat2 else 1

Note that you could also write the if - else statement in a single line as shown above.

  1. Perhaps using loc instead of query would be more readable for Python users, we are not making a complex query here anyway.
get_distance_matrix
  1. You could replace the while loop with a for loop and avoid the need to update the index variable, id, within the loop. To this end, you could first find all unique user_ids in the dataframe: user_ids = df['user_id'].unique(), and then loop over them: for id in user_ids:.
create_dendrogram
  1. You could assign value to color_threshold in a single line as follows:
color_treshold = 0.75 if color_treshold is None else color_treshold

and similar to the second definition to simplify the nested if - else statements.

  1. You could also write the if-else statement in plt.xticks in a single line:
 plt.xticks(np.arange(0.0, 1.1, 0.1) if x_max <= 1 else np.arange(0, x_max + 1, 1))
_get_cluster_label_for_user
  1. It seems that index is not used within the function, hence enumerate and updating its value within the loop (index += 1) are not needed.

  2. The try-except block adds complexity and it isn't really needed. You could replace it with an if-else block:

if card in df_u["card_label"].values:
    ...
else:
    print(f'"{card}" is not a valid card label. Removed from list.')
    cluster_cards.remove(card)
    print("Continue with cards: %s" % cluster_cards)
_get_cards_for_label
  1. As mentioned above, I would recommend using loc.
get_cluster_labels
  1. The while loop and the try-except block add complexity and are not necessary here. You could replace the while loop with a for loop after listing all unique user_ids (similar to the suggestion above for get_distance_matrix). This way, the try-except block can be removed as we can guarantee that df_u is not empty (also getting rid of the UnboundLocalError which isn't specific and may be confusing)

  2. PEP-8 recommends using format strings instead of concatenation. For example, you could replace the following line:

print("User " + str(id) + " labeled card(s): " + cluster_label)

with:

print(f"User {id} labeled card(s): {cluster_label}")
get_cluster_labels_df
  1. Similarly to the previous function, I would recommend using a for loop instead of a while loop and removing the try-except block to simplify the code and improve readability.

@katoss
Copy link
Author

katoss commented May 17, 2023

Thanks a lot for your review @Robaina ! I had a quick look over your comments, and will try to address them over the weekend :)

@Batalex
Copy link
Contributor

Batalex commented May 17, 2023

That was fast 🚀
Thanks a bunch, @Robaina!

@katoss
Copy link
Author

katoss commented May 24, 2023

Hi @Robaina, thanks a lot again for the thorough review! 🌟

I have started to address your comments, mainly so far by adapting the functions:

  • _get_distance_matrix_for_user (commits: 0c3b4dd)
  • get_distance_matrix (commits: 269a52a, e6be892)
  • create_dendrogram (commits: 8fc8679)
  • _get_cluster_label_for_user (commits: 7bbdb84)
  • _get_cards_for_label (commits: b882486)
  • get_cluster_labels (commits: 031ae0d)
  • get_cluster_labels_df (commits: 2ed2dba)
  • I also added the link to the docs to the about section of the repository

Thank you for helping me improve the code quality! Please let me know if I managed to adapt the functions as you imagined.

Regarding the rest of the feedback, I will try to address it as soon as possible.

@katoss
Copy link
Author

katoss commented May 25, 2023

I started to wonder about a more general question regarding versioning. My ci/cd pipeline publishes a new version with each commit. Like yesterday I went from 0.2.12. to 0.2.20 by making a commit for each change (since I did not want to make the commits too big). Based on your experience @Robaina @Batalex, is this the usual way to go it or is it too much?

@Robaina
Copy link

Robaina commented May 25, 2023

Hi @Robaina, thanks a lot again for the thorough review! star2

I have started to address your comments, mainly so far by adapting the functions:

  • _get_distance_matrix_for_user (commits: 0c3b4dd)
  • get_distance_matrix (commits: 269a52a, e6be892)
  • create_dendrogram (commits: 8fc8679)
  • _get_cluster_label_for_user (commits: 7bbdb84)
  • _get_cards_for_label (commits: b882486)
  • get_cluster_labels (commits: 031ae0d)
  • get_cluster_labels_df (commits: 2ed2dba)
  • I also added the link to the docs to the about section of the repository

Thank you for helping me improve the code quality! Please let me know if I managed to adapt the functions as you imagined.

Regarding the rest of the feedback, I will try to address it as soon as possible.

Great! Glad my comments were useful! I'll take a deeper look at them later and come back to you.

@Robaina
Copy link

Robaina commented May 25, 2023

I started to wonder about a more general question regarding versioning. My ci/cd pipeline publishes a new version with each commit. Like yesterday I went from 0.2.12. to 0.2.20 by making a commit for each change (since I did not want to make the commits too big). Based on your experience @Robaina @Batalex, is this the usual way to go it or is it too much?

Based on my experience I would say this is too much, I would update the version only after a significant change to the codebase. Of course this "significant" is subjective. Here are some guidelines for semantic versioning.

On the other hand, I actually forgot to ask you about your CI / CD pipeline the other day. I observed that currently the package is built and uploaded to PIP on every push to the main branch. I am by no means an expert in DevOps, but I think that if you are directly pushing every commit to main, then building the package and uploading to PIP is perhaps too much. I would do this only after collecting some changes that address a specific issue (or issues). This workflow would be fine if there were another branch, say, dev (for development) meant to collect changes and then the CD workflow would get triggered on merging dev to main.

Definitely @Batalex will know more about this.

@katoss
Copy link
Author

katoss commented May 25, 2023

Ok, thank you! Then my intuition wasn't wrong. I followed this tutorial to create the package and CI/CD, and did not question the way they did it, since it was the first time for me setting all of this up.
Is it possible to trigger the whole CD manually so it only publishes a new version when I want it to? Or maybe the easiest thing would be that I create a development branch and collect the changes there, as you said.

@katoss
Copy link
Author

katoss commented Jul 5, 2023

Hi,
I just started a draft PR for changes in functions based on the reviews of @khynder and @Batalex :)

It is quite a lot, I listed everything, but have addressed only some issues so far, and have some questions for others. @khynder , could you accept the invitation to become a collaborator on cardsort, so I can assign you as a reviewer to the PR?

I also feel there might be still an issue with the trigger for CD, since CD started for all commits for patch changes in the new PR (see Actions).

@Batalex
Copy link
Contributor

Batalex commented Jul 6, 2023

Hi, I just started a draft PR for changes in functions based on the reviews of @khynder and @Batalex :)

It is quite a lot, I listed everything, but have addressed only some issues so far, and have some questions for others. @khynder , could you accept the invitation to become a collaborator on cardsort, so I can assign you as a reviewer to the PR?

I also feel there might be still an issue with the trigger for CD, since CD started for all commits for patch changes in the new PR (see Actions).

Can you try removing the pull_request trigger in CI.yml ? According to the documentation, there are many possible sub triggers for this one

@katoss
Copy link
Author

katoss commented Jul 6, 2023

Can you try removing the pull_request trigger in CI.yml ? According to the documentation, there are many possible sub triggers for this one

Thank you, just changed it in the fix-functions branch, let's see if it changes something.

Edit: I guess the change was not thought to impact the CD and was more of a global suggestion, but in any case the CD is still getting triggered. Probably the

release:
types: [published]

does not work. I will try to think about another solution

@lwasser lwasser added this to the package-in-review milestone Jul 11, 2023
@lwasser lwasser moved this to under-review in peer-review-status Jul 11, 2023
@katoss
Copy link
Author

katoss commented Jul 19, 2023

I worked on the fix-functions PR, I think it might be good to go @khynder @Batalex

@katoss
Copy link
Author

katoss commented Aug 1, 2023

I merged the fix-functions PR 🚀 , thank you for your help @khynder @Batalex .
I also added the repo to zenodo and added a badge, as you suggested @Robaina (on branch update-readme so far). I'm also about to add a list of contributors, will try to figure out how to use the all-contributors bot in the next days. I should probably also change the repo-status from WIP to active?

I think I worked through all the feedback now 🤔 What are the next steps? :)

@Batalex
Copy link
Contributor

Batalex commented Aug 2, 2023

Congrats @katoss, that was a big one!
Somehow I forgot to keep up, did you manage to fix the CD pipeline?

What are the next steps? :)

For now, I'll ask @Robaina and @khynder to check the final approval checkbox in their review message. Then, I will wrap up the review by checking some other stuff. It's great that you already took care of zenodo and the badge!

@katoss
Copy link
Author

katoss commented Aug 3, 2023

Somehow I forgot to keep up, did you manage to fix the CD pipeline?

Yes, I did, I think I forgot to write it. Now the CD only triggers when changes are pushed or merged with the main branch. So far it seems to work :)

For now, I'll ask @Robaina and @khynder to check the final approval checkbox in their review message. Then, I will wrap up the review by checking some other stuff. It's great that you already took care of zenodo and the badge!

Ok amazing!

@khynder
Copy link

khynder commented Aug 3, 2023

Checked!

@Robaina
Copy link

Robaina commented Aug 16, 2023

so sorry for the delay... checked!

@Batalex
Copy link
Contributor

Batalex commented Aug 16, 2023

🎉 cardsort has been approved by pyOpenSci! Thank you @katoss for submitting cardsort and many thanks to @Robaina and @khynder for reviewing this package! 🐈‍⬛

Author Wrap Up Tasks

There are a few things left to do to wrap up this submission:

  • Activate Zenodo watching the repo if you haven't already done so.
  • Tag and create a release to create a Zenodo version and DOI.
  • Add the badge for pyOpenSci peer-review to the README.md of cardsort. The badge should be [![pyOpenSci](https://tinyurl.com/y22nb8up)](https://github.com/pyOpenSci/software-review/issues/102).
  • Please fill out the post-review survey. All maintainers and reviewers should fill this out.

@Robaina, @khynder we would greatly appreciate if you could fill the post-review survey as well!
If any of you have any feedback for us about the review process please feel free to share it here. We are always looking to improve our process and documentation in the peer-review-guide.

@Batalex Batalex moved this from under-review to pyos-accepted in peer-review-status Aug 16, 2023
@Robaina
Copy link

Robaina commented Aug 16, 2023

Congrats @katoss!

@katoss
Copy link
Author

katoss commented Aug 22, 2023

Amazing! I just added the badge and did the survey. Thanks a lot for spending the time on reviewing everything so thoroughly @Batalex @Robaina and @khynder! The review was really helpful and instructive, and improved the package a lot :)

@Batalex
Copy link
Contributor

Batalex commented Aug 23, 2023

Congrats @katoss, I am glad this review went so well. I am truly pleased that we were able to add value to your work.
Best of luck with your thesis! 🤞
I will be closing this submission issue now.

We offer authors to join us on our slack / discourse. Let me know if you are interested in joining the slack.
We can also promote cardsort with a blog post, if you feel like you have not written enough with your thesis 😁 This is what it could look like.

@Batalex Batalex closed this as completed Aug 23, 2023
@github-project-automation github-project-automation bot moved this from pyos-accepted to Joss accepted in peer-review-status Aug 23, 2023
@Batalex Batalex moved this from Joss accepted to pyos-accepted in peer-review-status Aug 23, 2023
@katoss
Copy link
Author

katoss commented Sep 1, 2023

Amazing, thanks @Batalex :) And yes, I would like to join the slack. Could you send me an invite?
A blog article sounds like a good idea, I will try to write a draft some time this month.

I saw that the PR to update the pyOpenSci website with the cardsort package is still open and has no reviewer assigned. Is normal that it just takes a while, or do you think it would be better if I send a reminder?

@Batalex
Copy link
Contributor

Batalex commented Sep 3, 2023

I just invited you 🐈‍⬛

Is normal that it just takes a while, or do you think it would be better if I send a reminder?

It seems that we have an issue with pre-commit.ci, but it should be fixed soon

@lwasser
Copy link
Member

lwasser commented Sep 11, 2023

hey there @katoss 👋 thank you for mentioning this issue in our website update pr!! i saw it. it took me a while to merge as there were some bugs in the build that i wanted to fix before merging. it would have broken the website! but generally these will get merged more quickly - as that build becomes a bit more stable.

cardsort is now listed on our website!! 🚀 and we look forward to seeing you around slack!!
Screen Shot 2023-09-11 at 4 08 24 PM

thank you everyone for your work on this pr!! and @Batalex for leading the charge on the review ✨ !!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: pyos-accepted
Development

No branches or pull requests

7 participants