Developer tools #8

aazuspan · 2023-04-18T23:15:55Z

This PR would close #4 by setting up developer tooling and CI. Specifically, it:

Pins minimum Python to 3.8
Adds black and isort for formatting
Adds flake8 for linting
Adds mypy for type-checking (currently running but doing nothing)
Runs all the tools above on all existing code
Sets up pre-commit to run all formatting, linting, and type-checking
Adds pytest-cov for measuring test coverage
Adds bumpversion and twine for releasing in the future
Sets up a Github Action to run pytest on all supported Python versions for all pushes and PRs
Sets up a Github Action to run all pre-commits
Adds a bare-bones readme developer guide (we can move this elsewhere later)

Despite all the commit noise (took a few tries to get all the CI configuration right), this shouldn't have made any functional changes to source code, so hopefully it should be a little easier to dig through than the last PR.

One thing I'm still on the fence about is the best way to integrate this branch. For simplicity, I'm proposing we merge it into fb_add_estimators since that's the only branch that's in good enough shape to pass tests and where we're likely to be focusing development, but if you foresee that being an issue (i.e. not having access to the developer tooling if we make a new branch off of develop before merging fb_add_estimators) we could re-consider. I'm open to suggestions there!

- Minimum Python set to 3.8 for consistency with sklearn (see #4) - Move package version into __init__.py for bumpversion compat - Apply black and isort to setup.py

- Add all developer tool dependencies to setup.py - Add README with developer documentation - Add generated cache and coverage files to gitignore - Add pre-commit configuration

grovduck · 2023-04-19T01:29:19Z

@aazuspan, again you are way ahead of me, but this is great stuff for me to be exposed to. Pre-commits, Github actions, and bumpversion/twine are all very new to me, so I will do some background work to get up to speed on those. But I have a few comments/edits to share at this point:

I've been using black with line-length = 80. Let's stick with 88, but I need to make sure I don't shoot myself in the foot with my current VSCode settings which hard-codes --line-length=80. What I'm not quite yet understanding is that black is not reformatting lines longer than 80 on save, which means that a setting that you've put in is overriding my VSCode default. Any ideas?
Let's remove non-standard .gitignore patterns - I don't need *.R as a pattern
I was influenced at one point to keep code under a src directory. Based on the commits that I saw you have, perhaps this is more problematic than it's worth. I'm OK with sklearn_gnn coming up a directory if you're more used to that organization.
Clearly you are an author if not the primary author of this package so far. I'm not sure how best to reflect that in setup.py. Have you done this before where you have more than one primary author? But it definitely needs to be reflected if you're comfortable with that.
Thanks for the intro README.md. For this PR, this looks completely adequate and actually quite helpful for me already.
For Github actions, it looks like you can't have it be an editable install. But we're recommending that developers clone and install as an editable package. Do I have that right?
Can you explain why the imported version number isn't working (your last commit)?

aazuspan · 2023-04-19T17:39:43Z

What I'm not quite yet understanding is that black is not reformatting lines longer than 80 on save, which means that a setting that you've put in is overriding my VSCode default. Any ideas?

Interesting... I don't think I added anything that could override your global VSCode settings, so I'm at a loss. In the pre-commit config, I just use black with its default line length of 88, and then set flake8 to match that. Regarding 80 vs. 88, I only went with 88 because it's the black default and they argue that it's a good balance between readability and file length, but I'm not married to that standard.

Let's remove non-standard .gitignore patterns - I don't need *.R as a pattern

Sounds good! I'll clean that up a little.

I was influenced at one point to keep code under a src directory. Based on the commits that I saw you have, perhaps this is more problematic than it's worth. I'm OK with sklearn_gnn coming up a directory if you're more used to that organization.

I haven't used a src layout before, but I'm finding there are some nice advantages to it. I've definitely had bugs that resulted from my tests running on an old installed version of the package rather than the current source code, which the src layout prevents by forcing you to use an editable install. I'm happy sticking with it, unless we encounter problems down the line.

Have you done this before where you have more than one primary author?

I would certainly still consider you the primary author both in that we're building the code off of your past work and that you'll most likely be involved in maintaining it beyond my contributions, but I do appreciate the consideration. I don't believe there's any clean way to handle multiple authors in a setup.py. It is possible with a pyproject.toml, which is actually something I've been meaning to bring up as a possibility as it's considered the "modern" standard for Python packaging. But that's probably a bigger discussion for another issue. For now, I'm happy with the package being authored by LEMMA group, but once it's more polished we may want to add a funding section to the readme that mentions my ORISE affiliation and/or the project I'm funded through.

For Github actions, it looks like you can't have it be an editable install. But we're recommending that developers clone and install as an editable package. Do I have that right?

I made the Github action editable originally without thinking about it, then realized it would never need to be editable and removed it for clarity. I don't think it would have caused any problems though. I did mention I sometimes play fast and loose with commits...

Can you explain why the imported version number isn't working (your last commit)?

Good question. When setup.py attempts to import sklearn_knn.__version__, it needs to also import everything else in __init__.py. That leads to a ModuleNotFoundError when it attempts to import something from sklearn which hasn't been installed yet. In other projects, I've placed __version__ in the __init__.py to make it publicly accessible, but I forgot that it requires an ugly workaround where your setup.py has to load and parse the version as text (described in this thread) rather than importing it directly. There is another workaround where you just define it independently in both places and let bumpversion handle updating everything when you change versions, which is what I currently do in wxee, but that requires another config file telling bumpversion where to look. While I was sorting that out, I came across the rejection notice for PEP 396 that suggests a publicly available __version__ is no longer recommended, so I decided (for now at least) to just go back to a hardcoded version. I did leave bumpversion as a dependency, which in hindsight is probably not worth including anymore. Unless you have strong feelings on a publicly accessible __version__, I'll remove that dependency.

grovduck · 2023-04-19T22:59:47Z

Interesting... I don't think I added anything that could override your global VSCode settings, so I'm at a loss.

This was me. I had competing formatters with realizing it. I stripped my line-length=80 out of my black settings, so I think I'm good to go with that. If you're comfortable with 88, let's stick with it.

I haven't used a src layout before, but I'm finding there are some nice advantages to it. ... I'm happy sticking with it, unless we encounter problems down the line.

Sounds good, I agree

I would certainly still consider you the primary author both in that we're building the code off of your past work and that you'll most likely be involved in maintaining it beyond my contributions, but I do appreciate the consideration. I don't believe there's any clean way to handle multiple authors in a setup.py. It is possible with a pyproject.toml, which is actually something I've been meaning to bring up as a possibility as it's considered the "modern" standard for Python packaging. But that's probably a bigger discussion for another issue. For now, I'm happy with the package being authored by LEMMA group, but once it's more polished we may want to add a funding section to the readme that mentions my ORISE affiliation and/or the project I'm funded through.

I agree with all of this (other than your humility ;)). I've used pyproject.toml in other projects and liked it quite a bit, so I'm good with raising that as a new issue/PR. There seemed to be some residual disagreements on whether certain projects support pyproject.toml vs setup.cfg, but I'm not finding those references at present. And we can definitely have an AUTHORS.md file, right? Good idea on funding sources as well.

While I was sorting that out, I came across the rejection notice for PEP 396 that suggests a publicly available version is no longer recommended, so I decided (for now at least) to just go back to a hardcoded version. I did leave bumpversion as a dependency, which in hindsight is probably not worth including anymore. Unless you have strong feelings on a publicly accessible version, I'll remove that dependency.

I think I'm following all the logic here. Is the more accepted standard now?

from importlib.metadata import version
print(version("scikit-learn-knn"))

If you didn't have bumpversion in a dependency/tool, would you just manually change the version in setup.py or rely on some other tool? It looks like bumpversion is useful if you're tracking the version number over several files. I have no strong feelings on a publicly accessible __version__.

One thing I'm still on the fence about is the best way to integrate this branch. For simplicity, I'm proposing we merge it into fb_add_estimators since that's the only branch that's in good enough shape to pass tests and where we're likely to be focusing development, but if you foresee that being an issue (i.e. not having access to the developer tooling if we make a new branch off of develop before merging fb_add_estimators) we could re-consider. I'm open to suggestions there!

I don't foresee branching off develop until we're finished with the fb_add_estimators branch (or branches from it). So I think all this tooling would be available to us downstream of where we are now. This is good with me.

I've tested out pre-commit and got it to intentionally fail on a commit, so that's fun to see.

Do we need to worry that we have similar flake8 plugins installed? For example, with my currently configured plugins, I get this when running flake8 on _euclidean.py:

(default-310) PS D:\code\scikit-learn-knn> flake8 .\src\sklearn_knn\_euclidean.py
.\src\sklearn_knn\_euclidean.py:3:1: I001 isort found an import in the wrong position
.\src\sklearn_knn\_euclidean.py:3:80: E501 line too long (86 > 79 characters)
.\src\sklearn_knn\_euclidean.py:4:1: I005 isort found an unexpected missing import
.\src\sklearn_knn\_euclidean.py:4:1: I005 isort found an unexpected missing import
.\src\sklearn_knn\_euclidean.py:6:1: D101 Missing docstring in public class
.\src\sklearn_knn\_euclidean.py:7:1: D102 Missing docstring in public method
.\src\sklearn_knn\_euclidean.py:7:20: N803 argument name 'X' should be lowercase
.\src\sklearn_knn\_euclidean.py:9:10: N806 variable 'X' in function should be lowercase
.\src\sklearn_knn\_euclidean.py:12:1: D102 Missing docstring in public method
.\src\sklearn_knn\_euclidean.py:12:24: N803 argument name 'X' should be lowercase
.\src\sklearn_knn\_euclidean.py:14:10: N806 variable 'X' in function should be lowercase

But based on what I read, pre-commit isn't using all the plug-ins configured, so it's probably a non-issue. I guess the next question is whether there are any flake8 plugins that we do want to use in the pre-commit?

Last question: You mentioned setting up tox in #4, but I don't see where that is set up. Does Github Actions make that redundant because you're running tests and pre-commits on 3.8 - 3.11?

I think that's the extent of my usefulness for a review. I'm comfortable merging if you are (other than that small .gitignore cleanup). Thanks @aazuspan!

aazuspan · 2023-04-20T16:11:00Z

And we can definitely have an AUTHORS.md file, right?

Good idea!

Is the more accepted standard now?

Yes, that's my understanding. I don't fully grasp the benefit over a __version__, but that seems to be the widespread and official recommendation.

If you didn't have bumpversion in a dependency/tool, would you just manually change the version in setup.py or rely on some other tool? It looks like bumpversion is useful if you're tracking the version number over several files.

Manual, at least until we switched to a pyproject.toml at which point we could start using a build tool like hatch, which can do what bumpversion does, plus handle environment management, build and publish releases (so we wouldn't need twine anymore), and run custom scripts. For now, it doesn't hurt to leave bumpversion in place.

Do we need to worry that we have similar flake8 plugins installed? ... pre-commit isn't using all the plug-ins configured, so it's probably a non-issue.

Good point! My understanding is that pre-commit runs in an isolated environment with only the dependencies specified in .pre-commit-config.yaml, which is why your local plugins aren't affecting that. I think the solution to make sure we're both running the exact same linting is to use pre-commit run flake8 --files ./src/sklearn_knn/_euclidean.py instead of calling flake8 directly. A little more verbose, but it should ensure the local environment isn't affecting results.

I guess the next question is whether there are any flake8 plugins that we do want to use in the pre-commit?

Probably! I was originally going to add flake8-docstrings but realized it would be a big lift to get that passing, so I figured we could make a separate PR for adding docstrings and that rule. I'm totally open to other flake8 plugins though! If we want to add some, I think we just need to add an additional_dependencies section to the flake8 hook in the pre-commit config, e.g.

-   repo: https://github.com/pycqa/flake8
    rev: '6.0.0'
    hooks:
    -   id: flake8
        args: [--max-line-length, "88"]
        additional_dependencies:
        - flake8-docstrings

Got any favorites you think we should add?

Last question: You mentioned setting up tox in #4, but I don't see where that is set up. Does Github Actions make that redundant because you're running tests and pre-commits on 3.8 - 3.11?

Yes, I went back and forth on that, but that was my thinking. Of course tox allows you to run those tests locally, but a) I've had a lot of trouble in the past getting tox set up to run locally, especially on Windows, and b) issues between different Python versions are rare enough that if it does arise, it's usually possible to just debug remotely in Github actions. Does that seem reasonable?

grovduck · 2023-04-21T23:25:49Z

@aazuspan, finally responding to the last remaining bits on this.

I'm totally open to other flake8 plugins though! If we want to add some. Got any favorites you think we should add?

Other than flake8-docstrings which will happen in #10, I sometimes find flake-bugbear and pep8-naming useful, although pep8-naming can call out a bunch of exceptions that I typically want to keep (especially like X, y). Most of this is because I've not had anybody checking my code, so I get lazy. Not a flake8 plugin, but I've found sourcery to help me write more idiomatic code as well. But I can use those myself without those being required. I think only if you find them useful should we add to the pre-commit configuration.

Of course tox allows you to run those tests locally, but a) I've had a lot of trouble in the past getting tox set up to run locally, especially on Windows, and b) issues between different Python versions are rare enough that if it does arise, it's usually possible to just debug remotely in Github actions. Does that seem reasonable?

Absolutely. I've found tox to be problematic as well (probably because I am developing on Windows) and I'm intrigued on how Github Actions handles this, so I'm good with leaving tox out for now.

The .gitignore looks good to me. I'm happy with the PR as it is now.

aazuspan · 2023-04-24T17:10:27Z

I've found tox to be problematic as well

Glad to hear that's not just me!

flake8-bugbear sounds like a good idea. I see the issue with pep8-naming since it looks like it's incompatible with the sklearn naming standards. Maybe more trouble than it's worth for this project? I've never tried sourcery, but it looks interesting, and I'm always up for learning about new tools.

I'll add flake8-bugbear and sourcery to the pre-commit and CI. Be prepared for some failed action notification emails while I get that set up...

EDIT: That was easier than expected! sourcery now runs with the other pre-commits in the CI, so I'm going to go ahead and merge this. Going with a squash merge this time because there was a lot of noise in the commit history getting the actions set up, and it pretty much all adds up to one feature.

I'm currently using my sourcery token to run the CI. I don't think that's a time-sensitive problem since it doesn't seem like there are any quotas for free accounts, but I imagine we'll want to switch that over to a lemma-osu token. Not sure how familiar you are with Github secrets, but you can swap the token by going to the repository Settings > Secrets and variables > Actions > Repository secrets and updating SOURCERY_TOKEN with a new token you can find on the sourcery dashboard.

grovduck · 2023-04-27T22:27:12Z

This looks great, @aazuspan. Couple quick follow-ups:

I had no idea that sourcery could be used at a pre-commit, but given that I didn't know what a pre-commit was until last week, not a surprise!
Yes, definitely let's replace your token with a lemma-osu one. I have a personal token as well, but are you clear how best to get one associated with a team? Through here or just by using a different email account?

aazuspan · 2023-04-28T15:37:00Z

Good question... Looking at the plans, my impression was that we'd be fine with the free tier since we're working on a public repo, but they do make it sound like maybe you need a Team plan for using it in an organization repo...

Sourcery Team includes everything in Sourcery Pro plus allows you to use Sourcery in organization's GitHub repos

Obviously it's working now, so maybe there's a caveat there. It does mention...

You can try out Sourcery in an organization's GitHub repo for free for up to 3 team members.

So maybe as long as it's just the two of us contributing we're okay without a team plan?

Personally, I'd probably try just using it with a different email and if that won't work long-term, maybe we just pull it out of the CI and run it locally with personal accounts?

grovduck · 2023-04-28T21:38:47Z

Excellent, I've gone ahead and created a new Google account, signed up for a free account on Sourcery, and changed SOURCERY_TOKEN to the updated value.

aazuspan added 8 commits April 17, 2023 16:42

Apply black and isort and fix flake8 errors #4

420b37c

Merge branch 'fb_add_estimators' into dev_tools

7d27db0

Pin minimum Python and set __version__ #4

9d37559

- Minimum Python set to 3.8 for consistency with sklearn (see #4) - Move package version into __init__.py for bumpversion compat - Apply black and isort to setup.py

Set up developer tools (mypy, isort, flake8, black) #4

4716c85

- Add all developer tool dependencies to setup.py - Add README with developer documentation - Add generated cache and coverage files to gitignore - Add pre-commit configuration

Set up Github Actions for testing and linting #4

3a1fee7

Fix pre-commit command and clean up workflows

0371561

Fix Python versions, workflow name, and remove editable install

45eb39b

Fix version

d5d5632

aazuspan added the dx Developer experience, tools, efficiency, etc. label Apr 18, 2023

Clean up gitignore

cb2f817

This was referenced Apr 20, 2023

Add docstrings and lint them #10

Open

pyproject.toml and build system #9

Closed

Add flake8-bugbear and sourcery to pre-commit and CI #4

61822e1

aazuspan merged commit 950b0bb into fb_add_estimators Apr 24, 2023
10 checks passed

aazuspan deleted the dev_tools branch April 24, 2023 17:26

grovduck linked an issue Jun 7, 2023 that may be closed by this pull request

Set up dev tools #4

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Developer tools #8

Developer tools #8

aazuspan commented Apr 18, 2023

grovduck commented Apr 19, 2023

aazuspan commented Apr 19, 2023

grovduck commented Apr 19, 2023 •

edited

Loading

aazuspan commented Apr 20, 2023

grovduck commented Apr 21, 2023

aazuspan commented Apr 24, 2023 •

edited

Loading

grovduck commented Apr 27, 2023

aazuspan commented Apr 28, 2023

grovduck commented Apr 28, 2023

Developer tools #8

Developer tools #8

Conversation

aazuspan commented Apr 18, 2023

grovduck commented Apr 19, 2023

aazuspan commented Apr 19, 2023

grovduck commented Apr 19, 2023 • edited Loading

aazuspan commented Apr 20, 2023

grovduck commented Apr 21, 2023

aazuspan commented Apr 24, 2023 • edited Loading

grovduck commented Apr 27, 2023

aazuspan commented Apr 28, 2023

grovduck commented Apr 28, 2023

grovduck commented Apr 19, 2023 •

edited

Loading

aazuspan commented Apr 24, 2023 •

edited

Loading