-
-
Notifications
You must be signed in to change notification settings - Fork 25.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Add minimal support for spin as a developer tool #29012
Conversation
From the Perhaps the simplest way to think of it, in context of this project, is that it is a Python tool that replaces the Makefile. So, no more of a "layer cake of tools" than you already have 😅 Because it is a Python tool, however, we can improve the user interface: print a decent help, add flags, etc. Because it is not written in Make, you can put logic in your commands that other developers can understand ( As gladly as we'd support scikit-learn's use cases, the project should use what works best for its purposes; and if that's Make, that's perfectly fine too :) Let me know if you have any questions on using/customizing/extending the tool. |
Looking at the existing Makefile, I think you can aggregate targets into the following
|
One issue I have with |
This is actually fixed in spin 0.9, released 2 weeks ago, which prompted me to take another look at spin. This is mentioned in my too long PR description, I did not have/take the time to make it shorter 😅 See scientific-python/spin#155 if you are curious about the details. AFAICS this is mostly:
|
@lesteve I saw you mentioning editable install in the OP, but I figured this must be only for "easy" packages, otherwise numpy wouldn't say spin can't do editable installs. But if it works, then that seems nice. Note that latest spin on conda-forge seems to be 0.8 though. |
It works for scikit-learn in this PR (at least in my quick tests), I am pretty sure the spin PR adding support for editable install was tested in scikit-image. I don't see why it would not work for numpy. Having said that, there may be some limitations that will be discovered when more people try to use spin with editable install. It's hard to make predictions especially about the future as the saying goes 😉
Yeah I know mentioned in my too long PR description too 😉 |
pyproject.toml
Outdated
# "spin.cmds.meson.test" | ||
# ] | ||
# "Environments" = [ | ||
# "spin.cmds.meson.ipython", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on numpy, when working with spin, one cannot run python/ipython/pytest manually. So it's not that it's a shortcut, it's more like that this is the only entry point. I'm okay with that, but then we need these commands here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you build with meson into, say, build-install
, you cannot run python/ipython/pytest without setting some paths. Spin does that for you, unless you are using an editable install (which you can also get with meson, via pip install -e --no-build-isolation
).
I.e., it's not because of spin that you cannot access other entry-points, but because of the build/install mechanism used. Installing into ./build-install
is the default behavior for spin build
on NumPy (and most meson-based projects).
There are lots of moving pieces, so I'm sorry if this is coming across overly pedantic; I just want to clarify what's happening, and help figure out what best workflow defaults are for sklearn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying this. From the perspective of the users/contributors, it doesn't matter what the tool does. When users run a command which they expect to install the package, then it should behave as if it's installed.
So in this case, we might want to make sure spin build
actually installs the package in the environment, at least for scikit-learn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it's worth splitting the users and contributors. Users would install via pip or conda, typically, whereas developers want a working build that gets updated when they modify the code. The advantage of editable installs come especially when testing with other packages (i.e., you can run another project's test suite, and it will pick up your dev package). But if you're happy to have an editable install (which retriggers builds and differs in some other ways from a regular package install), then you can set build
to the spin.cmds.pip.install
built-in command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I had local modifications that I did not push which now folded this discussion oh well ...
I think the best way forward on this would be to have spin build-editable
exist and probably not spin build
to avoid that people mix both, i.e. with spin you can choose which commands you make available. We would also need the Meson verbose editable install so we kind of need a custom command I think.
Now I "just" need to find a good example of custom command to put some kind of reality behind the words 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be super useful thanks!
I guess the simplest thing I can think of is I would like spin build-editable
(build-editable
naming could be improved) to do
pip install --editable . \
--verbose --no-build-isolation \
--config-settings editable-verbose=true
Bonus points:
- reusing the
spin.cmds.pip.install
and appending the--config-settings
to it would be nice more generally it may be nice to haveI think install and build are quite different so let's forget about this.spin build --editable
to build in editable mode although not high priority
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. What does "build in editable mode" mean in this context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See scientific-python/spin#192 ; with that in place, we can use the command as-is, and just set --verbose
as the default option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Examples of how NumPy overrides spin
commands, as well as implement their own: https://github.com/numpy/numpy/blob/main/.spin/cmds.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. What does "build in editable mode" mean in this context?
Yeah not great wording, I think it is more "install in editable mode" indeed
I just checked, and |
While I was more on the -1 side a couple of months ago I since used stuff like A separate question unrelated to this PR since @stefanv is around :). I played with |
Like Guillaume I am one of the people not super in favour of having tools that are "too powerful" provide this abstraction. "too powerful" is of course ambiguous and undefinable - my main point of worry is that using a tool like a However, I am softening to the idea of a tool like My final point: I care about being able to mix and match a tool like |
pyproject.toml
Outdated
"Environments" = [ | ||
"spin.cmds.meson.shell", | ||
"spin.cmds.meson.ipython", | ||
"spin.cmds.meson.python", | ||
"spin.cmds.meson.run" | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need these? I'm asking because right now I can install in editable mode and then type ipython
or ipython -i some_script.py
or what ever weird flags I'd like to pass to ipython today and it just works. Asked differently: what problem is solved by providing these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed I copied and pasted this part from an example and I think can be removed. These commands mostly makes sense for non-editable workflow (with spin build
you have an out-of-tree Meson build and you set PYTHONPATH to be able to import sklearn
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't want to use the "out of tree" feature of meson then I agree we can remove it. Is there ever a use-case for doing a "not editable install" while doing development work?
Our aim so far has been to keep We could consider adding some lightweight env management, but why not use nox or one of the pixi tools you mention? |
This is my main worry too :) As such, spin (at least for the internal commands we provide) always prints the commands it executes to the terminal.
We don't always have control over the command lines of the tools we used, however, so it'd be hard to get all tool developers to provide each project with exactly what it needs so that arcane flags are no longer necessary. The proof of this lies in the existence of Makefiles in numerous projects. I think a cool example of this just came out of this conversation: scientific-python/spin#192 I wasn't aware of that meson feature!
I'll take it 😂 No, seriously, I discourage people from adopting the tool unless it makes obvious sense. Development should be kept simple; we don't need additional mental overhead. If simple means adopting
I would not keep both around; "there should be one obvious way" and all that.
Another concern we share. We had this problem with non-in-place meson builds (until recently, those didn't exist!) where we couldn't easily launch commands and have them pick up the package. So In scikit-learn's case, it seems like most developers are happy to use the editable install route exclusively, in which case many of the meson commands are no longer necessary. |
Basically, while Basically, I used I kind of already mentioned this point to @wolfv at the last PyConDE but my view might be not reasonable at all :) |
It may be worth having a chat. No need to build the same thing twice, but the pre-packaged commands (or the logic of them) may still be worth shipping, even if we use alternative scaffolding. |
Seems like |
Thanks for the ping :) @stefanv - would love to chat anytime. I am currently at PyCon US (you too?). Otherwise happy to jump on a call and discuss. Spin looks cool! |
Is it possible to make things also work for Windows? Currently |
Spin is tested on Windows. You can have spin invoke a platform platform-specific Makefile, although I already test docs building on Windows using git-bash. |
that would be one benefit of pixi - |
I don't think pixi can fix this problem: it's the underlying Sphinx Make file that's not working on Windows. |
Not sure it is worth it, but I guess it seems feasible that I quickly checked and at least numpy and matplotlib still have a Edit: opened scientific-python/spin#206 to do this which has been merged |
My sense is that that's the general trend, since you have trouble building the library on a "traditional" Windows shell. But spin is not opinionated about this sort of thing... |
@@ -68,7 +68,7 @@ feature, code or documentation improvement). | |||
|
|||
.. prompt:: bash $ | |||
|
|||
conda create -n sklearn-env -c conda-forge python numpy scipy cython meson-python ninja | |||
conda create -n sklearn-env -c conda-forge python numpy scipy cython meson-python ninja spin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm moving a lot more towards @Micky774 's suggestion of having a way to not copy paste these. Maybe we could at least include an environment.yml
file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed but not really related to adding spin, do you want to open an issue about this 😉?
Some possibilities off the top of my head:
- have an environment.yml but then what do you do about pip users? scikit-image seems to have instructions that works for both, off the top of my head use
requirements.txt
and for conda:conda create -n env python
+conda install --file
that can use requirements.txt. - use an environment from the lock-files (
conda env create -f env.yml
or something like this), at least it is tested in the CI on some OS. This may be useful if we ever need to pin some stuff for some time. does not work for pip users. - have a spin custom command that installs the build dependencies and can switch between conda and pip. Maybe overkill
Side-comment: personally the one I am always annoyed until I switched recently to creating environment from conda-lock lock-files directly is the doc dependencies so many different ones with a mix of conda and pip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you could benefit from pixi
(conda + pypi + lockfile) :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll see but for now one baby step at a time 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is making you move away from "copy and paste that command"?
I still like having it and using it. But I'd also like to have spin setup-me-up
as a shortcut. Mostly because the times when I can't find the above command in my shell history I need to spend 5min digging it out of our docs. So having a shortcut in spin would be useful. Though also a way of fixing a symptom instead of the root cause (why is the command in a place that is hard to find?)
No lockfiles for dev envs please. scikit-learn has to work with a large number of version combinations of the (few) dependencies it has. Having different developers on different combinations is IMHO a feature, as it increases the chances of someone noticing that we are making it harder to a combination of versions that isn't "main stream".
Case in point, we recently made it so that you need Numpy 2 to build wheels. Or atleast that is the default. Someone asked about "but do we need to have this?", which at the time I found tedious. But just yesterday I tried to use numpy<2
to reproduce something and basically failed. But I think it would have been even harder if we didn't have the conversation about "yeah but what about using older versions of Numpy, should that still be allowed/possible?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is making you move away from "copy and paste that command"?
I am not sure if this question is for me or for Wolf ... just to be clear you would like to have the full pip command somewhere still mentioned in the documentation? I can certainly do this.
The main argument for spin
in my mind is that you have discoverability and a simple point of entry for new contributors. So when you don't remember the magical command you only have to remember to type spin
and you'll get some nice info with a few things you can do. Right now this looks like this:
Also this was probably mentioned before but worth repeating the point, spin
is quite explicit about the commands it actually runs behind the scenes, this is what you get when you do spin install -v
so you can still find your command if you want to:
My hope right now is to try to keep this issue reasonably focused on adding spin
to be able to have interested people (like me) experiment with it in their day to day work and see how we like it and what we can make with it.
This is not that easy to keep this PR focused because there are plenty of reasonable approaches that are trying to answer "OK but what is the real problem here, can we not try to fix it instead rather than add yet a new tool".
About lock-files for dev environment I would agree with you. I have personally started using the CI conda-lock lock files more recently because I find them convenient to try to reproduce CI issues (and I am on a Linux machine which makes it easier). I would certainly not advocate for "conda-lock files for everyone's dev env" 😉.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was actually thinking of Adrin. Because he said he used to not like the idea but is now changing his mind. So I was interested in what is changing his mind.
I think we should add spin. Like you say it is convenient, for experienced contributors and newcomers.
The point that is important to me is that we treat it as a useful way to have a shortcut and discoverability for very simple steps. Not as a way to add a layer of abstraction that hides/deals with the fact that the actual commands are annoying/hard to understand/there are many of them/etc.
The fact that spin prints what it is doing is good for discoverability, but I think it doesn't provide the forcing function that is needed to keep the things it is doing simple. Building a Dockerfile
also prints the commands and output, but many of them might as well be written in Latin. I'd understand about the same amount about what they are actually doing.
A little late to the party, but I wanted to mention that as before, I'm a fan of adding a developer tool to simplify and streamline the scikit-learn development workflow, even just a little bit. There are definitely a lot of valid risks mentioned (e.g. nested abstractions, obfuscation, etc.) and several corresponding mitigating strategies (e.g. printing the underlying command, etc.). I don't really have much to add since those discussions have been well-constructed so far, so instead I want to offer a high-level opinion. While there are anti-patterns that such tools expose us to, I believe the accessibility that Instead, having a curated centralized dev tool that ensures that project build related tasks/commands are discoverable, interpretable, and accessible allows contributors to focus their time and energy on actual work. I believe that as long as the underlying commands that are run are clearly advertised, such a tool facilitates learning the underlying workflow more than just jumping back to online documentation despite it being in some sense "automatic". It redirects focus to where it ought to be. I really do think that every little bit of streamlining helps a ton when it comes to attracting new contributors, and encouraging existing contributors. Let's keep improving our workflow in other ways, but I'd love to see us add |
This reverts commit fbdcf68.
To thread the needle towards a fully opt-in approach, I reverted the doc change and only added the |
I agree with everything you said. The point that makes me hesitant/push back is that I bet when the documentation was first created a similar argument was/could have been made for creating those docs. Over time they became less ideal, which is expected. Coffee and milk mix "for free" when you put them in the same mug. Entropy is cruel that way :) For me, adding spin increases our amount of maintenance work - we need to work on undoing the effect of time on the docs and on our spin setup. I think it will be worth it. But it doesn't absolve us from having to try and make the docs nicer (again). |
Is it realistic to remove the |
I think it is definitely in the back of my mind to remove the This reminded me, I added a custom |
Would it make sense to add an optional dependency to EDIT: Nevermind, I keep forgetting that you can't tell |
Co-authored-by: Tim Head <betatim@gmail.com>
should default |
Great to see this merged 🎉!
It has a
|
And now I realised that |
Rereading this maybe you meant "should I would personally vote for |
Nice, missed that |
There you go, I opened a PR to do this 😉 scientific-python/spin#225 |
I mentioned this in the January developer meeting, see meeting notes for more details about motivations.
Right now this is WIP in particular:
Feed-back more than welcome!
Main advantages of switching to spin
You get some nice output when typing
spin
that makes things more discoverable (rather than having to remember a long-winded command and finding it in the shell history)A few nice things you can do easily
spin install -v
spin test
. Because we are using editable you can still usepytest
directly if you still want, but I guess having a uniform interface for newcomers may be an improvement. One caveat is that you need to add--
if you want to pass additional pytest arguments e.g.spin test -- sklearn/tests/test_isotonic.py
rather thanpytest sklearn/tests/test_isotonic.py
spin docs html
, build doc without running examplesspin docs html-noplot
. Currently broken on Windows but will be fixed by FIX make sphinx docs work on Windows scientific-python/spin#206.Try it out locally
You need to install spin >= 0.9 that has support for Meson editable install.
See https://github.com/scientific-python/spin for more details about spin
Possible improvements
using verbose in Meson editable install (to have some output when things are recompiling onHas been fixed in meson 0.10 Improve debug printing for Meson editable installs scientific-python/spin#192sklearn
import) does not seem straightforward but there is likely a way to do it with a spin custom commandAny feed-back more than welcome!
Concerns raised during the January developer meeting
I think this is certainly a price to pay but to me this is definitely worth since you don't have to remember (or be able to find in the shell history or in the online doc) arcane long-winded command. Experienced contributors are used to this but for newcomers this is definitely a stumbling block.
In my short experience, it indeed by default outputs information about the commands being run. If we use custom commands, adding this kind of helpful output will be our responsibility see scientific-python/spin#161 where I asked about this.
So no good answer on this one ... this kind of happens already since
spin docs html
goes through theMakefile
. Maybe there are some opportunities to improve the situation in spin itself. See scientific-python/spin#161 (already mentioned above) where I asked about this as well.