-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MNT] Extra dependency specifications per component #5136
Conversation
I've started working on it, and I will request all developers to provide a very nitpicky review of these to avoid future test failures, conflicts, etc. I'll list some of the issues I noted below, in decreasing order of priority according to me, and all of these are up for discussion. missing optional dependenciesI've created the new extras based on presence of
numba as soft dependencyAs far as I can see, bounds of soft dependenciesIf I am not wrong, lower bounds in
This will increase lower bounds of a lot of packages, but at least those will be tested in the CI directly. If we keep the existing lower bounds, those are not tested and because of the possible number of interactions, it's next to impossible to test those with 100% confidence. combination of extrasAs pointed out in the original issue, it may be common that few components (or parts of them) are usually meant to work together, e.g. separate extra for testingAs of now, testing dependencies are part of separate extras for too restrictive dependenciesThere are few dependencies which affect the rest by adding significant restrictions on sub-dependencies (e.g. |
Since this is a draft PR, I think Github notifications are not sent automatically. Inviting all council members (@sktime/community-council), core developers (@sktime/core-developers) and current mentees (@BensHamza, @hazrulakmal, @luca-miniati) to share their opinions. |
Re
See further discussion here: #3567 |
There has been a discussion on how we manage soft dependency bounds systematically here: but back then no one really engaged. The de-facto strategy has been, for core dependencies, to adopt defensive bounds, because at Defensive bounds are justifiable only if the releases happen on a regular basis, and swiftly after dependencies upgrade. |
Makes sense. |
That's an interesting question how to manage these. I suppose a separate dependency set is the obvious answer, although I wonder how it scales in terms of maintenance burden. Anything more automatic we can do here? I'm thinking along the lines of dynamic CI elements and environments based on the estimator tags - moving further in the direction of a full featured mini-package manager. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thanks for all the hard work!
My major change request is that you also outline the maintenance workflow you intend here, in a change to docs/source/developer_guide/dependencies.rst
, that would make review easier. It's also important to think about what changes this makes to the contributing developer workflow, and the release manager workflow.
pyproject.toml
Outdated
"tbats", | ||
] | ||
networks = [ | ||
"tensorflow", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking ahead, do we want to cater for multiple backends? There's pytorch
and tensorflow
, a given user is unlikely to want to install them both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but will wait for comments from developers working on these.
What about network-tensorflow
and network-pytorch
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, makes sense! Although do we ever expect anything more than a single package in these?
@fkiraly, I took a look at #1480, and I agree in principle. But I would like to suggest slight modifications to consider as well for being specific:
Let me know what do you think. The I don't think it's a good idea to have a mini package manager for Also, any opinion on the first point? Regarding packages being present in
I shall start documenting, but I was waiting for active core developers ( @achieveordie , @benHeid , etc. ), CC members ( @marrov , @JonathanBechtel , etc. ) and others to share their opinions (in favour or against) as well. Tagging them as a reminder, and will update the PR by this weekend if there are no objections. |
@yarnabrina, I agree in-principle with your suggestions, would you be so kind to copy the "version bounds" part of your post to #1480 where it's topical, and we continue discussion there? I will reply to the rest here. |
I think it is both specific and ubiquitous enough in classification, regression, clustering, distances/kernels, to have it as a default in those dependency sets. It makes sense to have it as an extra dependency set though, as the other estimator types may occasionally have it, but you wouldn't want
Agreed.
Hm, I'm undecided here - I am also in strong favour of simplicity though, I really mean sth light touch.
Apologies, can you point me to a precise reference for "the first point"? What are you referring to?
Ok - I was not thinking about extensive documentation though, but I can see the point. In that case, can you explain (without going all the way to formal documentation) how the developer and the release workflows would look like? Let's say we add one new package as a soft dependency. There are now ca 10 dependency sets we could add it to. E.g., is there a difference if the package is really "single-use, sporadic"? Does the developer add it, if yes, where? |
I meant this part:
As far as I can see, these are the packages that are mentioned in
And these are present as dependency in tags but not part of
I'm not sure if I have a ready deterministic answer, but here's my take on the approach. If a developer adds new estimator In this way, users who just want all forecasting capabilities of Now you raised a very valid point that what if a package is required only for a very small percentage of estimators. That I think is the issue with current |
…encies * origin/main: [BUG] Fix tag to indicate support of exogenous features by `NaiveForecaster` (sktime#5162) [BUG] fix check causing exception in `ConformalIntervals` in `_predict` (sktime#5134) [ENH] empirical distribution (sktime#5094) [MNT] Update versions of pre commit hooks and fix `E721` issues pointed out by `flake8` (sktime#5163) [ENH] `tslearn` distances and kernels including adapter (sktime#5039) [ENH] Student's t-distribution (sktime#5050) [MNT] bound `statsforecast<1.6.0` due to recent failures (sktime#5149) [ENH] widen scope of conditional test execution (sktime#5135) [MNT] [Dependabot](deps-dev): Update sphinx-gallery requirement from <0.14.0 to <0.15.0 (sktime#5124) [MNT] upgrade CI runners to latest stable images (sktime#5031)
@fkiraly and other reviewers, as of now my plan is not to add/remove/modify any more of the packages from the new extras I added, and I plan to add lower/upper bounds to this criteia by tomorrow:
Please let me know if this is not okay. I plan to make this a PR open for review (and merge) by tomorrow, even though these will not be tested until we split the CI per components. |
Not sure what you mean, but if you mean identical to current, that's fine with me. I wouldn't move them up as it might break sth for someone, we should ensure we have lower range testing coverage first before we move that around. |
…encies * origin/main: [DOC] minor docstring typo fixes in `_DelegatedForecaster` module (sktime#5168) [DOC] Fix make_pipeline, make_reduction, window_summarizer & load_forecasting data docstrings (sktime#5065)
Hm, could you at least update the docs with what we agreed upon, for the upper bounds? Re lower, should we collate the options, we could try to talk to users and developers with that. |
How aout #1480, renaming it to sth more general? Or a new issue? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree with the content in the dev notes, would only request to merge the sections to avoid duplication and have the dep set discussion in one single place
…encies * origin/main: [DOC] speed-up tutorial notebooks - deep learning classifiers (sktime#5169) [ENH] fixture names in probability distribution tests (sktime#5159) [ENH] test for specification conformance of tag register (sktime#5170) &SmirnGregHM [BUG] ensure forecasting tuners do not vectorize over columns (variables) (sktime#5145) [ENH] VMD (variational mode decomposition) transformer based on `vmdpy` (sktime#5129) [ENH] Interface statsmodels MSTL - transformer (sktime#5125) [ENH] add tag for inexact `inverse_transform`-s (sktime#5166) [ENH] refactor and add conditional execution to `numba` based distance tests (sktime#5141) [MNT] move fixtures in `test_dropna` to `pytest` fixtures (sktime#5153) [BUG] prevent exception in `PyODAnnotator.get_test_params` (sktime#5151) [MNT] move fixtures in `test_reduce_global` to `pytest` fixtures (sktime#5157) [MNT] fix dependency isolation of `DateTimeFeatures` tests (sktime#5154) [MNT] lower dep bound compatibility patch - `binom_test` (sktime#5152) [MNT] test forecastingdata downloads only on a small random subset (sktime#5146) [ENH] widen scope of change-conditional test execution (sktime#5147) [DOC] update forecasting extension template on `predict_proba` (sktime#5138)
@fkiraly Sorry for no updates on this seemingly small PR for more than a week, but was quite busy (and probably for next weeks as well). Addressed you comments by merging the sections and specifically noting requirement of CI pass. Please review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In-principle agree, great improvement!
Though, why is the CI not running?
This is intentional, caused by the last pushed commit message containing "skip CI". Ref. https://docs.github.com/en/actions/managing-workflow-runs/skipping-workflow-runs This PR only contains documentation update (being tested in read the docs job - unaffected by above skipping) and new extras (which can't be tested by existing CI). So, I thought we can skip CI. |
Ah, thanks - I got confused since I was somehow (incorrectly) assuming that the change in dependency sets also affected the CI setup somehow, and I couldn't fit the two together since none of the relevant code seemed to have obviously changed. That explains it! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved hence.
An out-of-scope problem arises in combination of this, and the requirement on how we update the upper bound - "all tests pass". How do we ensure this, concretely, given that the current setup has incremental testing, and not full testing?
Closes task 1 of #5101.