[MNT] Extra dependency specifications per component #5136

yarnabrina · 2023-08-20T07:58:13Z

Closes task 1 of #5101.

yarnabrina · 2023-08-20T08:22:53Z

I've started working on it, and I will request all developers to provide a very nitpicky review of these to avoid future test failures, conflicts, etc.

I'll list some of the issues I noted below, in decreasing order of priority according to me, and all of these are up for discussion.

missing optional dependencies

I've created the new extras based on presence of python_dependencies, i.e. I created a set of all packages mentioned in all occurrences of this tag in a component and created the extra accordingly. However, I note that the union of the new extras is a proper subset of all_extras, which implies a lot of these are directly imported somewhere without adding as dependencies. In some cases it may be justifiable as it's not always necessary (e.g. matplotlib), in some cases it may not be (e.g. kotsu). So, how to handle these need to be planned.

add these as explicit dependencies in separate PR's (out of scope of this PR)
create new extra dependencies to capture these
etc.

numba as soft dependency

As far as I can see, numba is present in almost all components (but not in all estimators of those components - at least not directly). Does it make sense to make this a mandatory dependency? What are the challenges to do so?

bounds of soft dependencies

If I am not wrong, lower bounds in all_extras are created not in a systematic way, they may be added once when they were added and then updated only if someone reported some issue or some CI test failure or etc. Since this split will give us an opportunity to introduce new extras and hence the possibility of changing versions, I propose that we formally come up with some strategy. The following is my suggestion:

use the latest minor version as the lower bound (subject to no conflicts in CI)
use the next upcoming minor version as the upper bound

This will increase lower bounds of a lot of packages, but at least those will be tested in the CI directly. If we keep the existing lower bounds, those are not tested and because of the possible number of interactions, it's next to impossible to test those with 100% confidence.

combination of extras

As pointed out in the original issue, it may be common that few components (or parts of them) are usually meant to work together, e.g. holidays based transformers are probably suitable only along with forecasting estimators. I did note the fact and still chose to split it, as nothing stops users from using pip install sktime[forecasting,transformations] or pip install holidays sktime[forecasting].

separate extra for testing

As of now, testing dependencies are part of dev extra. This is fine when doing locally, but I think it will be better have a dedicated tests dependency. Then while I start on the next PR to split CI per component, in each we can just use pip install .[<component_name>,tests] and unnecessary (for testing) dependencies will be skipped.

separate extras for too restrictive dependencies

There are few dependencies which affect the rest by adding significant restrictions on sub-dependencies (e.g. scikit-optimize on numpy directlyand almosteverythingelse indirectly) or by requiring too many system dependencies (torch when it gets added by deep learning works). Adding these similarly to others may not allow testing on latest versions, and in some cases users may be blocked to install an extra just because one package is not getting installed successfully. We can create specific extras for these situations (e.g. forecasting-skopt or classification-torch), but how these will be tested need to be planned.

yarnabrina · 2023-08-20T08:30:37Z

Since this is a draft PR, I think Github notifications are not sent automatically. Inviting all council members (@sktime/community-council), core developers (@sktime/core-developers) and current mentees (@BensHamza, @hazrulakmal, @luca-miniati) to share their opinions.

fkiraly · 2023-08-20T09:33:27Z

numba as soft dependency

Re numba being a core or soft dependency, there has been a long discussion on this, the "for" side is:

it's not needed in the framework, so should be managed with estimators from an architectural perspective
installation causes a lot of problems for the user base typically, due to C dependency
speed losses in non-numba pats of code execution due to numba compiler - non-dependent code is faster if it is not installed!
numba typically lags 6-12 months behind with their dependencies, so implies upper bounds on things like pandas, numpy

See further discussion here: #3567

fkiraly · 2023-08-20T09:36:44Z

bounds of soft dependencies

There has been a discussion on how we manage soft dependency bounds systematically here:
#1480

but back then no one really engaged.

The de-facto strategy has been, for core dependencies, to adopt defensive bounds, because at pandas updates we were regularly getting failures and user complaints - it would also hit actual deployments. Similarly, for popular soft dependencies like statsmodels, but we haven't been doing that systematically (more along the lines "it hit us once, so let's now put a bound there").

Defensive bounds are justifiable only if the releases happen on a regular basis, and swiftly after dependencies upgrade.
Sometimes there is a longer delay to fix bugs arising from dependency updates, but that's justifiable as it would have rendered the packages incompatible in the alternative scenario.

fkiraly · 2023-08-20T09:37:45Z

combination of extras

Makes sense.

fkiraly · 2023-08-20T09:39:29Z

separate extras for too restrictive dependencies

That's an interesting question how to manage these. I suppose a separate dependency set is the obvious answer, although I wonder how it scales in terms of maintenance burden. Anything more automatic we can do here?

I'm thinking along the lines of dynamic CI elements and environments based on the estimator tags - moving further in the direction of a full featured mini-package manager.

fkiraly

This looks great, thanks for all the hard work!

My major change request is that you also outline the maintenance workflow you intend here, in a change to docs/source/developer_guide/dependencies.rst, that would make review easier. It's also important to think about what changes this makes to the contributing developer workflow, and the release manager workflow.

pyproject.toml

fkiraly · 2023-08-20T09:43:34Z

pyproject.toml

+    "tbats",
+]
+networks = [
+    "tensorflow",


thinking ahead, do we want to cater for multiple backends? There's pytorch and tensorflow, a given user is unlikely to want to install them both.

Agreed, but will wait for comments from developers working on these.

What about network-tensorflow and network-pytorch?

hm, makes sense! Although do we ever expect anything more than a single package in these?

pyproject.toml

yarnabrina · 2023-08-22T15:20:13Z

@fkiraly, I took a look at #1480, and I agree in principle. But I would like to suggest slight modifications to consider as well for being specific:

More precisely:

* upper version bounds for _all_ dependencies

* at each release, we bump the upper bound to the current versions, or the highest versions that don't cause breakage

* if at a release we cannot move the upper bound to current version, we open a maintenance issue to log this

both upper and lower bounds for all dependencies
- initiate lower bounds as tested (current as of now unless someone manually tests for lower versions) minor version (assuming major.minor.patch format)
- initiate upper bounds as next non-breaking version (next major if major is non-zero, next minor if major is zero - this is probably standard)
regularly update bounds
- upper bounds are automatically updated by dependabot
- lower bounds are updated if there are at least k (2/3 - needs to be decided) minor releases have been done since

Let me know what do you think.

The numba seems like a serious discussion, and personally I've never used this package so no opinions. Will it make sense to not have numba by default in any of the extras (to avoid effect on estimators that don't need them), and then have a separate numba extra with just itself? I suppose the same may be applied to pytorch or tensorflow as well (too restrictive package point), waiting for others inputs.

I don't think it's a good idea to have a mini package manager for sktime. There's plenty already, and it'll probably make things more complicated. I'll prefer simple and easy separate extra, as I can't think of any automated process.

Also, any opinion on the first point? Regarding packages being present in all_extras and few methods having them as import, but not being specified as dependencies through tags? What should we do about those?

My major change request is that you also outline the maintenance workflow you intend here, in a change to docs/source/developer_guide/dependencies.rst, that would make review easier.

I shall start documenting, but I was waiting for active core developers ( @achieveordie , @benHeid , etc. ), CC members ( @marrov , @JonathanBechtel , etc. ) and others to share their opinions (in favour or against) as well. Tagging them as a reminder, and will update the PR by this weekend if there are no objections.

fkiraly · 2023-08-22T15:28:01Z

@yarnabrina, I agree in-principle with your suggestions, would you be so kind to copy the "version bounds" part of your post to #1480 where it's topical, and we continue discussion there? I will reply to the rest here.

fkiraly · 2023-08-22T15:33:29Z

The numba seems like a serious discussion, and personally I've never used this package so no opinions. Will it make sense to not have numba by default in any of the extras (to avoid effect on estimators that don't need them), and then have a separate numba extra with just itself?

I think it is both specific and ubiquitous enough in classification, regression, clustering, distances/kernels, to have it as a default in those dependency sets. It makes sense to have it as an extra dependency set though, as the other estimator types may occasionally have it, but you wouldn't want numba if only 1 out of 25 estimators needs it.

I suppose the same may be applied to pytorch or tensorflow as well (too restrictive package point), waiting for others inputs.

Agreed.

I don't think it's a good idea to have a mini package manager for sktime. There's plenty already, and it'll probably make things more complicated. I'll prefer simple and easy separate extra, as I can't think of any automated process.

Hm, I'm undecided here - I am also in strong favour of simplicity though, I really mean sth light touch.

Also, any opinion on the first point? Regarding packages being present in all_extras and few methods having them as import, but not being specified as dependencies through tags? What should we do about those?

Apologies, can you point me to a precise reference for "the first point"? What are you referring to?

My major change request is that you also outline the maintenance workflow you intend here, in a change to docs/source/developer_guide/dependencies.rst, that would make review easier.

I shall start documenting, but I was waiting for active core developers ( @achieveordie , @benHeid , etc. ), CC members ( @marrov , @JonathanBechtel , etc. ) and others to share their opinions (in favour or against) as well. Tagging them as a reminder, and will update the PR by this weekend if there are no objections.

Ok - I was not thinking about extensive documentation though, but I can see the point.

In that case, can you explain (without going all the way to formal documentation) how the developer and the release workflows would look like? Let's say we add one new package as a soft dependency. There are now ca 10 dependency sets we could add it to. E.g., is there a difference if the package is really "single-use, sporadic"? Does the developer add it, if yes, where?

yarnabrina · 2023-08-22T16:55:38Z

can you point me to a precise reference for "the first point"? What are you referring to?

I meant this part:

missing optional dependencies

I've created the new extras based on presence of python_dependencies, i.e. I created a set of all packages mentioned in all occurrences of this tag in a component and created the extra accordingly. However, I note that the union of the new extras is a proper subset of all_extras, which implies a lot of these are directly imported somewhere without adding as dependencies. In some cases it may be justifiable as it's not always necessary (e.g. matplotlib), in some cases it may not be (e.g. kotsu). So, how to handle these need to be planned.
1. add these as explicit dependencies in separate PR's (out of scope of this PR)

2. create new extra dependencies to capture these

3. etc.

As far as I can see, these are the packages that are mentioned in all_extras but not mentioned in python_dependencies ever:

cloudpickle (present in utils)
dash (no idea where it's used)
dask (present in datatypes)
gluonts (present in datatypes)
h5py (present in classification)
kotsu (present in benchmarking)
matplotlib (present in annotation, benchmarking, classification, clustering, forecasting, utils)
scikit_posthocs (present in benchmarking)
seaborn (present in annotation, benchmarking, utils)
xarray (present in datatypes)

And these are present as dependency in tags but not part of all_extras as per main:

hcrystalball (present in forecasting - you already suggested to not add - removed in this PR)
mrsqm (present in classification - present in cython_extras - don't know what it's for)
tensorflow-probability (present in proba - you already suggested to not add - removed in this PR)

In that case, can you explain (without going all the way to formal documentation) how the developer and the release workflows would look like? Let's say we add one new package as a soft dependency. There are now ca 10 dependency sets we could add it to. E.g., is there a difference if the package is really "single-use, sporadic"? Does the developer add it, if yes, where?

I'm not sure if I have a ready deterministic answer, but here's my take on the approach. If a developer adds new estimator E (any concrete estimator) in component C (one of the component sub-directories of sktime/sktime), which depends on package P, the developer should P in the extra for C. For example if I complete darts generalised adapter, I won't add anything, but if I add XGBoost specifically, I'll add darts and xgboost under forecasting.

In this way, users who just want all forecasting capabilities of sktime can just do pip install sktime[forecasting] and not bother about anything else, and if they want to be specific, then they'll do as they do currently: manual specification of optional dependencies. To test for the simple user case, CI will also have a set of forecast dedicated jobs which will test if all forecast dependencies are working well together or not.

Now you raised a very valid point that what if a package is required only for a very small percentage of estimators. That I think is the issue with current all_extras, and I hope that the percentage will be improve with component based extras. It still won't be 100% or "high", but improved for sure. To achieve that level of isolation, I think we either need to have separate extra per estimator or package, which doesn't seem scalable (in the sense of having too many extras to test independently for all in different operating systems).

…encies * origin/main: [BUG] Fix tag to indicate support of exogenous features by `NaiveForecaster` (sktime#5162) [BUG] fix check causing exception in `ConformalIntervals` in `_predict` (sktime#5134) [ENH] empirical distribution (sktime#5094) [MNT] Update versions of pre commit hooks and fix `E721` issues pointed out by `flake8` (sktime#5163) [ENH] `tslearn` distances and kernels including adapter (sktime#5039) [ENH] Student's t-distribution (sktime#5050) [MNT] bound `statsforecast<1.6.0` due to recent failures (sktime#5149) [ENH] widen scope of conditional test execution (sktime#5135) [MNT] [Dependabot](deps-dev): Update sphinx-gallery requirement from <0.14.0 to <0.15.0 (sktime#5124) [MNT] upgrade CI runners to latest stable images (sktime#5031)

yarnabrina · 2023-08-26T17:32:13Z

@fkiraly and other reviewers, as of now my plan is not to add/remove/modify any more of the packages from the new extras I added, and I plan to add lower/upper bounds to this criteia by tomorrow:

upper bound as latest minor version + 1 (unless all_extras specifically have set a lower limit for some reason)
lower bound as identical to all_extras (changing from current minor version because of different point of views of most compatibility vs most confident coverage)

Please let me know if this is not okay. I plan to make this a PR open for review (and merge) by tomorrow, even though these will not be tested until we split the CI per components.

fkiraly · 2023-08-26T18:11:35Z

lower bound as identical to all_extras (changing from current minor version because of different point of views of most compatibility vs most confident coverage)

Not sure what you mean, but if you mean identical to current, that's fine with me.

I wouldn't move them up as it might break sth for someone, we should ensure we have lower range testing coverage first before we move that around.

…encies * origin/main: [DOC] minor docstring typo fixes in `_DelegatedForecaster` module (sktime#5168) [DOC] Fix make_pipeline, make_reduction, window_summarizer & load_forecasting data docstrings (sktime#5065)

fkiraly · 2023-08-27T19:40:31Z

no documentation has been added for bound updates in future, as only @fkiraly and I participated and didn't quite agree on our point of views

Hm, could you at least update the docs with what we agreed upon, for the upper bounds?

Re lower, should we collate the options, we could try to talk to users and developers with that.
My position comes from the high-level principle, no unexpected changes that could break downstream use for someone without an upper bound on sktime (I would consider this quite important).

yarnabrina · 2023-08-28T17:16:00Z

@fkiraly documented about bounds as I could think of here. Please take a look.

Agreed on collating point of views, arguments/counter-arguments. Let me know where you want to do (or already done), and I can share my points as well.

fkiraly · 2023-08-28T19:26:56Z

Let me know where you want to do (or already done), and I can share my points as well.

How aout #1480, renaming it to sth more general? Or a new issue?

docs/source/developer_guide/dependencies.rst

fkiraly

agree with the content in the dev notes, would only request to merge the sections to avoid duplication and have the dep set discussion in one single place

…encies * origin/main: [DOC] speed-up tutorial notebooks - deep learning classifiers (sktime#5169) [ENH] fixture names in probability distribution tests (sktime#5159) [ENH] test for specification conformance of tag register (sktime#5170) &SmirnGregHM [BUG] ensure forecasting tuners do not vectorize over columns (variables) (sktime#5145) [ENH] VMD (variational mode decomposition) transformer based on `vmdpy` (sktime#5129) [ENH] Interface statsmodels MSTL - transformer (sktime#5125) [ENH] add tag for inexact `inverse_transform`-s (sktime#5166) [ENH] refactor and add conditional execution to `numba` based distance tests (sktime#5141) [MNT] move fixtures in `test_dropna` to `pytest` fixtures (sktime#5153) [BUG] prevent exception in `PyODAnnotator.get_test_params` (sktime#5151) [MNT] move fixtures in `test_reduce_global` to `pytest` fixtures (sktime#5157) [MNT] fix dependency isolation of `DateTimeFeatures` tests (sktime#5154) [MNT] lower dep bound compatibility patch - `binom_test` (sktime#5152) [MNT] test forecastingdata downloads only on a small random subset (sktime#5146) [ENH] widen scope of change-conditional test execution (sktime#5147) [DOC] update forecasting extension template on `predict_proba` (sktime#5138)

yarnabrina · 2023-09-03T11:22:42Z

@fkiraly Sorry for no updates on this seemingly small PR for more than a week, but was quite busy (and probably for next weeks as well). Addressed you comments by merging the sections and specifically noting requirement of CI pass. Please review.

fkiraly

In-principle agree, great improvement!

Though, why is the CI not running?

yarnabrina · 2023-09-06T02:00:52Z

Though, why is the CI not running?

This is intentional, caused by the last pushed commit message containing "skip CI".

Ref. https://docs.github.com/en/actions/managing-workflow-runs/skipping-workflow-runs

This PR only contains documentation update (being tested in read the docs job - unaffected by above skipping) and new extras (which can't be tested by existing CI). So, I thought we can skip CI.

fkiraly · 2023-09-06T12:47:33Z

This is intentional, caused by the last pushed commit message containing "skip CI".

Ah, thanks - I got confused since I was somehow (incorrectly) assuming that the change in dependency sets also affected the CI setup somehow, and I couldn't fit the two together since none of the relevant code seemed to have obviously changed.

That explains it!

fkiraly

Approved hence.

An out-of-scope problem arises in combination of this, and the requirement on how we update the upper bound - "all tests pass". How do we ensure this, concretely, given that the current setup has incremental testing, and not full testing?

…ency sets (#5204) This PR adds documentation in the two main installation instructions of the granular soft dependency sets introduced with #5136

yarnabrina added 2 commits August 20, 2023 12:40

extras per component

7ce890f

empty commit [skip ci]

ff77dd9

yarnabrina added the maintenance Continuous integration, unit testing & package distribution label Aug 20, 2023

fkiraly requested changes Aug 20, 2023

View reviewed changes

fkiraly reviewed Aug 20, 2023

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

fkiraly reviewed Aug 20, 2023

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

fkiraly reviewed Aug 20, 2023

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

fkiraly reviewed Aug 20, 2023

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

yarnabrina mentioned this pull request Aug 22, 2023

[MNT] adopt defensive upper version bounds on package dependencies #1480

Open

yarnabrina added 4 commits August 22, 2023 22:12

added missing dependency

387f5af

removed proba extra

acae246

removed too restrictive packages from new extras

cc02f26

empty commit [skip ci]

7c06c43

yarnabrina added 3 commits August 27, 2023 21:13

Merge remote-tracking branch 'origin/main' into split-optional-depend…

6c13c2a

…encies * origin/main: [DOC] minor docstring typo fixes in `_DelegatedForecaster` module (sktime#5168) [DOC] Fix make_pipeline, make_reduction, window_summarizer & load_forecasting data docstrings (sktime#5065)

specified bounds

1196e87

empty commit [skip ci]

084774a

yarnabrina added 2 commits August 28, 2023 22:21

document maintenance plan

afd5d79

empty commit [skip ci]

981d35e

fkiraly reviewed Aug 28, 2023

View reviewed changes

docs/source/developer_guide/dependencies.rst Outdated Show resolved Hide resolved

fkiraly reviewed Aug 28, 2023

View reviewed changes

docs/source/developer_guide/dependencies.rst Outdated Show resolved Hide resolved

fkiraly reviewed Aug 28, 2023

View reviewed changes

docs/source/developer_guide/dependencies.rst Outdated Show resolved Hide resolved

fkiraly requested changes Aug 28, 2023

View reviewed changes

yarnabrina added 3 commits September 3, 2023 16:38

merged documentation sections

d24e076

empty commit [skip ci]

1c6060c

yarnabrina requested review from fkiraly, benHeid and achieveordie and removed request for benHeid and achieveordie September 3, 2023 11:20

fkiraly requested changes Sep 5, 2023

View reviewed changes

fkiraly approved these changes Sep 6, 2023

View reviewed changes

fkiraly merged commit 13e0dc5 into sktime:main Sep 9, 2023
1 check passed

fkiraly mentioned this pull request Sep 9, 2023

[DOC] installation instruction docs for learning task specific dependency sets #5204

Merged

yarnabrina deleted the split-optional-dependencies branch September 16, 2023 12:38

yarnabrina mentioned this pull request Oct 2, 2023

[MNT] Split CI jobs per components with specific soft-dependencies #5304

Merged

This was referenced Oct 15, 2023

[MNT] new CI workflow to test extras #5375

Merged

[MNT] split optional dependencies and CI testing per components #5101

Closed

yarnabrina mentioned this pull request Jan 6, 2024

[MNT] Design and rework of CI testing workflows #5706

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MNT] Extra dependency specifications per component #5136

[MNT] Extra dependency specifications per component #5136

yarnabrina commented Aug 20, 2023

yarnabrina commented Aug 20, 2023 •

edited

Loading

yarnabrina commented Aug 20, 2023

fkiraly commented Aug 20, 2023 •

edited

Loading

fkiraly commented Aug 20, 2023 •

edited

Loading

fkiraly commented Aug 20, 2023

fkiraly commented Aug 20, 2023

fkiraly left a comment

fkiraly Aug 20, 2023

yarnabrina Aug 20, 2023

fkiraly Aug 20, 2023

yarnabrina commented Aug 22, 2023

fkiraly commented Aug 22, 2023 •

edited

Loading

fkiraly commented Aug 22, 2023

yarnabrina commented Aug 22, 2023

missing optional dependencies

yarnabrina commented Aug 26, 2023

fkiraly commented Aug 26, 2023

fkiraly commented Aug 27, 2023 •

edited

Loading

yarnabrina commented Aug 28, 2023

fkiraly commented Aug 28, 2023 •

edited

Loading

fkiraly left a comment

yarnabrina commented Sep 3, 2023

fkiraly left a comment

yarnabrina commented Sep 6, 2023 •

edited

Loading

fkiraly commented Sep 6, 2023

fkiraly left a comment

[MNT] Extra dependency specifications per component #5136

[MNT] Extra dependency specifications per component #5136

Conversation

yarnabrina commented Aug 20, 2023

yarnabrina commented Aug 20, 2023 • edited Loading

missing optional dependencies

numba as soft dependency

bounds of soft dependencies

combination of extras

separate extra for testing

separate extras for too restrictive dependencies

yarnabrina commented Aug 20, 2023

fkiraly commented Aug 20, 2023 • edited Loading

fkiraly commented Aug 20, 2023 • edited Loading

fkiraly commented Aug 20, 2023

fkiraly commented Aug 20, 2023

fkiraly left a comment

Choose a reason for hiding this comment

fkiraly Aug 20, 2023

Choose a reason for hiding this comment

yarnabrina Aug 20, 2023

Choose a reason for hiding this comment

fkiraly Aug 20, 2023

Choose a reason for hiding this comment

yarnabrina commented Aug 22, 2023

fkiraly commented Aug 22, 2023 • edited Loading

fkiraly commented Aug 22, 2023

yarnabrina commented Aug 22, 2023

missing optional dependencies

yarnabrina commented Aug 26, 2023

fkiraly commented Aug 26, 2023

fkiraly commented Aug 27, 2023 • edited Loading

yarnabrina commented Aug 28, 2023

fkiraly commented Aug 28, 2023 • edited Loading

fkiraly left a comment

Choose a reason for hiding this comment

yarnabrina commented Sep 3, 2023

fkiraly left a comment

Choose a reason for hiding this comment

yarnabrina commented Sep 6, 2023 • edited Loading

fkiraly commented Sep 6, 2023

fkiraly left a comment

Choose a reason for hiding this comment

yarnabrina commented Aug 20, 2023 •

edited

Loading

fkiraly commented Aug 20, 2023 •

edited

Loading

fkiraly commented Aug 20, 2023 •

edited

Loading

fkiraly commented Aug 22, 2023 •

edited

Loading

fkiraly commented Aug 27, 2023 •

edited

Loading

fkiraly commented Aug 28, 2023 •

edited

Loading

yarnabrina commented Sep 6, 2023 •

edited

Loading