[BUG] Fixing dockerized tests #5426

kurayami07734 · 2023-10-14T11:23:23Z

Changes

added full_test build target to makefile

What should a reviewer concentrate their feedback on?

I was thinking that to have similar functionality to only_changed_modules we could do a diff on a curl from the origin/main

yarnabrina · 2023-10-14T14:38:17Z

It seems that you did the exact same change as I suggested, but I currently don't have a way to test if this will solve docker issues that you had earlier (don't have docker setup in my personal laptop). The changes look okay and I'll approve, subject to someone testing them on docker images.

kurayami07734 · 2023-10-14T15:12:56Z

I did run them, but it takes too long and
It got stuck around 17%

edit

I ran the test using all resources on my laptop, it ran fine till about 15%, then workers started crashing
After which I stopped it

yarnabrina · 2023-10-16T15:14:22Z

I don't think I'm aware how these images are expected to be used. Current CI takes well over 3 hours (e.g. https://github.com/sktime/sktime/actions/runs/6443020075). Depending on your laptop's specifications, and how much resource you allocated to docker, crashing is probably not very surprising.

(By the way, trying in CI was a nice idea, but it probably would need some extra rights that you'll not have in sktime GHA. You may try in your fork though, but it stull needs some more tinkering. I never tried so don't know for sure. However, a small feedback: building for every push in all branches and every PR seems a bit too frequent.)

@fkiraly @achieveordie can you tell how are these docker images intended to be used by users? Is it just for the purpose of giving new developers a working environment with everything pre-installed, or are these released and maintained in any way?

If the former case, if @kurayami07734 checks basic usage and one of us confirms, I think it's enough. If this gets published in dockerhub or similar in the latter case, then we need to find a better approach to test updates in docker, unless we assume CI testing on ubuntu:latest is enough.

kurayami07734 · 2023-10-24T11:51:06Z

@yarnabrina
I used google cloud shell to run the test

sktime/classification/tests/test_sklearn_compatability.py::test_sklearn_composite_classifiers[composite_classifier2-data_args1]
  /usr/local/lib/python3.8/site-packages/sklearn/calibration.py:300: FutureWarning: `base_estimator` was renamed to `estimator` in version 1.2 and will be removed in 1.4.
    warnings.warn(

sktime/benchmarking/tests/test_forecasting.py::test_forecastingbenchmark[expected_results_df1-scorers1]
sktime/benchmarking/tests/test_forecasting.py::test_forecastingbenchmark[expected_results_df0-scorers0]
  /usr/src/sktime/sktime/benchmarking/forecasting.py:38: DeprecationWarning: Starting v0.25.0 model_evaluation.evaluate module will rearrange all metric columns to the left of its output result DataFrame. Please use loc references when addressing the columns. You can safely ignore this warning if you don't use evaluate function directly.
    scores_df = evaluate(forecaster=estimator, y=y, cv=cv_splitter, scoring=scorers)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.16-final-0 -----------
Coverage HTML written to dir htmlcov
Coverage XML written to file coverage.xml

============================================================================================================ slowest 20 durations =============================================================================================================
128.77s call     sktime/forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_update_predict_predicted_index[BATS-1-y:2cols-update_params=True-1-fh=1]
92.56s call     sktime/forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_update_predict_predicted_index[BATS-1-y:2cols-update_params=True-1-fh=[2 5]]
88.56s call     sktime/forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_update_predict_predicted_index[BATS-1-y:2cols-update_params=True-5-fh=1]
83.41s call     sktime/forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_hierarchical_with_exogeneous[BATS-1-y:2cols]
60.23s call     sktime/forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_update_predict_predicted_index[BATS-1-y:1cols-update_params=True-1-fh=1]
52.18s call     sktime/forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_update_predict_predicted_index[BATS-1-y:2cols-update_params=True-5-fh=[2 5]]
51.10s call     sktime/classification/early_classification/tests/test_teaser.py::test_teaser_full_length
50.75s call     sktime/datasets/tests/test_single_problem_loaders.py::test_load_UEA
45.11s call     sktime/forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_hierarchical_with_exogeneous[BATS-1-y:1cols]
41.03s call     sktime/forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_update_predict_predicted_index[BATS-1-y:1cols-update_params=True-5-fh=1]
41.00s call     sktime/forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_update_predict_predicted_index[BATS-1-y:1cols-update_params=True-1-fh=[2 5]]
32.82s call     sktime/classification/hybrid/tests/test_hivecote_v2.py::test_contracted_hivecote_v2
31.96s call     sktime/clustering/tests/test_k_shapes.py::test_kshapes
30.92s call     sktime/transformations/series/tests/test_clear_sky.py::test_clearsky_trafo_vals
28.90s call     sktime/tests/test_all_estimators.py::TestAllEstimators::test_non_state_changing_method_contract[StatsForecastMSTL-2-ForecasterFitPredictUnivariateWithX-get_fitted_params]
25.82s call     sktime/utils/tests/test_check_estimator.py::test_check_estimator_does_not_raise[ForecastKnownValues]
25.79s call     sktime/utils/tests/test_check_estimator.py::test_check_estimator_passed[ForecastKnownValues]
25.62s call     sktime/forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_predict_time_index[StatsForecastAutoCES-1-y:2cols-fh=[2 5]-datetime-datetime-False]
25.30s call     sktime/forecasting/model_evaluation/tests/test_evaluate.py::test_evaluate_bigger_X[AutoARIMA]
24.22s call     sktime/forecasting/tests/test_conformal.py::test_conformal_with_hierarchical
=========================================================================================================== short test summary info ===========================================================================================================
FAILED sktime/datasets/tests/test_single_problem_loaders.py::test_check_link_downloadable[m1_quarterly_dataset] - urllib.error.HTTPError: HTTP Error 500: INTERNAL SERVER ERROR
====================================================================== 1 failed, 38254 passed, 874 skipped, 13 xfailed, 10 xpassed, 2625 warnings in 10640.72s (2:57:20) ======================================================================
make: *** [Makefile:38: full_test] Error 1
make: *** [Makefile:104: dockertest] Error 2

yarnabrina · 2023-10-24T14:54:48Z

I won't worry much about the HTTP error, happening for a few days in the CI. But if you are running full tests, I wonder why 874 tests are skipped. Is that not unintuitive?

Do you see any patterns in the skipped tests?

(Also the number of tests appear to be too less than what I expected. I was expecting around 84463 tests, which I got by running pytest --collect-only --matrixdesign False --only_changed_modules False sktime locally against main branch.)

@fkiraly @achieveordie can you tell how are these docker images intended to be used by users? Is it just for the purpose of giving new developers a working environment with everything pre-installed, or are these released and maintained in any way?

Just in case you missed this, tagging once more @fkiraly @achieveordie .

kurayami07734 · 2023-10-25T05:47:37Z

@yarnabrina

=========================== short test summary info ============================
SKIPPED [1] sktime/classification/early_classification/tests/test_all_early_classifiers.py: got empty parameter set ['estimator_instance'], function test_multivariate_input_exception at /usr/src/sktime/sktime/classification/early_classification/tests/test_all_early_classifiers.py:36
SKIPPED [1] sktime/classification/early_classification/tests/test_all_early_classifiers.py: got empty parameter set ['estimator_instance', 'scenario'], function test_classifier_output at /usr/src/sktime/sktime/classification/early_classification/tests/test_all_early_classifiers.py:41
SKIPPED [1] sktime/tests/test_softdeps.py:209: got empty parameter set ['estimator'], function test_python_error at /usr/src/sktime/sktime/tests/test_softdeps.py:208
SKIPPED [1] sktime/proba/tests/test_proba_basic.py:11: skip test if required soft dependency is not available
SKIPPED [2] sktime/utils/tests/test_mlflow_sktime_model_export.py:392: skip test if required soft dependency not available
SKIPPED [4] sktime/utils/tests/test_mlflow_sktime_model_export.py:357: skip test if required soft dependency not available
SKIPPED [4] sktime/utils/tests/test_mlflow_sktime_model_export.py:415: skip test if required soft dependency not available
SKIPPED [2] sktime/utils/tests/test_mlflow_sktime_model_export.py:119: skip test if required soft dependency not available
SKIPPED [1] sktime/utils/tests/test_mlflow_sktime_model_export.py:302: skip test if required soft dependency not available
SKIPPED [4] sktime/utils/tests/test_mlflow_sktime_model_export.py:489: skip test if required soft dependency not available
SKIPPED [1] sktime/utils/tests/test_mlflow_sktime_model_export.py:463: skip test if required soft dependency not available
SKIPPED [1] sktime/utils/tests/test_mlflow_sktime_model_export.py:695: skip test if required soft dependency not available
SKIPPED [1] sktime/utils/tests/test_mlflow_sktime_model_export.py:588: skip test if required soft dependency not available
SKIPPED [1] sktime/utils/tests/test_mlflow_sktime_model_export.py:650: skip test if required soft dependency not available
SKIPPED [2] sktime/utils/tests/test_mlflow_sktime_model_export.py:96: skip test if required soft dependency not available
SKIPPED [1] sktime/utils/tests/test_mlflow_sktime_model_export.py:532: skip test if required soft dependency not available
SKIPPED [1] sktime/utils/tests/test_mlflow_sktime_model_export.py:671: skip test if required soft dependency not available
SKIPPED [1] sktime/utils/tests/test_mlflow_sktime_model_export.py:225: skip test if required soft dependency not available
SKIPPED [1] sktime/utils/tests/test_mlflow_sktime_model_export.py:630: skip test if required soft dependency not available
SKIPPED [1] sktime/utils/tests/test_mlflow_sktime_model_export.py:564: skip test if required soft dependency not available
SKIPPED [1] sktime/utils/tests/test_mlflow_sktime_model_export.py:171: skip test if required soft dependency not available
SKIPPED [1] sktime/utils/tests/test_mlflow_sktime_model_export.py:609: skip test if required soft dependency not available
SKIPPED [1] sktime/utils/tests/test_mlflow_sktime_model_export.py:279: skip test if required soft dependency not available
SKIPPED [696] sktime/tests/test_all_estimators.py:1432: hangs on mac and unix remote tests
SKIPPED [72] sktime/forecasting/model_selection/tests/test_tune.py:246: run test only if softdeps are present and incrementally (if requested)
SKIPPED [1] sktime/forecasting/model_selection/tests/test_tune.py:286: run test only if softdeps are present and incrementally (if requested)
SKIPPED [54] sktime/forecasting/base/tests/test_fh.py:71: steps and fh_type are incompatible
SKIPPED [15] ../../local/lib/python3.8/site-packages/_pytest/doctest.py:456: all tests skipped by +SKIP option
SKIPPED [1] sktime/dists_kernels/tests/test_all_dist_kernels.py: got empty parameter set ['estimator_instance', 'scenario'], function test_pairwise_transformers_tabular at /usr/src/sktime/sktime/dists_kernels/tests/test_all_dist_kernels.py:29
= 38255 passed, 874 skipped0m, 13 xfailed, 10 xpassed, 115 warnings in 12703.16s (3:31:43)

achieveordie · 2023-10-25T10:51:44Z

Apologies for having missed the tags. I was not involved during this development but from my perspective, it seems like this is meant to act as a developer environment rather than something an end user will use.

yarnabrina · 2023-10-25T16:03:18Z

@kurayami07734 This is very surprising to me to get these many skips caused by missing soft dependencies, if you have installed all the soft dependencies as part of docker build. Also, significant mismatch in the number of tests is very odd as well.

However, if it's just for development and not for end users directly, I'm satisfied with all the changes as of now. That's because a lot is coming from mlflow and tensorflow-probability and intentional skips. Please make it ready for review and request @fkiraly and others for their feedbacks.

added full_test build target to makefile

2c28761

kurayami07734 added 2 commits October 15, 2023 01:14

added workflow

cbc3986

adding github action did not work

be769b2

yarnabrina previously approved these changes Oct 25, 2023

View reviewed changes

Merge branch 'main' into pr/5426

ebb5394

fkiraly dismissed yarnabrina’s stale review via ebb5394 January 23, 2024 17:29

[AUTOMATED] update CONTRIBUTORS.md

ad21d3d

fkiraly marked this pull request as ready for review January 23, 2024 17:31

fkiraly requested review from achieveordie, benHeid and fkiraly as code owners January 23, 2024 17:31

fkiraly added the maintenance Continuous integration, unit testing & package distribution label Jan 23, 2024

fkiraly changed the title ~~[Bug] Fixing docker tests~~ [BUG] Fixing dockerized tests Jan 23, 2024

fkiraly added the module:tests test framework functionality - only framework, excl specific tests label Jan 23, 2024

fkiraly merged commit 29c77dc into sktime:main Jan 23, 2024
1 check passed

fkiraly changed the title ~~[BUG] Fixing dockerized tests~~ [BUG] Fixing dockerized tests Jan 23, 2024

yarnabrina mentioned this pull request Mar 8, 2024

[MNT] Docker files updated #6076

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Fixing dockerized tests #5426

[BUG] Fixing dockerized tests #5426

kurayami07734 commented Oct 14, 2023

yarnabrina commented Oct 14, 2023

kurayami07734 commented Oct 14, 2023 •

edited

Loading

yarnabrina commented Oct 16, 2023

kurayami07734 commented Oct 24, 2023

yarnabrina commented Oct 24, 2023 •

edited

Loading

kurayami07734 commented Oct 25, 2023

achieveordie commented Oct 25, 2023

yarnabrina commented Oct 25, 2023

[BUG] Fixing dockerized tests #5426

[BUG] Fixing dockerized tests #5426

Conversation

kurayami07734 commented Oct 14, 2023

Changes

What should a reviewer concentrate their feedback on?

yarnabrina commented Oct 14, 2023

kurayami07734 commented Oct 14, 2023 • edited Loading

edit

yarnabrina commented Oct 16, 2023

kurayami07734 commented Oct 24, 2023

yarnabrina commented Oct 24, 2023 • edited Loading

kurayami07734 commented Oct 25, 2023

achieveordie commented Oct 25, 2023

yarnabrina commented Oct 25, 2023

kurayami07734 commented Oct 14, 2023 •

edited

Loading

yarnabrina commented Oct 24, 2023 •

edited

Loading