Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Fixing dockerized tests #5426

Merged
merged 5 commits into from
Jan 23, 2024
Merged

Conversation

kurayami07734
Copy link
Contributor

Fixes #5352

Changes

  • added full_test build target to makefile

What should a reviewer concentrate their feedback on?

I was thinking that to have similar functionality to only_changed_modules we could do a diff on a curl from the origin/main

@yarnabrina
Copy link
Collaborator

It seems that you did the exact same change as I suggested, but I currently don't have a way to test if this will solve docker issues that you had earlier (don't have docker setup in my personal laptop). The changes look okay and I'll approve, subject to someone testing them on docker images.

@kurayami07734
Copy link
Contributor Author

kurayami07734 commented Oct 14, 2023

I did run them, but it takes too long and
It got stuck around 17%

edit

I ran the test using all resources on my laptop, it ran fine till about 15%, then workers started crashing
After which I stopped it

@yarnabrina
Copy link
Collaborator

I don't think I'm aware how these images are expected to be used. Current CI takes well over 3 hours (e.g. https://github.com/sktime/sktime/actions/runs/6443020075). Depending on your laptop's specifications, and how much resource you allocated to docker, crashing is probably not very surprising.

(By the way, trying in CI was a nice idea, but it probably would need some extra rights that you'll not have in sktime GHA. You may try in your fork though, but it stull needs some more tinkering. I never tried so don't know for sure. However, a small feedback: building for every push in all branches and every PR seems a bit too frequent.)


@fkiraly @achieveordie can you tell how are these docker images intended to be used by users? Is it just for the purpose of giving new developers a working environment with everything pre-installed, or are these released and maintained in any way?

If the former case, if @kurayami07734 checks basic usage and one of us confirms, I think it's enough. If this gets published in dockerhub or similar in the latter case, then we need to find a better approach to test updates in docker, unless we assume CI testing on ubuntu:latest is enough.

@kurayami07734
Copy link
Contributor Author

@yarnabrina
I used google cloud shell to run the test

sktime/classification/tests/test_sklearn_compatability.py::test_sklearn_composite_classifiers[composite_classifier2-data_args1]
  /usr/local/lib/python3.8/site-packages/sklearn/calibration.py:300: FutureWarning: `base_estimator` was renamed to `estimator` in version 1.2 and will be removed in 1.4.
    warnings.warn(

sktime/benchmarking/tests/test_forecasting.py::test_forecastingbenchmark[expected_results_df1-scorers1]
sktime/benchmarking/tests/test_forecasting.py::test_forecastingbenchmark[expected_results_df0-scorers0]
  /usr/src/sktime/sktime/benchmarking/forecasting.py:38: DeprecationWarning: Starting v0.25.0 model_evaluation.evaluate module will rearrange all metric columns to the left of its output result DataFrame. Please use loc references when addressing the columns. You can safely ignore this warning if you don't use evaluate function directly.
    scores_df = evaluate(forecaster=estimator, y=y, cv=cv_splitter, scoring=scorers)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.16-final-0 -----------
Coverage HTML written to dir htmlcov
Coverage XML written to file coverage.xml

============================================================================================================ slowest 20 durations =============================================================================================================
128.77s call     sktime/forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_update_predict_predicted_index[BATS-1-y:2cols-update_params=True-1-fh=1]
92.56s call     sktime/forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_update_predict_predicted_index[BATS-1-y:2cols-update_params=True-1-fh=[2 5]]
88.56s call     sktime/forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_update_predict_predicted_index[BATS-1-y:2cols-update_params=True-5-fh=1]
83.41s call     sktime/forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_hierarchical_with_exogeneous[BATS-1-y:2cols]
60.23s call     sktime/forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_update_predict_predicted_index[BATS-1-y:1cols-update_params=True-1-fh=1]
52.18s call     sktime/forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_update_predict_predicted_index[BATS-1-y:2cols-update_params=True-5-fh=[2 5]]
51.10s call     sktime/classification/early_classification/tests/test_teaser.py::test_teaser_full_length
50.75s call     sktime/datasets/tests/test_single_problem_loaders.py::test_load_UEA
45.11s call     sktime/forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_hierarchical_with_exogeneous[BATS-1-y:1cols]
41.03s call     sktime/forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_update_predict_predicted_index[BATS-1-y:1cols-update_params=True-5-fh=1]
41.00s call     sktime/forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_update_predict_predicted_index[BATS-1-y:1cols-update_params=True-1-fh=[2 5]]
32.82s call     sktime/classification/hybrid/tests/test_hivecote_v2.py::test_contracted_hivecote_v2
31.96s call     sktime/clustering/tests/test_k_shapes.py::test_kshapes
30.92s call     sktime/transformations/series/tests/test_clear_sky.py::test_clearsky_trafo_vals
28.90s call     sktime/tests/test_all_estimators.py::TestAllEstimators::test_non_state_changing_method_contract[StatsForecastMSTL-2-ForecasterFitPredictUnivariateWithX-get_fitted_params]
25.82s call     sktime/utils/tests/test_check_estimator.py::test_check_estimator_does_not_raise[ForecastKnownValues]
25.79s call     sktime/utils/tests/test_check_estimator.py::test_check_estimator_passed[ForecastKnownValues]
25.62s call     sktime/forecasting/tests/test_all_forecasters.py::TestAllForecasters::test_predict_time_index[StatsForecastAutoCES-1-y:2cols-fh=[2 5]-datetime-datetime-False]
25.30s call     sktime/forecasting/model_evaluation/tests/test_evaluate.py::test_evaluate_bigger_X[AutoARIMA]
24.22s call     sktime/forecasting/tests/test_conformal.py::test_conformal_with_hierarchical
=========================================================================================================== short test summary info ===========================================================================================================
FAILED sktime/datasets/tests/test_single_problem_loaders.py::test_check_link_downloadable[m1_quarterly_dataset] - urllib.error.HTTPError: HTTP Error 500: INTERNAL SERVER ERROR
====================================================================== 1 failed, 38254 passed, 874 skipped, 13 xfailed, 10 xpassed, 2625 warnings in 10640.72s (2:57:20) ======================================================================
make: *** [Makefile:38: full_test] Error 1
make: *** [Makefile:104: dockertest] Error 2

@yarnabrina
Copy link
Collaborator

yarnabrina commented Oct 24, 2023

I won't worry much about the HTTP error, happening for a few days in the CI. But if you are running full tests, I wonder why 874 tests are skipped. Is that not unintuitive?

Do you see any patterns in the skipped tests?

(Also the number of tests appear to be too less than what I expected. I was expecting around 84463 tests, which I got by running pytest --collect-only --matrixdesign False --only_changed_modules False sktime locally against main branch.)


@fkiraly @achieveordie can you tell how are these docker images intended to be used by users? Is it just for the purpose of giving new developers a working environment with everything pre-installed, or are these released and maintained in any way?

Just in case you missed this, tagging once more @fkiraly @achieveordie .

@kurayami07734
Copy link
Contributor Author

@yarnabrina

=========================== short test summary info ============================
SKIPPED [1] sktime/classification/early_classification/tests/test_all_early_classifiers.py: got empty parameter set ['estimator_instance'], function test_multivariate_input_exception at /usr/src/sktime/sktime/classification/early_classification/tests/test_all_early_classifiers.py:36
SKIPPED [1] sktime/classification/early_classification/tests/test_all_early_classifiers.py: got empty parameter set ['estimator_instance', 'scenario'], function test_classifier_output at /usr/src/sktime/sktime/classification/early_classification/tests/test_all_early_classifiers.py:41
SKIPPED [1] sktime/tests/test_softdeps.py:209: got empty parameter set ['estimator'], function test_python_error at /usr/src/sktime/sktime/tests/test_softdeps.py:208
SKIPPED [1] sktime/proba/tests/test_proba_basic.py:11: skip test if required soft dependency is not available
SKIPPED [2] sktime/utils/tests/test_mlflow_sktime_model_export.py:392: skip test if required soft dependency not available
SKIPPED [4] sktime/utils/tests/test_mlflow_sktime_model_export.py:357: skip test if required soft dependency not available
SKIPPED [4] sktime/utils/tests/test_mlflow_sktime_model_export.py:415: skip test if required soft dependency not available
SKIPPED [2] sktime/utils/tests/test_mlflow_sktime_model_export.py:119: skip test if required soft dependency not available
SKIPPED [1] sktime/utils/tests/test_mlflow_sktime_model_export.py:302: skip test if required soft dependency not available
SKIPPED [4] sktime/utils/tests/test_mlflow_sktime_model_export.py:489: skip test if required soft dependency not available
SKIPPED [1] sktime/utils/tests/test_mlflow_sktime_model_export.py:463: skip test if required soft dependency not available
SKIPPED [1] sktime/utils/tests/test_mlflow_sktime_model_export.py:695: skip test if required soft dependency not available
SKIPPED [1] sktime/utils/tests/test_mlflow_sktime_model_export.py:588: skip test if required soft dependency not available
SKIPPED [1] sktime/utils/tests/test_mlflow_sktime_model_export.py:650: skip test if required soft dependency not available
SKIPPED [2] sktime/utils/tests/test_mlflow_sktime_model_export.py:96: skip test if required soft dependency not available
SKIPPED [1] sktime/utils/tests/test_mlflow_sktime_model_export.py:532: skip test if required soft dependency not available
SKIPPED [1] sktime/utils/tests/test_mlflow_sktime_model_export.py:671: skip test if required soft dependency not available
SKIPPED [1] sktime/utils/tests/test_mlflow_sktime_model_export.py:225: skip test if required soft dependency not available
SKIPPED [1] sktime/utils/tests/test_mlflow_sktime_model_export.py:630: skip test if required soft dependency not available
SKIPPED [1] sktime/utils/tests/test_mlflow_sktime_model_export.py:564: skip test if required soft dependency not available
SKIPPED [1] sktime/utils/tests/test_mlflow_sktime_model_export.py:171: skip test if required soft dependency not available
SKIPPED [1] sktime/utils/tests/test_mlflow_sktime_model_export.py:609: skip test if required soft dependency not available
SKIPPED [1] sktime/utils/tests/test_mlflow_sktime_model_export.py:279: skip test if required soft dependency not available
SKIPPED [696] sktime/tests/test_all_estimators.py:1432: hangs on mac and unix remote tests
SKIPPED [72] sktime/forecasting/model_selection/tests/test_tune.py:246: run test only if softdeps are present and incrementally (if requested)
SKIPPED [1] sktime/forecasting/model_selection/tests/test_tune.py:286: run test only if softdeps are present and incrementally (if requested)
SKIPPED [54] sktime/forecasting/base/tests/test_fh.py:71: steps and fh_type are incompatible
SKIPPED [15] ../../local/lib/python3.8/site-packages/_pytest/doctest.py:456: all tests skipped by +SKIP option
SKIPPED [1] sktime/dists_kernels/tests/test_all_dist_kernels.py: got empty parameter set ['estimator_instance', 'scenario'], function test_pairwise_transformers_tabular at /usr/src/sktime/sktime/dists_kernels/tests/test_all_dist_kernels.py:29
= 38255 passed, 874 skipped0m, 13 xfailed, 10 xpassed, 115 warnings in 12703.16s (3:31:43)

@achieveordie
Copy link
Collaborator

Apologies for having missed the tags. I was not involved during this development but from my perspective, it seems like this is meant to act as a developer environment rather than something an end user will use.

@yarnabrina
Copy link
Collaborator

@kurayami07734 This is very surprising to me to get these many skips caused by missing soft dependencies, if you have installed all the soft dependencies as part of docker build. Also, significant mismatch in the number of tests is very odd as well.

However, if it's just for development and not for end users directly, I'm satisfied with all the changes as of now. That's because a lot is coming from mlflow and tensorflow-probability and intentional skips. Please make it ready for review and request @fkiraly and others for their feedbacks.

yarnabrina
yarnabrina previously approved these changes Oct 25, 2023
@fkiraly fkiraly marked this pull request as ready for review January 23, 2024 17:31
@fkiraly fkiraly added the maintenance Continuous integration, unit testing & package distribution label Jan 23, 2024
@fkiraly fkiraly changed the title [Bug] Fixing docker tests [BUG] Fixing dockerized tests Jan 23, 2024
@fkiraly fkiraly added the module:tests test framework functionality - only framework, excl specific tests label Jan 23, 2024
@fkiraly fkiraly merged commit 29c77dc into sktime:main Jan 23, 2024
1 check passed
@fkiraly fkiraly changed the title [BUG] Fixing dockerized tests [BUG] Fixing dockerized tests Jan 23, 2024
@yarnabrina yarnabrina mentioned this pull request Mar 8, 2024
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Continuous integration, unit testing & package distribution module:tests test framework functionality - only framework, excl specific tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Dockerized tests not running
4 participants