Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable warnings on travis (and make sure they are reasonable) #10158

Open
3 tasks done
amueller opened this issue Nov 16, 2017 · 26 comments
Open
3 tasks done

Enable warnings on travis (and make sure they are reasonable) #10158

amueller opened this issue Nov 16, 2017 · 26 comments
Labels
Build / CI Easy Well-defined and straightforward way to resolve help wanted

Comments

@amueller
Copy link
Member

amueller commented Nov 16, 2017

We just disabled warnings on travis in #9840.
I don't think we should do that. I've been a bit absent lately, but I think the current state of the warnings is pretty bad. Many of these seem recent changes that require cleanup.

  • catch and ignore all ConvergenceWarnings raised in sklearn.utils.estimator_checks done in [MRG+2] catch more expected warnings in common tests #11151.
  • catch deprecations of min_impurity_split in tree and tree ensembles, make sure it's not used anywhere else.
  • catch deprecations of random_state in SVC tests, make sure it's not used anywhere else.
@lesteve
Copy link
Member

lesteve commented Nov 17, 2017

I think you are right. I look at the distribution of warnings removing --disable-pytest-warnings from setup.cfg and there you go:

pytest sklearn 2>&1 | tee pytest-with-warnings.log
grep -P ': \w+Warning' pytest-with-warnings.log | sort | uniq -c | sort -n -k1 -r

The top 10 offenders:

   3205   /home/lesteve/dev/scikit-learn/sklearn/linear_model/coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
    120   /home/lesteve/miniconda3/lib/python3.6/site-packages/scipy/special/_logsumexp.py:110: RuntimeWarning: underflow encountered in exp
    100   /home/lesteve/dev/scikit-learn/sklearn/preprocessing/tests/test_data.py:485: RuntimeWarning: underflow encountered in nextafter
     98   /home/lesteve/dev/scikit-learn/sklearn/gaussian_process/kernels.py:1206: RuntimeWarning: underflow encountered in exp
     66   /home/lesteve/dev/scikit-learn/sklearn/mixture/dpgmm.py:55: RuntimeWarning: underflow encountered in exp
     64   /home/lesteve/dev/scikit-learn/sklearn/utils/deprecation.py:77: DeprecationWarning: Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.
     59   /home/lesteve/dev/scikit-learn/sklearn/linear_model/stochastic_gradient.py:130: FutureWarning: max_iter and tol parameters have been added in <class 'sklearn.linear_model.stochastic_gradient.SGDClassifier'> in 0.19. If both are left unset, they default to max_iter=5 and tol=None. If tol is not None, max_iter defaults to max_iter=1000. From 0.21, default max_iter will be 1000, and default tol will be 1e-3.
     55   /home/lesteve/dev/scikit-learn/sklearn/utils/deprecation.py:77: DeprecationWarning: Function constant is deprecated; The function constant of regression_models is deprecated in version 0.19.1 and will be removed in 0.22.
     48   /home/lesteve/dev/scikit-learn/sklearn/utils/deprecation.py:77: DeprecationWarning: Function squared_exponential is deprecated; The function squared_exponential of correlation_models is deprecated in version 0.19.1 and will be removed in 0.22.
     39   /home/lesteve/dev/scikit-learn/sklearn/neural_network/multilayer_perceptron.py:571: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (100) reached and the optimization hasn't converged yet.

The full thing: warnings.txt

@jnothman
Copy link
Member

jnothman commented Nov 18, 2017 via email

@lesteve
Copy link
Member

lesteve commented Nov 18, 2017

I wonder what proportion come from common tests...

A lot of warnings are indeed coming from common tests:

pytest sklearn/tests/test_common.py 2>&1 | tee pytest-test-common-with-warnings.log
grep -P ': \w+Warning' pytest-test-common-with-warnings.log | sort | uniq -c | sort -n -k1 -r | head -n 10
   3204   /home/lesteve/dev/alt-scikit-learn/sklearn/linear_model/coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
     39   /home/lesteve/dev/alt-scikit-learn/sklearn/neural_network/multilayer_perceptron.py:571: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (100) reached and the optimization hasn't converged yet.
     26   /home/lesteve/dev/alt-scikit-learn/sklearn/linear_model/coordinate_descent.py:1783: ConvergenceWarning: Objective did not converge, you might want to increase the number of iterations
     20   /home/lesteve/dev/alt-scikit-learn/sklearn/random_projection.py:378: DataDimensionalityWarning: The number of components is higher than the number of features: n_features < n_components (3 < 32).The dimensionality of the problem will not be reduced.
     20   /home/lesteve/dev/alt-scikit-learn/sklearn/kernel_approximation.py:470: UserWarning: n_components > n_samples. This is not possible.
     14   /home/lesteve/dev/alt-scikit-learn/sklearn/linear_model/theil_sen.py:128: ConvergenceWarning: Maximum number of iterations 5 reached in spatial median for TheilSen regressor.
     14   /home/lesteve/dev/alt-scikit-learn/sklearn/decomposition/fastica_.py:118: UserWarning: FastICA did not converge. Consider increasing tolerance or the maximum number of iterations.
     10   /home/lesteve/dev/alt-scikit-learn/sklearn/cross_decomposition/pls_.py:288: UserWarning: Y residual constant at iteration 1
      9   /home/lesteve/dev/alt-scikit-learn/sklearn/cluster/birch.py:629: UserWarning: Number of subclusters found (2) by Birch is less than (3). Decrease the threshold.
      7   /home/lesteve/dev/alt-scikit-learn/sklearn/semi_supervised/label_propagation.py:289: ConvergenceWarning: max_iter=5 was reached without convergence.

@jnothman jnothman added the Easy Well-defined and straightforward way to resolve label Nov 20, 2017
@jnothman
Copy link
Member

Okay. I've added hiding the ConvergenceWarnings there to the issue description.

@gxyd
Copy link
Contributor

gxyd commented Nov 24, 2017

@lesteve I can see that the file you mention here has warnings of three types: UserWarning, RuntimeWarning, DeprecationWarning, also contains some ConvergenceWarning though.

Do we need to modify all these three warnings to ConvergenceWarning?

[Edit]: May be also contains FutureWarning.

@lesteve
Copy link
Member

lesteve commented Nov 24, 2017

IMO the main priority is to figure out where the 3000+ warnings from test_common.py are coming from, and to silence them if we think it is fine to silence them. One example that jumps to mind: we may fit on some small data in test_common.py so that ignoring ConvergeWarning may be fine.

@jnothman
Copy link
Member

jnothman commented Nov 26, 2017 via email

@amueller
Copy link
Member Author

I think @jnothman is right, we get convergence warnings because we change n_iter. We should probably just catch those.

@amueller
Copy link
Member Author

So we got

  /home/andy/checkout/scikit-learn/sklearn/utils/deprecation.py:122: FutureWarning: You are accessing a training score ('split0_train_score'), which will not be available by default any more in 0.21. If you need training scores, please set return_train_score=True

a bunch of times, which is bad, since it means the code will actually break if we don't change it (but we still have 2 versions to fix it ;)

#9379 added a whole bunch of new warnings...

@amueller
Copy link
Member Author

also pretty bad:

sklearn/model_selection/tests/test_split.py::test_shuffle_kfold_stratifiedkfold_reproducibility
  /home/andy/anaconda3/lib/python3.6/site-packages/numpy/testing/utils.py:761: DeprecationWarning: elementwise == comparison failed; this will raise an error in the future.
    val = comparison(x, y)
  /home/andy/anaconda3/lib/python3.6/site-packages/numpy/testing/utils.py:761: DeprecationWarning: elementwise == comparison failed; this will raise an error in the future.
    val = comparison(x, y)
  /home/andy/anaconda3/lib/python3.6/site-packages/numpy/testing/utils.py:761: DeprecationWarning: elementwise == comparison failed; this will raise an error in the future.
    val = comparison(x, y)

sklearn/model_selection/tests/test_split.py::test_train_test_split_errors

@amueller
Copy link
Member Author

looks like the common tests are clean now. looking for the worst offenders in the other tests now

@amueller
Copy link
Member Author

amueller commented Jul 17, 2018

=========== 10049 passed, 33 skipped, 681 warnings in 343.36 seconds ===========

    133   /home/andy/checkout/scikit-learn/sklearn/svm/base.py:922: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
    100   /home/andy/checkout/scikit-learn/sklearn/preprocessing/tests/test_data.py:513: RuntimeWarning: underflow encountered in nextafter
     40   /home/andy/checkout/scikit-learn/sklearn/tree/tree.py:869: RuntimeWarning: divide by zero encountered in log
     26   /home/andy/checkout/scikit-learn/sklearn/gaussian_process/kernels.py:767: RuntimeWarning: underflow encountered in multiply
     25   /home/andy/checkout/scikit-learn/sklearn/gaussian_process/kernels.py:1235: RuntimeWarning: underflow encountered in multiply
     23   /home/andy/checkout/scikit-learn/sklearn/gaussian_process/kernels.py:766: RuntimeWarning: underflow encountered in multiply
     22   /home/andy/checkout/scikit-learn/sklearn/naive_bayes.py:465: RuntimeWarning: divide by zero encountered in log
     22   /home/andy/checkout/scikit-learn/sklearn/linear_model/stochastic_gradient.py:1073: RuntimeWarning: divide by zero encountered in log
     16   /home/andy/checkout/scikit-learn/sklearn/ensemble/bagging.py:758: RuntimeWarning: divide by zero encountered in log
     16   /home/andy/checkout/scikit-learn/sklearn/cluster/hierarchical.py:96: RuntimeWarning: underflow encountered in nextafter
     16   /home/andy/anaconda3/envs/py37/lib/python3.7/site-packages/numpy/core/_methods.py:35: RuntimeWarning: underflow encountered in reduce
     14   /home/andy/checkout/scikit-learn/sklearn/gaussian_process/gpc.py:400: RuntimeWarning: underflow encountered in multiply
     11   /home/andy/checkout/scikit-learn/sklearn/metrics/cluster/unsupervised.py:342: RuntimeWarning: divide by zero encountered in true_divide
     11   /home/andy/checkout/scikit-learn/sklearn/gaussian_process/gpc.py:399: RuntimeWarning: underflow encountered in multiply
     10   /home/andy/checkout/scikit-learn/sklearn/mixture/base.py:268: ConvergenceWarning: Initialization 1 did not converge. Try different init parameters, or increase max_iter, tol or check for degenerate data.
     10   /home/andy/checkout/scikit-learn/sklearn/linear_model/sag.py:332: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge

See #11608 and #11536 for the worst offenders.

@amueller
Copy link
Member Author

Updated:

    134   /home/andy/checkout/scikit-learn/sklearn/svm/base.py:922: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
     44   /home/andy/checkout/scikit-learn/sklearn/tree/tree.py:869: RuntimeWarning: divide by zero encountered in log
     26   /home/andy/checkout/scikit-learn/sklearn/gaussian_process/kernels.py:767: RuntimeWarning: underflow encountered in multiply
     25   /home/andy/checkout/scikit-learn/sklearn/gaussian_process/kernels.py:1239: RuntimeWarning: underflow encountered in multiply
     23   /home/andy/checkout/scikit-learn/sklearn/gaussian_process/kernels.py:766: RuntimeWarning: underflow encountered in multiply
     22   /home/andy/checkout/scikit-learn/sklearn/naive_bayes.py:465: RuntimeWarning: divide by zero encountered in log
     22   /home/andy/checkout/scikit-learn/sklearn/linear_model/stochastic_gradient.py:1073: RuntimeWarning: divide by zero encountered in log
     16   /home/andy/checkout/scikit-learn/sklearn/ensemble/bagging.py:753: RuntimeWarning: divide by zero encountered in log
     16   /home/andy/anaconda3/envs/py37/lib/python3.7/site-packages/numpy/core/_methods.py:35: RuntimeWarning: underflow encountered in reduce
     14   /home/andy/checkout/scikit-learn/sklearn/gaussian_process/gpc.py:400: RuntimeWarning: underflow encountered in multiply
     11   /home/andy/checkout/scikit-learn/sklearn/metrics/cluster/unsupervised.py:342: RuntimeWarning: divide by zero encountered in true_divide
     11   /home/andy/checkout/scikit-learn/sklearn/gaussian_process/gpc.py:399: RuntimeWarning: underflow encountered in multiply
     10   /home/andy/checkout/scikit-learn/sklearn/mixture/base.py:268: ConvergenceWarning: Initialization 1 did not converge. Try different init parameters, or increase max_iter, tol or check for degenerate data.
     10   /home/andy/checkout/scikit-learn/sklearn/linear_model/sag.py:332: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
      9   /home/andy/checkout/scikit-learn/sklearn/tree/tree.py:873: RuntimeWarning: divide by zero encountered in log
      9   /home/andy/anaconda3/envs/py37/lib/python3.7/site-packages/scipy/sparse/compressed.py:746: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
      9   /home/andy/anaconda3/envs/py37/lib/python3.7/site-packages/numpy/core/numeric.py:2321: RuntimeWarning: underflow encountered in multiply
      8   /home/andy/checkout/scikit-learn/sklearn/gaussian_process/kernels.py:1245: RuntimeWarning: underflow encountered in multiply
      8   /home/andy/checkout/scikit-learn/sklearn/gaussian_process/gpc.py:357: RuntimeWarning: underflow encountered in multiply

Some other ones are a bit more concerning I feel, though:

      8   /home/andy/checkout/scikit-learn/sklearn/cluster/k_means_.py:968: ConvergenceWarning: Number of distinct clusters (2) found smaller than n_clusters (10). Possibly due to duplicate points in X.
      2   /home/andy/checkout/scikit-learn/sklearn/neighbors/lof.py:246: UserWarning: n_neighbors (20) is greater than the total number of samples (6). n_neighbors will be set to (n_samples - 1) for estimation.
      2   /home/andy/checkout/scikit-learn/sklearn/neighbors/lof.py:246: UserWarning: n_neighbors (20) is greater than the total number of samples (3). n_neighbors will be set to (n_samples - 1) for estimation.
      2   /home/andy/checkout/scikit-learn/sklearn/neighbors/lof.py:246: UserWarning: n_neighbors (20) is greater than the total number of samples (10). n_neighbors will be set to (n_samples - 1) for estimation.
 3   /home/andy/checkout/scikit-learn/sklearn/neighbors/lof.py:246: UserWarning: n_neighbors (20) is greater than the total number of samples (9). n_neighbors will be set to (n_samples - 1) for estimation.
      3   /home/andy/checkout/scikit-learn/sklearn/model_selection/_validation.py:834: RuntimeWarning: Number of classes in training fold (7) does not match total number of classes (10). Results may not be appropriate for your use case. To fix this, use a cross-validation technique resulting in properly stratified folds
      4   /home/andy/checkout/scikit-learn/sklearn/neighbors/nearest_centroid.py:141: UserWarning: Averaging for metrics other than euclidean and manhattan not supported. The average is set to be the mean.
      5   /home/andy/checkout/scikit-learn/sklearn/model_selection/_validation.py:834: RuntimeWarning: Number of classes in training fold (8) does not match total number of classes (10). Results may not be appropriate for your use case. To fix this, use a cross-validation technique resulting in properly stratified folds

I'm also confused by these:

      2   /home/andy/checkout/scikit-learn/sklearn/utils/estimator_checks.py:2278: UserWarning: Estimator's parameters changed after set_params raised ValueError
      2   /home/andy/checkout/scikit-learn/sklearn/utils/estimator_checks.py:2278: UserWarning: Estimator's parameters changed after set_params raised TypeError
      2   /home/andy/checkout/scikit-learn/sklearn/utils/estimator_checks.py:2266: UserWarning: ValueError occurred during set_params. It is recommended to delay parameter validation until fit.
      2   /home/andy/checkout/scikit-learn/sklearn/utils/estimator_checks.py:2266: UserWarning: TypeError occurred during set_params. It is recommended to delay parameter validation until fit.

Shouldn't these be errors?

@amueller
Copy link
Member Author

I think we should try to avoid all that are not RuntimeWarning or ConvergenceWarning who are both numerical issues. The other warnings are UserWarnings mostly which means we made weird mistakes in setting up the tests.

@jnothman
Copy link
Member

jnothman commented Jul 25, 2018 via email

@jnothman
Copy link
Member

what needs to be done here at this point?

@lesteve
Copy link
Member

lesteve commented Jan 17, 2019

I think someone need to do what I did in #10158 (comment) and reevaluate how many warnings are still there.

From Andy's comments about worst offenders #11536 is still open (but you saw that since you commented on the issue).

@amueller
Copy link
Member Author

730   /home/andy/checkout/scikit-learn/sklearn/metrics/regression.py:545: UndefinedMetricWarning: R^2 score is not well-defined with less than two samples.
414   /home/andy/checkout/scikit-learn/sklearn/linear_model/sag.py:334: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
134   /home/andy/checkout/scikit-learn/sklearn/svm/base.py:929: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
119   /home/andy/checkout/scikit-learn/sklearn/mixture/base.py:272: ConvergenceWarning: Initialization 1 did not converge. Try different init parameters, or increase max_iter, tol or check for degenerate data.
 43   /home/andy/checkout/scikit-learn/sklearn/tree/tree.py:890: RuntimeWarning: divide by zero encountered in log

some new ones, yay! (this is from #8022 but master is similar)

@adrinjalali
Copy link
Member

@thomasjpfan is this still relevant?

@thomasjpfan
Copy link
Member

@adrinjalali Yes, this feels like a never ending issue. Since we do not actively check for warnings, new warnings will always appear.

We need a good way to see if a PR introduces new warnings or have a way to display the warnings a PR creates. Furthermore, on the master branch, we should have a way to see current warnings.

In azure, after running pytest, we can create a separate step to just show the warnings. (It will do a little string processing of the pytest output and display only the warnings)

@adrinjalali
Copy link
Member

if we could somehow save those warning as artifacts, then we could fail on the diff between the master's log and the PR's maybe?

@amueller
Copy link
Member Author

Why do we need to do string processing? Can't we just show the pytest output?

@adrinjalali
Copy link
Member

We can show the pytest output, but it's the same as checking for pep8 issues on the whole codebase. We don't do it, we only fail if a PR introduces new issues.

The issue is that we already do have tons of warnings, and it'd be too hard to try and remove them all, so the idea is to prevent PRs to introduce new warnings.

@amueller
Copy link
Member Author

The issue is that we already do have tons of warnings, and it'd be too hard to try and remove them all, so the idea is to prevent PRs to introduce new warnings.

Why is it too hard? I want to remove them all.

@amueller
Copy link
Member Author

Also I would argue we should maybe do the same for flake8, because we keep introducing unused imports. though lgtm might help us with that.

@adrinjalali
Copy link
Member

I'm totally +1 on both warnings and flake8!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build / CI Easy Well-defined and straightforward way to resolve help wanted
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants