Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC On the relative harm of cosmetic changes #11336

Closed
jnothman opened this issue Jun 21, 2018 · 26 comments
Closed

RFC On the relative harm of cosmetic changes #11336

jnothman opened this issue Jun 21, 2018 · 26 comments
Labels
Needs Decision - Close Requires decision for closing

Comments

@jnothman
Copy link
Member

I can't remember where, but there was a recent PR offering cosmetic (e.g. PEP8) fixes, and we usually reject these PRs as causing merge conflicts with existing open PRs. @amueller wanted to know exactly how many PRs were affected by each change.

If we knew it would affect no or few open/active PRs, we might be keen to fix flake8 issues, or to modernise tests (pytest style).

Thus, using https://gist.github.com/jnothman/41a5e05c82c4508afa7bee3b493752dd, I have determined which lines are modified (or adjacent to modified lines) by which open PRs.

https://www.dropbox.com/s/jsf07i40npq5o5g/lines-modified-by-open-prs.zip?dl=0 shows the results with a file of results for PRs that can currently be merged into master, and a file of results for PRs that cannot currently be merged into master. Each file has three columns: PR#, file name, line#.

Looking at just the file names, the following files in master are never modified by open PRs:

diff <(git ls-tree -r master --name-only | sort) <(cat lines-modified-merge* | cut -f2 | sort -u) | grep '<'
.coveragerc
.landscape.yml
AUTHORS.rst
MANIFEST.in
benchmarks/bench_20newsgroups.py
benchmarks/bench_glm.py
benchmarks/bench_glmnet.py
benchmarks/bench_isotonic.py
benchmarks/bench_multilabel_metrics.py
benchmarks/bench_plot_neighbors.py
benchmarks/bench_plot_parallel_pairwise.py
benchmarks/bench_random_projections.py
benchmarks/bench_tree.py
build_tools/appveyor/install.ps1
build_tools/appveyor/run_with_env.cmd
doc/images/cds-logo.png
doc/images/dysco.png
doc/images/inria-logo.jpg
doc/images/iris.pdf
doc/images/iris.svg
doc/images/last_digit.png
doc/images/lda_model_graph.png
doc/images/ml_map.png
doc/images/multilayerperceptron_network.png
doc/images/no_image.png
doc/images/nyu_short_color.png
doc/images/plot_digits_classification.png
doc/images/plot_face_recognition_1.png
doc/images/plot_face_recognition_2.png
doc/images/rbm_graph.png
doc/images/scikit-learn-logo-notext.png
doc/images/sloan_banner.png
doc/includes/big_toc_css.rst
doc/includes/bigger_toc_css.rst
doc/logos/favicon.ico
doc/logos/identity.pdf
doc/logos/scikit-learn-logo-notext.png
doc/logos/scikit-learn-logo-small.png
doc/logos/scikit-learn-logo-thumb.png
doc/logos/scikit-learn-logo.bmp
doc/logos/scikit-learn-logo.png
doc/logos/scikit-learn-logo.svg
doc/model_selection.rst
doc/modules/glm_data/lasso_enet_coordinate_descent.png
doc/modules/isotonic.rst
doc/preface.rst
doc/sphinxext/MANIFEST.in
doc/sphinxext/github_link.py
doc/templates/class.rst
doc/templates/class_with_call.rst
doc/templates/class_without_init.rst
doc/templates/deprecated_class.rst
doc/templates/deprecated_class_with_call.rst
doc/templates/deprecated_class_without_init.rst
doc/templates/deprecated_function.rst
doc/templates/function.rst
doc/templates/generate_deprecated.sh
doc/testimonials/README.txt
doc/testimonials/images/Makefile
doc/testimonials/images/aweber.png
doc/testimonials/images/bestofmedia-logo.png
doc/testimonials/images/betaworks.png
doc/testimonials/images/birchbox.jpg
doc/testimonials/images/booking.png
doc/testimonials/images/change-logo.png
doc/testimonials/images/dataiku_logo.png
doc/testimonials/images/datapublica.png
doc/testimonials/images/datarobot.png
doc/testimonials/images/evernote.png
doc/testimonials/images/howaboutwe.png
doc/testimonials/images/huggingface.png
doc/testimonials/images/infonea.jpg
doc/testimonials/images/inria.png
doc/testimonials/images/lovely.png
doc/testimonials/images/machinalis.png
doc/testimonials/images/okcupid.png
doc/testimonials/images/ottogroup_logo.png
doc/testimonials/images/peerindex.png
doc/testimonials/images/phimeca.png
doc/testimonials/images/rangespan.png
doc/testimonials/images/solido_logo.png
doc/testimonials/images/spotify.png
doc/testimonials/images/telecomparistech.jpg
doc/testimonials/images/yhat.png
doc/testimonials/images/zopa.png
doc/themes/scikit-learn/static/css/bootstrap-responsive.css
doc/themes/scikit-learn/static/css/bootstrap-responsive.min.css
doc/themes/scikit-learn/static/css/bootstrap.css
doc/themes/scikit-learn/static/css/bootstrap.min.css
doc/themes/scikit-learn/static/css/examples.css
doc/themes/scikit-learn/static/img/FNRS-logo.png
doc/themes/scikit-learn/static/img/columbia.png
doc/themes/scikit-learn/static/img/forkme.png
doc/themes/scikit-learn/static/img/glyphicons-halflings-white.png
doc/themes/scikit-learn/static/img/glyphicons-halflings.png
doc/themes/scikit-learn/static/img/google.png
doc/themes/scikit-learn/static/img/inria-small.jpg
doc/themes/scikit-learn/static/img/inria-small.png
doc/themes/scikit-learn/static/img/nyu_short_color.png
doc/themes/scikit-learn/static/img/plot_classifier_comparison_1.png
doc/themes/scikit-learn/static/img/plot_manifold_sphere_1.png
doc/themes/scikit-learn/static/img/scikit-learn-logo-notext.png
doc/themes/scikit-learn/static/img/scikit-learn-logo-small.png
doc/themes/scikit-learn/static/img/scikit-learn-logo.png
doc/themes/scikit-learn/static/img/scikit-learn-logo.svg
doc/themes/scikit-learn/static/img/sloan_logo.jpg
doc/themes/scikit-learn/static/img/sydney-primary.jpeg
doc/themes/scikit-learn/static/img/sydney-stacked.jpeg
doc/themes/scikit-learn/static/img/telecom.png
doc/themes/scikit-learn/static/jquery.maphilight.js
doc/themes/scikit-learn/static/jquery.maphilight.min.js
doc/themes/scikit-learn/static/js/bootstrap.js
doc/themes/scikit-learn/static/js/bootstrap.min.js
doc/themes/scikit-learn/static/js/copybutton.js
doc/themes/scikit-learn/theme.conf
doc/tune_toc.rst
doc/tutorial/common_includes/info.txt
doc/tutorial/statistical_inference/finding_help.rst
doc/tutorial/statistical_inference/index.rst
doc/tutorial/text_analytics/.gitignore
doc/tutorial/text_analytics/data/movie_reviews/fetch_data.py
doc/tutorial/text_analytics/data/twenty_newsgroups/fetch_data.py
doc/tutorial/text_analytics/solutions/exercise_02_sentiment.py
doc/tutorial/text_analytics/solutions/generate_skeletons.py
doc/unsupervised_learning.rst
examples/applications/README.txt
examples/applications/svm_gui.py
examples/bicluster/README.txt
examples/bicluster/plot_spectral_biclustering.py
examples/bicluster/plot_spectral_coclustering.py
examples/classification/README.txt
examples/classification/plot_classification_probability.py
examples/classification/plot_lda.py
examples/cluster/README.txt
examples/covariance/README.txt
examples/covariance/plot_lw_vs_oas.py
examples/cross_decomposition/README.txt
examples/datasets/README.txt
examples/datasets/plot_random_multilabel_dataset.py
examples/decomposition/README.txt
examples/decomposition/plot_beta_divergence.py
examples/decomposition/plot_incremental_pca.py
examples/decomposition/plot_pca_vs_lda.py
examples/ensemble/README.txt
examples/ensemble/plot_forest_importances.py
examples/ensemble/plot_forest_importances_faces.py
examples/ensemble/plot_gradient_boosting_oob.py
examples/ensemble/plot_gradient_boosting_quantile.py
examples/exercises/README.txt
examples/exercises/plot_cv_digits.py
examples/feature_selection/README.txt
examples/feature_selection/plot_rfe_digits.py
examples/feature_selection/plot_rfe_with_cross_validation.py
examples/feature_selection/plot_select_from_model_boston.py
examples/gaussian_process/README.txt
examples/gaussian_process/plot_compare_gpr_krr.py
examples/gaussian_process/plot_gpr_co2.py
examples/linear_model/README.txt
examples/linear_model/plot_huber_vs_ridge.py
examples/linear_model/plot_iris_logistic.py
examples/linear_model/plot_logistic.py
examples/linear_model/plot_ols_ridge_variance.py
examples/linear_model/plot_omp.py
examples/linear_model/plot_polynomial_interpolation.py
examples/linear_model/plot_ridge_coeffs.py
examples/linear_model/plot_robust_fit.py
examples/manifold/README.txt
examples/manifold/plot_mds.py
examples/mixture/README.txt
examples/mixture/plot_gmm.py
examples/mixture/plot_gmm_covariances.py
examples/mixture/plot_gmm_pdf.py
examples/mixture/plot_gmm_selection.py
examples/mixture/plot_gmm_sin.py
examples/model_selection/README.txt
examples/neighbors/README.txt
examples/neural_networks/README.txt
examples/neural_networks/plot_mnist_filters.py
examples/preprocessing/README.txt
examples/preprocessing/plot_function_transformer.py
examples/semi_supervised/README.txt
examples/svm/README.txt
examples/text/README.txt
examples/tree/README.txt
site.cfg
sklearn/__check_build/_check_build.pyx
sklearn/__check_build/setup.py
sklearn/cluster/tests/__init__.py
sklearn/cluster/tests/common.py
sklearn/compose/tests/__init__.py
sklearn/covariance/tests/__init__.py
sklearn/cross_decomposition/tests/__init__.py
sklearn/datasets/data/diabetes_data.csv.gz
sklearn/datasets/data/diabetes_target.csv.gz
sklearn/datasets/data/digits.csv.gz
sklearn/datasets/data/linnerud_exercise.csv
sklearn/datasets/data/linnerud_physiological.csv
sklearn/datasets/images/README.txt
sklearn/datasets/images/china.jpg
sklearn/datasets/images/flower.jpg
sklearn/datasets/tests/__init__.py
sklearn/datasets/tests/data/svmlight_classification.txt
sklearn/datasets/tests/data/svmlight_invalid.txt
sklearn/datasets/tests/data/svmlight_invalid_order.txt
sklearn/datasets/tests/data/svmlight_multilabel.txt
sklearn/decomposition/_online_lda.pyx
sklearn/decomposition/tests/__init__.py
sklearn/ensemble/setup.py
sklearn/ensemble/tests/__init__.py
sklearn/externals/__init__.py
sklearn/externals/joblib/_multiprocessing_helpers.py
sklearn/externals/six.py
sklearn/feature_extraction/__init__.py
sklearn/feature_extraction/tests/__init__.py
sklearn/feature_selection/tests/__init__.py
sklearn/feature_selection/tests/test_variance_threshold.py
sklearn/feature_selection/variance_threshold.py
sklearn/gaussian_process/tests/__init__.py
sklearn/linear_model/tests/__init__.py
sklearn/manifold/tests/__init__.py
sklearn/metrics/cluster/tests/__init__.py
sklearn/metrics/pairwise_fast.pyx
sklearn/metrics/tests/__init__.py
sklearn/mixture/tests/__init__.py
sklearn/model_selection/tests/__init__.py
sklearn/model_selection/tests/common.py
sklearn/neighbors/tests/__init__.py
sklearn/neural_network/_stochastic_optimizers.py
sklearn/neural_network/tests/__init__.py
sklearn/neural_network/tests/test_stochastic_optimizers.py
sklearn/preprocessing/tests/__init__.py
sklearn/semi_supervised/tests/__init__.py
sklearn/src/cblas/ATL_drefasum.c
sklearn/src/cblas/ATL_drefcopy.c
sklearn/src/cblas/ATL_drefgemv.c
sklearn/src/cblas/ATL_drefgemvN.c
sklearn/src/cblas/ATL_drefgemvT.c
sklearn/src/cblas/ATL_drefger.c
sklearn/src/cblas/ATL_drefrot.c
sklearn/src/cblas/ATL_drefrotg.c
sklearn/src/cblas/ATL_dsrefdot.c
sklearn/src/cblas/ATL_srefasum.c
sklearn/src/cblas/ATL_srefcopy.c
sklearn/src/cblas/ATL_srefgemv.c
sklearn/src/cblas/ATL_srefgemvN.c
sklearn/src/cblas/ATL_srefgemvT.c
sklearn/src/cblas/ATL_srefger.c
sklearn/src/cblas/ATL_srefnrm2.c
sklearn/src/cblas/ATL_srefrot.c
sklearn/src/cblas/ATL_srefrotg.c
sklearn/src/cblas/README.txt
sklearn/src/cblas/atlas_aux.h
sklearn/src/cblas/atlas_dsysinfo.h
sklearn/src/cblas/atlas_enum.h
sklearn/src/cblas/atlas_level1.h
sklearn/src/cblas/atlas_level2.h
sklearn/src/cblas/atlas_ptalias1.h
sklearn/src/cblas/atlas_ptalias2.h
sklearn/src/cblas/atlas_refalias1.h
sklearn/src/cblas/atlas_refalias2.h
sklearn/src/cblas/atlas_reflevel2.h
sklearn/src/cblas/atlas_reflvl2.h
sklearn/src/cblas/atlas_refmisc.h
sklearn/src/cblas/atlas_ssysinfo.h
sklearn/src/cblas/atlas_type.h
sklearn/src/cblas/cblas_dasum.c
sklearn/src/cblas/cblas_daxpy.c
sklearn/src/cblas/cblas_ddot.c
sklearn/src/cblas/cblas_dgemv.c
sklearn/src/cblas/cblas_dger.c
sklearn/src/cblas/cblas_dnrm2.c
sklearn/src/cblas/cblas_drot.c
sklearn/src/cblas/cblas_drotg.c
sklearn/src/cblas/cblas_dscal.c
sklearn/src/cblas/cblas_errprn.c
sklearn/src/cblas/cblas_sasum.c
sklearn/src/cblas/cblas_saxpy.c
sklearn/src/cblas/cblas_sdot.c
sklearn/src/cblas/cblas_sgemv.c
sklearn/src/cblas/cblas_sger.c
sklearn/src/cblas/cblas_snrm2.c
sklearn/src/cblas/cblas_srot.c
sklearn/src/cblas/cblas_srotg.c
sklearn/src/cblas/cblas_sscal.c
sklearn/svm/src/liblinear/COPYRIGHT
sklearn/svm/src/liblinear/tron.cpp
sklearn/svm/src/liblinear/tron.h
sklearn/svm/src/libsvm/LIBSVM_CHANGES
sklearn/svm/src/libsvm/libsvm_template.cpp
sklearn/svm/tests/__init__.py
sklearn/tests/__init__.py
sklearn/tests/test_check_build.py
sklearn/tree/tests/__init__.py
sklearn/utils/_logistic_sigmoid.pyx
sklearn/utils/_scipy_sparse_lsqr_backport.py
sklearn/utils/arrayfuncs.pyx
sklearn/utils/fast_dict.pyx
sklearn/utils/lgamma.pxd
sklearn/utils/lgamma.pyx
sklearn/utils/murmurhash.pxd
sklearn/utils/murmurhash.pyx
sklearn/utils/optimize.py
sklearn/utils/sparsetools/tests/__init__.py
sklearn/utils/src/MurmurHash3.cpp
sklearn/utils/src/MurmurHash3.h
sklearn/utils/src/cholesky_delete.h
sklearn/utils/src/gamma.c
sklearn/utils/src/gamma.h
sklearn/utils/tests/__init__.py
sklearn/utils/tests/test_bench.py
sklearn/utils/tests/test_fast_dict.py
sklearn/utils/tests/test_linear_assignment.py
sklearn/utils/tests/test_murmurhash.py
sklearn/utils/tests/test_optimize.py
sklearn/utils/tests/test_shortest_path.py

with the following flake8 errors:

benchmarks/bench_plot_neighbors.py:39:5: E265 block comment should start with '# '
benchmarks/bench_plot_neighbors.py:62:5: E265 block comment should start with '# '
benchmarks/bench_plot_neighbors.py:85:5: E265 block comment should start with '# '
benchmarks/bench_plot_parallel_pairwise.py:11:1: E302 expected 2 blank lines, found 1
benchmarks/bench_random_projections.py:77:9: E128 continuation line under-indented for visual indent
benchmarks/bench_random_projections.py:85:28: E128 continuation line under-indented for visual indent
benchmarks/bench_random_projections.py:86:28: E128 continuation line under-indented for visual indent
benchmarks/bench_random_projections.py:96:80: E501 line too long (80 > 79 characters)
benchmarks/bench_random_projections.py:218:15: E128 continuation line under-indented for visual indent
examples/applications/svm_gui.py:24:1: E402 module level import not at top of file
examples/applications/svm_gui.py:27:1: E402 module level import not at top of file
examples/applications/svm_gui.py:28:1: E402 module level import not at top of file
examples/applications/svm_gui.py:29:1: E402 module level import not at top of file
examples/applications/svm_gui.py:30:1: E402 module level import not at top of file
examples/applications/svm_gui.py:38:1: E402 module level import not at top of file
examples/applications/svm_gui.py:39:1: E402 module level import not at top of file
examples/applications/svm_gui.py:41:1: E402 module level import not at top of file
examples/applications/svm_gui.py:42:1: E402 module level import not at top of file
examples/applications/svm_gui.py:43:1: E402 module level import not at top of file
examples/bicluster/plot_spectral_biclustering.py:23:1: E402 module level import not at top of file
examples/bicluster/plot_spectral_biclustering.py:24:1: E402 module level import not at top of file
examples/bicluster/plot_spectral_biclustering.py:26:1: E402 module level import not at top of file
examples/bicluster/plot_spectral_biclustering.py:27:1: E402 module level import not at top of file
examples/bicluster/plot_spectral_biclustering.py:28:1: E402 module level import not at top of file
examples/bicluster/plot_spectral_biclustering.py:29:1: E402 module level import not at top of file
examples/bicluster/plot_spectral_coclustering.py:22:1: E402 module level import not at top of file
examples/bicluster/plot_spectral_coclustering.py:23:1: E402 module level import not at top of file
examples/bicluster/plot_spectral_coclustering.py:25:1: E402 module level import not at top of file
examples/bicluster/plot_spectral_coclustering.py:26:1: E402 module level import not at top of file
examples/bicluster/plot_spectral_coclustering.py:27:1: E402 module level import not at top of file
examples/bicluster/plot_spectral_coclustering.py:28:1: E402 module level import not at top of file
examples/classification/plot_classification_probability.py:19:1: E402 module level import not at top of file
examples/classification/plot_classification_probability.py:20:1: E402 module level import not at top of file
examples/classification/plot_classification_probability.py:22:1: E402 module level import not at top of file
examples/classification/plot_classification_probability.py:23:1: E402 module level import not at top of file
examples/classification/plot_classification_probability.py:24:1: E402 module level import not at top of file
examples/classification/plot_classification_probability.py:25:1: E402 module level import not at top of file
examples/classification/plot_classification_probability.py:26:1: E402 module level import not at top of file
examples/classification/plot_lda.py:48:80: E501 line too long (84 > 79 characters)
examples/classification/plot_lda.py:49:80: E501 line too long (82 > 79 characters)
examples/covariance/plot_lw_vs_oas.py:25:1: E402 module level import not at top of file
examples/covariance/plot_lw_vs_oas.py:26:1: E402 module level import not at top of file
examples/covariance/plot_lw_vs_oas.py:27:1: E402 module level import not at top of file
examples/covariance/plot_lw_vs_oas.py:29:1: E402 module level import not at top of file
examples/decomposition/plot_incremental_pca.py:26:1: E402 module level import not at top of file
examples/decomposition/plot_incremental_pca.py:27:1: E402 module level import not at top of file
examples/decomposition/plot_incremental_pca.py:29:1: E402 module level import not at top of file
examples/decomposition/plot_incremental_pca.py:30:1: E402 module level import not at top of file
examples/decomposition/plot_pca_vs_lda.py:21:1: E402 module level import not at top of file
examples/decomposition/plot_pca_vs_lda.py:23:1: E402 module level import not at top of file
examples/decomposition/plot_pca_vs_lda.py:24:1: E402 module level import not at top of file
examples/decomposition/plot_pca_vs_lda.py:25:1: E402 module level import not at top of file
examples/ensemble/plot_forest_importances.py:15:1: E402 module level import not at top of file
examples/ensemble/plot_forest_importances.py:16:1: E402 module level import not at top of file
examples/ensemble/plot_forest_importances.py:18:1: E402 module level import not at top of file
examples/ensemble/plot_forest_importances.py:19:1: E402 module level import not at top of file
examples/ensemble/plot_forest_importances.py:51:8: E128 continuation line under-indented for visual indent
examples/ensemble/plot_forest_importances_faces.py:15:1: E402 module level import not at top of file
examples/ensemble/plot_forest_importances_faces.py:16:1: E402 module level import not at top of file
examples/ensemble/plot_forest_importances_faces.py:18:1: E402 module level import not at top of file
examples/ensemble/plot_forest_importances_faces.py:19:1: E402 module level import not at top of file
examples/ensemble/plot_gradient_boosting_oob.py:32:1: E402 module level import not at top of file
examples/ensemble/plot_gradient_boosting_oob.py:33:1: E402 module level import not at top of file
examples/ensemble/plot_gradient_boosting_oob.py:35:1: E402 module level import not at top of file
examples/ensemble/plot_gradient_boosting_oob.py:36:1: E402 module level import not at top of file
examples/ensemble/plot_gradient_boosting_oob.py:37:1: E402 module level import not at top of file
examples/ensemble/plot_gradient_boosting_quantile.py:22:1: E265 block comment should start with '# '
examples/exercises/plot_cv_digits.py:14:1: E402 module level import not at top of file
examples/exercises/plot_cv_digits.py:15:1: E402 module level import not at top of file
examples/exercises/plot_cv_digits.py:16:1: E402 module level import not at top of file
examples/exercises/plot_cv_digits.py:34:1: E402 module level import not at top of file
examples/feature_selection/plot_rfe_digits.py:11:80: E501 line too long (94 > 79 characters)
examples/feature_selection/plot_rfe_digits.py:16:1: E402 module level import not at top of file
examples/feature_selection/plot_rfe_digits.py:17:1: E402 module level import not at top of file
examples/feature_selection/plot_rfe_digits.py:18:1: E402 module level import not at top of file
examples/feature_selection/plot_rfe_digits.py:19:1: E402 module level import not at top of file
examples/feature_selection/plot_rfe_with_cross_validation.py:11:1: E402 module level import not at top of file
examples/feature_selection/plot_rfe_with_cross_validation.py:12:1: E402 module level import not at top of file
examples/feature_selection/plot_rfe_with_cross_validation.py:13:1: E402 module level import not at top of file
examples/feature_selection/plot_rfe_with_cross_validation.py:14:1: E402 module level import not at top of file
examples/feature_selection/plot_rfe_with_cross_validation.py:15:1: E402 module level import not at top of file
examples/feature_selection/plot_select_from_model_boston.py:14:1: E402 module level import not at top of file
examples/feature_selection/plot_select_from_model_boston.py:15:1: E402 module level import not at top of file
examples/feature_selection/plot_select_from_model_boston.py:17:1: E402 module level import not at top of file
examples/feature_selection/plot_select_from_model_boston.py:18:1: E402 module level import not at top of file
examples/feature_selection/plot_select_from_model_boston.py:19:1: E402 module level import not at top of file
examples/feature_selection/plot_select_from_model_boston.py:25:80: E501 line too long (84 > 79 characters)
examples/feature_selection/plot_select_from_model_boston.py:46:29: W291 trailing whitespace
examples/gaussian_process/plot_compare_gpr_krr.py:53:1: E402 module level import not at top of file
examples/gaussian_process/plot_compare_gpr_krr.py:55:1: E402 module level import not at top of file
examples/gaussian_process/plot_compare_gpr_krr.py:57:1: E402 module level import not at top of file
examples/gaussian_process/plot_compare_gpr_krr.py:59:1: E402 module level import not at top of file
examples/gaussian_process/plot_compare_gpr_krr.py:60:1: E402 module level import not at top of file
examples/gaussian_process/plot_compare_gpr_krr.py:61:1: E402 module level import not at top of file
examples/gaussian_process/plot_compare_gpr_krr.py:62:1: E402 module level import not at top of file
examples/gaussian_process/plot_compare_gpr_krr.py:113:80: E501 line too long (80 > 79 characters)
examples/gaussian_process/plot_gpr_co2.py:66:1: E402 module level import not at top of file
examples/gaussian_process/plot_gpr_co2.py:68:1: E402 module level import not at top of file
examples/gaussian_process/plot_gpr_co2.py:70:1: E402 module level import not at top of file
examples/gaussian_process/plot_gpr_co2.py:71:1: E402 module level import not at top of file
examples/gaussian_process/plot_gpr_co2.py:73:1: E402 module level import not at top of file
examples/linear_model/plot_huber_vs_ridge.py:20:1: E402 module level import not at top of file
examples/linear_model/plot_huber_vs_ridge.py:21:1: E402 module level import not at top of file
examples/linear_model/plot_huber_vs_ridge.py:23:1: E402 module level import not at top of file
examples/linear_model/plot_huber_vs_ridge.py:24:1: E402 module level import not at top of file
examples/linear_model/plot_iris_logistic.py:21:1: E402 module level import not at top of file
examples/linear_model/plot_iris_logistic.py:22:1: E402 module level import not at top of file
examples/linear_model/plot_iris_logistic.py:23:1: E402 module level import not at top of file
examples/linear_model/plot_logistic.py:21:1: E402 module level import not at top of file
examples/linear_model/plot_logistic.py:22:1: E402 module level import not at top of file
examples/linear_model/plot_logistic.py:24:1: E402 module level import not at top of file
examples/linear_model/plot_ols_ridge_variance.py:31:1: E402 module level import not at top of file
examples/linear_model/plot_ols_ridge_variance.py:32:1: E402 module level import not at top of file
examples/linear_model/plot_ols_ridge_variance.py:34:1: E402 module level import not at top of file
examples/linear_model/plot_omp.py:11:1: E402 module level import not at top of file
examples/linear_model/plot_omp.py:12:1: E402 module level import not at top of file
examples/linear_model/plot_omp.py:13:1: E402 module level import not at top of file
examples/linear_model/plot_omp.py:14:1: E402 module level import not at top of file
examples/linear_model/plot_omp.py:15:1: E402 module level import not at top of file
examples/linear_model/plot_polynomial_interpolation.py:30:1: E402 module level import not at top of file
examples/linear_model/plot_polynomial_interpolation.py:31:1: E402 module level import not at top of file
examples/linear_model/plot_polynomial_interpolation.py:33:1: E402 module level import not at top of file
examples/linear_model/plot_polynomial_interpolation.py:34:1: E402 module level import not at top of file
examples/linear_model/plot_polynomial_interpolation.py:35:1: E402 module level import not at top of file
examples/linear_model/plot_ridge_coeffs.py:44:1: E402 module level import not at top of file
examples/linear_model/plot_ridge_coeffs.py:45:1: E402 module level import not at top of file
examples/linear_model/plot_ridge_coeffs.py:47:1: E402 module level import not at top of file
examples/linear_model/plot_ridge_coeffs.py:48:1: E402 module level import not at top of file
examples/linear_model/plot_ridge_coeffs.py:49:1: E402 module level import not at top of file
examples/linear_model/plot_robust_fit.py:69:80: E501 line too long (101 > 79 characters)
examples/linear_model/plot_robust_fit.py:70:80: E501 line too long (83 > 79 characters)
examples/manifold/plot_mds.py:16:1: E402 module level import not at top of file
examples/manifold/plot_mds.py:18:1: E402 module level import not at top of file
examples/manifold/plot_mds.py:19:1: E402 module level import not at top of file
examples/manifold/plot_mds.py:21:1: E402 module level import not at top of file
examples/manifold/plot_mds.py:22:1: E402 module level import not at top of file
examples/manifold/plot_mds.py:23:1: E402 module level import not at top of file
examples/manifold/plot_mds.py:68:80: E501 line too long (80 > 79 characters)
examples/neural_networks/plot_mnist_filters.py:25:1: E402 module level import not at top of file
examples/neural_networks/plot_mnist_filters.py:26:1: E402 module level import not at top of file
examples/neural_networks/plot_mnist_filters.py:27:1: E402 module level import not at top of file
sklearn/cluster/tests/common.py:22:22: E124 closing bracket does not match visual indentation
sklearn/externals/six.py:12:80: E501 line too long (80 > 79 characters)
sklearn/externals/six.py:44:20: F821 undefined name 'basestring'
sklearn/externals/six.py:45:27: F821 undefined name 'long'
sklearn/externals/six.py:47:17: F821 undefined name 'unicode'
sklearn/externals/six.py:134:1: E303 too many blank lines (3)
sklearn/externals/six.py:141:80: E501 line too long (91 > 79 characters)
sklearn/externals/six.py:151:80: E501 line too long (91 > 79 characters)
sklearn/externals/six.py:161:80: E501 line too long (87 > 79 characters)
sklearn/externals/six.py:174:80: E501 line too long (80 > 79 characters)
sklearn/externals/six.py:175:80: E501 line too long (80 > 79 characters)
sklearn/externals/six.py:188:80: E501 line too long (82 > 79 characters)
sklearn/externals/six.py:189:80: E501 line too long (82 > 79 characters)
sklearn/externals/six.py:190:80: E501 line too long (82 > 79 characters)
sklearn/externals/six.py:202:1: E303 too many blank lines (3)
sklearn/externals/six.py:226:80: E501 line too long (111 > 79 characters)
sklearn/externals/six.py:227:80: E501 line too long (111 > 79 characters)
sklearn/externals/six.py:243:80: E501 line too long (111 > 79 characters)
sklearn/externals/six.py:244:80: E501 line too long (111 > 79 characters)
sklearn/externals/six.py:266:80: E501 line too long (83 > 79 characters)
sklearn/externals/six.py:289:80: E501 line too long (117 > 79 characters)
sklearn/externals/six.py:290:80: E501 line too long (117 > 79 characters)
sklearn/externals/six.py:307:80: E501 line too long (120 > 79 characters)
sklearn/externals/six.py:308:80: E501 line too long (120 > 79 characters)
sklearn/externals/six.py:322:80: E501 line too long (129 > 79 characters)
sklearn/externals/six.py:323:80: E501 line too long (129 > 79 characters)
sklearn/externals/six.py:327:80: E501 line too long (83 > 79 characters)
sklearn/externals/six.py:335:80: E501 line too long (93 > 79 characters)
sklearn/externals/six.py:433:1: E302 expected 2 blank lines, found 1
sklearn/externals/six.py:437:1: E302 expected 2 blank lines, found 1
sklearn/externals/six.py:441:1: E302 expected 2 blank lines, found 1
sklearn/externals/six.py:449:5: E301 expected 1 blank line, found 0
sklearn/externals/six.py:467:5: E301 expected 1 blank line, found 0
sklearn/externals/six.py:468:16: F821 undefined name 'unicode'
sklearn/externals/six.py:471:5: E301 expected 1 blank line, found 0
sklearn/externals/six.py:473:5: E301 expected 1 blank line, found 0
sklearn/externals/six.py:475:5: E301 expected 1 blank line, found 0
sklearn/externals/six.py:488:5: E303 too many blank lines (2)
sklearn/externals/six.py:494:5: E303 too many blank lines (2)
sklearn/externals/six.py:511:5: E303 too many blank lines (2)
sklearn/externals/six.py:516:5: E303 too many blank lines (2)
sklearn/externals/six.py:521:9: E301 expected 1 blank line, found 0
sklearn/externals/six.py:522:37: F821 undefined name 'basestring'
sklearn/externals/six.py:528:32: F821 undefined name 'unicode'
sklearn/externals/six.py:534:32: F821 undefined name 'unicode'
sklearn/externals/six.py:542:36: F821 undefined name 'unicode'
sklearn/externals/six.py:546:23: F821 undefined name 'unicode'
sklearn/externals/six.py:547:21: F821 undefined name 'unicode'
sklearn/externals/six.py:568:1: E302 expected 2 blank lines, found 1
sklearn/utils/_scipy_sparse_lsqr_backport.py:56:1: E402 module level import not at top of file
sklearn/utils/_scipy_sparse_lsqr_backport.py:57:1: E402 module level import not at top of file
sklearn/utils/_scipy_sparse_lsqr_backport.py:58:1: E402 module level import not at top of file
sklearn/utils/_scipy_sparse_lsqr_backport.py:262:10: E128 continuation line under-indented for visual indent
sklearn/utils/_scipy_sparse_lsqr_backport.py:263:10: E128 continuation line under-indented for visual indent
sklearn/utils/_scipy_sparse_lsqr_backport.py:264:10: E128 continuation line under-indented for visual indent
sklearn/utils/_scipy_sparse_lsqr_backport.py:265:10: E128 continuation line under-indented for visual indent
sklearn/utils/_scipy_sparse_lsqr_backport.py:266:10: E128 continuation line under-indented for visual indent
sklearn/utils/_scipy_sparse_lsqr_backport.py:267:10: E128 continuation line under-indented for visual indent
sklearn/utils/_scipy_sparse_lsqr_backport.py:268:10: E128 continuation line under-indented for visual indent
sklearn/utils/_scipy_sparse_lsqr_backport.py:284:5: F841 local variable 'nstop' is assigned to but never used
sklearn/utils/_scipy_sparse_lsqr_backport.py:303:5: F841 local variable '__xm' is assigned to but never used
sklearn/utils/_scipy_sparse_lsqr_backport.py:304:5: F841 local variable '__xn' is assigned to but never used
sklearn/utils/tests/test_shortest_path.py:12:5: E265 block comment should start with '# '
sklearn/utils/tests/test_shortest_path.py:15:5: E265 block comment should start with '# '
sklearn/utils/tests/test_shortest_path.py:32:5: E265 block comment should start with '# '
sklearn/utils/tests/test_shortest_path.py:36:5: E265 block comment should start with '# '
sklearn/utils/tests/test_shortest_path.py:39:5: E265 block comment should start with '# '
sklearn/utils/tests/test_shortest_path.py:43:5: E265 block comment should start with '# '

The following files are modified by a single open PR:

cat lines-modified-merge* | cut -f1-2 | sort -u | cut -f2 | sort | uniq -c | sort -n | awk '$1 == 2 {print $2}'
.codecov.yml
COPYING
ISSUE_TEMPLATE.md
PULL_REQUEST_TEMPLATE.md
benchmarks/.gitignore
benchmarks/bench_isolation_forest.py
benchmarks/bench_lof.py
benchmarks/bench_plot_fastkmeans.py
benchmarks/bench_plot_incremental_pca.py
benchmarks/bench_plot_omp_lars.py
benchmarks/bench_plot_svd.py
benchmarks/bench_plot_ward.py
benchmarks/bench_rcv1_logreg_convergence.py
benchmarks/bench_sample_without_replacement.py
benchmarks/bench_sparsify.py
benchmarks/bench_text_vectorizers.py
benchmarks/bench_tsne_mnist.py
benchmarks/plot_tsne_mnist.py
build_tools/appveyor/requirements.txt
build_tools/circle/checkout_merge_commit.sh
build_tools/travis/after_success.sh
build_tools/windows/windows_testing_downloader.ps1
doc/datasets/labeled_faces.rst
doc/datasets/olivetti_faces.rst
doc/datasets/rcv1.rst
doc/datasets/twenty_newsgroups.rst
doc/developers/index.rst
doc/make.bat
doc/modules/cross_decomposition.rst
doc/modules/density.rst
doc/modules/kernel_ridge.rst
doc/modules/label_propagation.rst
doc/modules/lda_qda.rst
doc/modules/random_projection.rst
doc/presentations.rst
doc/templates/numpydoc_docstring.rst
doc/themes/scikit-learn/static/ML_MAPS_README.rst
doc/themes/scikit-learn/static/jquery.js
doc/themes/scikit-learn/static/js/extra.js
doc/tutorial/index.rst
doc/tutorial/machine_learning_map/ML_MAPS_README.txt
doc/tutorial/machine_learning_map/index.rst
doc/tutorial/machine_learning_map/pyparsing.py
doc/tutorial/machine_learning_map/svg2imagemap.py
doc/tutorial/statistical_inference/putting_together.rst
doc/tutorial/statistical_inference/settings.rst
doc/tutorial/text_analytics/data/languages/fetch_data.py
doc/tutorial/text_analytics/skeletons/exercise_01_language_train_model.py
doc/tutorial/text_analytics/skeletons/exercise_02_sentiment.py
doc/tutorial/text_analytics/solutions/exercise_01_language_train_model.py
doc/whats_new/v0.13.rst
doc/whats_new/v0.14.rst
doc/whats_new/v0.15.rst
doc/whats_new/v0.16.rst
doc/whats_new/v0.17.rst
doc/whats_new/v0.18.rst
doc/whats_new/v0.19.rst
examples/.flake8
examples/applications/plot_face_recognition.py
examples/applications/plot_model_complexity_influence.py
examples/applications/plot_out_of_core_classification.py
examples/applications/plot_outlier_detection_housing.py
examples/applications/plot_prediction_latency.py
examples/applications/plot_species_distribution_modeling.py
examples/calibration/README.txt
examples/calibration/plot_calibration_multiclass.py
examples/classification/plot_digits_classification.py
examples/cluster/plot_agglomerative_clustering.py
examples/cluster/plot_agglomerative_clustering_metrics.py
examples/cluster/plot_birch_vs_minibatchkmeans.py
examples/cluster/plot_cluster_iris.py
examples/cluster/plot_coin_segmentation.py
examples/cluster/plot_coin_ward_segmentation.py
examples/cluster/plot_color_quantization.py
examples/cluster/plot_dict_face_patches.py
examples/cluster/plot_digits_agglomeration.py
examples/cluster/plot_digits_linkage.py
examples/cluster/plot_feature_agglomeration_vs_univariate_selection.py
examples/cluster/plot_kmeans_assumptions.py
examples/cluster/plot_kmeans_digits.py
examples/cluster/plot_kmeans_silhouette_analysis.py
examples/cluster/plot_linkage_comparison.py
examples/cluster/plot_mini_batch_kmeans.py
examples/cluster/plot_segmentation_toy.py
examples/cluster/plot_ward_structured_vs_unstructured.py
examples/compose/README.txt
examples/compose/plot_column_transformer.py
examples/compose/plot_column_transformer_mixed_types.py
examples/compose/plot_compare_reduction.py
examples/compose/plot_digits_pipe.py
examples/compose/plot_feature_union.py
examples/compose/plot_transformed_target.py
examples/covariance/plot_covariance_estimation.py
examples/covariance/plot_mahalanobis_distances.py
examples/covariance/plot_robust_vs_empirical_covariance.py
examples/covariance/plot_sparse_cov.py
examples/cross_decomposition/plot_compare_cross_decomposition.py
examples/datasets/plot_digits_last_image.py
examples/datasets/plot_iris_dataset.py
examples/datasets/plot_random_dataset.py
examples/decomposition/plot_ica_blind_source_separation.py
examples/decomposition/plot_ica_vs_pca.py
examples/decomposition/plot_kernel_pca.py
examples/decomposition/plot_pca_3d.py
examples/decomposition/plot_pca_vs_fa_model_selection.py
examples/decomposition/plot_sparse_coding.py
examples/ensemble/plot_adaboost_hastie_10_2.py
examples/ensemble/plot_adaboost_multiclass.py
examples/ensemble/plot_adaboost_regression.py
examples/ensemble/plot_adaboost_twoclass.py
examples/ensemble/plot_bias_variance.py
examples/ensemble/plot_forest_iris.py
examples/ensemble/plot_gradient_boosting_regression.py
examples/ensemble/plot_gradient_boosting_regularization.py
examples/ensemble/plot_partial_dependence.py
examples/ensemble/plot_random_forest_regression_multioutput.py
examples/ensemble/plot_voting_decision_regions.py
examples/ensemble/plot_voting_probas.py
examples/exercises/plot_cv_diabetes.py
examples/exercises/plot_digits_classification_exercise.py
examples/exercises/plot_iris_exercise.py
examples/feature_selection/plot_f_test_vs_mi.py
examples/feature_selection/plot_feature_selection.py
examples/feature_selection/plot_permutation_test_for_classification.py
examples/gaussian_process/plot_gpc.py
examples/gaussian_process/plot_gpc_iris.py
examples/gaussian_process/plot_gpc_isoprobability.py
examples/gaussian_process/plot_gpc_xor.py
examples/gaussian_process/plot_gpr_noisy.py
examples/gaussian_process/plot_gpr_noisy_targets.py
examples/gaussian_process/plot_gpr_prior_posterior.py
examples/linear_model/plot_ard.py
examples/linear_model/plot_bayesian_ridge.py
examples/linear_model/plot_lasso_dense_vs_sparse_data.py
examples/linear_model/plot_lasso_lars.py
examples/linear_model/plot_lasso_model_selection.py
examples/linear_model/plot_logistic_l1_l2_sparsity.py
examples/linear_model/plot_logistic_multinomial.py
examples/linear_model/plot_logistic_path.py
examples/linear_model/plot_multi_task_lasso_support.py
examples/linear_model/plot_ols.py
examples/linear_model/plot_ols_3d.py
examples/linear_model/plot_ransac.py
examples/linear_model/plot_ridge_path.py
examples/linear_model/plot_sgd_comparison.py
examples/linear_model/plot_sgd_iris.py
examples/linear_model/plot_sgd_loss_functions.py
examples/linear_model/plot_sgd_penalties.py
examples/linear_model/plot_sgd_separating_hyperplane.py
examples/linear_model/plot_sgd_weighted_samples.py
examples/linear_model/plot_sparse_logistic_regression_20newsgroups.py
examples/linear_model/plot_sparse_logistic_regression_mnist.py
examples/linear_model/plot_theilsen.py
examples/manifold/plot_swissroll.py
examples/manifold/plot_t_sne_perplexity.py
examples/mixture/plot_concentration_prior.py
examples/model_selection/grid_search_text_feature_extraction.py
examples/model_selection/plot_cv_predict.py
examples/model_selection/plot_grid_search_digits.py
examples/model_selection/plot_learning_curve.py
examples/model_selection/plot_multi_metric_evaluation.py
examples/model_selection/plot_roc.py
examples/model_selection/plot_roc_crossval.py
examples/model_selection/plot_train_error_vs_test_error.py
examples/model_selection/plot_underfitting_overfitting.py
examples/model_selection/plot_validation_curve.py
examples/multioutput/README.txt
examples/multioutput/plot_classifier_chain_yeast.py
examples/neighbors/plot_digits_kde_sampling.py
examples/neighbors/plot_species_kde.py
examples/neural_networks/plot_mlp_alpha.py
examples/neural_networks/plot_mlp_training_curves.py
examples/plot_isotonic_regression.py
examples/plot_johnson_lindenstrauss_bound.py
examples/plot_kernel_ridge_regression.py
examples/plot_multilabel.py
examples/plot_multioutput_face_completion.py
examples/preprocessing/plot_all_scaling.py
examples/semi_supervised/plot_label_propagation_digits.py
examples/semi_supervised/plot_label_propagation_digits_active_learning.py
examples/semi_supervised/plot_label_propagation_versus_svm_iris.py
examples/svm/plot_custom_kernel.py
examples/svm/plot_separating_hyperplane.py
examples/svm/plot_svm_anova.py
examples/svm/plot_svm_kernels.py
examples/svm/plot_svm_margin.py
examples/svm/plot_svm_nonlinear.py
examples/svm/plot_svm_scale_c.py
examples/svm/plot_weighted_samples.py
examples/text/plot_hashing_vs_dict_vectorizer.py
examples/tree/plot_tree_regression.py
examples/tree/plot_tree_regression_multioutput.py
examples/tree/plot_unveil_tree_structure.py
sklearn/__check_build/__init__.py
sklearn/_config.py
sklearn/cluster/_dbscan_inner.pyx
sklearn/cluster/_hierarchical.pyx
sklearn/cluster/_k_means_elkan.pyx
sklearn/cluster/setup.py
sklearn/compose/tests/test_target.py
sklearn/covariance/__init__.py
sklearn/covariance/robust_covariance.py
sklearn/covariance/shrunk_covariance_.py
sklearn/covariance/tests/test_covariance.py
sklearn/covariance/tests/test_elliptic_envelope.py
sklearn/datasets/_svmlight_format.pyx
sklearn/datasets/data/boston_house_prices.csv
sklearn/datasets/data/breast_cancer.csv
sklearn/datasets/data/iris.csv
sklearn/datasets/data/wine_data.csv
sklearn/datasets/descr/boston_house_prices.rst
sklearn/datasets/descr/diabetes.rst
sklearn/datasets/descr/iris.rst
sklearn/datasets/descr/linnerud.rst
sklearn/datasets/descr/wine_data.rst
sklearn/datasets/setup.py
sklearn/datasets/tests/test_california_housing.py
sklearn/datasets/tests/test_common.py
sklearn/datasets/tests/test_covtype.py
sklearn/datasets/tests/test_kddcup99.py
sklearn/datasets/tests/test_lfw.py
sklearn/datasets/tests/test_mldata.py
sklearn/datasets/tests/test_rcv1.py
sklearn/decomposition/cdnmf_fast.pyx
sklearn/decomposition/tests/test_factor_analysis.py
sklearn/externals/README
sklearn/externals/conftest.py
sklearn/externals/joblib/disk.py
sklearn/feature_extraction/setup.py
sklearn/feature_selection/mutual_info_.py
sklearn/feature_selection/tests/test_base.py
sklearn/feature_selection/tests/test_chi2.py
sklearn/gaussian_process/__init__.py
sklearn/gaussian_process/correlation_models.py
sklearn/gaussian_process/regression_models.py
sklearn/linear_model/sgd_fast.pxd
sklearn/linear_model/sgd_fast_helpers.h
sklearn/linear_model/tests/test_omp.py
sklearn/manifold/setup.py
sklearn/manifold/tests/test_locally_linear.py
sklearn/metrics/cluster/expected_mutual_info_fast.pyx
sklearn/metrics/cluster/setup.py
sklearn/mixture/__init__.py
sklearn/neighbors/nearest_centroid.py
sklearn/neighbors/quad_tree.pxd
sklearn/neighbors/quad_tree.pyx
sklearn/neighbors/tests/test_quad_tree.py
sklearn/neighbors/typedefs.pyx
sklearn/preprocessing/_encoders.py
sklearn/preprocessing/tests/test_encoders.py
sklearn/semi_supervised/__init__.py
sklearn/src/cblas/atlas_misc.h
sklearn/src/cblas/atlas_reflevel1.h
sklearn/src/cblas/cblas.h
sklearn/src/cblas/cblas_dcopy.c
sklearn/src/cblas/cblas_scopy.c
sklearn/src/cblas/cblas_xerbla.c
sklearn/svm/bounds.py
sklearn/svm/liblinear.pxd
sklearn/svm/liblinear.pyx
sklearn/svm/libsvm.pxd
sklearn/svm/setup.py
sklearn/svm/src/liblinear/linear.h
sklearn/svm/src/libsvm/libsvm_sparse_helper.c
sklearn/tests/test_init.py
sklearn/tests/test_random_projection.py
sklearn/tree/setup.py
sklearn/utils/bench.py
sklearn/utils/fast_dict.pxd
sklearn/utils/graph_shortest_path.pyx
sklearn/utils/sparsetools/__init__.py
sklearn/utils/tests/test_deprecation.py
sklearn/utils/tests/test_metaestimators.py
sklearn/utils/tests/test_stats.py
sklearn/utils/weight_vector.pxd

The following are modified by 2 open PRs:

.circleci/config.yml
.mailmap
CONTRIBUTING.md
benchmarks/bench_covertype.py
benchmarks/bench_mnist.py
benchmarks/bench_plot_lasso_path.py
benchmarks/bench_plot_randomized_svd.py
benchmarks/bench_saga.py
benchmarks/bench_sgd_regression.py
build_tools/circle/list_versions.py
build_tools/travis/flake8_diff.sh
conftest.py
doc/README.md
doc/conftest.py
doc/data_transforms.rst
doc/datasets/covtype.rst
doc/datasets/kddcup99.rst
doc/developers/maintainer.rst
doc/modules/computational_performance.rst
doc/modules/covariance.rst
doc/modules/learning_curve.rst
doc/modules/unsupervised_reduction.rst
doc/sphinxext/sphinx_issues.py
doc/supervised_learning.rst
doc/tutorial/machine_learning_map/parse_path.py
doc/user_guide.rst
doc/whats_new/older_versions.rst
examples/README.txt
examples/applications/plot_tomography_l1_reconstruction.py
examples/applications/wikipedia_principal_eigenvector.py
examples/classification/plot_classifier_comparison.py
examples/classification/plot_lda_qda.py
examples/cluster/plot_adjusted_for_chance_measures.py
examples/cluster/plot_dbscan.py
examples/cluster/plot_face_compress.py
examples/cluster/plot_kmeans_stability_low_dim_dense.py
examples/decomposition/plot_pca_iris.py
examples/ensemble/plot_ensemble_oob.py
examples/ensemble/plot_feature_transformation.py
examples/ensemble/plot_gradient_boosting_early_stopping.py
examples/ensemble/plot_isolation_forest.py
examples/ensemble/plot_random_forest_embedding.py
examples/feature_selection/plot_feature_selection_pipeline.py
examples/linear_model/plot_lasso_and_elasticnet.py
examples/manifold/plot_compare_methods.py
examples/manifold/plot_lle_digits.py
examples/manifold/plot_manifold_sphere.py
examples/model_selection/plot_confusion_matrix.py
examples/model_selection/plot_randomized_search.py
examples/neighbors/plot_kde_1d.py
examples/neighbors/plot_lof.py
examples/neighbors/plot_nearest_centroid.py
examples/neural_networks/plot_rbm_logistic_classification.py
examples/preprocessing/plot_power_transformer.py
examples/preprocessing/plot_scaling_importance.py
examples/semi_supervised/plot_label_propagation_structure.py
examples/svm/plot_iris.py
examples/svm/plot_oneclass.py
examples/svm/plot_rbf_parameters.py
examples/svm/plot_separating_hyperplane_unbalanced.py
examples/svm/plot_svm_regression.py
examples/tree/plot_iris.py
sklearn/_build_utils/__init__.py
sklearn/_isotonic.pyx
sklearn/cluster/_feature_agglomeration.py
sklearn/cluster/bicluster.py
sklearn/cluster/tests/test_affinity_propagation.py
sklearn/cluster/tests/test_feature_agglomeration.py
sklearn/compose/__init__.py
sklearn/covariance/elliptic_envelope.py
sklearn/covariance/empirical_covariance_.py
sklearn/covariance/tests/test_graphical_lasso.py
sklearn/covariance/tests/test_robust_covariance.py
sklearn/cross_decomposition/__init__.py
sklearn/datasets/descr/breast_cancer.rst
sklearn/datasets/descr/digits.rst
sklearn/datasets/mlcomp.py
sklearn/datasets/mldata.py
sklearn/datasets/tests/test_20news.py
sklearn/decomposition/base.py
sklearn/decomposition/tests/test_incremental_pca.py
sklearn/externals/_pilutil.py
sklearn/externals/copy_joblib.sh
sklearn/externals/funcsigs.py
sklearn/externals/joblib/_compat.py
sklearn/externals/joblib/_memory_helpers.py
sklearn/externals/joblib/_parallel_backends.py
sklearn/externals/joblib/backports.py
sklearn/externals/joblib/format_stack.py
sklearn/externals/joblib/logger.py
sklearn/externals/joblib/memory.py
sklearn/externals/joblib/numpy_pickle_compat.py
sklearn/externals/joblib/numpy_pickle_utils.py
sklearn/externals/setup.py
sklearn/feature_extraction/stop_words.py
sklearn/feature_extraction/tests/test_dict_vectorizer.py
sklearn/feature_selection/tests/test_mutual_info.py
sklearn/gaussian_process/tests/test_gaussian_process.py
sklearn/gaussian_process/tests/test_gpc.py
sklearn/linear_model/randomized_l1.py
sklearn/linear_model/sag_fast.pyx
sklearn/linear_model/tests/test_passive_aggressive.py
sklearn/linear_model/tests/test_perceptron.py
sklearn/linear_model/tests/test_theil_sen.py
sklearn/linear_model/theil_sen.py
sklearn/manifold/_barnes_hut_tsne.pyx
sklearn/manifold/_utils.pyx
sklearn/metrics/cluster/bicluster.py
sklearn/metrics/cluster/tests/test_bicluster.py
sklearn/metrics/cluster/tests/test_common.py
sklearn/metrics/setup.py
sklearn/mixture/dpgmm.py
sklearn/mixture/tests/test_bayesian_mixture.py
sklearn/mixture/tests/test_dpgmm.py
sklearn/mixture/tests/test_mixture.py
sklearn/neighbors/graph.py
sklearn/neighbors/tests/test_nearest_centroid.py
sklearn/neighbors/typedefs.pxd
sklearn/neural_network/_base.py
sklearn/svm/__init__.py
sklearn/svm/libsvm_sparse.pyx
sklearn/svm/src/liblinear/liblinear_helper.c
sklearn/svm/src/liblinear/linear.cpp
sklearn/svm/src/libsvm/libsvm_helper.c
sklearn/tests/test_config.py
sklearn/tests/test_docstring_parameters.py
sklearn/tests/test_isotonic.py
sklearn/tree/_criterion.pxd
sklearn/utils/_random.pxd
sklearn/utils/_unittest_backport.py
sklearn/utils/arpack.py
sklearn/utils/deprecation.py
sklearn/utils/linear_assignment_.py
sklearn/utils/sparsetools/setup.py
sklearn/utils/tests/test_graph.py
sklearn/utils/tests/test_random.py
sklearn/utils/tests/test_seq_dataset.py
sklearn/utils/tests/test_utils.py
sklearn/utils/weight_vector.pyx

What do we think of offering contributors to do cleanups of flake8 or removing assert_{true,false,equal,dict_equal} in these files?

@jnothman
Copy link
Member Author

I should summarise: of 1122 files in the repo at 73b7d07:

type n PRs n files
c 0 59
1 9
2 3
3 1
4-10 1
data 0 7
1 4
examples 0 59
1 136
2 35
3 12
4-10 10
externals 0 3
1 3
2 13
3 3
4-10 4
other 0 112
1 32
2 16
3 7
4-10 10
py 0 17
1 39
2 33
3 24
4-10 97
11-20 46
20+ 10
rst 0 18
1 30
2 13
3 13
4-10 32
11-20 8
20+ 4
tests 0 40
1 22
2 26
3 25
4-10 63
11-20 22
20+ 1

(Sorry: being lazy about visualisation)

Definition of type:

def path_type(path):
    if path.startswith('examples'):
        return 'examples'
    if '/externals/' in path:
        return 'externals'
    if '/tests/' in path:
        return 'tests'
    if path.endswith('.rst'):
        return 'rst'
    if not path.startswith('sklearn/'):
        return 'other'
    if any(path.endswith(ext) for ext in ['.c', '.h', '.cpp']):
        return 'c'
    if any(path.endswith(ext) for ext in ['.py', '.pxd', '.pyx', '.pxi']):
        return 'py'
    if any(path.endswith(ext) for ext in ['.jpg', '.csv', '.gz']):
        return 'data'
    return 'other'

@lesteve
Copy link
Member

lesteve commented Jun 29, 2018

Wow this is quite an impressive endeavour! I am guessing that there is a bit of inaccuracy because the line numbers of the flake8 warnings change with history (or maybe this is just on a per-file level). I would still take that as a useful first-order estimate.

Personally I have to admit I have a bit of a bias towards flake8ing all (or at least as much the consensus deems acceptable) of the code, sacrifice some old outstanding PRs and help the people that need it on the ongoing still alive PRs.

Ideally it would be very nice to have some anecdotal evidence that sometimes conflicts (caused by flake8 or by some similar automated change) is the thing that prevents resuscitating a PR (or makes a PR die). Talking for myself, I have had some cases where resolving the conflicts was too painful (#10663 reviving #4807 stands to mind). I ended up rewriting the code, with a lot of copying and pasting from the PR.

One of the arguments I have heard against flake8ing the code is that flake8 changes with time (PEP8 changes with time and is open to interpretation too) and that it's not something you do once and forget about it for the rest of your life. I definitely have seen cases where different versions of flake8 were giving different warnings but I would be (maybe too naively) optimistic that we can control this effect through ignoring flake8 warnings in setup.cfg or on a line-level through comments.

Side-comment: if we decide to flake8 some parts of the code (e.g. tests may be a good candidate), there is no guarantee that flake8_diff.sh prevents flake8 errors to be reintroduced (mainly because the diff does not have enough context), so we will need to adapt our flake8 testing script.

@jnothman
Copy link
Member Author

jnothman commented Jun 30, 2018 via email

@rth
Copy link
Member

rth commented Jul 3, 2018

Very impressing analysis! This will be very useful for PR introducing cosmetic changes.

What do we think of offering contributors to do cleanups of flake8 or removing assert_{true,false,equal,dict_equal} in these files?

I'm still biased toward "yes", particularly for running nose2pytest on the test files, don't really have a strong opinion about flake.

Also I was following with interest projects that started to use black (e.g. dask/dask-ml#237) but that's a whole another level of potential merge conflicts so probably not even worth considering here (even assuming the advantages were unanimous, which is not the case I think).

@rth rth mentioned this issue Jul 6, 2018
@rth
Copy link
Member

rth commented May 14, 2019

On a related topic Django recently accepted an Enhancement Proposal to re-format the whole code base with black (https://github.com/django/deps/blob/master/accepted/0008-black.rst) if I understood that right. They "only" have 230 open PRs but still.

@thomasjpfan
Copy link
Member

The way dask-ml tries to enforce style is through its documentation and using pre-commit to manage pre-commit hooks.

Even without pre-commit hooks, if we can get contributors to run black on their PRs, all the merge conflicts should be resolved by themselves. Ideally, there would be no need to manually resolve merge conflicts.

@jnothman
Copy link
Member Author

jnothman commented May 14, 2019 via email

@jnothman
Copy link
Member Author

jnothman commented May 14, 2019 via email

@thomasjpfan
Copy link
Member

Although black's approach to indentation is very different to ours.

That is the blocker for using black. scikit-learn prefers:

def hello_world(var_1, var_2, var_3,
			    var_4, ...):
    pass

while black prefers:

def hello_world(
    var_1,
    var_2,
    var_3,
    ...):
    pass

When not working in scikit-learn, I tend to prefer the black indentation, because it allows space for type annotations.

@NicolasHug
Copy link
Member

I'd be OK with using black, the benefit seem to outweigh the downsides.

Merge conflicts with existing PRs would be trivial to solve, contributors would just have to run black.

@rth
Copy link
Member

rth commented May 15, 2019

Merge conflicts with existing PRs would be trivial to solve, contributors would just have to run black.

black would help some, but there are still cases where conflicts would need to be resolved (e.g. if the same line was changed in 2 different ways).

@rth
Copy link
Member

rth commented Apr 14, 2020

The issue is that for instance applying flake8 on the diff, after a certain point does not make code more PEP8 compatible -- it does not converge.

flake8 --exclude=sklearn/externals sklearn/ | wc -l

On,

  • master - 227 errors
  • v0.22.1 - 282 errors
  • v0.21.0 - 209 errors
  • v0.20.0 - 268 errors
  • v0.19.0 - 482 errors

So honestly I think we should just fix these 200 LoC (or which many are removed/added newlines), be done with it and remove ~160 lines of bash hacks in build_tools/circle/linting.sh in favor of flake8 sklearn/

@rth
Copy link
Member

rth commented Apr 14, 2020

Or maybe we should just discuss using black in the next dev meeting..

@jnothman
Copy link
Member Author

I can't say I like black's style all the time... but I agree that it's an easy way out. Happy to discuss.

@rth
Copy link
Member

rth commented Apr 19, 2020

For code style as far as can tell we have 3 possibilities,
1). Keep the current situation: i.e. apply flake8 on the diff, and a few selected flake8 rules on the full code base (e.g. avoid unused imports). The limitation of it is that we have to maintain a significant amount of code to check this and new PR may introduce PEP8 incompatible changes, particularly at the edges of the diff.
2) Fix flake8 issues, to address limitation from 1 but otherwise keep the current situation.
3). Use automatic code formatting such as black. I agree that using 1 parameter per line for longer functions #11336 (comment) is not necessarily ideal. Though it could make more sense if we start to add some code annotations. Also for functions that have 20+ input parameters (e.g. TfidfVectorizer) we also have to consider whether there is API issue to start with.

@jorisvandenbossche I see pandas has applied black last year pandas-dev/pandas#27076. Could you share your experience of it? Particularly with respect to managing the transition, whether it had an impact on merge conflicts for existing PR, as well as what impact in PR review and interaction with contributors it had.

@chkoar
Copy link
Contributor

chkoar commented May 22, 2020

I am +1 for the black adoption.

@amueller
Copy link
Member

I think I'd vote 2, though I'd also be fine with 3. Completely agree with @jnothman :)

@NicolasHug
Copy link
Member

Just an anecdotal remark in favor of black: I've written a bunch of PRs recently where I needed to do some global changes to the codebase (e.g. adding a strict_mode arguement to all the checks, or replacing check_array by _validate_data).

Having black would have made my life much easier. Right now, I have to fix dozens (sometimes hundreds) of linting issues before I can push, otherwise I know the CI won't run most tests. It's a bit of a pain especially when you're just trying potential ideas.

@jnothman
Copy link
Member Author

I think transitioning could look like:

  1. Make a PR to blacken master.
  2. Update documentation and potentially adopt pre-commit.com's black runner.
  3. Add a commit to all open PRs, automatically applying black.

I'm not yet sure about applying black to examples, where we'll often want to make the example visually clear. I'm thinking about the layout of a 2d matrix input specifically.

@chkoar
Copy link
Contributor

chkoar commented Jun 26, 2020

I'm not yet sure about applying black to examples, where we'll often want to make the example visually clear.

That's an understandable point but probably is something that we are used to it.
IMHO having just black . removes the mental complexity on how to format the code.

@thomasjpfan
Copy link
Member

If we are doing this, I would try to push for a slightly higher line-length (maybe 100). I recall that @adrinjalali was not happy with changing the line length.

@rth
Copy link
Member

rth commented Jun 26, 2020

If we are doing this, I would try to push for a slightly higher line-length (maybe 100). I recall that @adrinjalali was not happy with changing the line length.

Also Gaël I think. 100 would be a bit too large even on my laptop I think. I would prefer either the default (88) or 79.

IMHO having just black . removes the mental complexity on how to format the code.

Yeah, asking contributors to rely on pre-commit for code changes except for examples where they would need to manually do it can be confusing.

I think transitioning could look like:

Sounds like a plan.

Add a commit to all open PRs, automatically applying black.

Yes, that would indeed be ideal. Something like,

for pr_id in get_open_prs():
    subprocess.call(['hub', 'pr', 'checkout', f'#{pr_id}'])
    subprocess.call(['black', '.'])
    subprocess.call(['git', 'commit', '-a'])
    subprocess.call(['git', 'push'])

we would need to think about how to make that a) maybe include manual validation in the beginning b) store state of what was migrated what wasn't c) make it more fault tolerant d) be aware that we probably won't be able to do it all in one run due to Github API limitations. The list of open PRs can be obtained with,

hub pr list -s open

@mitar
Copy link
Contributor

mitar commented Jul 3, 2020

Just 2 cents here: I was heavy user of flake, pycodestyle, and other tools on my Python projects with many students working on them. I wanted that students learn good code style and that code is consistent as everyone was participating on just parts of it. The downside was that students spend so much time trying to fix those style issues because even if those tools detect issues, authors do not necessary know how to fix them and then do trial and error. Even more, some students never installed those tools locally and just used CI to check, so there was a crazy amount of commits trying to fix those and waiting for CI.

With black, we can just focus on code and ignore the style. Students are happier and they also learn good code style by seeing what black fixes. Which eventually take a hold on them.

Since then I also used go which also has a standard formatter. And it really makes life much easier. Even if you disagree with some style choices, at the end consistency is the most important and there is really no reason to spend time manually fixing those consistency issues.

I also believe sklearn should get typing information, so black formatting of attributes makes sense in that context.

@jnothman
Copy link
Member Author

jnothman commented Jul 4, 2020 via email

@cmarmo
Copy link
Member

cmarmo commented Jan 15, 2022

Now that black has been adopted, am I wrong or this issue can be closed?

@cmarmo cmarmo added the Needs Decision - Close Requires decision for closing label Jan 15, 2022
@adrinjalali
Copy link
Member

Joel's script is a really nice way to measure impact for future potential changes. But I think the main issue is resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Decision - Close Requires decision for closing
Projects
None yet
Development

No branches or pull requests

10 participants