DOC Add links to KMeans examples in docstrings and the user guide #27799

marenwestermann · 2023-11-17T13:45:52Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Adds links to examples in the docstrings and the user guide which demonstrate how to use K-Means.

Any other comments?

I started with the example plot_cluster_iris.py and then realised that it probably makes sense to group all the links related to K-Means examples in one PR. So I will keep working on adding links to examples which show how to use K-Means.

Edit: the examples are

plot_cluster_iris.py
plot_color_quantization.py
plot_kmeans_assumptions.py
plot_kmeans_digits.py
plot_kmeans_silhouette_analysis.py
plot_mini_batch_kmeans.py
plot_document_clustering.py

Note: there can be more than one PR per example script because they might be referenced in different locations. For example there is an existing open PR for plot_document_clustering.py which links this example in the docs of a other estimator.

github-actions · 2023-11-17T13:47:17Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: ca33b93. Link to the linter CI: here}

ArturoAmorQ

Thanks for the PR @marenwestermann! Here is a batch of comments :)

ArturoAmorQ · 2023-11-24T15:35:49Z

doc/modules/clustering.rst

@@ -218,7 +222,9 @@ initializations of the centroids. One method to help address this issue is the
 k-means++ initialization scheme, which has been implemented in scikit-learn
 (use the ``init='k-means++'`` parameter). This initializes the centroids to be
 (generally) distant from each other, leading to probably better results than
-random initialization, as shown in the reference.
+random initialization, as shown in the reference. For a detailed example of
+comaparing different initialization schemes refer to


Suggested change

comaparing different initialization schemes refer to

comparing different initialization schemes, refer to

ArturoAmorQ · 2023-11-24T15:40:36Z

doc/modules/clustering.rst

@@ -231,7 +237,17 @@ weight of 2 to a sample is equivalent to adding a duplicate of that sample
 to the dataset :math:`X`.

 K-means can be used for vector quantization. This is achieved using the
-transform method of a trained model of :class:`KMeans`.
+transform method of a trained model of :class:`KMeans`. For an example of


Suggested change

transform method of a trained model of :class:`KMeans`. For an example of

`transform` method of a trained model of :class:`KMeans`. For an example of

ArturoAmorQ · 2023-11-24T15:43:49Z

doc/modules/clustering.rst

+   using the iris dataset
+
+ * :ref:`sphx_glr_auto_examples_text_plot_document_clustering.py`: Document clustering
+   using KMeans and MiniBatchKMeans based on sparse data


Suggested change

using KMeans and MiniBatchKMeans based on sparse data

using :class:`KMeans` and :class:`MiniBatchKMeans` based on sparse data

ArturoAmorQ · 2023-11-24T15:44:11Z

doc/modules/clustering.rst

+
+.. topic:: Examples:
+
+ * :ref:`sphx_glr_auto_examples_cluster_plot_cluster_iris.py`: Example usage of K-Means


Suggested change

* :ref:`sphx_glr_auto_examples_cluster_plot_cluster_iris.py`: Example usage of K-Means

* :ref:`sphx_glr_auto_examples_cluster_plot_cluster_iris.py`: Example usage of :class:`KMeans`

ArturoAmorQ · 2023-11-24T15:46:03Z

doc/modules/clustering.rst

 * :ref:`sphx_glr_auto_examples_cluster_plot_mini_batch_kmeans.py`: Comparison of KMeans and
   MiniBatchKMeans


Suggested change

* :ref:`sphx_glr_auto_examples_cluster_plot_mini_batch_kmeans.py`: Comparison of KMeans and

MiniBatchKMeans

* :ref:`sphx_glr_auto_examples_cluster_plot_mini_batch_kmeans.py`: Comparison of

:class:`KMeans` and :class:`MiniBatchKMeans`

ArturoAmorQ · 2023-11-24T15:46:26Z

doc/modules/clustering.rst

- * :ref:`sphx_glr_auto_examples_text_plot_document_clustering.py`: Document clustering using sparse
-   MiniBatchKMeans
+ * :ref:`sphx_glr_auto_examples_text_plot_document_clustering.py`: Document clustering
+   using KMeans and MiniBatchKMeans based on sparse data


Suggested change

using KMeans and MiniBatchKMeans based on sparse data

using :class:`KMeans` and :class:`MiniBatchKMeans` based on sparse data

ArturoAmorQ · 2023-11-24T15:52:12Z

examples/cluster/plot_cluster_iris.py

- top right: What the effect of a bad initialization is
+- top right: What using three clusters would deliver.
+
+- bottom left: What the effect of a bad initialization is


Maybe this can be done in another PR, but currently it seems that the initialization is good. I would rather pass a fixed random_state to KMeans instead of setting a global np.random.seed

ArturoAmorQ · 2023-11-24T15:59:46Z

examples/text/plot_document_clustering.py

+# using the model results itself. In that case, the :ref:`Silhouette Coefficient
+# <sphx_glr_auto_examples_cluster_plot_kmeans_silhouette_analysis.py>` comes in handy.


I would rather say something similar to
"In that case the Silhouette analysis comes in handy. See sphx_glr_auto_examples_cluster_plot_kmeans_silhouette_analysis.py for an example on how to do it."

ArturoAmorQ · 2023-11-24T16:03:12Z

examples/cluster/plot_color_quantization.py

@@ -41,7 +41,7 @@
 china = load_sample_image("china.jpg")

 # Convert to floats instead of the default 8 bits integer coding. Dividing by
-# 255 is important so that plt.imshow behaves works well on float data (need to
+# 255 is important so that plt.imshow works well on float data (need to


Nice catch!

ArturoAmorQ

Now it does LGTM, thanks @marenwestermann and sorry for taking so long to answer! (I was/still am off on holidays)

…ikit-learn#27799)

…7799)

…ikit-learn#27799)

plot_cluster_iris example

851e7b4

github-actions bot added module:cluster Documentation labels Nov 17, 2023

marenwestermann added 11 commits November 17, 2023 14:56

add plot_cluster_iris example to user guide

632a543

add links to plot_color_quantization

858571f

fix rendering issue

983865c

Merge remote-tracking branch 'upstream/main' into kmeans-examples

6edf701

add links to plot_kmeans_assumptions

54846ec

embed links in text

a4f05c8

add link to plot_kmeans_digits in user guide

48a0b30

add links to plot_kmeans_silhouette_analysis

5f85468

add link to plot_mini_batch_kmeans

57b1cf1

add link in plot_document_clustering.py

147c473

add links to plot_document_clustering

c921706

marenwestermann changed the title ~~DOC [WIP] Add links to KMeans examples in docstrings and the user guide~~ DOC Add links to KMeans examples in docstrings and the user guide Nov 24, 2023

ArturoAmorQ reviewed Nov 24, 2023

View reviewed changes

ArturoAmorQ mentioned this pull request Nov 24, 2023

Add links to examples from the docstrings and user guides #26927

Open

marenwestermann added 2 commits December 10, 2023 11:29

address comments

759da86

Merge remote-tracking branch 'upstream/main' into kmeans-examples

ca33b93

ArturoAmorQ approved these changes Jan 6, 2024

View reviewed changes

ArturoAmorQ merged commit 056864d into scikit-learn:main Jan 6, 2024
27 checks passed

marenwestermann deleted the kmeans-examples branch January 6, 2024 12:34

jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Jan 17, 2024

DOC Add links to KMeans examples in docstrings and the user guide (sc…

20bdc90

…ikit-learn#27799)

jeremiedbb pushed a commit that referenced this pull request Jan 17, 2024

DOC Add links to KMeans examples in docstrings and the user guide (#2…

638244f

…7799)

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Feb 10, 2024

DOC Add links to KMeans examples in docstrings and the user guide (sc…

791c6c7

…ikit-learn#27799)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC Add links to KMeans examples in docstrings and the user guide #27799

DOC Add links to KMeans examples in docstrings and the user guide #27799

marenwestermann commented Nov 17, 2023 •

edited

Loading

github-actions bot commented Nov 17, 2023 •

edited

Loading

ArturoAmorQ left a comment

ArturoAmorQ Nov 24, 2023

ArturoAmorQ Nov 24, 2023

ArturoAmorQ Nov 24, 2023

ArturoAmorQ Nov 24, 2023

ArturoAmorQ Nov 24, 2023

ArturoAmorQ Nov 24, 2023

ArturoAmorQ Nov 24, 2023

ArturoAmorQ Nov 24, 2023

ArturoAmorQ Nov 24, 2023

ArturoAmorQ left a comment

	comaparing different initialization schemes refer to
	comparing different initialization schemes, refer to

	transform method of a trained model of :class:`KMeans`. For an example of
	`transform` method of a trained model of :class:`KMeans`. For an example of

	using KMeans and MiniBatchKMeans based on sparse data
	using :class:`KMeans` and :class:`MiniBatchKMeans` based on sparse data


		.. topic:: Examples:

		* :ref:`sphx_glr_auto_examples_cluster_plot_cluster_iris.py`: Example usage of K-Means

		* :ref:`sphx_glr_auto_examples_cluster_plot_mini_batch_kmeans.py`: Comparison of KMeans and
		MiniBatchKMeans

		# using the model results itself. In that case, the :ref:`Silhouette Coefficient
		# <sphx_glr_auto_examples_cluster_plot_kmeans_silhouette_analysis.py>` comes in handy.

DOC Add links to KMeans examples in docstrings and the user guide #27799

DOC Add links to KMeans examples in docstrings and the user guide #27799

Conversation

marenwestermann commented Nov 17, 2023 • edited Loading

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

github-actions bot commented Nov 17, 2023 • edited Loading

✔️ Linting Passed

ArturoAmorQ left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ArturoAmorQ left a comment

Choose a reason for hiding this comment

marenwestermann commented Nov 17, 2023 •

edited

Loading

github-actions bot commented Nov 17, 2023 •

edited

Loading