### sklearn.cluster.MeanShift

_class_ sklearn.cluster.MeanShift(_*_, _bandwidth=None_, _seeds=None_, _bin_seeding=False_, _min_bin_freq=1_, _cluster_all=True_, _n_jobs=None_, _max_iter=300_)[[source]](https://github.com/scikit-learn/scikit-learn/blob/8c9c1f27b/sklearn/cluster/_mean_shift.py#L263)[¶](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.MeanShift.html#sklearn.cluster.MeanShift "Permalink to this definition")


Parameters:

**bandwidth**float, default=None

Bandwidth used in the RBF kernel.

If not given, the bandwidth is estimated using sklearn.cluster.estimate_bandwidth; see the documentation for that function for hints on scalability (see also the Notes, below).

**seeds**array-like of shape (n_samples, n_features), default=None

Seeds used to initialize kernels. If not set, the seeds are calculated by clustering.get_bin_seeds with bandwidth as the grid size and default values for other parameters.

**bin_seeding**bool, default=False

If true, initial kernel locations are not locations of all points, but rather the location of the discretized version of points, where points are binned onto a grid whose coarseness corresponds to the bandwidth. Setting this option to True will speed up the algorithm because fewer seeds will be initialized. The default value is False. Ignored if seeds argument is not None.

**min_bin_freq**int, default=1

To speed up the algorithm, accept only those bins with at least min_bin_freq points as seeds.

**cluster_all**bool, default=True

If true, then all points are clustered, even those orphans that are not within any kernel. Orphans are assigned to the nearest kernel. If false, then orphans are given cluster label -1.

**n_jobs**int, default=None

The number of jobs to use for the computation. The following tasks benefit from the parallelization:

-   The search of nearest neighbors for bandwidth estimation and label assignments. See the details in the docstring of the  `NearestNeighbors`  class.
    
-   Hill-climbing optimization for all seeds.
    

See  [Glossary](https://scikit-learn.org/stable/glossary.html#term-n_jobs)  for more details.

`None`  means 1 unless in a  [`joblib.parallel_backend`](https://joblib.readthedocs.io/en/latest/parallel.html#joblib.parallel_backend "(in joblib v1.3.0.dev0)")  context.  `-1`  means using all processors. See  [Glossary](https://scikit-learn.org/stable/glossary.html#term-n_jobs)  for more details.

**max_iter**int, default=300

Maximum number of iterations, per seed point before the clustering operation terminates (for that seed point), if has not converged yet.

New in version 0.22.

Attributes:

**cluster_centers_**ndarray of shape (n_clusters, n_features)

Coordinates of cluster centers.

**labels_**ndarray of shape (n_samples,)

Labels of each point.

**n_iter_**int

Maximum number of iterations performed on each seed.

New in version 0.22.

**n_features_in_**int

Number of features seen during  [fit](https://scikit-learn.org/stable/glossary.html#term-fit).

New in version 0.24.

**feature_names_in_**ndarray of shape (`n_features_in_`,)

Names of features seen during  [fit](https://scikit-learn.org/stable/glossary.html#term-fit). Defined only when  `X`  has feature names that are all strings.

In [2]:
import numpy as np
from sklearn.datasets import make_blobs
from sklearn.cluster import MeanShift

x,y = make_blobs(n_samples=200, n_features=2, centers=3, cluster_std=0.7, random_state=0)

mean_sh = MeanShift(bandwidth=0.8)
cluster_label = mean_sh.fit_predict(x)
np.unique(cluster_label)

array([0, 1, 2, 3, 4, 5], dtype=int64)

In [3]:
mean_sh = MeanShift(bandwidth=1)
cluster_label = mean_sh.fit_predict(x)
np.unique(cluster_label)

array([0, 1, 2], dtype=int64)

In [4]:
from sklearn.cluster import estimate_bandwidth
est_bandwi = estimate_bandwidth(x)
print(est_bandwi)

1.8158484154517098


In [5]:
import pandas as pd

cluster_df = pd.DataFrame(data = x, columns=["ftr1", "ftr2"])
cluster_df["target"] = y 

best_bandwi = estimate_bandwidth(x)

mean_sh_best = MeanShift(bandwidth=best_bandwi)
cluster_label_best = mean_sh_best.fit_predict(x)
print(np.unique(cluster_label_best))

[0 1 2]
