------------------------
#### sklearn.datasets.make_blobs
------------------------
- `make_blobs` is a function provided by the scikit-learn library in Python, specifically in the sklearn.datasets module. 
- It is used to generate synthetic datasets for clustering and classification tasks. 
- This function is particularly helpful for testing and prototyping machine learning algorithms.

#### Purpose:
- The primary purpose of `make_blobs` is to generate synthetic datasets with clusters, making it easy to create simple datasets for practicing and testing machine learning algorithms, particularly clustering algorithms.

#### Parameters:
- `n_samples` (int): The total number of data points to generate.
- `n_features` (int, optional, default=2): The number of features for each data point.
- `centers` (int or array of shape [n_centers, n_features], optional, default=3): The number of clusters to generate or the fixed center locations.
- `cluster_std` (float or sequence of floats, optional, default=1.0): The standard deviation of each cluster. If it's a float, the same value is applied to all clusters. If it's an array, it specifies the standard deviation for each cluster.
- `center_box` (tuple, optional, default=(-10.0, 10.0)): The bounding box for the randomly generated cluster centers. By default, it is set between -10.0 and 10.0 for each feature.
- `shuffle` (boolean, optional, default=True): Whether to shuffle the samples. If set to False, the samples are arranged in clusters.
- `random_state` (int or RandomState, optional, default=None):

Controls the random seed for reproducibility.

In [51]:
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt

In [52]:
import numpy as np

import ipywidgets as widgets
from ipywidgets import interactive

In [53]:
# Function to create blobs and plot
def create_and_plot(n_samples, n_features, centers, cluster_std):
    X, y = make_blobs(n_samples   = n_samples, 
                      n_features  = n_features, 
                      centers     = centers, 
                      cluster_std = cluster_std, 
                      random_state= 42)
    
    # Generate a unique color for each cluster
    colors = [plt.cm.nipy_spectral(i / centers) for i in range(centers)]
    
    plt.figure(figsize=(12, 5))
    
     # Plot each cluster with its respective color
    for i in range(centers):
        plt.scatter(X[y == i, 0], X[y == i, 1], c=[colors[i]], edgecolors='k', label=f'Cluster {i + 1}')

    #plt.scatter(X[:, 0], X[:, 1], c='blue', edgecolors='k')
    plt.title("Generated Blobs")
    plt.xlabel("Feature 1")
    plt.ylabel("Feature 2")
    plt.show()

In [54]:
# Create sliders for each parameter
n_samples_slider   = widgets.IntSlider(value=100,   min=100, max=10000, step=25, description='n_samples')
n_features_slider  = widgets.IntSlider(value=2,     min=2, max=2, description='n_features')
centers_slider     = widgets.IntSlider(value=2,     min=2, max=10, description='centers')
cluster_std_slider = widgets.FloatSlider(value=1.0, min=0, max=10, step=0.25, description='cluster_std')

In [55]:
# Create interactive widget
interactive_plot = interactive(create_and_plot, 
                               n_samples  = n_samples_slider, 
                               n_features = n_features_slider,
                               centers    = centers_slider, 
                               cluster_std= cluster_std_slider)

In [56]:
# Display the interactive widget
output = interactive_plot.children[-1]
output.layout.height = '500px'
interactive_plot

interactive(children=(IntSlider(value=100, description='n_samples', max=10000, min=100, step=25), IntSlider(va…

In [37]:
# Function to create blobs and return the dataset
def generate_dataset(n_samples, n_features, centers, cluster_std):
    X, y = make_blobs(n_samples=n_samples, n_features=n_features, centers=centers, cluster_std=cluster_std, random_state=42)
    return X, y

In [38]:
# Create sliders for each parameter
n_samples_slider   = widgets.IntSlider(value=100, min=100, max=10000, step=25, description='n_samples')
n_features_slider  = widgets.IntSlider(value=2, min=1, max=10, description='n_features')
centers_slider     = widgets.IntSlider(value=2, min=2, max=10, description='centers')
cluster_std_slider = widgets.FloatSlider(value=1.0, min=0, max=10, step=0.25, description='cluster_std')

In [39]:
# Create interactive widget
interactive_dataset = interactive(generate_dataset, n_samples=n_samples_slider, n_features=n_features_slider,
                                   centers=centers_slider, cluster_std=cluster_std_slider)

In [40]:
# Display the interactive widget
output_dataset = interactive_dataset.children[-1]
output_dataset.layout.height = '500px'
data_widget    = widgets.Output()

In [46]:

# Numpy arrays to store X and y
X_array = np.array([])
y_array = np.array([])

In [47]:
# Function to update the data widget and store in numpy arrays
def update_data_widget(change):
    global X_array, y_array
    with data_widget:
        data_widget.clear_output()
        X, y = generate_dataset(n_samples_slider.value, n_features_slider.value,
                                centers_slider.value, cluster_std_slider.value)
        print("Generated Dataset:")
        print("X (Features):")
        print(X)
        print("\ny (Labels):")
        print(y)
        X_array, y_array = X, y

In [48]:
# Attach the update function to slider changes
n_samples_slider.observe(update_data_widget, 'value')
n_features_slider.observe(update_data_widget, 'value')
centers_slider.observe(update_data_widget, 'value')
cluster_std_slider.observe(update_data_widget, 'value')

In [49]:
# Display the widgets
widgets.VBox([data_widget, interactive_dataset])

VBox(children=(Output(outputs=({'name': 'stdout', 'text': 'Generated Dataset:\nX (Features):\n[[ 3.7688858  -6…

In [50]:
X_array.shape

(2575, 3)