<a href="https://colab.research.google.com/github/safwanak786/mlp/blob/main/week1/mlp_week_1_01_data_loading.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

The objective of this colab is to demonstrate `sklearn` dataset API.

Recall that it has three APIs:
1. Loaders (`load_*`) load small standard datasets bundled with `sklearn`.
2. Fetchers (`fetch_*`) fetch large datasets from the internet and loads them in memory.
3. Generators (`generate_*`) generate controlled synthetic datasets.

Loaders and fetchers return a `bunch` object and generators return a tuple of feature matrix and label vector (or matrix).

# Loaders

## Loading iris dataset

In [1]:
from sklearn.datasets import load_iris
data = load_iris()

This returns a `Bunch` object `data` which is a dictionary like object with the following attributes:
* `data`, which has the feature matrix.
* `target`, which is the label vector
* `feature_names` contain the names of the features.
* `target_names` contain the names of the classes.
* `DESCR` has the full description of dataset.
* `filename` has the path to the location of data.

In [2]:
type(data)

sklearn.utils._bunch.Bunch

We can access them one by one and examine their contents.  For example, we can access `feature_names` as follows:

In [3]:
data.feature_names

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

We can see the names of the features in this dataset.

Let's examine the names of the labels.

In [6]:
data.DESCR



In [4]:
data.target_names

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

There are three classes: `setosa`, `versicolor`, `virginica`.

The feature matrix can be accessed as follows: `data.data`.  Let's look at the first five examples in feature matrix.

In [7]:
data.data[:5]

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2]])

We can observe 4 features per example.

Let's examine the shape of the feature matrix.

In [8]:
data.data.shape

(150, 4)

There are 150 examples and each example has 4 features.

Finally, we will examine the label vector and its shape.

In [9]:
data.target

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

There are 50 examples each from three classes: 0, 1 and 2.

We can read additional documentation about `load_iris` in the following manner:

In [11]:
?load_iris

In this way, we can load and examine different datasets.

We can obtain feature matrix and label or target from `load_iris` and other loaders in general by setting `return_X_y` argument to `True`.

In [12]:
feature_matrix, label_vector = load_iris(return_X_y=True)
print ('Shape of feature matrix:', feature_matrix.shape)
print ('Shape of label vector:', label_vector.shape)

Shape of feature matrix: (150, 4)
Shape of label vector: (150,)


## Loading diabetes dataset

In [13]:
from sklearn.datasets import load_diabetes
data = load_diabetes()

Additional details about this loader can be accessed from the documentation.

In [28]:
?load_diabetes

### `load_diabetes`

**Step 2.** Load the dataset and obtain a `Bunch` object.

In [18]:
# Call the loader and obtain the `Bunch` object.
type(data)

sklearn.utils._bunch.Bunch

**Step 3.** Examine the bunch object.

Look at the description of the dataset.

In [19]:
data.feature_names

['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']

Find out the shape of the feature matrix.

In [20]:
# Write code for finding the shape of the feature matrix.
data.data.shape

(442, 10)

Look at the first five examples from the feature matrix.

In [21]:
# Look at the first five examples from the feature matrix.
data.data[:5]

array([[ 0.03807591,  0.05068012,  0.06169621,  0.02187239, -0.0442235 ,
        -0.03482076, -0.04340085, -0.00259226,  0.01990749, -0.01764613],
       [-0.00188202, -0.04464164, -0.05147406, -0.02632753, -0.00844872,
        -0.01916334,  0.07441156, -0.03949338, -0.06833155, -0.09220405],
       [ 0.08529891,  0.05068012,  0.04445121, -0.00567042, -0.04559945,
        -0.03419447, -0.03235593, -0.00259226,  0.00286131, -0.02593034],
       [-0.08906294, -0.04464164, -0.01159501, -0.03665608,  0.01219057,
         0.02499059, -0.03603757,  0.03430886,  0.02268774, -0.00936191],
       [ 0.00538306, -0.04464164, -0.03638469,  0.02187239,  0.00393485,
         0.01559614,  0.00814208, -0.00259226, -0.03198764, -0.04664087]])

Find out the shape of the label matrix.

In [22]:
# Write code to find shape of label matrix.
data.target.shape

(442,)

Look at the labels of the first five examples.

In [23]:
# Look at the labels of the first five examples.
data.target[:5]

array([151.,  75., 141., 206., 135.])

Find out the names of the features.

In [25]:
# Get the names of the features.
data.feature_names

['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']

Find names of class labels.

In [31]:
# Find names of class labels.
data.target_names

AttributeError: ignored

## Loading digits dataset

In [32]:
from sklearn.datasets import load_digits
?load_digits

In [33]:
data = load_digits()

### `load_digits`

**Step 2.** Load the dataset and obtain a `Bunch` object.

In [34]:
# Call the loader and obtain the `Bunch` object.
type(data)

sklearn.utils._bunch.Bunch

**Step 3.** Examine the bunch object.

Look at the description of the dataset.

In [35]:
data.DESCR

".. _digits_dataset:\n\nOptical recognition of handwritten digits dataset\n--------------------------------------------------\n\n**Data Set Characteristics:**\n\n    :Number of Instances: 1797\n    :Number of Attributes: 64\n    :Attribute Information: 8x8 image of integer pixels in the range 0..16.\n    :Missing Attribute Values: None\n    :Creator: E. Alpaydin (alpaydin '@' boun.edu.tr)\n    :Date: July; 1998\n\nThis is a copy of the test set of the UCI ML hand-written digits datasets\nhttps://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits\n\nThe data set contains images of hand-written digits: 10 classes where\neach class refers to a digit.\n\nPreprocessing programs made available by NIST were used to extract\nnormalized bitmaps of handwritten digits from a preprinted form. From a\ntotal of 43 people, 30 contributed to the training set and different 13\nto the test set. 32x32 bitmaps are divided into nonoverlapping blocks of\n4x4 and the number of on pixel

Find out the shape of the feature matrix.

In [None]:
# Write code for finding the shape of the feature matrix.

Look at the first five examples from the feature matrix.

In [None]:
# Look at the first five examples from the feature matrix.

Find out the shape of the label matrix.

In [None]:
# Write code to find shape of label matrix.

Look at the labels of the first five examples.

In [None]:
# Look at the labels of the first five examples.

Find out the names of the features.

In [None]:
# Get the names of the features.

Find names of class labels.

In [None]:
# Find names of class labels.

## Exercise

Experiment with other dataset loaders e.g. `load_wine`, `load_breast_cancer` and `load_linnerud`.

### `load_wine`

**Step 1.** Import the loader.

In [None]:
# Write your code here.
from sklearn.datasets import load_wine

**Step 1a.** In case, you want to know more about the loader, access its documentation by using `?<loader_name>' command.

In [None]:
# Access the documentation.
?load_wine

**Step 2.** Load the dataset and obtain a `Bunch` object.

In [17]:
# Call the loader and obtain the `Bunch` object.
data = load_wine()

NameError: ignored

**Step 3.** Examine the bunch object.

Look at the description of the dataset.

In [None]:
data.DESCR



Find out the shape of the feature matrix.

In [None]:
# Write code for finding the shape of the feature matrix.
data.data.shape

(178, 13)

Look at the first five examples from the feature matrix.

In [None]:
# Look at the first five examples from the feature matrix.
data.data[:5]

array([[1.423e+01, 1.710e+00, 2.430e+00, 1.560e+01, 1.270e+02, 2.800e+00,
        3.060e+00, 2.800e-01, 2.290e+00, 5.640e+00, 1.040e+00, 3.920e+00,
        1.065e+03],
       [1.320e+01, 1.780e+00, 2.140e+00, 1.120e+01, 1.000e+02, 2.650e+00,
        2.760e+00, 2.600e-01, 1.280e+00, 4.380e+00, 1.050e+00, 3.400e+00,
        1.050e+03],
       [1.316e+01, 2.360e+00, 2.670e+00, 1.860e+01, 1.010e+02, 2.800e+00,
        3.240e+00, 3.000e-01, 2.810e+00, 5.680e+00, 1.030e+00, 3.170e+00,
        1.185e+03],
       [1.437e+01, 1.950e+00, 2.500e+00, 1.680e+01, 1.130e+02, 3.850e+00,
        3.490e+00, 2.400e-01, 2.180e+00, 7.800e+00, 8.600e-01, 3.450e+00,
        1.480e+03],
       [1.324e+01, 2.590e+00, 2.870e+00, 2.100e+01, 1.180e+02, 2.800e+00,
        2.690e+00, 3.900e-01, 1.820e+00, 4.320e+00, 1.040e+00, 2.930e+00,
        7.350e+02]])

Find out the shape of the label matrix.

In [None]:
# Write code to find shape of label matrix.
data.target.shape

(178,)

Look at the labels of the first five examples.

In [None]:
# Look at the labels of the first five examples.
data.target[:5]

array([0, 0, 0, 0, 0])

Find out the names of the features.

In [None]:
# Get the names of the features.
data.feature_names

['alcohol',
 'malic_acid',
 'ash',
 'alcalinity_of_ash',
 'magnesium',
 'total_phenols',
 'flavanoids',
 'nonflavanoid_phenols',
 'proanthocyanins',
 'color_intensity',
 'hue',
 'od280/od315_of_diluted_wines',
 'proline']

Find names of class labels.

In [None]:
# Find names of class labels.
data.target_names

array(['class_0', 'class_1', 'class_2'], dtype='<U7')

### `load_breast_cancer`

**Step 1.** Import the loader.

In [None]:
# Write your code here.

**Step 1a.** In case, you want to know more about the loader, access its documentation by using `?<loader_name>' command.

In [None]:
# Access the documentation.

**Step 2.** Load the dataset and obtain a `Bunch` object.

In [None]:
# Call the loader and obtain the `Bunch` object.

**Step 3.** Examine the bunch object.

Look at the description of the dataset.

Find out the shape of the feature matrix.

In [None]:
# Write code for finding the shape of the feature matrix.

Look at the first five examples from the feature matrix.

In [None]:
# Look at the first five examples from the feature matrix.

Find out the shape of the label matrix.

In [None]:
# Write code to find shape of label matrix.

Look at the labels of the first five examples.

In [None]:
# Look at the labels of the first five examples.

Find out the names of the features.

In [None]:
# Get the names of the features.

Find names of class labels.

In [None]:
# Find names of class labels.

### `load_linnerud`

**Step 1.** Import the loader.

In [None]:
# Write your code here.

**Step 1a.** In case, you want to know more about the loader, access its documentation by using `?<loader_name>' command.

In [None]:
# Access the documentation.

**Step 2.** Load the dataset and obtain a `Bunch` object.

In [None]:
# Call the loader and obtain the `Bunch` object.

**Step 3.** Examine the bunch object.

Look at the description of the dataset.

Find out the shape of the feature matrix.

In [None]:
# Write code for finding the shape of the feature matrix.

Look at the first five examples from the feature matrix.

In [None]:
# Look at the first five examples from the feature matrix.

Find out the shape of the label matrix.

In [None]:
# Write code to find shape of label matrix.

Look at the labels of the first five examples.

In [None]:
# Look at the labels of the first five examples.

Find out the names of the features.

In [None]:
# Get the names of the features.

Find names of class labels.

In [None]:
# Find names of class labels.

# Fetchers

## `fetch_california_housing`

**Step 1**: Import the library and access the documentation.

In [None]:
from sklearn.datasets import fetch_california_housing
?fetch_california_housing

Note that the `fetch_`* also returns a `Bunch` object just like loaders.

We can examine various attributes of this dataset on the lines of datasets in loaders.

**Step 2.** Load the dataset and obtain a `Bunch` object.

In [None]:
# Call the loader and obtain the `Bunch` object.
housing_data = fetch_california_housing()

**Step 3.** Examine the bunch object.

Look at the description of the dataset.

In [None]:
housing_data.DESCR

'.. _california_housing_dataset:\n\nCalifornia Housing dataset\n--------------------------\n\n**Data Set Characteristics:**\n\n    :Number of Instances: 20640\n\n    :Number of Attributes: 8 numeric, predictive attributes and the target\n\n    :Attribute Information:\n        - MedInc        median income in block group\n        - HouseAge      median house age in block group\n        - AveRooms      average number of rooms per household\n        - AveBedrms     average number of bedrooms per household\n        - Population    block group population\n        - AveOccup      average number of household members\n        - Latitude      block group latitude\n        - Longitude     block group longitude\n\n    :Missing Attribute Values: None\n\nThis dataset was obtained from the StatLib repository.\nhttps://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html\n\nThe target variable is the median house value for California districts,\nexpressed in hundreds of thousands of dollars ($100,000

Find out the shape of the feature matrix.

In [None]:
# Write code for finding the shape of the feature matrix.
housing_data.data.shape

(20640, 8)

Look at the first five examples from the feature matrix.

In [None]:
# Look at the first five examples from the feature matrix.
housing_data.data[:5]

array([[ 8.32520000e+00,  4.10000000e+01,  6.98412698e+00,
         1.02380952e+00,  3.22000000e+02,  2.55555556e+00,
         3.78800000e+01, -1.22230000e+02],
       [ 8.30140000e+00,  2.10000000e+01,  6.23813708e+00,
         9.71880492e-01,  2.40100000e+03,  2.10984183e+00,
         3.78600000e+01, -1.22220000e+02],
       [ 7.25740000e+00,  5.20000000e+01,  8.28813559e+00,
         1.07344633e+00,  4.96000000e+02,  2.80225989e+00,
         3.78500000e+01, -1.22240000e+02],
       [ 5.64310000e+00,  5.20000000e+01,  5.81735160e+00,
         1.07305936e+00,  5.58000000e+02,  2.54794521e+00,
         3.78500000e+01, -1.22250000e+02],
       [ 3.84620000e+00,  5.20000000e+01,  6.28185328e+00,
         1.08108108e+00,  5.65000000e+02,  2.18146718e+00,
         3.78500000e+01, -1.22250000e+02]])

Find out the shape of the label matrix.

In [None]:
# Write code to find shape of label matrix.
housing_data.target.shape

(20640,)

Look at the labels of the first five examples.

In [None]:
# Look at the labels of the first five examples.
housing_data.target[:5]

array([4.526, 3.585, 3.521, 3.413, 3.422])

Note that the labels seem to be real numbers.

Find out the names of the features.

In [None]:
# Get the names of the features.
housing_data.feature_names

['MedInc',
 'HouseAge',
 'AveRooms',
 'AveBedrms',
 'Population',
 'AveOccup',
 'Latitude',
 'Longitude']

Find names of class labels.

In [None]:
# Find names of class labels.
housing_data.target_names

['MedHouseVal']

## `fetch_openml`

[openml.org](openml.org) is a public repository for machine learning data and experiments, that allows everybody to upload open datasets.

Import the library and access the documentation.

In [None]:
from sklearn.datasets import fetch_openml
?fetch_openml

Note that this is an experimental API and is likely to change in the future releases.

> We use this API for loading MNIST dataset.

In [None]:
X, y = fetch_openml('mnist_784', version=1, return_X_y=True)
print ("Feature matrix shape:", X.shape)
print ("Label shape:", y.shape)

Feature matrix shape: (70000, 784)
Label shape: (70000,)


## Exercise

### `fetch_20newsgroups`

**Step 1.** Import the loader.

In [None]:
# Write your code here.

**Step 1a.** In case, you want to know more about the loader, access its documentation by using `?<loader_name>' command.

In [None]:
# Access the documentation.

**Step 2.** Load the dataset and obtain a `Bunch` object.

In [None]:
# Call the loader and obtain the `Bunch` object.

**Step 3.** Examine the bunch object.

Look at the description of the dataset.

Find out the shape of the feature matrix.

In [None]:
# Write code for finding the shape of the feature matrix.

Look at the first five examples from the feature matrix.

In [None]:
# Look at the first five examples from the feature matrix.

Find out the shape of the label matrix.

In [None]:
# Write code to find shape of label matrix.

Look at the labels of the first five examples.

In [None]:
# Look at the labels of the first five examples.

Find out the names of the features.

In [None]:
# Get the names of the features.

Find names of class labels.

In [None]:
# Find names of class labels.

### `fetch_kddcup99`

**Step 1.** Import the loader.

In [None]:
# Write your code here.

**Step 1a.** In case, you want to know more about the loader, access its documentation by using `?<loader_name>' command.

In [None]:
# Access the documentation.

**Step 2.** Load the dataset and obtain a `Bunch` object.

In [None]:
# Call the loader and obtain the `Bunch` object.

**Step 3.** Examine the bunch object.

Look at the description of the dataset.

Find out the shape of the feature matrix.

In [None]:
# Write code for finding the shape of the feature matrix.

Look at the first five examples from the feature matrix.

In [None]:
# Look at the first five examples from the feature matrix.

Find out the shape of the label matrix.

In [None]:
# Write code to find shape of label matrix.

Look at the labels of the first five examples.

In [None]:
# Look at the labels of the first five examples.

Find out the names of the features.

In [None]:
# Get the names of the features.

Find names of class labels.

In [None]:
# Find names of class labels.

# Generators

### `make_regression`

In [None]:
from sklearn.datasets import make_regression
?make_regression

#### Example 1

Let's generate 100 samples with 5 features for a single label regression problem.

In [None]:
X, y = make_regression(n_samples=100, n_features=5, n_targets=1, shuffle=True, random_state=42)

It's a good practice to set seed so that we get to see repeatability in the experimentation.

Let's look at the shapes of feature matrix and label vector.

In [None]:
X.shape

(100, 5)

In [None]:
y.shape

(100,)

#### Example 2

Let's generate 100 samples with 5 features for multiple regression problem with 5 outputs.

In [None]:
X, y = make_regression(n_samples=100, n_features=5, n_targets=5, shuffle=True, random_state=42)

Let's look at the shapes of feature matrix and label vector.

In [None]:
X.shape

(100, 5)

In [None]:
y.shape

(100, 5)

Since we generated multi-output target with 5 outputs, the output has shape `(100, 5)`.

## `make_classification`

Generate a random $n$-class classification problem set up.

In [None]:
from sklearn.datasets import make_classification
?make_classification

Let's generate a binary classification problem with 10 features and 100 samples.

In [None]:
X, y = make_classification(n_samples=100, n_features=10, n_classes=2, n_clusters_per_class=1, random_state=42)

Let's examine the shapes of feature matrix and label vector.

In [None]:
X.shape

(100, 10)

In [None]:
y.shape

(100,)

Look at a few examples and their labels.

In [None]:
X[:5]

array([[ 0.11422765, -1.71016839, -0.06822216, -0.14928517,  0.30780177,
         0.15030176, -0.05694562, -0.22595246, -0.36361221, -0.13818757],
       [ 0.70775194, -1.57022472, -0.23503183, -0.63604713,  0.62180996,
        -0.56246678,  0.97255445, -0.77719676,  0.63240774, -0.47809669],
       [ 0.63859246,  0.04739867,  0.33273433,  1.1046981 , -0.65183611,
        -1.66152006, -1.2110162 ,  1.09821151, -0.0660798 ,  0.68024225],
       [-0.23894805, -0.97755524,  0.0379061 ,  0.19896733,  0.50091719,
        -0.90756366,  0.75539123,  0.12437227, -0.57677133,  0.07871283],
       [-0.59239392, -0.05023811,  0.17573204, -1.43949185,  0.27045683,
        -0.86399077, -0.83095012,  0.60046915,  0.04852163,  0.32557953]])

In [None]:
y[:5]

array([1, 1, 1, 1, 0])

Let's generate a three class classification problem with 100 samples and 10 features.

In [None]:
X, y = make_classification(n_samples=100, n_features=10, n_classes=3, n_clusters_per_class=1, random_state=42)

Let's examine shapes of feature matrix and labels.

In [None]:
X.shape

(100, 10)

In [None]:
y.shape

(100,)

Let's look at a few examples - features and labels.

In [None]:
X[:5]

array([[-0.58351628, -1.73833907, -1.37298251, -1.77311485,  0.45918008,
         0.83392215, -1.66096093,  0.20768769, -0.07016571,  0.42961822],
       [-1.0044394 , -1.43862044,  0.47335819, -0.21188291,  0.0125924 ,
         0.22409248, -0.77300978,  0.49799829,  0.0976761 ,  0.02451017],
       [ 0.07740833,  0.19896733,  0.12437227,  0.17738132, -0.97755524,
         0.50091719,  0.75138712,  0.54336019,  0.09933231, -1.66940528],
       [-0.91759569, -0.9609536 ,  1.07746664,  0.4522739 , -0.32138584,
        -0.8254972 , -0.56372455,  0.24368721,  0.41293145, -0.8222204 ],
       [-0.96222828, -0.96090774,  1.21530116,  0.55980482, -1.24778318,
        -0.25256815, -1.43014138,  0.13074058,  1.6324113 , -0.44004449]])

In [None]:
y[:5]

array([2, 0, 1, 0, 0])

## `make_multilabel_classification`

This function helps us generating a random multi-label classification problem.

In [None]:
from sklearn.datasets import make_multilabel_classification
?make_multilabel_classification

Let's generate a multilabel classification problem with 100 samples, 10 features, 5 labels and on an average 2 labels per example.

In [None]:
X, y = make_multilabel_classification(n_samples=100, n_features=20, n_classes=5, n_labels=2)

First of all, let's examine shapes of feature matrix and label vector.

In [None]:
X.shape

(100, 20)

In [None]:
y.shape

(100, 5)

Let's examine a few rows of feature matrix and label matrix.

In [None]:
X[:5]

array([[ 1.,  4.,  2.,  0.,  0.,  2.,  2.,  3.,  4.,  3.,  5.,  0.,  2.,
         5.,  3.,  1.,  1.,  0.,  2.,  7.],
       [ 4.,  1.,  2.,  0.,  3.,  1.,  2.,  2.,  2.,  2.,  1.,  1.,  1.,
         3.,  1.,  2.,  4.,  2.,  2.,  2.],
       [ 0.,  1.,  4.,  0.,  2.,  1.,  4.,  0.,  6.,  2.,  4.,  2.,  1.,
         0.,  5.,  0.,  5.,  5.,  1.,  7.],
       [ 5.,  3.,  3.,  0.,  0.,  2.,  6.,  2., 10.,  0.,  2.,  2.,  2.,
         0.,  4.,  0.,  5.,  5.,  5.,  3.],
       [ 4.,  3.,  5.,  0.,  4.,  2.,  6.,  1.,  2.,  2.,  3.,  1.,  4.,
         1.,  5.,  4.,  3.,  4.,  2.,  1.]])

In [None]:
y[:5]

array([[1, 0, 1, 1, 0],
       [1, 0, 1, 1, 0],
       [1, 0, 0, 1, 0],
       [0, 0, 0, 1, 0],
       [0, 1, 0, 1, 1]])

## `make_blobs`

`make_blobs` enables us to generate random data for clustering.

In [None]:
from sklearn.datasets import make_blobs
?make_blobs

Let's generate a random dataset of 10 samples with 2 features each for clustering.

In [None]:
X, y = make_blobs(n_samples=10, centers=3, n_features=2, random_state=42)
print ("Feature matrix shape:", X.shape)
print ("Label shape:", y.shape)

Feature matrix shape: (10, 2)
Label shape: (10,)


We can find the cluster membership of each point in `y`.

In [None]:
y

array([2, 2, 1, 2, 0, 0, 0, 1, 1, 0])