# Machine Learning - Multi-Dimensional Scaling

## Contents
Multi Dimensional Scaling


Used for dimensionality reduction when the input data is not
linearly arranged or it is not known whether a linear relationship
exists or not.

MDS is a non-linear technique for embedding data in a
lower-dimensional space.

MDS (multidimensional scaling) is an algorithm that transforms
a dataset into another dataset, usually with lower dimensions,
keeping the same euclidean distances between the points.

It can be used to detect outliers in some multivariate distribution,

The main objective of MDS is to represent dissimilarities as
distances between points in a low dimensional space such that
the distances correspond as closely as possible to the
dissimilarities.

nonlinear method to project in lower dimensions by saving
pairwise distances

The metric MDS calculates distances between each pair of
points in the original high-dimensional space and then maps it to
lower-dimensional space while preserving those distances
between points as well as possible

In [1]:


import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import manifold #needed for multidimensional scaling (MDS) and t-SNE from sklearn import decomposition
from sklearn import cluster
from sklearn import preprocessing

In [3]:
df = pd.read_csv('../input/iris-flower-dataset/IRIS.csv') 
df.head()

In [4]:
df.columns

In [5]:
df = df.drop(columns=['species'])

In [6]:
df.shape

In [7]:
df.info()

In [8]:
df.describe()

In [9]:
 sns.displot(df, x="sepal_length", kind="kde", bw_adjust=2)

In [10]:
 sns.displot(df, x="sepal_width", kind="kde", bw_adjust=2)

In [11]:
sns.displot(df, x="petal_length", kind="kde", bw_adjust=2)

In [12]:
sns.displot(df, x="petal_width", kind="kde", bw_adjust=2)

## Scaling

In [13]:
sns.displot(df, x="petal_width", kind="kde", bw_adjust=2)



In [14]:
min_max_scaler = preprocessing.MinMaxScaler() 
data=df
df = min_max_scaler.fit_transform(df)

In [15]:
colors = np.array(['orange', 'blue', 'lime', 'blue', 'khaki', 'pink', 'green', 'purple'])
# points - a 2D array of (x,y) coordinates of data points
# labels - an array of numeric labels in the interval [0..k-1], one for each point
# centers - a 2D array of (x, y) coordinates of cluster centers
# title - title of the plot
def clustering_scatterplot(points, labels, centers, title): # plot the examples, i.e. the data points
    n_clusters = np.unique(labels).size 
    for i in range(n_clusters):
        h = plt.scatter(points[labels==i,0], points[labels==i,1],
                        c=colors[i%colors.size], label = 'cluster '+str(i))
    # plot the centers of the clusters
    if centers is not None:
        plt.scatter(centers[:,0], centers[:,1], c='r', marker='*', s=500)
    _ = plt.title(title) 
    _ = plt.legend()
    _ = plt.xlabel('x') 
    _ = plt.ylabel('y')

## K-Means Clustering

In [16]:
 clustered_df = cluster.KMeans(n_clusters=3, n_init=10, max_iter=300).fit(df)

In [17]:
# append the cluster centers to the dataset
data_and_centers = np.r_[df,clustered_df.cluster_centers_]

In [18]:
data_and_centers

## PCA

In [19]:
from sklearn import decomposition

In [20]:
XYcoordinates_pca = decomposition.PCA(n_components=2).fit_transform(data_and_centers)

In [21]:
XYcoordinates_pca

In [22]:
clustering_scatterplot(points=XYcoordinates_pca[:-3,:], labels=clustered_df.labels_,
centers=XYcoordinates_pca[-3:,:], title='PCA')

In [23]:
from sklearn import decomposition

## Multi-Dimensional Scaling

In [24]:
XYcoordinates = manifold.MDS(n_components=2).fit_transform(data_and_centers)

In [25]:
XYcoordinates

In [26]:
clustering_scatterplot(points=XYcoordinates[:-3,:], labels=clustered_df.labels_,
centers=XYcoordinates[-3:,:], title='MDS')

In [27]:
# project both th data and the k-Means cluster centers to a 2D space
XYcoordinates_tsne = manifold.TSNE(n_components=2).fit_transform(data_and_centers)


In [28]:
clustering_scatterplot(points=XYcoordinates_tsne[:-3,:], labels=clustered_df.labels_, centers=XYcoordinates_tsne[-3:,:], title='TSNE')