# Visualization for Unsupervised Algorithms

* Hierarchical clustering
* t-SNE : Creates a 2D map of a dataset

We can use linkage() function performs hierarchical clustering on an array of samples.
Use the linkage() function to obtain a hierarchical clustering of the grain samples,
and use dendrogram() to visualize the result.


In [None]:
# Perform the necessary imports
from scipy.cluster.hierarchy import linkage, dendrogram
import matplotlib.pyplot as plt

# Calculate the linkage: mergings
mergings = linkage(samples, method="complete")

# Plot the dendrogram, using varieties as labels
dendrogram(mergings,
           labels=varieties,
           leaf_rotation=90,
           leaf_font_size=6,
)
plt.show()

![Hierarchical Clustering](../images/hierarchical_clustering.svg)

## Another example using Companies and movements

In [None]:
# Import normalize
from sklearn.preprocessing import normalize

# Normalize the movements: normalized_movements
normalized_movements = normalize(movements)

# Calculate the linkage: mergings
mergings = linkage(normalized_movements, method="complete")

# Plot the dendrogram
dendrogram(mergings, labels=companies, leaf_rotation= 90, leaf_font_size=6)
plt.show()

![Companies vs Movements](../images/stocks.svg)

## Example using simple method

In [None]:
# Perform the necessary imports
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import linkage, dendrogram

# Calculate the linkage: mergings
mergings = linkage(samples, method='single')

# Plot the dendrogram
dendrogram(mergings, leaf_rotation=90, leaf_font_size=6, labels= country_names)
plt.show()

![Simple Method](../images/simple_method.svg)

## Extracting clustering labels

In the previous exercise, you saw that the intermediate clustering of the grain samples at height 6 has 3 clusters.
Now, use the fcluster() function to extract the cluster labels for this intermediate clustering, and compare the
labels with the grain varieties using a cross-tabulation.

The hierarchical clustering has already been performed and mergings is the result of the linkage() function.
The list varieties gives the variety of each grain sample.

In [None]:
# Perform the necessary imports
import pandas  as pd
from scipy.cluster.hierarchy import fcluster

# Use fcluster to extract labels: labels
labels = fcluster(mergings,6, criterion='distance')

# Create a DataFrame with labels and varieties as columns: df
df = pd.DataFrame({'labels': labels, 'varieties': varieties})

# Create crosstab: ct
ct = pd.crosstab(df["labels"], df["varieties"])

# Display ct
print(ct)

# t-SNE

> t-Distributed Stochastic Neighbor Embedding

* Maps samples to 2D or 3D
* Map approximately preserves nearness of samples
* Great for inspecting Datasets

Also note:

* Iris dataset has 4 measurements, so samples are 4-dimensional
* t-SNE maps samples to 2D space
* t-SNE didn't know that there were different species
* ... yet kept the species mostly separate

![](../images/iris_tsne1.png)

We can observe that:

* "versicolor" and "virginica" are harder to distinguish from one another.
* Consistent with k-means inertia plot: could argue for 2 clusters or for 3

In [None]:
import matplotlib.pyplot as plt
from sklearn.manifold import  TSNE
model = TSNE(learning_rate=100)
transformed = model.fit_transform(samples)
xs = transformed[:,0]
ys = transformed[:,1]

# species has the iris species index
plt.scatter(xs, ys, c=species)
plt.show()

## Another Details

t-SNE

* has a fit_transform() method
* Simultaneously fits the model and transforms the data
* Has no separated fit() or transform() methods
* Can't extend the map to include new data samples
* Must start over each time

## Learning rate

* Choose learning rate for the dataset
* Wrong choice: points bunch together
* Try values between 50 and 200

## Different every time

* tSNE features are different every time
* Piedmont wines, 3 runs, 3 different scatter plots!
    * However: The wine varieties (colors) habe same position relative to one another

## Example

In [None]:
# Import TSNE
from sklearn.manifold import TSNE

# Create a TSNE instance: model
model = TSNE(learning_rate= 200)

# Apply fit_transform to samples: tsne_features
tsne_features = model.fit_transform(samples)

# Select the 0th feature: xs
xs = tsne_features[:,0]

# Select the 1st feature: ys
ys = tsne_features[:,1]

# Scatter plot, coloring by variety_numbers
plt.scatter(xs,ys,c=variety_numbers)
plt.show()

![](../images/example_tsne1.png)

Sample with stocks

In [None]:
# Import TSNE
from sklearn.manifold import TSNE

# Create a TSNE instance: model
model = TSNE(learning_rate=50)

# Apply fit_transform to normalized_movements: tsne_features
tsne_features = model.fit_transform(normalized_movements)

# Select the 0th feature: xs
xs = tsne_features[:,0]

# Select the 1th feature: ys
ys = tsne_features[:,1]

# Scatter plot
plt.scatter(xs,ys, alpha=0.5)

# Annotate the points
for x, y, company in zip(xs, ys, companies):
    plt.annotate(company, (x, y), fontsize=5, alpha=0.75)
plt.show()

![](../images/cluster_stocks.png)