All instructions are provided for R. I am going to reproduce them in Python as best as I can.

# Preface

From the textbook, p. 416:
> Consider the `USArrests` data. We will now perform hierarchical clustering on the states.

In [23]:
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import AgglomerativeClustering


sns.set()
%matplotlib inline

In [24]:
usarrests = pd.read_csv(
                        'https://raw.githubusercontent.com'
                        '/dsnair/ISLR/master/data/csv/USArrests.csv'
                       ).set_index('State')
usarrests.head(3)

Unnamed: 0_level_0,Murder,Assault,UrbanPop,Rape
State,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Alabama,13.2,236,58,21.2
Alaska,10.0,263,48,44.5
Arizona,8.1,294,80,31.0


# (a)

From the textbook, p. 416:
> Using hierarchical clustering with complete linkage and Euclidean distance, cluster the states.

In [25]:
hclust = AgglomerativeClustering(n_clusters=3, linkage='complete')
hclust.fit(usarrests)

AgglomerativeClustering(affinity='euclidean', compute_full_tree='auto',
                        connectivity=None, distance_threshold=None,
                        linkage='complete', memory=None, n_clusters=3)

# (b)

From the textbook, p. 416:
> Cut the dendrogram at a height that results in three distinct clusters. Which states belong to which clusters?

In `sklearn` you have to define the number of clusters beforehand (`n_clusters=3` above).

In [30]:
clustering_not_scaled = pd.Series(hclust.labels_, index=usarrests.index)
clustering_not_scaled.sort_values()

State
Missouri          0
Wisconsin         0
Montana           0
Nebraska          0
New Hampshire     0
New Jersey        0
North Dakota      0
Ohio              0
Oklahoma          0
Oregon            0
Pennsylvania      0
Rhode Island      0
South Dakota      0
Utah              0
Vermont           0
Virginia          0
Washington        0
West Virginia     0
Minnesota         0
Massachusetts     0
Wyoming           0
Hawaii            0
Maine             0
Arkansas          0
Kentucky          0
Kansas            0
Iowa              0
Indiana           0
Connecticut       0
Delaware          0
Idaho             0
Alaska            1
Tennessee         1
South Carolina    1
Georgia           1
Alabama           1
North Carolina    1
Louisiana         1
Mississippi       1
Illinois          2
Florida           2
New York          2
New Mexico        2
Nevada            2
Texas             2
Colorado          2
California        2
Arizona           2
Maryland          2
Michigan      

# (c)

From the textbook, p. 416:
> Hierarchically cluster the states using complete linkage and Euclidean distance, after scaling the variables to have standard deviation one.

In [31]:
x = StandardScaler().fit_transform(usarrests)
hclust.fit(x)
clustering_scaled = pd.Series(hclust.labels_, index=usarrests.index)
clustering_scaled.sort_values()

State
Missouri          0
Wisconsin         0
Montana           0
Nebraska          0
New Hampshire     0
New Jersey        0
North Dakota      0
Ohio              0
Oklahoma          0
Oregon            0
Pennsylvania      0
Rhode Island      0
South Dakota      0
Utah              0
Vermont           0
Virginia          0
Washington        0
West Virginia     0
Minnesota         0
Massachusetts     0
Wyoming           0
Hawaii            0
Maine             0
Arkansas          0
Kentucky          0
Kansas            0
Iowa              0
Indiana           0
Connecticut       0
Delaware          0
Idaho             0
Alaska            1
Tennessee         1
South Carolina    1
Georgia           1
Alabama           1
North Carolina    1
Louisiana         1
Mississippi       1
Illinois          2
Florida           2
New York          2
New Mexico        2
Nevada            2
Texas             2
Colorado          2
California        2
Arizona           2
Maryland          2
Michigan      

# (d)

From the textbook, p. 416:
> What effect does scaling the variables have on the hierarchical clustering obtained? In your opinion, should the variables be scaled before the inter-observation dissimilarities are computed? Provide a justification for your answer.

In [32]:
(clustering_not_scaled == clustering_scaled).all()

True

In this case, scaling does not matter. I think that, in general, a feature matrix should be scaled before the inter-observation dissimilarities are computed. If the predictors have different scales, the ones with large absolute differences will dominate the ones with low, regardless of their relative differences. Relative differences should be important for determining how close or far apart any two points in the dataset.