Skip to content

Commit

Permalink
Merge pull request #52 from pykale/refine
Browse files Browse the repository at this point in the history
Refine exercises
  • Loading branch information
Mdnaimulislam committed Mar 18, 2023
2 parents 481e184 + bba351a commit 9fd3120
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 13 deletions.
15 changes: 9 additions & 6 deletions content/09-pca-clustering/clustering.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -894,7 +894,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"**(b)** Using **hierarchical clustering** with **complete linkage** and **Euclidean distance**, cluster the **states**."
"**(b)** Using **agglomerative hierarchical clustering** with **complete linkage** and **Euclidean distance**, cluster the **states**."
]
},
{
Expand Down Expand Up @@ -977,14 +977,16 @@
"plt.axhline(y=150, c=\"k\", ls=\"dashed\")\n",
"plt.show()\n",
"\n",
"print(res1)"
"print(res1)\n",
"\n",
"# Cutting the dendrogram at height 150 results in three distinct clusters."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**(d)** **Hierarchically cluster** the states using **complete linkage** and **Euclidean distance**, after **scaling** the variables to have a **standard deviation** of one, and then **cut the dendrogram** at a height that results in three distinct clusters. Now, which states belong to which clusters?"
"**(d)** Perform **agglomerative hierarchical clustering** of the states using **complete linkage** and **Euclidean distance** after scaling the variables to have a standard deviation of one. Then, **cut the dendrogram** at a height that results in **six distinct clusters**. Which states belong to which clusters?"
]
},
{
Expand Down Expand Up @@ -1014,7 +1016,7 @@
"outputs": [],
"source": [
"hc_complete2 = linkage(scaled_x, \"complete\")\n",
"res2 = pd.DataFrame(cut_tree(hc_complete2, n_clusters=3), index=state_name)\n",
"res2 = pd.DataFrame(cut_tree(hc_complete2, n_clusters=6), index=state_name)\n",
"res2.index.name = \"states\"\n",
"res2.rename(columns={0: \"ID\"}, inplace=True)\n",
"\n",
Expand All @@ -1023,10 +1025,11 @@
"plt.xlabel(\"states\")\n",
"plt.ylabel(\"Ecludian distance\")\n",
"dendrogram(hc_complete2, labels=res2.index, leaf_rotation=90, leaf_font_size=8)\n",
"plt.axhline(y=4.45, c=\"k\", ls=\"dashed\")\n",
"plt.axhline(y=2.8, c=\"k\", ls=\"dashed\")\n",
"plt.show()\n",
"\n",
"print(res2)"
"print(res2)\n",
"# Cutting the dendrogram at height 2.8 results in six distinct clusters."
]
}
],
Expand Down
15 changes: 8 additions & 7 deletions content/09-pca-clustering/pca.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -608,7 +608,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"**3.** Now, run **PCA** from **sklearn** explaining at least $95\\%$ variance on the **standardised mouse protein dataset** and display the **top ten eigenvalues**. "
"**3.** Now, run **PCA** from **sklearn** explaining at least $95\\%$ variance on the **standardised mouse protein dataset** and display the **top ten eigenvalues**. **Hint:** Eigenvalue equals the explained variance specific to PCA."
]
},
{
Expand Down Expand Up @@ -637,19 +637,21 @@
},
"outputs": [],
"source": [
"from sklearn.decomposition import PCA\n",
"\n",
"pca = PCA(n_components=0.95)\n",
"pca.fit(X)\n",
"cov_matrix = np.dot(X.T, X) / len(X)\n",
"\n",
"for eigenvector in pca.components_[:10]:\n",
" print(np.dot(eigenvector.T, np.dot(cov_matrix, eigenvector)))"
"print(\n",
" \"The first 10 explained variance/eigenvalues are: \\n\", pca.explained_variance_[:10]\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**4.** Using the fitted PCA model in **Exercise 3**, find out the **proportion of explained variance (PVE)** and its **ratio** for the top 10 principal components. "
"**4.** Using the fitted PCA model in **Exercise 3**, find out the **explained variance ratio** for the top 10 principal components. "
]
},
{
Expand Down Expand Up @@ -678,7 +680,6 @@
},
"outputs": [],
"source": [
"print(\"The first 10 explained variance are: \\n\", pca.explained_variance_[:10])\n",
"print(\n",
" \"The first 10 explained variance ratio are: \\n\", pca.explained_variance_ratio_[:10]\n",
")"
Expand All @@ -688,7 +689,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"**5.** After observing the **PVE** from **Exercise 4** carefuly, find out how many **principal components** should we take to preserve $70\\%$ of variance of the data?"
"**5.** After observing the **explained varience ratio** from **Exercise 4** carefuly, find out how many **principal components** should we take to preserve $70\\%$ of variance of the data?"
]
},
{
Expand Down

0 comments on commit 9fd3120

Please sign in to comment.