Skip to content

Commit

Permalink
minor changes in PCA in KMeans notebook
Browse files Browse the repository at this point in the history
  • Loading branch information
mazumdarparijat committed Feb 18, 2014
1 parent 9653908 commit d2b1a47
Showing 1 changed file with 7 additions and 4 deletions.
11 changes: 7 additions & 4 deletions doc/ipython-notebooks/clustering/KMeans.ipynb
Expand Up @@ -774,7 +774,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"KMeans is highly affected by the <i>curse of dimensionality</i>. So, dimension reduction becomes an important preprocessing step. Shogun offers a variety of dimension reduction techniques to choose from. Since our data is not very high dimensional, PCA is a good choice for dimension reduction. We have already seen the accuracy of KMeans when all four dimensions are used. In the following exercise we shall see how the accuracy varies as one chooses lower dimensions to represent data. "
"KMeans is highly affected by the <i>curse of dimensionality</i>. So, dimension reduction becomes an important preprocessing step. Shogun offers a variety of [dimension reduction techniques](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CDimensionReductionPreprocessor.html) to choose from. Since our data is not very high dimensional, PCA is a good choice for dimension reduction. We have already seen the accuracy of KMeans when all four dimensions are used. In the following exercise we shall see how the accuracy varies as one chooses lower dimensions to represent data. "
]
},
{
Expand All @@ -797,6 +797,7 @@
"collapsed": false,
"input": [
"from numpy import dot\n",
"\n",
"def apply_pca_to_data(target_dims):\n",
" train_features = RealFeatures(obsmatrix)\n",
" submean = PruneVarSubMean(False)\n",
Expand All @@ -808,6 +809,7 @@
" pca_transform = preprocessor.get_transformation_matrix()\n",
" new_features = dot(pca_transform.T, train_features)\n",
" return new_features\n",
"\n",
"oneD_matrix = apply_pca_to_data(1)"
],
"language": "python",
Expand Down Expand Up @@ -981,7 +983,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Again, we follow the same steps, but skip plotting data (because plotting 3-D is not possible)."
"Again, we follow the same steps, but skip plotting data."
]
},
{
Expand Down Expand Up @@ -1023,7 +1025,7 @@
"metadata": {},
"source": [
"STEP 3: Get accuracy of results. In this step, the 'difference' plot positions data points based petal length \n",
" and petal width in the original data. This will enable us to visually campare these results with that of KMeans applied\n",
" and petal width in the original data. This will enable us to visually compare these results with that of KMeans applied\n",
" to 4-Dimensional data (ie. our first result on Iris dataset)"
]
},
Expand Down Expand Up @@ -1060,6 +1062,7 @@
"input": [
"from scipy.interpolate import interp1d\n",
"from numpy import linspace\n",
"\n",
"x = array([1, 2, 3, 4])\n",
"y = array([accuracy_1d, accuracy_2d, accuracy_3d, accuracy_4d])\n",
"f = interp1d(x, y)\n",
Expand All @@ -1079,7 +1082,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The above plot is not very intuitive. The accuracy obtained by using just one latent dimension is much more than that obtained by taking all four features features. This shows the importance of PCA. Not only does it reduce the complexity of running KMeans, it also enhances results."
"The above plot is not very intuitive theoretically. The accuracy obtained by using just one latent dimension is much more than that obtained by taking all four features features. A plausible explanation could be that the mixing of data points from Iris Versicolour and Iris Virginica is least along the single principal dimension chosen by PCA. Additional dimensions only aggrevate this inter-mixing, thus resulting in poorer clustering accuracy. While there could be other explanations to the observed results, our small experiment has successfully highlighted the importance of PCA. Not only does it reduce the complexity of running KMeans, it also enhances results at times."
]
},
{
Expand Down

0 comments on commit d2b1a47

Please sign in to comment.