Skip to content

Commit

Permalink
Typo fixes and other minor changes in multiclass KNN notebook.
Browse files Browse the repository at this point in the history
  • Loading branch information
iglesias committed Feb 12, 2014
1 parent 34b087f commit c07efa5
Showing 1 changed file with 10 additions and 10 deletions.
20 changes: 10 additions & 10 deletions doc/ipython-notebooks/multiclass/KNN.ipynb
Expand Up @@ -27,7 +27,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook details the *K-Nearest Neighbors* (KNN) algorithm. It is a very simple but effectiv algorithm for soling multi-class classification problems."
"This notebook details the *K-Nearest Neighbors* (KNN) algorithm. It is a very simple but effective algorithm for solving multi-class classification problems."
]
},
{
Expand All @@ -42,7 +42,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The training of a KNN model basically does nothing but memorizing all the training points and the associated labels, which is very cheap in computation but costly in storage. The prediction is implemented by finding the K nearest neighbors of the query point, and voting. Here K is a hyper-parameter for the algorithm. Smaller K gives the model low bias but high variance; while larger K gives low variance but high bias.\n",
"The training of a KNN model basically does nothing but memorizing all the training points and the associated labels, which is very cheap in computation but costly in storage. The prediction is implemented by finding the K nearest neighbors of the query point, and voting. Here K is a hyper-parameter for the algorithm. Smaller values for K give the model low bias but high variance; while larger values for K give low variance but high bias.\n",
"\n",
"In `SHOGUN`, you can use [CKNN](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CKNN.html) to perform KNN learning. To construct a KNN machine, you must choose the hyper-parameter K and a distance function. Usually, we simply use the standard [CEuclideanDistance](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CEuclideanDistance.html), but in general, any subclass of [CDistance](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CDistance.html) could be used. For demonstration, in this tutorial we select a random subset of 1000 samples from the USPS digit recognition dataset, and run 2-fold cross validation of KNN with varying K."
]
Expand Down Expand Up @@ -166,7 +166,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Lets plot a few missclassified examples - I guess we all agree that these are notably harder to detect."
"Let's plot a few missclassified examples - I guess we all agree that these are notably harder to detect."
]
},
{
Expand Down Expand Up @@ -225,7 +225,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"So k=3 seems to have been an optimal choice."
"So k=3 seems to have been the optimal choice."
]
},
{
Expand All @@ -240,7 +240,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Obviously applying KNN is very costly: For each prediction you have to compare the object against all training objects. While the implementation in `SHOGUN` will use all available CPU cores to parallelize this computation it might still be slow when you have big data sets. In `SHOGUN`, you can use *Cover Trees* to speed up the nearest neighbor searching process in KNN. Just call `set_use_covertree` on the KNN machine to enable or disable this feature. We also show the prediction time comparison with and without Cover Tree in this tutorial. So lets just have a comparison utilizing the data above:"
"Obviously applying KNN is very costly: for each prediction you have to compare the object against all training objects. While the implementation in `SHOGUN` will use all available CPU cores to parallelize this computation it might still be slow when you have big data sets. In `SHOGUN`, you can use *Cover Trees* to speed up the nearest neighbor searching process in KNN. Just call `set_use_covertree` on the KNN machine to enable or disable this feature. We also show the prediction time comparison with and without Cover Tree in this tutorial. So let's just have a comparison utilizing the data above:"
]
},
{
Expand Down Expand Up @@ -268,7 +268,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"So we can significantly speed it up. Lets do a more systematic comparison. For that a helper function is defined to run the evaluation for KNN:"
"So we can significantly speed it up. Let's do a more systematic comparison. For that a helper function is defined to run the evaluation for KNN:"
]
},
{
Expand Down Expand Up @@ -401,7 +401,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In contrast to KNN - multiclass support vector machines attempt to model the decision function separating each class from one another. They compare examples utilizing similarity measures (so called Kernels) instead of distances like KNN does. When applied, they are in Big-O notation computationally as expensive as KNN but involve another (costly) training step. They do not scale very well to cases with a huge number of classes but usually lead to favorable results when applied to small number of classes cases. So for reference let us compare how a standard multiclass SVM performs wrt. KNN on the mnist data set from above."
"In contrast to KNN - multiclass Support Vector Machines (SVMs) attempt to model the decision function separating each class from one another. They compare examples utilizing similarity measures (so called Kernels) instead of distances like KNN does. When applied, they are in Big-O notation computationally as expensive as KNN but involve another (costly) training step. They do not scale very well to cases with a huge number of classes but usually lead to favorable results when applied to small number of classes cases. So for reference let us compare how a standard multiclass SVM performs wrt. KNN on the mnist data set from above."
]
},
{
Expand Down Expand Up @@ -434,7 +434,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Lets apply the SVM to the same test data set to compare results:"
"Let's apply the SVM to the same test data set to compare results:"
]
},
{
Expand All @@ -455,7 +455,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Since the SVM performs way better on this task - lets apply it to all data we did not use in training."
"Since the SVM performs way better on this task - let's apply it to all data we did not use in training."
]
},
{
Expand Down Expand Up @@ -489,7 +489,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The misclassified examples are indeed much harder to to label even for human beings."
"The misclassified examples are indeed much harder to label even for human beings."
]
}
],
Expand Down

0 comments on commit c07efa5

Please sign in to comment.