KNN

For Alphabet Classification for Letter Recognition Data Set from the UCI Machine Learning Repository

This dataset contains 20,000 examples. Divide the set so that the first 15,000 examples are for training and the remaining 5,000 for testing.

This project contains k-NN classification and the condensed 1-NN algorithm.

The format is [testY] = testknn(trainX, trainY, testX, k)

where trainX is a (nTrain *D) data matrix, testX is a (nTest * D) data matrix, trainY is a (nTrain * 1) label vector, and testY is a (nTest * 1) label vector, and k is the number of nearest neighbors for classification.

[condensedIdx] = condensedata(trainX, trainY)

where condensedIdx is a vector of indicies of the condensed training set.

****Algorithm Procedure folowed

Data is imported total 20000 rows
Data is divided into train and test 15000:5000
For KNN: Sample the training data. Call function for KNN. Calculate accuracy, time.
For CNN: Sample the training data. Condense training data.Call function for CNN. Change training according to the condensed result. Call KNN Calculate accuracy, time.

K-NN K-NN is nearest neighbourhood algorithm with various options for K.

Algorithm K-NN

Take the training dataset.
For each row in the test dataset compare(find distance) with all the data elements in the train data set.
Here euclidean distance is used.
Once all the distance is calculated find the top K distance using sorting. This means you are finding k nearest neighbours.
Calculate the neighbour that happens max times.
Thats your classification for the current test data.

Condensed KNN(CNN)

This algorithm has 3 parts:

Outlier removal: Simply removes the outlier data from the train data.
Creating the prototype: Works on the data formed after removing the outlier. For every element in that dataset, removes it and puts it one by one on the prototype data set as per the algorithm.
Updating the training set according to the Prototype: The prototype dataset is now used as the training set.

Algorithm CNN:

Take trainX, trainY.
Remove outliers from the data: For each data row in the training set do -check it will all the other data from within the training set. -If it is correctly classified, retain the particular row. -else discard it saying that it was an outlier.
Now work on the data obtained after removing the outlier. This is your training data for now.
Randomly pick one row from this training data.
Put it in Prototype dataset. New dataset.(This is going to be the condensed dataset)
For each data selected randomly: -Check it with all the elemeents in Prototype dataset. -IF the elements in the prototype classify the new data correctly -Then move on. -Else add this particular data in the prototype assuming that the previous data in prototype was not enough.
Once all the elements in the train set is completed, you will be left with condensed data in Prototype dataset.

%%%%%%%%%%% Confusion matrix %%%%%%%%%%% Experiments

Number of experiments: 2 (algorithms) * 5 (values of k) * 6(values of N) = 60 total experiments
Accuracy: -For KNN: --With every increase in K the accuracy was observed to decrease. --With every increase in subsample, the accuracy increases. -For CNN --With every increase in K the accuracy was observed to decrease. --With every increase in subsample, the accuracy increases.
Running time: -For KNN: --With every increase in K the time was observed to increases. --With every increase in subsample, the time increases. -For CNN --Overall time decreased for the KNN after the dataset is once condensed. --With every increase in subsample, the time increases. --Otherwise for diffrent K and same subsample the time varies as it is a random process. But it does not differ too much.

%%%%%%%%%%% Problems Faced

Since this was a high dimentional space, time taken is observed to be more. If the dataset increases, the time will be more.
One more problem was to keep track and use the training data several times to find the distance.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
condensedata.m		condensedata.m
main.m		main.m
testknn.m		testknn.m

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KNN

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

KNN

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages