Author: João Nuno Carvalho
Date: 2017
License: MIT Open Source License
Pre-requisites:
Install the free Anaconda for Python 3.6.
Procedure:
1º Start with a table of data in a excel worksheet.
The row will be what you want to cluster, in the end this program creates a new column with the cluster ID at each row, next to the name (second column).
2º Save the excel file as a *.csv file. (Coma Separated Values)
3º At the start menu select the Anaconda prompt and then go to the directory were you have your csv file and code file and start the Jupyter notebooks by making the command “jupyter notebook”. In the file list, double click on the code file to open it.
4º In the program, change the name of the input file, to your CSV file, and change the number of clusters that you want to generate.
5º Execute all the cells. It will generate a new CSV file, that terminates in “K_means”.
6º Open the file in Excel and apply a filter on the new column data to see the elements of the separate clusters.
Zoo Data Set (Artificial, 7 classes of animals)
UCI - Machine Learning Repository
See file Excel_table_clustering_code_using_K-Means_in_Python.ipynb