Data Mining - A KMeans implementation from Scratch (no Sci-kit)

Getting Started

iris.arff: This is the test data file that the kmeans clustering runs on.
kmeans.py: This is the file that contains the main script to run the clustering on the iris dataset.
parsing.py: This is the python file used to read in the data from the arff file.
Assignment-3-Report.pdf: This is the file that contains our overall report and contribution sheet.

To run the clustering, execute the following command: python3 kmeans.py <k_value> <epsilon_value> <number_of_iterations>
- k_value is the number of clusters the program will produce.
- epsilon_value is the change in the sum of the distances from the cluster centers.
- number_of_iterations is the number of iterations the program will go through to find the best clustering.
To run finding and plotting the optimal k values of the dataset, uncomment line 184 in kmeans.py

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
__pycache__		__pycache__
README.md		README.md
iris.arff		iris.arff
kmeans.py		kmeans.py
parsing.py		parsing.py