Skip to content

sehl/Digit_image_clusters

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Class Cluster Analysis

Python code to extract digit 'types' from Kaggle/MNIST Digit Recognizer dataset. Training data is first sorted by its label, then each digit is run through a k-means clustering algorithm to extract different styles of handwriting.

Visualization Description

It occurred to me that everyone writes certain digits in different ways, for instance what I call the 'z' two versus the 'loopy' two:

'z' two 'loopy' two

Or the 'closed' four versus the 'open' four:

'closed' four 'open' four

So, using the sklearn KMeans module, I extracted four, ten, and twenty clusters for each digit in the labeled training set. Since the KMeans algorithm initializes randomly, running with different seeds (which I did not do for continuity purposes) will slightly alter which digit attributes are extracted in each cluster set, but I found it interesting that the 'one with a hat' didn't come out clearly until the 20-cluster set and that the 'seven with a mustache' is vaguely in two of the 10-cluster means, but is very clear in only one of the 20-cluster set. Also, some people write really slanted ones!

4-clusters per digit 10-clusters per digit 20-clusters per digit

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages