# Description
In this programming assignment, you are required to implement the k-means algorithm and apply it to a real-life data set.

## Input
The provided input file (`places.txt`) consists of the locations of 300 places in the US. Each location is a two-dimensional point that represents the longitude and latitude of the place. For example, "-112.1,33.5" means the longitude of the place is -112.1, and the latitude is 33.5.

## Output
You are required to implement the k-means algorithm and use it to cluster the 300 locations into three clusters, such that the locations in the same cluster are geographically close to each other.

After reading the 300 locations in "places.txt" and applying the k-means algorithm (with k = 3), you are required to generate an output file named `"clusters.txt"`. The output file should contain exactly 300 lines, where each line represents the cluster label of each location. Every line should be in the format: location_id cluster_label.

An example snippet of the output "clusters.txt" file is provided below:  
  
0 1

1 0

2 1

3 2

4 0

In the above, the five lines denote the cluster ids of the first five locations in the input file, which means:

The first location belongs to cluster "1"

The second location belongs to cluster "0"

The third location belongs to cluster "1"

The fourth location belongs to cluster "2"

The fifth location belongs to cluster "0"


In [2]:
import pandas as pd
from sklearn.cluster import KMeans

In [3]:
data = pd.read_csv('./places.txt',names=['lon','lat'])

In [4]:
data.head()

Unnamed: 0,lon,lat
0,-112.070792,33.451625
1,-112.065542,33.449298
2,-112.073931,33.456491
3,-112.074866,33.470116
4,-80.52569,43.477099


In [7]:
# http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
clf = KMeans(n_clusters=3,n_jobs=-1)
data['place'] = clf.fit_predict(data)

In [8]:
data.to_csv('clusters.txt', sep=' ', columns=['place'], header=False)

In [9]:
data.head()

Unnamed: 0,lon,lat,place
0,-112.070792,33.451625,1
1,-112.065542,33.449298,1
2,-112.073931,33.456491,1
3,-112.074866,33.470116,1
4,-80.52569,43.477099,0


## Other Implementations
* http://stanford.edu/~cpiech/cs221/handouts/kmeans.html
* http://benalexkeen.com/k-means-clustering-in-python/
* https://gist.github.com/iandanforth/5862470
* https://github.com/stuntgoat/kmeans/blob/master/kmeans.py