## Minimizing Total Distance Traveled by all Guests to Gambill/Wykosky Wedding

Sherri and I decided to have our wedding ceremony and reception in Phoenixville, PA. I wondered where we should have held it in order to minimize the total distance traveled by all of our guests. It turns out we should have held it furhter West and further South. I'll show you how I came to that conclusion. 

### The first step was to read in the zip code for all guests that will attend. If I didn't know the address of a guest's 'plus one', I just assigned said guest's address to the plus one.

In [106]:
import requests
import pandas as pd
import numpy as np
import plotly.plotly as py
from plotly.graph_objs import *
from sklearn.cluster import KMeans

# read in data
zip_codes = pd.read_csv('zip-codes.csv',header=None,dtype={0:str})

# Display first 5 zip codes
zip_codes.head()

Unnamed: 0,0
0,19428
1,19056
2,18017
3,18017
4,19087


### Next, I queried the google maps api for the latitude, longitude and name of each zip code and stored each in a list.

In [107]:
url = 'https://maps.googleapis.com/maps/api/geocode/json?components=postal_code:'
api_key = 'AIzaSyABbZY9WgmXVVAwVLuMPvR97JY8V5eDOcI'
mapbox_access_token = 'pk.eyJ1IjoiY2hlbHNlYXBsb3RseSIsImEiOiJjaXFqeXVzdDkwMHFrZnRtOGtlMGtwcGs4In0.SLidkdBMEap9POJGIe1eGw'

lat_list = []
lon_list = []
name_list = []
for zip_code in zip_codes[0]:
    r = requests.get(url + zip_code + '&key=' + api_key).json()
    coords = r['results'][0]['geometry']['location']
    lat_list.append(r['results'][0]['geometry']['location']['lat'])
    lon_list.append(r['results'][0]['geometry']['location']['lng'])
    name_list.append(r['results'][0]['formatted_address'])  


### Next, I used a well known machine learning algorithm called K Means. K Means is able to find a point in the center of all the coordinates which would minimize the total distance between itself and all the guests' zipcodes. This point is called the centroid.

In [108]:
k_means = KMeans(n_clusters=1)

coord_array = np.array([coords for coords in zip(lat_list, lon_list)])
k_means.fit(coord_array)
centroid = k_means.cluster_centers_

cent_url = 'https://maps.googleapis.com/maps/api/geocode/json?latlng={},{}'.format(centroid[0][0],centroid[0][1])
r = requests.get(cent_url + '&key=' + api_key).json()
centroid_address = r['results'][0]['formatted_address']

print "The centroid is located at {} degrees latitude, {} degrees longitutde".format(centroid[0][0],centroid[0][1])
print "The street address of the centroid is {}".format(centroid_address)

The centroid is located at 40.0445381007 degrees latitude, -76.8842732224 degrees longitutde
The street address of the centroid is 2020 Conewago Rd, Dover, PA 17315, USA


### We should have held the wedding in Dover, PA. It would have minimized the distance for all. 

In [123]:
data = Data([
    Scattermapbox(
        lat=lat_list,
        lon=lon_list,
        mode='markers',
        marker=Marker(
            size=8,
            color='blue',
            opacity=0.6
        ),
        text=name_list,
        hoverinfo='text',
        showlegend=False
    ),
    Scattermapbox(
        lat=[centroid[0][0]],
        lon=[centroid[0][1]],
        mode='markers',
        marker=Marker(
            size=12,
            color='red',
            opacity=0.6
        ),
        text='Centroid: {}'.format(centroid_address),
        hoverinfo='text',
        showlegend=False
    ),
    Scattermapbox(
        lat=[40.1531839],
        lon=[-75.4946136],
        mode='markers',
        marker=Marker(
            size=12,
            color='cyan',
            opacity=0.6
        ),
        text='Wedding Location',
        hoverinfo='text',
        showlegend=False
    )
])
        
layout = Layout(
    title='Guest Locations in Blue, Centroid in Red, Wedding Location in Cyan',
    autosize=True,
    hovermode='closest',
    mapbox=dict(
        accesstoken=mapbox_access_token,
        bearing=0,
        center=dict(
            lat=38,
            lon=-94
        ),
        pitch=0,
        zoom=3,
        style='light'
    )
)

fig = dict(data=data, layout=layout)
py.iplot(fig, filename='Guest Locations in Blue, Centroid in Red, Wedding Location in Cyan')