# Subway Station Cleaning & Adding Starbucks

The purpose of this notebook is to create a dataframe that we can use which will have both Subway Stations, the Subway Station coordinates, and the three closest Starbuck locations to these Subway Stations. 

In [51]:
import numpy as np
import pandas as pd 
import requests
import time
import json

In [54]:
def read_json(json_file):
    with open(json_file) as f:
        return json.load(f)
config = read_json('key.json')

We created a funtion here to help us hide our API key from anyone who may try to use it.

In [3]:
starbucks = pd.read_csv('Data/Starbucks.csv')


In [4]:
station = pd.read_csv('Data/StationEntrances.csv')

In [5]:
station.head()

Unnamed: 0,Division,Line,Station_Name,Station_Latitude,Station_Longitude,Route_1,Route_2,Route_3,Route_4,Route_5,...,Staffing,Staff_Hours,ADA,ADA_Notes,Free_Crossover,North_South_Street,East_West_Street,Corner,Latitude,Longitude
0,BMT,Astoria,Ditmars Blvd,40.775036,-73.912034,N,Q,,,,...,FULL,,False,,True,31st St,23rd Ave,NW,40.775149,-73.912074
1,BMT,Astoria,Ditmars Blvd,40.775036,-73.912034,N,Q,,,,...,FULL,,False,,True,31st St,23rd Ave,NE,40.77481,-73.912151
2,BMT,Astoria,Ditmars Blvd,40.775036,-73.912034,N,Q,,,,...,FULL,,False,,True,31st St,23rd Ave,NE,40.775025,-73.911891
3,BMT,Astoria,Ditmars Blvd,40.775036,-73.912034,N,Q,,,,...,FULL,,False,,True,31st St,23rd Ave,NW,40.774938,-73.912337
4,BMT,Astoria,Astoria Blvd-Hoyt Av,40.770258,-73.917843,N,Q,,,,...,FULL,,False,,True,31st St,Hoyt Ave South,SW,40.770313,-73.917978


Will read in starbucks and station data set with locations

In [6]:
starbucks = starbucks.loc[(starbucks['State/Province'] == 'NY') & (starbucks['City'].isin(['Flushing','New York','Astoria','Manhattan', 'Kew Gardens','Forest Hills','Jamaica']))]

Will filter out all locations that are around the MTA 'E' Subway line. We had to perfrom a guess and check on all towns that surrounded the various subway stops and the following cities came back with starbucks located in them.

In [7]:
starbucks.head()

Unnamed: 0,Brand,Store Number,Store Name,Ownership Type,Street Address,City,State/Province,Country,Postcode,Phone Number,Timezone,Longitude,Latitude
20878,Starbucks,14840-129334,31-01 Broadway,Company Owned,31-01 Broadway,Astoria,NY,US,111062648,718-777-1305,GMT-05:00 America/New_York,-73.93,40.76
20879,Starbucks,22638-223164,Astoria Blvd & 31st St,Company Owned,30-18 Astoria Blvd,Astoria,NY,US,11102,7182781518,GMT-05:00 America/New_York,-73.92,40.77
20880,Starbucks,9663-97893,35th Ave and 37th St,Company Owned,3711 35th Avenue,Astoria,NY,US,11106,718-706-0464,GMT-05:00 America/New_York,-73.92,40.76
20881,Starbucks,7567-18640,31-44 Steinway Street,Company Owned,31-44 Steinway Street,Astoria,NY,US,111033911,718-274-5700,GMT-05:00 America/New_York,-73.92,40.76
20882,Starbucks,7555-14690,31St St & Ditmars Blvd,Company Owned,22-04 31st Street,Astoria,NY,US,111052714,718-626-6004,GMT-05:00 America/New_York,-73.91,40.78


In [8]:
starbucks.shape

(255, 13)

In [55]:
starbucks.isnull().sum()

Brand             0
Store Number      0
Store Name        0
Ownership Type    0
Street Address    0
City              0
State/Province    0
Country           0
Postcode          0
Phone Number      2
Timezone          0
new_street        0
combined          0
latitude          0
longitude         0
dtype: int64

Although we note that there are two null values within the phone number column, we note that that data is no noteeded adn thus would not be worth dropping the columns

In [9]:
len(starbucks['Store Number'].unique())

255

We checked len of each store number to confirm that there were no duplicates with store number as its the unique identifier column

For us to search the google API for the starbucks coordinates since the once in the data provided werent exact enough, we needed to merge the address in one line item for it to be read. So we removde all spaces in the address and created a new column for us to search on

In [10]:
starbucks['new_street'] = [i.replace(" ","") for i in starbucks['Street Address']] 

In [11]:
starbucks['combined'] = starbucks['new_street']+starbucks['City']

In [12]:
starbucks['combined']

20878              31-01BroadwayAstoria
20879           30-18AstoriaBlvdAstoria
20880             371135thAvenueAstoria
20881        31-44SteinwayStreetAstoria
20882            22-0431stStreetAstoria
                      ...              
21356      77-83West125thStreetNew York
21357              1740BroadwayNew York
21358                325W49thStNew York
21359    684AvenueoftheAmericasNew York
21360        30RockefellerPlazaNew York
Name: combined, Length: 255, dtype: object

Feature engineer column to have one word for street address and city with no spaces

In [13]:
starbucks.head()

Unnamed: 0,Brand,Store Number,Store Name,Ownership Type,Street Address,City,State/Province,Country,Postcode,Phone Number,Timezone,Longitude,Latitude,new_street,combined
20878,Starbucks,14840-129334,31-01 Broadway,Company Owned,31-01 Broadway,Astoria,NY,US,111062648,718-777-1305,GMT-05:00 America/New_York,-73.93,40.76,31-01Broadway,31-01BroadwayAstoria
20879,Starbucks,22638-223164,Astoria Blvd & 31st St,Company Owned,30-18 Astoria Blvd,Astoria,NY,US,11102,7182781518,GMT-05:00 America/New_York,-73.92,40.77,30-18AstoriaBlvd,30-18AstoriaBlvdAstoria
20880,Starbucks,9663-97893,35th Ave and 37th St,Company Owned,3711 35th Avenue,Astoria,NY,US,11106,718-706-0464,GMT-05:00 America/New_York,-73.92,40.76,371135thAvenue,371135thAvenueAstoria
20881,Starbucks,7567-18640,31-44 Steinway Street,Company Owned,31-44 Steinway Street,Astoria,NY,US,111033911,718-274-5700,GMT-05:00 America/New_York,-73.92,40.76,31-44SteinwayStreet,31-44SteinwayStreetAstoria
20882,Starbucks,7555-14690,31St St & Ditmars Blvd,Company Owned,22-04 31st Street,Astoria,NY,US,111052714,718-626-6004,GMT-05:00 America/New_York,-73.91,40.78,22-0431stStreet,22-0431stStreetAstoria


In [14]:
starbucks_coords = []
starbucks_store = []
missing_cords = []
for i in starbucks['combined']:
    try:
        response = requests.get(f'https://maps.googleapis.com/maps/api/geocode/json?address={i}ny&key={config['key']}')

        results = response.json()

        starbucks_coords.append(results['results'][0]['geometry']['location'])
        starbucks_store.append(i)
    except:
        missing_cords.append(i)
    time.sleep(2)

We were able to pull the coordinates of 247 of the 255 stores that we provided in the google API.

In [15]:
print(len(starbucks_coords))
print(len(starbucks_store)) 

247
247


In [16]:
missing_cords

['22-0431stStreetAstoria',
 '1350520thAveFlushing',
 '38ParkRow,#4New York',
 '1000SEighthAvenue,#34New York',
 '1117LexingtonAve.,#4New York',
 '1291LexingtonAve.,#1291New York',
 '1488ThirdAvenue#ANew York',
 '115BroadwayNew York']

We note that these were the 8 addresses with missing coordinates. When we innerjoin our dataframe, these items will be dropped from our data set

In [17]:
lat = []
long = []
for starbucks_coord in starbucks_coords:
    lat.append(starbucks_coord.get('lat'))
    long.append(starbucks_coord.get('lng'))

We will will a loop that will extract each latitude and longitude value from the coordinates pulled

In [18]:
lat_long = pd.DataFrame(data = {'combined': starbucks_store,
                    'latitude': lat,
                    'longitude': long})

In [19]:
lat_long.head()

Unnamed: 0,combined,latitude,longitude
0,31-01BroadwayAstoria,40.762074,-73.925029
1,30-18AstoriaBlvdAstoria,40.769995,-73.918523
2,371135thAvenueAstoria,40.755939,-73.923485
3,31-44SteinwayStreetAstoria,40.760282,-73.91814
4,41-02MainStreetFlushing,40.757969,-73.829743


Create a dataframe that we can merge back on with the latitude and longitudes of al the NY area stores

In [20]:
starbucks = pd.merge(left = starbucks,
        right = lat_long,
        how ='inner',
        on = 'combined')

We merged our new starbucks coordinates with the starbucks dataframe to get a more specific location of each store

In [21]:
starbucks.drop(columns= ['Longitude', 'Latitude'], inplace = True)
starbucks.head()

Unnamed: 0,Brand,Store Number,Store Name,Ownership Type,Street Address,City,State/Province,Country,Postcode,Phone Number,Timezone,new_street,combined,latitude,longitude
0,Starbucks,14840-129334,31-01 Broadway,Company Owned,31-01 Broadway,Astoria,NY,US,111062648,718-777-1305,GMT-05:00 America/New_York,31-01Broadway,31-01BroadwayAstoria,40.762074,-73.925029
1,Starbucks,22638-223164,Astoria Blvd & 31st St,Company Owned,30-18 Astoria Blvd,Astoria,NY,US,11102,7182781518,GMT-05:00 America/New_York,30-18AstoriaBlvd,30-18AstoriaBlvdAstoria,40.769995,-73.918523
2,Starbucks,9663-97893,35th Ave and 37th St,Company Owned,3711 35th Avenue,Astoria,NY,US,11106,718-706-0464,GMT-05:00 America/New_York,371135thAvenue,371135thAvenueAstoria,40.755939,-73.923485
3,Starbucks,7567-18640,31-44 Steinway Street,Company Owned,31-44 Steinway Street,Astoria,NY,US,111033911,718-274-5700,GMT-05:00 America/New_York,31-44SteinwayStreet,31-44SteinwayStreetAstoria,40.760282,-73.91814
4,Starbucks,7539-12698,Main St & 41St Ave,Company Owned,41-02 Main Street,Flushing,NY,US,113553133,718-358-9355,GMT-05:00 America/New_York,41-02MainStreet,41-02MainStreetFlushing,40.757969,-73.829743


We dropped the old latitude and longitudes from our starbucks dataframe

In [22]:
starbucks[starbucks['Store Name'] == 'Penn Station by A,C,E']

Unnamed: 0,Brand,Store Number,Store Name,Ownership Type,Street Address,City,State/Province,Country,Postcode,Phone Number,Timezone,new_street,combined,latitude,longitude
91,Starbucks,7851-80623,"Penn Station by A,C,E",Company Owned,1 Penn Plaza Concourse Level,New York,NY,US,101190002,212-736-3206,GMT-05:00 America/New_York,1PennPlazaConcourseLevel,1PennPlazaConcourseLevelNew York,40.751377,-73.992488


We wanted to check to ensure that the coordinate pulling worked and that we got the correct starbucks coordinates

In [23]:
station.head()

Unnamed: 0,Division,Line,Station_Name,Station_Latitude,Station_Longitude,Route_1,Route_2,Route_3,Route_4,Route_5,...,Staffing,Staff_Hours,ADA,ADA_Notes,Free_Crossover,North_South_Street,East_West_Street,Corner,Latitude,Longitude
0,BMT,Astoria,Ditmars Blvd,40.775036,-73.912034,N,Q,,,,...,FULL,,False,,True,31st St,23rd Ave,NW,40.775149,-73.912074
1,BMT,Astoria,Ditmars Blvd,40.775036,-73.912034,N,Q,,,,...,FULL,,False,,True,31st St,23rd Ave,NE,40.77481,-73.912151
2,BMT,Astoria,Ditmars Blvd,40.775036,-73.912034,N,Q,,,,...,FULL,,False,,True,31st St,23rd Ave,NE,40.775025,-73.911891
3,BMT,Astoria,Ditmars Blvd,40.775036,-73.912034,N,Q,,,,...,FULL,,False,,True,31st St,23rd Ave,NW,40.774938,-73.912337
4,BMT,Astoria,Astoria Blvd-Hoyt Av,40.770258,-73.917843,N,Q,,,,...,FULL,,False,,True,31st St,Hoyt Ave South,SW,40.770313,-73.917978


In [24]:
starbucks.to_csv('Data/starbucks_updated.csv', index = False)

Saved our new starbucks dataframe as 'starbucks_updated' to refect the more exact coordinates

Our next steps were to find the 3 closest stores to our subway stations.

In [25]:
station[station['Station_Name'] == 'West 4th St']

Unnamed: 0,Division,Line,Station_Name,Station_Latitude,Station_Longitude,Route_1,Route_2,Route_3,Route_4,Route_5,...,Staffing,Staff_Hours,ADA,ADA_Notes,Free_Crossover,North_South_Street,East_West_Street,Corner,Latitude,Longitude
349,IND,8 Avenue,West 4th St,40.732338,-74.000495,A,B,C,D,E,...,FULL,,True,,True,,,,40.731086,-74.001209
350,IND,8 Avenue,West 4th St,40.732338,-74.000495,A,B,C,D,E,...,FULL,,True,,True,6th Ave,West 3rd St,NW,40.731133,-74.001534
351,IND,8 Avenue,West 4th St,40.732338,-74.000495,A,B,C,D,E,...,FULL,,True,,True,6th Ave,West 3rd St,NE,40.731091,-74.001127
1327,IND,8 Avenue,West 4th St,40.732338,-74.000495,A,B,C,D,E,...,NONE,,True,,True,6th Ave,Waverly Pl,NE,40.733067,-73.999728
1328,IND,8 Avenue,West 4th St,40.732338,-74.000495,A,B,C,D,E,...,NONE,,True,,True,6th Ave,Waverly Pl,NE,40.733259,-73.999615
1329,IND,8 Avenue,West 4th St,40.732338,-74.000495,A,B,C,D,E,...,NONE,,True,,True,6th Ave,Waverly Pl,NW,40.733072,-74.000192


In [26]:
station.loc[station['Station_Name'] == 'West 4th St','Route_1'] = station.loc[station['Station_Name'] == 'West 4th St','Route_5']

When performing our EDA, we noticed that the Washingston Sq subwy stop was missing and this was due to the route not being picekd up when we filted. This is one of the bigger subway stops that teh E line has so in order to pick it up, we had to replace the route 1 column with an E so that when the dataset being filtered by all the E line stations, it will picked up.

In [27]:
station[station['Station_Name'] == 'West 4th St']

Unnamed: 0,Division,Line,Station_Name,Station_Latitude,Station_Longitude,Route_1,Route_2,Route_3,Route_4,Route_5,...,Staffing,Staff_Hours,ADA,ADA_Notes,Free_Crossover,North_South_Street,East_West_Street,Corner,Latitude,Longitude
349,IND,8 Avenue,West 4th St,40.732338,-74.000495,E,B,C,D,E,...,FULL,,True,,True,,,,40.731086,-74.001209
350,IND,8 Avenue,West 4th St,40.732338,-74.000495,E,B,C,D,E,...,FULL,,True,,True,6th Ave,West 3rd St,NW,40.731133,-74.001534
351,IND,8 Avenue,West 4th St,40.732338,-74.000495,E,B,C,D,E,...,FULL,,True,,True,6th Ave,West 3rd St,NE,40.731091,-74.001127
1327,IND,8 Avenue,West 4th St,40.732338,-74.000495,E,B,C,D,E,...,NONE,,True,,True,6th Ave,Waverly Pl,NE,40.733067,-73.999728
1328,IND,8 Avenue,West 4th St,40.732338,-74.000495,E,B,C,D,E,...,NONE,,True,,True,6th Ave,Waverly Pl,NE,40.733259,-73.999615
1329,IND,8 Avenue,West 4th St,40.732338,-74.000495,E,B,C,D,E,...,NONE,,True,,True,6th Ave,Waverly Pl,NW,40.733072,-74.000192


In [28]:
station = station[(station['Route_1'] == 'E') | (station['Route_2'] == 'E') | (station['Route_3'] == 'E')]

Here is where we filter the first 3 routes to pick up all the E line stations

In [29]:
station = station.groupby('Station_Name').mean()

We then grouped the stations all together, knowing that the only important information on this data set would bethe subway station coordinates that will not change since each stop has the same coordinates. It was a way to have each row be a individual subway station

In [30]:
station['station'] = station.index
station.reset_index(drop=True, inplace=True)
station.head()

Unnamed: 0,Station_Latitude,Station_Longitude,Route_8,Route_9,Route_10,Route_11,ADA,Free_Crossover,Latitude,Longitude,station
0,40.740893,-74.00169,,,,,1.0,1.0,40.740612,-74.001896,14th St
1,40.745906,-73.998041,,,,,0.0,1.0,40.74599,-73.997927,23rd St
2,40.747846,-73.946,,,,,0.0,1.0,40.747807,-73.945829,23rd St-Ely Av
3,40.752287,-73.993391,,,,,1.0,1.0,40.752218,-64.74426,34th St
4,40.757308,-73.989735,1.0,2.0,3.0,7.0,1.0,1.0,40.757358,-73.989816,42nd St


In [31]:
station['station'].sort_values().unique()

array(['14th St', '23rd St', '23rd St-Ely Av', '34th St', '42nd St',
       '45 Rd-Court House Sq', '50th St', '51st St', '5th Av-53rd St',
       '75th Av', '7th Av', '8th Av', 'Broadway-74th St', 'Canal St',
       'Chambers St', 'Forest Hills-71st Av',
       'Jackson Heights-Roosevelt Ave', 'Jamaica-Van Wyck',
       'Kew Gardens-Union Turnpike', 'Lexington Av-53rd St', 'Park Place',
       'Parsons Blvd-Archer Av - Jamaica Center', 'Queens Plaza',
       'Spring St', 'Sutphin Blvd-Archer Av - JFK', 'Times Square',
       'Times Square-42nd St', 'West 4th St', 'World Trade Center'],
      dtype=object)

I create 3 fucntions to pull the distance of the 3 closest starbucks stores. These functions will take in the latitude and longitude coordinates of each subway station within our dataframe as well as the starbucks stores latitude and longitudes and use the pythagorean theorem to find the shortest distance between them. We will then list these distances and match them to the dataframe after

In [32]:
def test(x1, y1, x2, y2): #x2, y2 series of lat and long for stations
    temp =  pd.DataFrame(data = [starbucks['Store Name'],(np.sqrt((x2 - x1)**2 + (y2-y1)**2))],index=[0,1])
    return temp.T.sort_values(by = 1)

In [33]:
def closest_station(x1, y1, x2, y2): #x2, y2 series of lat and long for stations
    temp =  pd.DataFrame((np.sqrt((x2 - x1)**2 + (y2-y1)**2)).sort_values()).iloc[0][0]
    return temp

In [34]:
def closest_station2(x1, y1, x2, y2): #x2, y2 series of lat and long for stations
    return pd.DataFrame((np.sqrt((x2 - x1)**2 + (y2-y1)**2)).sort_values()).iloc[1][0]

In [35]:
def closest_station3(x1, y1, x2, y2): #x2, y2 series of lat and long for stations
    return pd.DataFrame((np.sqrt((x2 - x1)**2 + (y2-y1)**2)).sort_values()).iloc[2][0]


In [36]:
def closest_store(x1, y1, x2, y2): #x2, y2 series of lat and long for stations
    temp =  pd.DataFrame(data = [starbucks['Store Name'],(np.sqrt((x2 - x1)**2 + (y2-y1)**2))],index=[0,1])
    return temp.T.sort_values(by = 1).iloc[0][0]

In [37]:
def closest_store2(x1, y1, x2, y2): #x2, y2 series of lat and long for stations
    temp =  pd.DataFrame(data = [starbucks['Store Name'],(np.sqrt((x2 - x1)**2 + (y2-y1)**2))],index=[0,1])
    return temp.T.sort_values(by = 1).iloc[1][0]

In [38]:
def closest_store3(x1, y1, x2, y2): #x2, y2 series of lat and long for stations
    temp =  pd.DataFrame(data = [starbucks['Store Name'],(np.sqrt((x2 - x1)**2 + (y2-y1)**2))],index=[0,1])
    return temp.T.sort_values(by = 1).iloc[2][0]

In [39]:
closest_station(40.757308, -73.989735, starbucks['latitude'], starbucks['longitude'])

0.000611798864002006

In [40]:
closest_store(40.757308, -73.989735, starbucks['latitude'], starbucks['longitude'])

'42nd & 8th'

Ensure that functions are working accordingly

In [41]:
first_closest = []
second_closest = []
third_closest = []
first_closest_store = []
second_closest_store = []
third_closest_store = []
for index in range(len(station)):
    row = station.iloc[index,:]
    lat = row['Station_Latitude']
    long = row['Station_Longitude']
    first_closest.append(closest_station(lat,long, starbucks['latitude'], starbucks['longitude']))
    second_closest.append(closest_station2(lat,long, starbucks['latitude'], starbucks['longitude']))
    third_closest.append(closest_station3(lat,long, starbucks['latitude'], starbucks['longitude']))
    first_closest_store.append(closest_store(lat,long, starbucks['latitude'], starbucks['longitude']))
    second_closest_store.append(closest_store2(lat,long, starbucks['latitude'], starbucks['longitude']))
    third_closest_store.append(closest_store3(lat,long, starbucks['latitude'], starbucks['longitude']))

With the cell above, we were able to individually loop through each subway station and their coordinates to find the 3 closest stores with the functions that were defined above. These distances calculated and store names were then added to a list where they will be added on to the station dataframe

In [42]:
print(len(first_closest))
print(len(second_closest))
print(len(third_closest))
print(len(first_closest_store))
print(len(second_closest_store))
print(len(third_closest_store))

29
29
29
29
29
29


Created a function to add all the closest list and their store name to the dataframe, we will run this three times to add each of the 3 closest store to each station

In [43]:

def add_to_df(df, distance, store_name, col1_name, col2_name):
    frame = pd.DataFrame([distance,store_name],index=[0,1])
    close = frame.T
    close.reset_index(drop=True, inplace=True)
    df[col1_name] = close[0]
    df[col2_name] = close[1]
    
    return df.head()

In [44]:
add_to_df(station,first_closest,first_closest_store,'first_closest','first_closest_store')

Unnamed: 0,Station_Latitude,Station_Longitude,Route_8,Route_9,Route_10,Route_11,ADA,Free_Crossover,Latitude,Longitude,station,first_closest,first_closest_store
0,40.740893,-74.00169,,,,,1.0,1.0,40.740612,-74.001896,14th St,0.00186078,8th Ave just south of 14th St
1,40.745906,-73.998041,,,,,0.0,1.0,40.74599,-73.997927,23rd St,0.00318054,24th & 7th
2,40.747846,-73.946,,,,,0.0,1.0,40.747807,-73.945829,23rd St-Ely Av,0.0131558,Roosevelt Island
3,40.752287,-73.993391,,,,,1.0,1.0,40.752218,-64.74426,34th St,0.0012815,One Penn Plaza
4,40.757308,-73.989735,1.0,2.0,3.0,7.0,1.0,1.0,40.757358,-73.989816,42nd St,0.000611799,42nd & 8th


In [45]:
add_to_df(station,second_closest,second_closest_store,'second_closest','second_closest_store')

Unnamed: 0,Station_Latitude,Station_Longitude,Route_8,Route_9,Route_10,Route_11,ADA,Free_Crossover,Latitude,Longitude,station,first_closest,first_closest_store,second_closest,second_closest_store
0,40.740893,-74.00169,,,,,1.0,1.0,40.740612,-74.001896,14th St,0.00186078,8th Ave just south of 14th St,0.00249949,19th & 8th
1,40.745906,-73.998041,,,,,0.0,1.0,40.74599,-73.997927,23rd St,0.00318054,24th & 7th,0.00371675,19th & 8th
2,40.747846,-73.946,,,,,0.0,1.0,40.747807,-73.945829,23rd St-Ely Av,0.0131558,Roosevelt Island,0.0216008,69th & First
3,40.752287,-73.993391,,,,,1.0,1.0,40.752218,-64.74426,34th St,0.0012815,One Penn Plaza,0.0012815,"Penn Station by A,C,E"
4,40.757308,-73.989735,1.0,2.0,3.0,7.0,1.0,1.0,40.757358,-73.989816,42nd St,0.000611799,42nd & 8th,0.000680281,Union Square East


In [46]:
add_to_df(station,third_closest,third_closest_store,'third_closest','third_closest_store')

Unnamed: 0,Station_Latitude,Station_Longitude,Route_8,Route_9,Route_10,Route_11,ADA,Free_Crossover,Latitude,Longitude,station,first_closest,first_closest_store,second_closest,second_closest_store,third_closest,third_closest_store
0,40.740893,-74.00169,,,,,1.0,1.0,40.740612,-74.001896,14th St,0.00186078,8th Ave just south of 14th St,0.00249949,19th & 8th,0.00352262,Greenwich Ave at Bank St
1,40.745906,-73.998041,,,,,0.0,1.0,40.74599,-73.997927,23rd St,0.00318054,24th & 7th,0.00371675,19th & 8th,0.00372927,23rd & 1st
2,40.747846,-73.946,,,,,0.0,1.0,40.747807,-73.945829,23rd St-Ely Av,0.0131558,Roosevelt Island,0.0216008,69th & First,0.0237769,75th & First
3,40.752287,-73.993391,,,,,1.0,1.0,40.752218,-64.74426,34th St,0.0012815,One Penn Plaza,0.0012815,"Penn Station by A,C,E",0.00377857,32nd btwn 6th & 7th
4,40.757308,-73.989735,1.0,2.0,3.0,7.0,1.0,1.0,40.757358,-73.989816,42nd St,0.000611799,42nd & 8th,0.000680281,Union Square East,0.00104259,43rd & 8th


In [47]:
station['station'].unique()

array(['14th St', '23rd St', '23rd St-Ely Av', '34th St', '42nd St',
       '45 Rd-Court House Sq', '50th St', '51st St', '5th Av-53rd St',
       '75th Av', '7th Av', '8th Av', 'Broadway-74th St', 'Canal St',
       'Chambers St', 'Forest Hills-71st Av',
       'Jackson Heights-Roosevelt Ave', 'Jamaica-Van Wyck',
       'Kew Gardens-Union Turnpike', 'Lexington Av-53rd St', 'Park Place',
       'Parsons Blvd-Archer Av - Jamaica Center', 'Queens Plaza',
       'Spring St', 'Sutphin Blvd-Archer Av - JFK', 'Times Square',
       'Times Square-42nd St', 'West 4th St', 'World Trade Center'],
      dtype=object)

In [308]:
station.to_csv('Data/stations.csv', index = False)

Will save our stations dataframe which we will then use in our modeling notebook.