### Read in file containing all subway entrances/exits from https://data.ny.gov/Transportation/NYC-Transit-Subway-Station-Map/6xm2-7ffy

In [None]:
import pandas as pd
stopsNYC = pd.read_csv('NYC_Transit_Subway_Entrance_And_Exit_Data.csv')


Fill nulls and change data types

In [None]:
stopsNYC = stopsNYC.fillna(0)

#change to int to get rid of decimal
stopsNYC = stopsNYC.astype({'Route8': int, 'Route9': int, 'Route10': int, 'Route11': int})

#change to object to match other route columns
stopsNYC = stopsNYC.astype({'Route8': object, 'Route9': object, 'Route10': object, 'Route11': object})

#change all route columns to string to concatentate
stopsNYC = stopsNYC.astype({'Route1': str, 'Route2': str, 'Route3': str, 'Route4': str, 'Route5': str, 'Route6': str, 'Route7': str, 'Route8': str, 'Route9': str, 'Route10': str, 'Route11': str})

Combine all line columns into one, placing zeros when the line doesn't stop at the station, and separating by commas

In [None]:
cols = ['Route1','Route2','Route3','Route4','Route5','Route6','Route7','Route8','Route9','Route10','Route11']
stopsNYC['All Lines'] = stopsNYC[cols].apply(lambda row: ', '.join(row.values.astype(str)), axis =1)

Remove original line columns

In [None]:
stopsNYC = stopsNYC.drop(['Route1','Route2','Route3','Route4','Route5','Route6','Route7','Route8','Route9','Route10','Route11'], axis= 1)

Drop unneeded columns

In [None]:
stopsNYC = stopsNYC.drop(['Entrance Type','Entry','Exit Only','Vending','Staff Hours','ADA','ADA Notes','Free Crossover','North South Street','East West Street','Corner','Entrance Latitude','Entrance Longitude','Station Location','Entrance Location','entrance_georeference','station_georeference'], axis = 1)

Remove errors by index - This is errors from the original data that created extra rows when aggregated.

In [None]:
stopsNYC = stopsNYC.drop([stopsNYC.index[153],stopsNYC.index[1231]])

Group by line, station name and all lines to focus on station latitude and longitude, not each individual entrance. Aggregate the station longitude and latitude - All have the same values, so choosing median() will not change the output.

In [None]:
stopsNYCexport = stopsNYC.groupby(['Line','Station Name','All Lines'])['Station Latitude','Station Longitude'].median()

Export to csv in a folder for exported data

In [None]:
stopsNYCexport.to_csv(r'C:\{file_path}\Exports\stopsNYCgrouped.csv', header=True)