## Capstone Project: The battle of neighbourhoods


### File Name: subway_stations.ipynb

### This code will retrieve subway stations names in Manhattan,NY by web scraping. Using Geocoders and Nominatim, latitude and longitude of each station is retrieved and then all the details are stored in "subway_stations.csv" file

In [1]:
#import libraries
from bs4 import BeautifulSoup
import requests
import pandas as pd
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values


In [2]:
#Access the wiki page text using Beautifulsoup

url="https://en.wikipedia.org/wiki/Category:New_York_City_Subway_stations_in_Manhattan"

source = requests.get(url).text

soup=BeautifulSoup(source,'lxml')
    
#print(soup.prettify()) 

In [3]:
# instantiate the dataframe to store stations details

column_names = ['Stations','Lat', 'Long'] 
df_stations = pd.DataFrame(columns=column_names)



In [4]:
#Retrieve station name and store in the dataframe

stations= soup.find_all('div',class_='mw-category-group')
for st in stations:
   for names in st.find_all('a'):
       for name in names:
        print(name)
        df_stations=df_stations.append({'Stations': name}, ignore_index=True)


List of New York City Subway stations in Manhattan
First Avenue (BMT Canarsie Line)
Second Avenue (IND Sixth Avenue Line)
Third Avenue (BMT Canarsie Line)
Fifth Avenue–59th Street (BMT Broadway Line)
Fifth Avenue/53rd Street (IND Queens Boulevard Line)
Seventh Avenue (IND lines)
Eighth Street–New York University (BMT Broadway Line)
10th Avenue (IRT Flushing Line)
14th Street/Eighth Avenue (New York City Subway)
14th Street/Sixth Avenue (New York City Subway)
14th Street–Union Square (New York City Subway)
18th Street (IRT Broadway–Seventh Avenue Line)
23rd Street (BMT Broadway Line)
23rd Street (IND Eighth Avenue Line)
23rd Street (IND Sixth Avenue Line)
23rd Street (IRT Broadway–Seventh Avenue Line)
23rd Street (IRT Lexington Avenue Line)
28th Street (BMT Broadway Line)
28th Street (IRT Broadway–Seventh Avenue Line)
28th Street (IRT Lexington Avenue Line)
33rd Street (IRT Lexington Avenue Line)
34th Street–Penn Station (IND Eighth Avenue Line)
34th Street–Penn Station (IRT Broadway–Se

In [5]:
# Drop the first row as it contains the title
df_stations=df_stations.drop(df_stations.head(1).index)
df_stations.reset_index(drop=True)


Unnamed: 0,Stations,Lat,Long
0,First Avenue (BMT Canarsie Line),,
1,Second Avenue (IND Sixth Avenue Line),,
2,Third Avenue (BMT Canarsie Line),,
3,Fifth Avenue–59th Street (BMT Broadway Line),,
4,Fifth Avenue/53rd Street (IND Queens Boulevard...,,
5,Seventh Avenue (IND lines),,
6,Eighth Street–New York University (BMT Broadwa...,,
7,10th Avenue (IRT Flushing Line),,
8,14th Street/Eighth Avenue (New York City Subway),,
9,14th Street/Sixth Avenue (New York City Subway),,


In [None]:
#Retrieve latitude and longitude of each station

for n in range(1,len(df_stations)):   
    address= df_stations['Stations'][n].strip()
    address=(df_stations['Stations'][n]+ '  , '+' Manhattan NY ')
    geolocator = Nominatim()
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    df_stations['Lat'][n]=latitude
    df_stations['Long'][n]=longitude
   

In [9]:
#Store details in csv file

df_stations.to_csv('subway_stations.csv',index=False)
print("Subway stations details are stored in csv file")


Subway stations details are stored in csv file
