### Obtaining primary_school_coordinates Dataset (csv)

In this notebook, we will be using the `requests` and `BeautifulSoup` python libraries to scrape a list of Primary Schools in Singapore from Wikipedia, as well as the OneMap Website to obtain longitude and latitude coordinates for each school

In [2]:
import requests
from bs4 import BeautifulSoup
import csv


Make a request to `https://en.wikipedia.org/wiki/List_of_primary_schools_in_Singapore` to find all primary schools; we can store primary school names a list

In [68]:
# make request to wikepdia page
res = requests.get('https://en.wikipedia.org/wiki/List_of_primary_schools_in_Singapore')
# parse the page
soup = BeautifulSoup(res.text, 'html.parser')

td_elements = soup.find_all('tbody')[1]
school_dict= {}

for tr_element in td_elements.find_all('tr'): #i.e each <tr> tag
    first_td = tr_element.find('td') #i.e first <td> tag
    if first_td:
        school_name = first_td.text.strip()
        school_dict[school_name] = {}


print(list(school_dict.keys()))

['Admiralty Primary School', 'Ahmad Ibrahim Primary School', 'Ai Tong School', 'Alexandra Primary School', 'Anchor Green Primary School', 'Anderson Primary School', 'Anglo-Chinese School (Junior)', 'Anglo-Chinese School (Primary)', 'Angsana Primary School', 'Ang Mo Kio Primary School', 'Beacon Primary School', 'Bedok Green Primary School', 'Bendemeer Primary School', 'Blangah Rise Primary School', 'Boon Lay Garden Primary School', 'Bukit Panjang Primary School', 'Bukit Timah Primary School', 'Bukit View Primary School', 'Canberra Primary School', 'Canossa Catholic Primary School', 'Cantonment Primary School', 'Casuarina Primary School', 'Catholic High School (Primary)', 'Cedar Primary School', 'Changkat Primary School', 'CHIJ (Katong) Primary', 'CHIJ (Kellock)', 'CHIJ Our Lady of Good Counsel', 'CHIJ Our Lady of the Nativity', 'CHIJ Our Lady Queen of Peace', 'CHIJ Primary (Toa Payoh)', "CHIJ St. Nicholas Girls' School (Primary Section)", 'Chongfu School', 'Chongzheng Primary School', '

OneMap API
- Create a OneMap API account to get access to developer methods/ query link for URL

get_lat_long(locality_list)
- accepts a list of location names
- appends names to the OneMap API, before sending a request
- if not found, appends the locality name to not found list
- outputs a dataframe of location, longitude, latitude

In [76]:
import json
import requests
import pandas as pd

def get_lat_long(name_list):
    latitude = []
    longitude = []
    not_found = []

    for name in name_list:
        query_string = "https://www.onemap.gov.sg/api/common/elastic/search?searchVal="+name+"&returnGeom=Y&getAddrDetails=Y&pageNum=1"
        resp = requests.get(query_string)

        #Convert JSON into Python Object 

        data_geo_location=json.loads(resp.content)
        if data_geo_location['found'] != 0:
            latitude.append(data_geo_location['results'][0]['LATITUDE'])
            longitude.append(data_geo_location['results'][0]['LONGITUDE'])
            print (name + " ,Lat: " + data_geo_location['results'][0]['LATITUDE'] + " Long: " + data_geo_location['results'][0]['LONGITUDE'])
        else:
            print (name + "No Results")
            not_found.append(name)

    #removing not found from list
    final_list = []
    for i in name_list:
        if i not in not_found:
            final_list.append(i)

    print(not_found)
    
    return pd.DataFrame({str('name'): final_list, 'latitude': latitude, 'longitude': longitude})

In [77]:
df = get_lat_long(list(school_dict.keys()))
df.head()


Admiralty Primary School ,Lat: 1.4426347903311 Long: 103.800040119743
Ahmad Ibrahim Primary School ,Lat: 1.43315271543517 Long: 103.832942401086
Ai Tong School ,Lat: 1.3605834338904 Long: 103.833020333986
Alexandra Primary School ,Lat: 1.29133439161334 Long: 103.824424680531
Anchor Green Primary School ,Lat: 1.39036998654612 Long: 103.887165375933
Anderson Primary School ,Lat: 1.38426429436736 Long: 103.841392081119
Anglo-Chinese School (Junior) ,Lat: 1.30935041274966 Long: 103.840950265464
Anglo-Chinese School (Primary) ,Lat: 1.31837054523521 Long: 103.835609732354
Angsana Primary School ,Lat: 1.34828400809545 Long: 103.951482746538
Ang Mo Kio Primary School ,Lat: 1.36932176584608 Long: 103.839630858752
Beacon Primary School ,Lat: 1.38394936211823 Long: 103.773632022975
Bedok Green Primary School ,Lat: 1.32344593287992 Long: 103.937878976352
Bendemeer Primary School ,Lat: 1.32181250780475 Long: 103.865404167629
Blangah Rise Primary School ,Lat: 1.27612047924037 Long: 103.808628535239


Unnamed: 0,name,latitude,longitude
0,Admiralty Primary School,1.4426347903311,103.800040119743
1,Ahmad Ibrahim Primary School,1.43315271543517,103.832942401086
2,Ai Tong School,1.3605834338904,103.833020333986
3,Alexandra Primary School,1.29133439161334,103.824424680531
4,Anchor Green Primary School,1.39036998654612,103.887165375933


Using get_lat_long on the list of Primary Schools, we end up with 8 schools that were unable to be retrieved. We will do the coordinates retrieval manually and add them directly to the CSV

In [79]:
Not_found = ["CHIJ St. Nicholas Girls' School (Primary Section)", 'Coral Primary School', 'Da Qiao Primary School', 'East Coast Primary School', 'East View Primary School', 'Jing Shan Primary School\xa0[zh]', 'Juying Primary School', 'Maris Stella High School (Primary Section)']
(len(Not_found))
df.shape


8

Convert to CSV "primary_school_coordinates"

In [75]:
df.to_csv('../data/modified/primary_school_coordinates.csv')