A project to get a list of bus stops, servies, and related data for the app NextBus SG.
This data is used for the app NextBus SG.
- List of all bus services and their respective routes and stops
- Bus service type (2-way, 1-way, or loop)
- List of all bus stops and metadata including name, road, coordinates, and respective buses
- Exta metadata such as route length and route time/duration
- MRT stations for bus stops
- College bus stops
- NUS
- NTU
To add college bus stops, we will have to add all shuttles. However, some bus stops don't have codes, so we'll have to think of ways to create IDs. One idea: Internal buses can be "N" followed by 4 digits (ex: First stop is N0001). Similary, premium stops with no IDs can be of the format PXXXX.
Most of the magic happens in bus.py
. This file contains the following important functions:
get_services
get_stops_for_each_service
get_all_stops
combine_stops_and_services
This function scrapes LTG and returns a list of all bus services.
Nothing complex.
This function takes a list of services and returns all the routes as a dictionary/object.
All the bus stops and information about routes come from LTG.
This function returns a dictionary with the keys being bus services (as strings). The dictionaries have the following key/values:
loop
: True or False depending on if the bus route is a loop or not. For example, bus 222 and 228 have loop routes. (Whenloop
is True,type
is'1'
)routes
: A list of routes.name
: The route's name.stops
: A list of stops in the route
type
: The route type.'0'
if it's a loop;'1'
if it's a 1-way route; and'2'
is it's a 2-way route. (Most services have 2-way routes)
More data will be added soon:
- MRT stations connections (see function 5)
- Route length
- ...
Example:
get_stops_for_each_service(['14', '15']) => {
'14': {
'routes': [
{
'name': 'Bedok Int → Clementi Int',
'stops': [
'84009',
...
],
},
{
'name': 'Clementi Int → Bedok Int',
'stops': [
'17009',
...
],
}
],
"type": '1'
},
{ '15': { .. } }
}
This function makes use of LTA's DataMall API for Bus Stops. I thought I could do without LTA's API for bus stops, but it's the only source for the longitude and latitude (coordinates) of the bus stops.
This function returns a dictionary with they keys being bus stop codes.
{
code: {
code,
name,
road,
coords
},
...
}
TODO: Find MRT stations (if any). This may require a separate function after function 4.
This function takes in 2 parameters:
services_stops_dict
: The output ofget_stops_for_each_service
(function 2).all_stops_dict
: The output ofget_all_stops
(function 3).
services_stops_dict
has a structure of bus service to routes (see example above).
all_stops_dict
is a dictionary of all bus stops with the key being the bus stop code.
Collecting MRT stations and a list of bus stops near them happens in mrt.py
. get_stations
works similarly to get_services
(function 1). It scrapes LTG for MRT stations.
There's one more function (MRT.get_station_bus_stops
) to get the bus stops for the station. It returns a list of of bus stop codes.
MRT.get_stations
returns a dictionary with MRT station reference IDs as the keys. Note that each reference ID has it's own dictionary, so interchange stations will be in the dictionary twice. Dhoby Ghaut will be there thrice.
{
ref: {
name,
refs,
bus_stops
}
}
refs
and bus_stops
are lists. The list refs
contains ref
(the key) and reference IDs of the station on other lines.
This dictionary does store repeats.
NOTE: As of now, future MRT stations abd lines have been removed (te, je, ...). This is because the LTG data is incomplete, so it's causing some repeats and dirty data.
Add MRT data to combined_stops_and_services_dict
(the dictionary returned from combine_stops_and_services
, function 4).
It's all come together in this structure:
{
code: {
code,
name,
road,
coords: {
lat,
lon
},
services: [ ... ],
mrt_stations: [ ... ]
}
}
- Land Transport Guru: It's a site that's beautiful and easy to scrape.
- Land Transport Authority Datamall: LTA provides free APIs such as bus arrival timings, and taxi location.
I'd also like to add cheeaun/busrouter-sg here. While it wasn't a data source, it was really helpful.
See the files in the "output/data" directory. More details coming soon.
- Some stations in the stations list on LTS don't have all stations refs complete. For example, see Marina Bay Financial (a bus stop). The TE line is mentioned even though it is not on the main list, causing duplicates.