## Problem

You are provided data on the stations and lines of Singapore's urban rail system, including planned additions over the next decade. Your task is to use this data to build a routing service, to help users find routes from any station to any other station on this future network.

The app should meet the following requirements:
- Allow the user to specify origin and destination stations.
- Find and display one or more routes from the origin to the destination, ordered by some efficiency heuristic. Routes should have one or more steps, like "Take [line] from [station] to [station]" or "Change to [line]". You may add other relevant information to the results.

You may use any language/framework. You may also convert the data into another format if needed.

## Data Description

The included file, stations.json, describes Singapore's future rail network. Here is an extract:

```
{
  ...
  "Bukit Gombak": {"NS": 3},
  "Bukit Panjang": {"BP": [6, 14], "DT": 1},
  "Buona Vista": {"EW": 21, "CC": 22, "CE": 22},
  ...
}
```

The keys of the root JSON object are station names (e.g. Bukit Gombak) and the values specify the position of each station on one or more train lines. For example, Bukit Gombak has position 3 on the "NS" (North-South) line.

Interchange stations (where train lines cross) like Buona Vista have positions on multiple lines: here it is at position 21 on the EW line and 22 on the CC and CE lines.

A few lines form loops: For instance, the Bukit Panjang station has positions 6 and 14 on the BP line because it closes the loop on that line.

Note that position numbers are not always sequential; the gaps represent spaces left for future stations, and may be ignored for this exercise.

Trains run in both directions on every line.

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv("./mrtsg.csv")

In [27]:
df

Unnamed: 0,STN_NAME,STN_NO,X,Y,Latitude,Longitude,COLOR,STN_SIGN,STN_NUMBER
40,CHOA CHU KANG LRT STATION,BP1,18121.6052,40753.8693,1.384836,103.744580,OTHERS,BP,1
155,SOUTH VIEW LRT STATION,BP2,18203.7243,40252.0686,1.380298,103.745317,OTHERS,BP,2
84,KEAT HONG LRT STATION,BP3,18622.5727,40064.5711,1.378603,103.749080,OTHERS,BP,3
168,TECK WHYE LRT STATION,BP4,19140.8000,39852.4608,1.376685,103.753735,OTHERS,BP,4
125,PHOENIX LRT STATION,BP5,19621.7772,40066.3155,1.378619,103.758056,OTHERS,BP,5
29,BUKIT PANJANG LRT STATION,BP6,20175.9353,39987.9068,1.377910,103.763034,OTHERS,BP,6
124,PETIR LRT STATION,BP7,20582.7273,39970.2513,1.377750,103.766688,OTHERS,BP,7
123,PENDING LRT STATION,BP8,21094.7812,39792.4936,1.376143,103.771288,OTHERS,BP,8
5,BANGKIT LRT STATION,BP9,21248.2460,40220.9693,1.380018,103.772667,OTHERS,BP,9
61,FAJAR LRT STATION,BP10,21043.4356,40718.8826,1.384521,103.770827,OTHERS,BP,10


In [4]:
df = df.drop_duplicates(['STN_NAME', 'STN_NO', 'COLOR'])

In [5]:
df.drop('OBJECTID', axis=1, inplace=True)

In [6]:
df['STN_SIGN'] = df['STN_NO'].apply(lambda x: x[:2])

In [7]:
def func(x):
    x = x[2:]

    if x.isdigit():
        return int(x)
    else:
        return ord(x) - ord('A') + 1

df['STN_NUMBER'] = df['STN_NO'].apply(func)

In [8]:
df.sort_values(['STN_SIGN', 'STN_NUMBER'], inplace=True)

In [9]:
df.head(20)

Unnamed: 0,STN_NAME,STN_NO,X,Y,Latitude,Longitude,COLOR,STN_SIGN,STN_NUMBER
40,CHOA CHU KANG LRT STATION,BP1,18121.6052,40753.8693,1.384836,103.74458,OTHERS,BP,1
155,SOUTH VIEW LRT STATION,BP2,18203.7243,40252.0686,1.380298,103.745317,OTHERS,BP,2
84,KEAT HONG LRT STATION,BP3,18622.5727,40064.5711,1.378603,103.74908,OTHERS,BP,3
168,TECK WHYE LRT STATION,BP4,19140.8,39852.4608,1.376685,103.753735,OTHERS,BP,4
125,PHOENIX LRT STATION,BP5,19621.7772,40066.3155,1.378619,103.758056,OTHERS,BP,5
29,BUKIT PANJANG LRT STATION,BP6,20175.9353,39987.9068,1.37791,103.763034,OTHERS,BP,6
124,PETIR LRT STATION,BP7,20582.7273,39970.2513,1.37775,103.766688,OTHERS,BP,7
123,PENDING LRT STATION,BP8,21094.7812,39792.4936,1.376143,103.771288,OTHERS,BP,8
5,BANGKIT LRT STATION,BP9,21248.246,40220.9693,1.380018,103.772667,OTHERS,BP,9
61,FAJAR LRT STATION,BP10,21043.4356,40718.8826,1.384521,103.770827,OTHERS,BP,10


In [10]:
graph = {}

grouped = df.groupby('STN_SIGN')

for name, group in grouped:
    print(f'Processing {name} STN')

    last_stn = group['STN_NO'].iloc[0]

    for stn_no in group['STN_NO'][1:]:
        graph.setdefault(last_stn, set()).add(stn_no)
        graph.setdefault(stn_no, set()).add(last_stn)

        last_stn = stn_no

    if last_stn not in graph:
        graph[last_stn] = set()

Processing BP STN
Processing CC STN
Processing CE STN
Processing CG STN
Processing DT STN
Processing EW STN
Processing NE STN
Processing NS STN
Processing PE STN
Processing PT STN
Processing PW STN
Processing SE STN
Processing ST STN
Processing SW STN
Processing TE STN


In [11]:
intersect_station = {}

for index, row in df.iterrows():
    stn_name = row['STN_NAME']
    stn_no = row['STN_NO']

    intersect_station.setdefault(stn_name, []).append(stn_no)

In [12]:
intersect_station

{'CHOA CHU KANG LRT STATION': ['BP1'],
 'SOUTH VIEW LRT STATION': ['BP2'],
 'KEAT HONG LRT STATION': ['BP3'],
 'TECK WHYE LRT STATION': ['BP4'],
 'PHOENIX LRT STATION': ['BP5'],
 'BUKIT PANJANG LRT STATION': ['BP6'],
 'PETIR LRT STATION': ['BP7'],
 'PENDING LRT STATION': ['BP8'],
 'BANGKIT LRT STATION': ['BP9'],
 'FAJAR LRT STATION': ['BP10'],
 'SEGAR LRT STATION': ['BP11'],
 'JELAPANG LRT STATION': ['BP12'],
 'SENJA LRT STATION': ['BP13'],
 'TEN MILE JUNCTION LRT STATION': ['BP14'],
 'DHOBY GHAUT MRT STATION': ['CC1', 'NE6', 'NS24'],
 'BRAS BASAH MRT STATION': ['CC2'],
 'ESPLANADE MRT STATION': ['CC3'],
 'PROMENADE MRT STATION': ['CC4', 'DT15'],
 'NICOLL HIGHWAY MRT STATION': ['CC5'],
 'STADIUM MRT STATION': ['CC6'],
 'MOUNTBATTEN MRT STATION': ['CC7'],
 'DAKOTA MRT STATION': ['CC8'],
 'PAYA LEBAR MRT STATION': ['CC9', 'EW8'],
 'MACPHERSON MRT STATION': ['CC10', 'DT26'],
 'TAI SENG MRT STATION': ['CC11'],
 'BARTLEY MRT STATION': ['CC12'],
 'SERANGOON MRT STATION': ['CC13', 'NE12'],
 '

In [13]:
from itertools import combinations


for stn_name, connected_stns in intersect_station.items():
    for stn1, stn2 in combinations(connected_stns, 2):
        graph.setdefault(stn1, set()).add(stn2)
        graph.setdefault(stn2, set()).add(stn1)

In [14]:
graph

{'BP1': {'BP2'},
 'BP2': {'BP1', 'BP3'},
 'BP3': {'BP2', 'BP4'},
 'BP4': {'BP3', 'BP5'},
 'BP5': {'BP4', 'BP6'},
 'BP6': {'BP5', 'BP7'},
 'BP7': {'BP6', 'BP8'},
 'BP8': {'BP7', 'BP9'},
 'BP9': {'BP10', 'BP8'},
 'BP10': {'BP11', 'BP9'},
 'BP11': {'BP10', 'BP12'},
 'BP12': {'BP11', 'BP13'},
 'BP13': {'BP12', 'BP14'},
 'BP14': {'BP13'},
 'CC1': {'CC2', 'NE6', 'NS24'},
 'CC2': {'CC1', 'CC3'},
 'CC3': {'CC2', 'CC4'},
 'CC4': {'CC3', 'CC5', 'DT15'},
 'CC5': {'CC4', 'CC6'},
 'CC6': {'CC5', 'CC7'},
 'CC7': {'CC6', 'CC8'},
 'CC8': {'CC7', 'CC9'},
 'CC9': {'CC10', 'CC8', 'EW8'},
 'CC10': {'CC11', 'CC9', 'DT26'},
 'CC11': {'CC10', 'CC12'},
 'CC12': {'CC11', 'CC13'},
 'CC13': {'CC12', 'CC14', 'NE12'},
 'CC14': {'CC13', 'CC15'},
 'CC15': {'CC14', 'CC16', 'NS17'},
 'CC16': {'CC15', 'CC17'},
 'CC17': {'CC16', 'CC18'},
 'CC18': {'CC17', 'CC19'},
 'CC19': {'CC18', 'CC20', 'DT9'},
 'CC20': {'CC19', 'CC21'},
 'CC21': {'CC20', 'CC22'},
 'CC22': {'CC21', 'CC23', 'EW21'},
 'CC23': {'CC22', 'CC24'},
 'CC24':

In [15]:
from collections import deque


def simple_bfs(start, end):
    queue = deque([start])
    marker = set([start])
    tracer = {}

    while queue:
        stn = queue.popleft()
        
        if stn == end:
            break

        for neighbor_stn in graph[stn]:
            if neighbor_stn not in marker:
                queue.append(neighbor_stn)
                marker.add(neighbor_stn)
                tracer[neighbor_stn] = stn

    if end not in tracer:
        return tuple()

    route = deque()

    while end != start:
        route.appendleft(end)
        end = tracer[end]

    route.appendleft(start)

    return tuple(route)

In [16]:
route = simple_bfs('CC21', 'EW12')

In [17]:
route

('CC21', 'CC20', 'CC19', 'DT9', 'DT10', 'DT11', 'DT12', 'DT13', 'DT14', 'EW12')

In [18]:
def get(row, field):
    return row[field].iloc[0]

In [19]:
def print_route(route):
    if not route:
        return

    last_row = df[df['STN_NO'] == route[0]]

    for path in route[1:]:
        row = df[df['STN_NO'] == path]

        last_stn_sign, last_color, last_stn_name = get(last_row, 'STN_SIGN'), get(last_row, 'COLOR'), get(last_row, 'STN_NAME')
        stn_sign, color, stn_name = get(row, 'STN_SIGN'), get(row, 'COLOR'), get(row, 'STN_NAME')

        if last_stn_sign == stn_sign:
            print('Take {} line from {} to {}'.format(last_color, last_stn_name, stn_name))
        else:
            print('Change to {} line {}'.format(color, stn_name))

        last_row = row

In [20]:
print_route(route)

Take YELLOW line from HOLLAND VILLAGE MRT STATION to FARRER ROAD MRT STATION
Take YELLOW line from FARRER ROAD MRT STATION to BOTANIC GARDENS MRT STATION
Change to BLUE line BOTANIC GARDENS MRT STATION
Take BLUE line from BOTANIC GARDENS MRT STATION to STEVENS MRT STATION
Take BLUE line from STEVENS MRT STATION to NEWTON MRT STATION
Take BLUE line from NEWTON MRT STATION to LITTLE INDIA MRT STATION
Take BLUE line from LITTLE INDIA MRT STATION to ROCHOR MRT STATION
Take BLUE line from ROCHOR MRT STATION to BUGIS MRT STATION
Change to GREEN line BUGIS MRT STATION


## Bonus

Singapore also has a very well-connected bus system. There are bus stops at every train station, but these do not have any line-based limitation. __Additionally, buses needs to be changed every 6 bus stops travelled__. A new route finder needs to be created to account for this. These are the travel times during different periods of the day for these modes of transport.

Peak hours (6am-9am and 6pm-9pm on Mon-Fri)
- Buses: All buses take 12 minutes per stop due to increased traffic
- Trains:
    - BP, SE, NS and NE lines take 12 minutes per station
    - All train line changes take 15 minutes

Non-Peak hours (9am-6pm on Mon-Fri, 6am-10pm on Sat & Sun)
- Buses: All buses take 10 minutes per stop, each bus change takes 10 minutes
- Trains: All trains take 10 minutes per stop, changing lines takes 10 minutes

Night hours (10pm-6am on Mon-Sun)
- Buses: All buses take 8 minutes between stations, each bus change takes 20 minutes
- Trains: Same as non-peak hours

At all times, switching between trains and buses takes 10 minutes for each change.

To account for these, in addition to the original requirements, the following requirements should also now be met:
- Accept a date-time string in "YYYY-MM-DDThh:mm" format (e.g. '2019-01-31T16:00')
- Find and display one or more routes ordered by an efficiency heuristic, as before, and again, clearly output the steps involved and total travel time

In [21]:
BUS = 'BUS'
TRAIN = 'TRAIN'
STOP = 'STOP'
CHANGE = 'CHANGE'

In [22]:
PEAK_HOURS = {
    BUS: {
        STOP: 12,
        CHANGE: 10
    },
    TRAIN: {
        STOP: 12,
        CHANGE: 15
    }
}

NON_PEAK_HOURS = {
    BUS: {
        STOP: 10,
        CHANGE: 10
    },
    TRAIN: {
        STOP: 10,
        CHANGE: 10
    }
}

NIGHT_HOURS = {
    BUS: {
        STOP: 8,
        CHANGE: 20
    },
    TRAIN: {
        STOP: 10,
        CHANGE: 10
    }
}

In [23]:
TRANSPORTS = (BUS, TRAIN)

In [24]:
from itertools import groupby, product


def get_stn(stn_no):
    return df[df['STN_NO'] == stn_no]


def calculate_cost(rule, route):
    iterator = iter(route)
    start_position = next(iterator)

    last_row = get_stn(start_position[0])
    last_transport_type = start_position[1]

    cost = 0
    final_route = deque([start_position])
    bus_stop = 1 if last_transport_type == BUS else 0

    for stn_no, transport_type in iterator:
        row = get_stn(stn_no)
        is_same_station = last_transport_type == transport_type and get(last_row, 'STN_NAME') == get(row, 'STN_NAME')

        if transport_type != last_transport_type:
            # switching between trains and buses takes 10 minutes for each change
            cost += 10
            bus_stop = 0

        # switching line
        if transport_type == TRAIN and is_same_station:
            cost += (rule[TRAIN][CHANGE] - rule[transport_type][STOP])

        if transport_type == BUS:
            bus_stop += 1

        if last_row is not None:
            if transport_type == BUS and is_same_station:
                # bus don't need to switch lines
                bus_stop = max(0, bus_stop - 1)

                last_stn_index = -1

                while final_route[last_stn_index] == 'CHANGE BUS':
                    last_stn_index -= 1

                # update paths
                final_route[last_stn_index] = (f'{final_route[last_stn_index][0]}/{stn_no}', transport_type)

                last_row, last_transport_type = row, transport_type

                continue

        cost += rule[transport_type][STOP]

        final_route.append((stn_no, transport_type))

        # buses needs to be changed every 6 bus stops travelled
        if bus_stop == 6:
            final_route.append('CHANGE BUS')
            bus_stop = 0
            cost += rule[BUS][CHANGE]

        last_row, last_transport_type = row, transport_type

    return cost, tuple(final_route)


def print_route(cost, route):
    if not route:
        return

    print(f'Cost: {cost}')
    print(route)


def dfs(start, end, rule):
    route = simple_bfs(start, end)

    if not route:
        return

    cost = float('inf')
    best_route = None

    for transport_types in product(TRANSPORTS, repeat=len(route)):
        new_cost, final_route = calculate_cost(rule, zip(route, transport_types))

        if new_cost < cost:
            cost = new_cost
            best_route = final_route

    print(rule)
    print_route(cost, best_route)
    print('--------------------')

In [25]:
dfs('EW18', 'NS22', PEAK_HOURS)
# dfs('EW18', 'NS22', NON_PEAK_HOURS)
# dfs('EW18', 'NS22', NIGHT_HOURS)

{'BUS': {'STOP': 12, 'CHANGE': 10}, 'TRAIN': {'STOP': 12, 'CHANGE': 15}}
Cost: 94
(('EW18', 'BUS'), ('EW17', 'BUS'), ('EW16/NE3', 'BUS'), ('NE4', 'BUS'), ('NE5', 'BUS'), ('NE6/NS24', 'BUS'), 'CHANGE BUS', ('NS23', 'BUS'), ('NS22', 'BUS'))
--------------------


In [26]:
dfs('EW15', 'EW5', NON_PEAK_HOURS)

{'BUS': {'STOP': 10, 'CHANGE': 10}, 'TRAIN': {'STOP': 10, 'CHANGE': 10}}
Cost: 100
(('EW15', 'TRAIN'), ('EW14', 'TRAIN'), ('EW13', 'TRAIN'), ('EW12', 'TRAIN'), ('EW11', 'TRAIN'), ('EW10', 'TRAIN'), ('EW9', 'TRAIN'), ('EW8', 'TRAIN'), ('EW7', 'TRAIN'), ('EW6', 'TRAIN'), ('EW5', 'TRAIN'))
--------------------


## Dissection

```
('EW18', 'TRAIN')
('EW17', 'TRAIN') 10
('EW16', 'TRAIN') 20
('EW15', 'TRAIN') 30
('EW14/NS26', 'BUS') 40 + 8
('NS25', 'BUS') 56
('NS24', 'BUS') 64
('NS23', 'BUS') 72
('NS22', 'BUS') 80
```