Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE".

---

# Read This First

* You will provide a single Jupyter notebook file.
* Each function should have appropriate documentation in the form of a docstring and in-line comments. 
* Each code explanation (i.e. Part 3 for each Section) should be written in markdown. 
* You can use any built-in Python functions (those that don't need libraries) and any libraries *used in the labs* (i.e., those following some `import` statement). No other libraries are allowed.

**You *will* be penalised if you don't follow these instructions.**

## Background information

*In which you will write a tool to advice people to avoid some unnecessary line changes in the London Underground.*

File `underground.zip` contains thirteen text files. File `station_names.csv` is a CSV file where each row is a pair containing an arbitrary *station id* and the *name* of a station in the London Underground. File `walking_times.csv` is a CSV file where each row contains a pair of station ids and the time to walking between the corresponding stations. Note that this does not contain all pairs of stations. CSV files such as `UG_bakerloo.csv` represent the connectivity at different lines in the Underground (e.g., the Bakerloo line). Each row contains a pair of station ids that indicate that such stations are adjacent to each other in that particular line. For instance, the first line in `UG_bakerloo.csv` is `11,135`, indicating that station id 11 (Baker Street) and station id 135 (Marylebone) are next to each other in the Bakerloo line.

All sorts of network analyses can be done in a system like the London Underground. For instance, we may wish to know whether we can easily avoid changing lines for a particular journey: say I'm outside Marylebone and I want to go to Euston Square. I could enter the Underground in Marylebone, change at Baker Street, and then go to Euston Square. But Baker Street is just a short walk from Marylebone. If I happen to know that, I may wish to just walk there and start my Underground journey from Baker Street. This provides motivation to Section B below.

You may wish to check out the official [London Underground map](https://tfl.gov.uk/maps/track/tube) to better understand the data, but please bear in mind that the files in `underground.zip` are not up to date, and minor differences are to be expected. 

## Section A: Data Wrangling

Your overall task here is to combine the information from a number of csv files into a single pandas DataFrame. 

### Part 1

Write a function to be called `load_underground_data`. It takes as input a string `path_name` and loads to memory the station names in `station_names.csv`, the walking times in `walking_times.csv`, and each of the lines `UG_*.csv`. You should combine the lines data into a single DataFrame, named `tube_lines`, containing three colums: the first column should be the name of the line, as a user-friendly string, while the last two columns should be the station ids of the adjacent stations. For example, for the first line in `UG_bakerloo.csv`, i.e. `11,135`, the corresponding row in `tube_lines` should be `'Bakerloo',11,135`.

The output of `load_underground_data` should be three DataFrames: `station_names`, `walking_times`, and `tube_lines`, in that order.

**Hints and reminders**

Note that the `walking_times.csv` file contains a header row, while the others do not. Be careful with how you set the column names!

You can make a Python function return any number of outputs if you separate them by commas. For instance `return a, b` will return two outputs, `a` and `b`, simultaneously.

In [1]:
import pandas as pd
import sys
import numpy as np
import math
import sys

# np.set_printoptions(threshold=sys.maxsize)


def load_underground_data(path_name):
    '''
    this function takes in path_name and loads underground csv files into their respective dataframes
    '''
    station_names = pd.read_csv("station_names.csv", names=["id", "name"], header=None)
    # creates dataframe for the station names
    walking_times = pd.read_csv("walking_times.csv")
    # creates dataframe for the walking times

    UG_lines = ["bakerloo", "central", "circle", "district", "hammersmith_city", "jubilee",
                "metropolitan", "northern", "piccadilly", "victoria", "waterloo_city"]
    # creates list of underground lines to help create tube_lines dataframes

    tube_lines = []  # this will be the list of all tube lines and their available stations
    for line in UG_lines:
        # loops through each underground line, reads the corresponding csv file,
        # and appends to a larger dataframe for all tube lines
        df = pd.read_csv(f"UG_{line}.csv", names=["previous_station_id", "next_station_id"], header=None)
        df['line'] = line.title()
        tube_lines.append(df)
    tube_lines = pd.concat(tube_lines, ignore_index=True)
    tube_lines = tube_lines[['line', 'previous_station_id', 'next_station_id']]

    return station_names, walking_times, tube_lines

In [2]:
# Test your code here


import os

MY_PATH = "/Users/sahmrahman/Desktop/STAT0040/underground"
os.chdir(MY_PATH)
load_underground_data(MY_PATH)

(      id                name
 0      1          Acton Town
 1      2            Barbican
 2      3             Aldgate
 3      4        Aldgate East
 4      5            Alperton
 ..   ...                 ...
 264  265      West Hampstead
 265  266  Willesden Junction
 266  267            Richmond
 267  268           Wimbledon
 268  269           Upminster
 
 [269 rows x 2 columns],
                      A                  B  Walking Time
 0    Elephant & Castle      Lambeth North            18
 1        Lambeth North           Waterloo             9
 2             Waterloo         Embankment             6
 3           Embankment      Charing Cross             3
 4        Charing Cross  Piccadilly Circus            11
 ..                 ...                ...           ...
 206      Finsbury Park      Seven Sisters            38
 207      Seven Sisters     Tottenham Hale            19
 208     Tottenham Hale    Blackhorse Road            18
 209    Blackhorse Road        Walthamstow 

In [3]:
# LEAVE THIS CELL BLANK


### Part 2

Write a function that combines each of the datasets into a single pandas DataFrame. Your output should have six columns, with each row corresponding to a single row in one of the `UG_*.csv` files. The columns should include the name of the line, the ID of the first station, the ID of the second station, the name of the first station, the name of the second station, and the walking time between the two stations (or NaN if this data is not available in `walking_times.csv`).

**Hints**

Note that there are slight discrepancies in how the stations are named between some of the files. In particular, the punctuation used in the names are different between `station_names.csv` and `walking_times.csv`. You may find the `pd.Series.str.replace` method helpful. 

Note that the order of pairs of stations may not necessarily be the same between `walking_times.csv` and the `UG_*.csv` files. 



In [54]:
def combine_underground_data(station_names, walking_times, tube_lines):
    '''
    takes in the three DataFrame objects, combines into one large DataFrame, and another smaller DataFrame of 
    walking times that do not belong to any common lines
    these two DataFrames are returned together in a tuple
    '''

    # fixes punctuation discrepancies
    walking_times['A'] = walking_times['A'].str.replace("'", '')
    walking_times['A'] = walking_times['A'].str.replace(".", '')
    walking_times['B'] = walking_times['B'].str.replace("'", '')
    walking_times['B'] = walking_times['B'].str.replace(".", '')

    # merge walking times and IDs, separately from station names and IDs
    # allows for us to keep walking times that don't belong to a line
    walking_and_first_ID = walking_times.merge(station_names,
                                               left_on='A',
                                               right_on='name',
                                               how='left')
    walking_and_first_ID['id'] = walking_and_first_ID['id'].astype(int)

    walking_and_IDs = walking_and_first_ID.merge(station_names,
                                                 left_on='B',
                                                 right_on='name',
                                                 how='left')

    names_and_first_ID = tube_lines.merge(station_names,
                                          left_on='previous_station_id',
                                          right_on='id',
                                          how='left')

    names_and_IDs = names_and_first_ID.merge(station_names,
                                             left_on='next_station_id',
                                             right_on = 'id',
                                             how='left')

    fully_combined = names_and_IDs.merge(walking_and_IDs,
                                         left_on=['name_x', 'name_y'],
                                         right_on=['A', 'B'],
                                         how='outer')


    # separate rows with walking times that belong to a line and those without
    
    walking_and_lines = fully_combined[0:364]
    walking_without_lines = fully_combined[364:]
    
    # reformat the dataframes

    walking_and_lines = walking_and_lines.drop(['id_x_x', 'id_y_x', 'A', 'B', 'id_x_y', 'name_x_y', 'id_y_y', 'name_y_y'], axis=1)
    walking_and_lines.rename(columns={
        'line': 'Underground Line',
        'previous_station_id': 'First Station ID',
        'next_station_id': 'Second Station ID',
        'name_x_x': 'First Station Name',
        'name_y_x': 'Second Station Name',
        'Walking Time': 'Walking Time (mins)',

    }, inplace=True)
    walking_and_lines['First Station ID'] = walking_and_lines['First Station ID'].astype(int)
    walking_and_lines['Second Station ID'] = walking_and_lines['Second Station ID'].astype(int)

    walking_without_lines = walking_without_lines.drop([
        'line',
        'previous_station_id',
        'next_station_id',
        'id_x_x',
        'name_x_x',
        'id_y_x',
        'name_y_x',
        'name_x_y',
        'name_y_y'
    ], axis=1)

    walking_without_lines = walking_without_lines[walking_without_lines.columns[[3, 4, 0, 1, 2]]]
    walking_without_lines['id_x_y'] = walking_without_lines['id_x_y'].astype(int)
    walking_without_lines['id_y_y'] = walking_without_lines['id_y_y'].astype(int)
    walking_without_lines.rename(columns={
        'id_x_y': 'First Station ID',
        'id_y_y': 'Second Station ID',
        'Walking Time': 'Walking Time (mins)'
    }, inplace=True)
    
    return walking_and_lines, walking_without_lines

In [55]:
# Test your code here
data_frames = load_underground_data(MY_PATH)
print(combine_underground_data(data_frames[0], data_frames[1], data_frames[2])[0])
# this will only return the rows with walking times assigned to a line

    Underground Line  First Station ID  Second Station ID First Station Name  \
0           Bakerloo                11                135       Baker Street   
1           Bakerloo                11                175       Baker Street   
2           Bakerloo                39                216         Embankment   
3           Northern                39                216         Embankment   
4           Bakerloo                39                229         Embankment   
..               ...               ...                ...                ...   
359         Victoria               202                253          Stockwell   
360         Victoria               202                254          Stockwell   
361         Victoria               224                252           Victoria   
362         Victoria               252                253            Pimlico   
363    Waterloo_City                13                229               Bank   

    Second Station Name  Walking Time (

  walking_times['A'] = walking_times['A'].str.replace(".", '')
  walking_times['B'] = walking_times['B'].str.replace(".", '')


In [6]:
# LEAVE THIS CELL BLANK


### Part 3

Write a short Markdown paragraph explaining your code in Parts 1 and 2. Your answer should be 4-6 sentences long.

Part 1 creates three `DataFrame` objects for `station_names.csv`, `walking_times.csv`, and all `UG_*.csv` files given in the underground zip/folder using pandas' `read_csv()` function. By looping through each `UG_*.csv` file and appending its contents to one `DataFrame` object, `tube_lines` is created to hold all underground lines and their respective stations. 

Part 2 combines the three `DataFrame` objects (`station_names`, `walking_times`, and `tube_lines`) and returns the result as two `DataFrame` objects: one with walking times that run along underground lines, and ones that do not. This is achieved using the `merge()` function. We rearrange and reformat the two `DataFrame`s to be easier to use.

---

## Section B

The aim of this section is to write a function to provide directions from one station to another, where it may be possible to begin the journey with a walk.

### Part 1a

Write a function called `get_connections`. This function should take as input the combined DataFrame that you output from `combine_underground_data`. It should return a dictionary of eleven `ndarray` objects. These should correspond to the eleven tube lines. Each `ndarray` must be a square matrix of integers (any type of integer is fine) with the number of rows and columns given by the total number of stations. Entries in any of the `ndarray` objects are equal to 0 if the corresponding stations are not adjacent in that line, and 1 if they are adjacent. Station positions must be coherent with `station_names`. For instance, entry `connections[0][134, 10]` (as well as `[10, 134]`) will be set to 1 assuming `134` corresponds to Marylebone, and `10` corresponds to Baker Street (remember that Python indexing starts from 0, so those station numbers should not be `135` and `11`).

In [38]:
def get_connections(combined_data):
    '''
    Takes in combined dataframe to put stations into a dictionary 
    of underground lines mapped to a 269x269 array of their repsective connections
    '''
    connections = {}  # created a dictionary for the connections for each line
    station_names = pd.read_csv("station_names.csv", names=["id", "name"], header=None)

    # Gets the 11 underground lines from the combined Dataframe
    UG_lines = ["Bakerloo", "Central", "Circle", "District", "Hammersmith_City", "Jubilee",
                "Metropolitan", "Northern", "Piccadilly", "Victoria", "Waterloo_City"]

    # Used the for loop to iterate through the underground lines to get the inidividual lines
    for line in UG_lines:

        # Created an empty NumPy array for the connections in the line
        connections[line] = np.zeros((len(station_names), len(station_names)), dtype=int)

        # Used the for loop to iterate through the rows to get the ID of the stations
        for i, j in combined_data[combined_data["Underground Line"] == line].iterrows():  # i,j=column,row

            # Gets the ID of the 1st station and the 2nd station
            # station_id_1 = int(j['First Station ID'][not math.isnan(j['First Station ID'])])
            # station_id_2 = int(j['Second Station ID'][not math.isnan(j['Second Station ID'])])
            station_id_1 = j['First Station ID']
            station_id_2 = j['Second Station ID']

            # Sets the connection between the two stations to 1
            connections[line][station_id_1 - 1][station_id_2 - 1] = 1
            connections[line][station_id_2 - 1][station_id_1 - 1] = 1

    return connections

    

In [40]:
# Test your code here
data_frames = load_underground_data(MY_PATH)
combined_data=combine_underground_data(data_frames[0], data_frames[1], data_frames[2])[0]
# combined_data is a tuple of the two dataframes (walking along lines, walking without common lines)
get_connections(combined_data)

  walking_times['A'] = walking_times['A'].str.replace(".", '')
  walking_times['B'] = walking_times['B'].str.replace(".", '')


{'Bakerloo': array([[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]]),
 'Central': array([[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]]),
 'Circle': array([[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]]),
 'District': array([[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]]),
 'Hammersmith_City': array([[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0,

### Part 1b

Write a function called `get_walking_net`.  This function should take as input the combined DataFrame that you output from `combine_underground_data`, and should return a single nunmpy `ndarray` object containing the walking times between stations. If the walking time between two stations is not contained in the `walking_times.csv` file, then the corresponding entry in the `ndarray` should be NaN. As in Part 1a, the station positions must be coherent with `station_names`.

In [41]:
def get_walking_net(combined_data_and_strays):
    station_names = pd.read_csv("station_names.csv", names=["id", "name"], header=None)

    # Created an empty array for the walking times between the stations
    walking_net = np.empty((len(station_names), len(station_names)), dtype=float)
    walking_net[:] = np.nan

    # Used the for loop to iterate through the rows to get the ID of the stations and the walking times between these
    # stations
    for i, j in combined_data_and_strays[0].iterrows():  # i,j=column,row

        # station_id_1 = int(j['First Station ID'][not math.isnan(['First Station ID'])])  # Gets id of 1st station
        # station_id_2 = int(j['Second Station ID'][not math.isnan(j['Second Station ID'])])  # Gets id of 2nd station
        station_id_1 = j['First Station ID']
        station_id_2 = j['Second Station ID']
        walking_time = j['Walking Time (mins)']  # Gets the walking time between the two stations

        # Gives the walking time between both station 1 and station 2
        walking_net[station_id_1 - 1][station_id_2 - 1] = walking_time
        walking_net[station_id_2 - 1][station_id_1 - 1] = walking_time

    for i, j in combined_data_and_strays[1].iterrows(): # do the same with our stray walking times
        station_id_1 = j['First Station ID']
        station_id_2 = j['Second Station ID']
        walking_time = j['Walking Time (mins)']  

        walking_net[station_id_1 - 1][station_id_2 - 1] = walking_time
        walking_net[station_id_2 - 1][station_id_1 - 1] = walking_time


    return walking_net

In [43]:
# Test your code here
data_frames = load_underground_data(MY_PATH)
combined_data=combine_underground_data(data_frames[0], data_frames[1], data_frames[2])
get_walking_net(combined_data)
# get_walking_net() takes in both elements of the combined_data tuple


  walking_times['A'] = walking_times['A'].str.replace(".", '')
  walking_times['B'] = walking_times['B'].str.replace(".", '')


array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       ...,
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan]])

In [44]:
# LEAVE THIS CELL BLANK


### Part 2

Write a function to be called `single_change_adviser`. It takes as input two integers, `station_in` and `station_out`. It also takes as input a dictionary of `ndarrays` to be called `connections`, assumed to come from a call to `get_connections`, and a single `ndarray` of walking times, assumed to come from a call to `get_walking_net`. Each of `station_in`, `station_out` is assumed to correspond to a station position as in `station_names`. This function must return one of these three outputs:

* if there is a common Underground line that contains both `station_in` and `station_out`, return the *tuple* `(station_in, line, 0)`, where `line` is the common Underground line (as named in `connections`);
* if `station_out` is not in any common line with `station_in`, but `station_in` is walkable to a connecting station (let's call its number `station_change`) that *is* in a same line as `station_out`, then return the tuple `(station_change, line, walking_time)`, where `line` is the common line of `station_change` and `station_out`, and `walking_time` is the time it takes to walk from `station_in` to `station_change`;
* if none of the above applies, return the tuple `(None, None, None)`;

If more than one solution applies (say, `station_in` and `station_out` share more than one common line), just return one possible solution using any criteria of your choice.

**Hints and Reminders**

You can check whether a station is in a particular Underground line by checking whether it has any non-zero entries in the respective row/column of `connections`. You can use whatever *NumPy* commands you wish to make your life here as easy as possible. Examples of useful *NumPy* functions are given in particular in Labs 5-7.

A tuple is just a type of list that we can't change once we create it, and the notation uses round brackets instead of square brackets (e.g. `["hello", "world!"]` is a list while `("hello", "world!")` is a tuple).

---

In [45]:

def get_IDs(connection_grids, line):
    '''
    returns all station IDs on line as a 1-dimensional array
    '''
    stations = []
    for row in range(269):
        for col in range(269):
            if connection_grids[line][row][col] == 1:
                stations.append(row + 1)
                stations.append(col + 1)
    return stations


def single_change_adviser(station_in, station_out, connections, walking_net):
    '''
    returns first available route betweens station_in and station_out
    route is either direct, interchange, or walking change
    '''
    # checks for common line for the station_in and station_out
    for line in connections.keys():
        direct_stations = get_IDs(connections, line)
        if direct_stations.__contains__(station_in) and direct_stations.__contains__(station_out):
            # DIRECT ROUTE
            return (station_in, line, 0)

    # Checks if there are no common lines, but checks if there is a station that is the same line as
    # station_out and that is in a walkable distance to station_in

    for line in connections.keys():
        # redo the direct_stations
        # this is necessary in practice because without separating the for loops, it is possible
        # a route with one change is given instead of a direct route
        direct_stations = get_IDs(connections, line)
        if not direct_stations.__contains__(station_in):
            continue
        for second_line in connections.keys():
            if line == second_line:  # it is possible when loop through the lines twice we are checking the same line
                # in that case, we will of course find similarities in the lines stations since they are all equal
                # this case is useless, so we skip over it
                continue
            change_stations = get_IDs(connections, second_line)
            if not change_stations.__contains__(station_out):
                continue
            for potential_change in direct_stations:  # loop through all stations in line
                for potential_change_other_line in change_stations:  # loop through all stations in line_
                    if potential_change == potential_change_other_line:
                        # we found a match in stations between different lines... not sure if i need this ^^ part
                        # INTERCHANGE ROUTE
                        return (potential_change, second_line, 0)
                    if not math.isnan(walking_net[potential_change - 1][potential_change_other_line - 1]) \
                            and direct_stations.__contains__(potential_change) \
                            and change_stations.__contains__(potential_change_other_line):
                        # we have two lines with two stations that share a walking time
                        # WALKING CHANGE ROUTE
                        return (potential_change_other_line, second_line,
                                walking_net[potential_change - 1][potential_change_other_line - 1])

    return (None, None, None)
        
    # YOUR CODE HERE

In [46]:
# Test your code here
station_in=134
station_out=106
connections=get_connections(combined_data[0])
walking_net=get_walking_net(combined_data)
single_change_adviser(station_in, station_out, connections, walking_net)

(85, 'Piccadilly', 14.0)

In [47]:
# LEAVE THIS CELL BLANK


### Part 3

Write an explanation of how your code for Section B works. Your answer should be 4-6 sentences long.

To create our connections map, we initialise an empty dictionary which will be populated with Underground line names for its keys and the respective 269x269 numpy ndarray for its values. To create our walking net, we initialise a 269x269 numpy ndarray of nan values. Both are populated by extracting the appropriate values from our combined DataFrames. 

We created a function called get_IDs that takes in any line and the connections map, and it returns an array of all station IDs on that line by looping through each 269x269 ndarray in each Underground line. 

single_change_advisor will take in the stations in and out, the connections map, and walking net. This function will return the first available route it finds (either direct, interchange, or walking change). It does so by using get_IDs on our stations in and out and checks for common stations both on the same line or different lines.

---

## Section C: OOP

*In which you will discuss how OOP can be applied to your previous solution.*

We will now apply an object-oriented programming (OOP) perspective to the code written Sections A and B. Please keep everything simple, much of the following consists of reusing code from the previous Sections. **The less you add to it, the better.**


### Part 1 
Create a class `Underground` that will encapsulate data loading and station change advice as originally implemented in Sections A and B. It should provide access to its internal data using *Python properties* only (as in Exercise 3 of Lab 8). Write the code necessary for that in any way you find sensible.

**Reminder**

Several other examples of Python properties are given in the solutions to the exercises of Lab 8, it's not only Exercise 3 which uses them.


In [48]:
class Underground:
    def __init__(self, path_name):
        '''
        creates Underground object using path_name to Underground csv files
        '''

        # gather info
        data_frames_ = load_underground_data(path_name)
        combined_data_ = combine_underground_data(data_frames_[0], data_frames_[1], data_frames_[2])
        self.connection_grids = get_connections(combined_data_[0])
        self.walking_net = get_walking_net(combined_data_)

        # conversions
        for line in self.connection_grids.keys():
            self.connection_grids[line] = self.connection_grids[line].tolist()
        self.walking_net = self.walking_net.tolist()

    def _get_IDs(self, connection_grids, line):
        '''
        returns all station IDs on line as a 1-dimensional array
        '''
        stations = []
        for row in range(269):
            for col in range(269):
                if connection_grids[line][row][col] == 1:
                    stations.append(row + 1)
                    stations.append(col + 1)
        return stations

    def single_change_adviser(self, station_in_ID, station_out_ID):
        '''
        returns first available route betweens station_in and station_out
        route is either direct, interchange, or walking change
        '''
        # checks for common line for the station_in and station_out

        for line in self.connection_grids.keys():

            direct_stations = self._get_IDs(self.connection_grids, line)
            if direct_stations.__contains__(station_in_ID) and direct_stations.__contains__(station_out_ID):
                # DIRECT ROUTE
                return (station_in_ID, line, 0)

            # Checks if there are no common lines, but checks if there is a station that is the same line as
            # station_out and that is in a walkable distance to station_in

        for line in self.connection_grids.keys():

            # redo the direct_stations
            # this is necessary in practice because without separating the for loops, it is possible
            # a route with one change is given instead of a direct route

            direct_stations = self._get_IDs(self.connection_grids, line)
            if not direct_stations.__contains__(station_in_ID):
                continue

            for second_line in self.connection_grids.keys():
                if line == second_line:
                    # it is possible when loop through the lines twice we are checking the same line
                    # in that case, we will of course find similarities in the lines stations since they are all equal
                    # this case is useless, so we skip over it
                    continue

                change_stations = self._get_IDs(self.connection_grids, second_line)
                if not change_stations.__contains__(station_out_ID):
                    continue

                for potential_change in direct_stations:  # loop through all stations in line

                    for potential_change_other_line in change_stations:  # loop through all stations in line_

                        if potential_change == potential_change_other_line and potential_change_other_line != station_in_ID:
                            # we found a match in stations between different lines... not sure if i need this ^^ part
                            # INTERCHANGE ROUTE
                            return (potential_change, second_line, 0)

                        if not math.isnan(self.walking_net[potential_change - 1][potential_change_other_line - 1]) \
                                and direct_stations.__contains__(potential_change) \
                                and change_stations.__contains__(potential_change_other_line):
                            # we have two lines with two stations that share a walking time
                            # WALKING CHANGE ROUTE
                            return (potential_change_other_line, second_line,
                                    self.walking_net[potential_change - 1][potential_change_other_line - 1])

        return (None, None, None)


In [49]:
# Test your code here

ug = Underground(MY_PATH)
print(ug.single_change_adviser(134, 106))
# YOUR CODE HERE

(85, 'Piccadilly', 14.0)


  walking_times['A'] = walking_times['A'].str.replace(".", '')
  walking_times['B'] = walking_times['B'].str.replace(".", '')


In [50]:
# LEAVE THIS CELL BLANK



### Part 2

Write an alternative version of `Underground` where the  internal implementation of `connections` is done similarly to the implementation of `adjacencies` of the `Network` class of Lab 9 (the main point being implementing the adjacencies as dictionaries instead of `ndarrays`). You must keep the public method names and arguments as presented in the implementation of Part 1. As in the `Network` example of Lab 9, you must create a new class representing stations, and another class representing links between physically adjacent Underground stations. Keep those classes as simple as possible. There is no need to create new methods to add individual vertices and edges, all vertices and edges can be added when loading the data. In fact, there is no need for any new public methods in the modified `Underground` class at all.


In [51]:
class Station:  # Station is the vertex
    def __init__(self, id):
        ''' 
        Creating station Id objects by ID
        '''
        self._id = id
        # self._name = name


class Line: # Line is the edge
    def __init__(self, name):
        '''
        Creating line objects by name
        '''
        self.name = name


class Network:
    def __init__(self):
        '''
        Creating an empty set for the stations and an empty dictionary for the adjacent stations to their respective stations
        '''
        self._stations = set()
        self._adjacencies = dict()

    def add_stations(self, station):
        '''
        Add stations to the set
        Adds to adjacent station to the empty dictionary
        '''
        self._stations.add(station)
        self._adjacencies[station] = []

    def add_edge(self, station_1: Station, station_2: Station, edge: Line):
        '''
        Add edges to the vertices which are the adjacent stations on the same line
        '''
        self._adjacencies[station_1].append((station_2, edge))


class Underground:  # load data

    def __init__(self, path_name):
        '''
        creates Underground object using path_name to Underground csv files
        '''

        # gather info
        data_frames_ = load_underground_data(path_name)
        combined_data_ = combine_underground_data(data_frames_[0], data_frames_[1], data_frames_[2])
        self.connection_grids = get_connections(combined_data_[0])
        self.walking_net = get_walking_net(combined_data_)

        net = Network()

        stations = [None] * 269

        for i in range(len(stations)):
            stations[i] = Station(i + 1)

        for j in stations:
            net.add_stations(j)

        self.tube_lines = ["Bakerloo", "Central", "Circle", "District", "Hammersmith_City", "Jubilee",
                           "Metropolitan", "Northern", "Piccadilly", "Victoria", "Waterloo_City"]

        self.UG_stations = {}
        for line in self.tube_lines:
            self.UG_stations[line] = []

        for line in self.connection_grids.keys():
            for row in range(269):
                for col in range(269):
                    if self.connection_grids[line][row][col] == 1:
                        net.add_edge(stations[row], stations[col], Line(line))
                        net.add_edge(stations[col], stations[row], Line(line))

                        self.UG_stations[line].append(row+1)
                        self.UG_stations[line].append(col+1)


        self.connection_grids = net

    def get_combined_data(self):
        '''
        Gets the combined data
        '''
        return combined_data_

    def single_change_adviser(self, station_in_ID, station_out_ID):
        '''
        returns first available route betweens station_in and station_out
        route is either direct, interchange, or walking change
        '''
        # checks for common line for the station_in and station_out

        for line in self.tube_lines:

            direct_stations = self.UG_stations[line]
            if direct_stations.__contains__(station_in_ID) and direct_stations.__contains__(station_out_ID):
                # DIRECT ROUTE
                return (station_in_ID, line, 0)

            # Checks if there are no common lines, but checks if there is a station that is the same line as
            # station_out and that is in a walkable distance to station_in

        for line in self.tube_lines:

            # redo the direct_stations
            # this is necessary in practice because without separating the for loops, it is possible
            # a route with one change is given instead of a direct route

            direct_stations = self.UG_stations[line]
            if not direct_stations.__contains__(station_in_ID):
                continue

            for second_line in self.tube_lines:
                if line == second_line:
                    # it is possible when loop through the lines twice we are checking the same line
                    # in that case, we will of course find similarities in the lines stations since they are all equal
                    # this case is useless, so we skip over it
                    continue

                change_stations = self.UG_stations[second_line]
                if not change_stations.__contains__(station_out_ID):
                    continue

                for potential_change in direct_stations:  # loop through all stations in line

                    for potential_change_other_line in change_stations:  # loop through all stations in line_

                        if potential_change == potential_change_other_line and potential_change_other_line != station_in_ID:
                            # we found a match in stations between different lines... not sure if i need this ^^ part
                            # INTERCHANGE ROUTE
                            return (potential_change, second_line, 0)

                        if not math.isnan(self.walking_net[potential_change - 1][potential_change_other_line - 1]) \
                                and direct_stations.__contains__(potential_change) \
                                and change_stations.__contains__(potential_change_other_line):
                            # we have two lines with two stations that share a walking time
                            # WALKING CHANGE ROUTE
                            return (potential_change_other_line, second_line,
                                    self.walking_net[potential_change - 1][potential_change_other_line - 1])

        return (None, None, None)


        

In [52]:
# Test your code here
ug_test = Underground(MY_PATH)
print(ug_test.single_change_adviser(134, 106))


  walking_times['A'] = walking_times['A'].str.replace(".", '')
  walking_times['B'] = walking_times['B'].str.replace(".", '')


(85, 'Piccadilly', 14.0)


In [53]:
# LEAVE THIS CELL BLANK


### Part 3

Write a short discussion (4-6 sentences) about how you think OOP is useful in the two cases above, explaining how you made your choices on how to organise your code. *Without writing any code*, explain a very simple example on how you could extend either the `Underground` or the corresponding vertex/edge classes to add functionality to your code. 

The OOP is useful for the first case as it encapsulates data such as the walking times and connections and methods which are unique to the Underground class. The OOP is useful for the second case as it allowed us to add new functions and new objects to create lists such as adjacency list of the conncections between the stations. We could extend the Underground classes by producing a method called Djikstra's algorithm which calculates the shortest route between 2 stations.

### Part 4

Write a simple unit test function to test the method `single_change_adviser`. Your test function should implement at least three test cases. This function should work for both implementations of the `Underground` class in Part 1 and Part 2, behaving exactly the same regardless of which version is the one being used.

---

In [33]:
def test_single_change_adviser():
    '''
    runs pre-generated test runs of single_change_adviser
    three unique scenarios are run in which there is either a direct, interchange or walking connection
    '''
    ug_test = Underground(MY_PATH)
    #Test case 1 - Direct route
    #Liverpol street-128 and bond street-23
    if ug_test.single_change_adviser(128, 23)==(128,"Central",0):
        print("Test 1 works - Take the Central Line straight from Liverpool street to Bond street")
        
    #Test case 2 - Route with 1 change 
    #Marble Arch-134, Green Park-85 and Hounslow Central-106
    if ug_test.single_change_adviser(134, 106)==(85,"Piccadilly",14.0):
        print("Test 2 works - Walk 14 minutes from Marble Arch to Green Park and take the Piccadilly line to Hounslow Central")
    
    #Test case 3 - Walking change route  
    #Heahtrow 123-256, Finsbury Park-75, Arsenal-10 and seven sisters-186
    if ug_test.single_change_adviser(256, 186)==(75,"Victoria",10.0):
        print("Test 3 works - Take the Picadilly Line from Heathrow 123 to Arsenal, then walk 10 minutes to Finsbury Park and then take the victoria line to seven sisters")

In [34]:
# Test your code here
test_single_change_adviser()

  walking_times['A'] = walking_times['A'].str.replace(".", '')
  walking_times['B'] = walking_times['B'].str.replace(".", '')


Test 1 works - Take the Central Line straight from Liverpool street to Bond street
Test 2 works - Walk 14 minutes from Marble Arch to Green Park and take the Piccadilly line to Hounslow Central
Test 3 works - Take the Picadilly Line from Heathrow 123 to Arsenal, then walk 10 minutes to Finsbury Park and then take the victoria line to seven sisters
