# Binary Search Tree - Assignment

In this assignment, we will work on HDB property information downloaded from [data.gov.sg](https://data.gov.sg/dataset/hdb-property-information).

**HDB Property Information**

HDB property information contains the location of existing HDB blocks, highest floor level, year of completion, type of building and number of HDB flats (breakdown by flat type) per block etc.

<img src="images/hdb_info.png" width=700 />

The csv file `hdb-property-information.csv` in `data` folder contains information of 12267 records. 
* We are only interested in residential HDB blocks. 
* We will build a **Binary Search Tree** to store such data for quick lookup.

## 1. Load CSV Data

We will implement a `HdbInfo` class to represent residential type of record; load all residential records from `hdb-property-information.csv` file in `data` folder. 

### Define Class `HdbInfo`

Define a class `HdbInfo` which has following attributes:
* `blk_no`: data from `blk_no` column
* `street`: data from `street` column
* `year_completed`: data from `year_completed` column
* `max_floor`: data from `max_floor_lvl` column
* `total_units`: data from `total_dwelling_units` column

1) Implement its `__init__()` function to which takes in parameter `blk_no`, `street`. 
* Initialize its other attributes to `0`.

2) Implement its `__str__()` and `__repr__()` functions which returns string in following format.

```
<blk_no> <street>: completed in <year_completed>, <max_floor> floors, <total_units> units
```
* For example, instance with data `10A,BOON TIONG RD,40,2014,Y,N,N,Y,N,N,BM,228,0,0,76,152,0,0,0,0,0,0,0,0` gives following printout:

```
10A BOON TIONG RD: completed in 2014, 40 floors, 228 units
```

3) Implement method `_get_address()` which returns address of the HDB block in the format of `"<blk_no> <stree>"`.
* For example, return `"10A BOON TIONG RD"` from above record.

4) Implement its `__lt__()` method such that `hdb1 < hdb2` will return `True` if `hdb1._get_address()` is alphabetically in front of `hdb2._get_address()`. 

5) Implement its `__eq__()` method such that `hdb1 == hdb2` will return `True` if the `_get_address()` method of both `hdb1` and `hdb2` returns same string value. 

In [1]:
class HdbInfo:
    
    def __init__(self, blk_no, street):
        self.blk_no = blk_no
        self.street = street
        self.year_completed = 0
        self.max_floor = 0
        self.total_units = 0
    
    def __str__(self):
        
        return '{} {}: completed in {}, {} floors, {} units'.format(
            self.blk_no, self.street, self.year_completed, self.max_floor, self.total_units)

    def __repr__(self):
        return self.__str__()
    
    def _get_address(self):
        return '{} {}'.format(self.blk_no, self.street)
    
    def __lt__(self, other):
        return self._get_address() < other._get_address()
    
    def __eq__(self, other):
        return self._get_address() == other._get_address()
    

<u>Test 1</u>

Sample output:
```
10A BOON TIONG RD: completed in 2014, 40 floors, 228 units
10A BOON TIONG RD: completed in 2014, 40 floors, 228 units
```

In [2]:
h = HdbInfo('10A', 'BOON TIONG RD')
h.year_completed = 2014
h.max_floor = 40
h.total_units = 228
print(str(h))
print(repr(h))

10A BOON TIONG RD: completed in 2014, 40 floors, 228 units
10A BOON TIONG RD: completed in 2014, 40 floors, 228 units


<u>Test 2</u>

Sample Output:
```
True
False
False
True
```

In [3]:
h1 = HdbInfo('10A', 'BOON TIONG RD')
h2 = HdbInfo('9A', 'BOON TIONG RD')
h3 = HdbInfo('9A', 'BOON TIONG RD')

print(h1 < h2)   # __lt__()
print(h1 > h3)   # __gt__()
print(h1 == h2)
print(h2 == h3)

True
False
False
True


### Load CSV Data

Write a function `load_hdb_info()`, which has a parameter `csv_path` pointing to the csv file.
* It skips all non-residential data.
* For each line of data it reads, it converts it to a `HdbInfo` instance.
* It returns list of `HdbInfo` instances converted from the data in the csv file.

You can use following index values.
```
BLK_NO = 0
STREET = 1
MAX_FLOOR = 2
YEAR_COMPLETED = 3
IS_RESIDENTIAL = 4
TOTAL_UNITS = 11
```

In [6]:
import csv

BLK_NO = 0
STREET = 1
MAX_FLOOR = 2
YEAR_COMPLETED = 3
IS_RESIDENTIAL = 4
TOTAL_UNITS = 11

def load_hdb_info(csv_path):
    result = []
    
    with open(csv_path) as f:
        reader = csv.reader(f)
        header = next(reader)
        
        for row in reader:
            if row[IS_RESIDENTIAL] != 'Y':
                continue
            h = HdbInfo(row[BLK_NO], row[STREET])
            h.year_completed = int(row[YEAR_COMPLETED])
            h.max_floor = int(row[MAX_FLOOR])
            h.total_units = int(row[TOTAL_UNITS])
            result.append(h)
    
    return result

<u>Test</u>

Sample output:
```
10047
1 BEACH RD: completed in 1970, 16 floors, 142 units
```

In [8]:
hdb_list = load_hdb_info('data/hdb-property-information.csv')
print(len(hdb_list))
print(hdb_list[0])

10047
1 BEACH RD: completed in 1970, 16 floors, 142 units


## 2. Binary Search Tree

We will implement a binary search tree to store list of `HdbInfo` instances. Using the tree, user can perform quick lookup of records. 

### Binary Search Tree

#### Class Node

Implement Node class to represent a node in the tree. 
* It contains `left`, `right` attributes pointing to its left and right child respectively.
* Its `data` attribute will store the `HdbInfo` instance.
* Its `__str__()` method returns string representation of the object in `data`.

In [9]:
class Node:
    
    def __init__(self, data=None, left=None, right=None):
        self.data = data
        self.left = left
        self.right = right
    
    def __str__(self):
        return str(self.data)

    def __repr__(self):
        return self.__str__()

#### Class BinarySearchTree

Implement a BinarySearchTree which has following features:
* Its `__init__()` method initializes its `root` attribute with input parameter `root`, which has a default value of `None`.
* Implement an `add()` method for adding of a `HdbInfo` instance to the tree.
* Implement a `preoder()` method which traversal through the tree in **preoder**. But it doesn't print out visited nodes, instead, it returns the total number of nodes in the tree. 

In [10]:
class BinarySearchTree:
    
    def __init__(self, root=None):
        self.root = root
    
    def preorder(self):
        return self._preorder(self.root)

    def _preorder(self, node=None):
        if node is None:
            return 0
        else:
            count = self._preorder(node.left)
            count = count + 1
            count = count + self._preorder(node.right)
            return count
    
    # Alternative version of preorder
    def preorder2(self):
        return self._preorder(self.root)

    def _preorder2(self, node=None, count=0):
        if node is None:
            return count
        else:
            count = self._preorder(node.left, count)
            count = count + 1
            count = self._preorder(node.right, count)
            return count
    
    def add(self, val):
        if self.root is None:
            self.root = Node(val)
        else:
            self._add(self.root, val)
    
    def _add(self, node, val):
#         print('Visiting', node.data)
        if node is None:    # for precaution
            return
        if val < node.data:
            if node.left is None:
                node.left = Node(val)
            else:
                self._add(node.left, val)
        if val > node.data:
            if node.right is None:
                node.right = Node(val)
            else:
                self._add(node.right, val)
        if val == node.data:
            return
        

<u>Test</u>

Sample Output: `10047`

In [27]:
tree = BinarySearchTree()

for hdb in hdb_list:
    tree.add(hdb)

print(tree.preorder())
print(tree.preorder2())

10047


AttributeError: 'BinarySearchTree' object has no attribute 'preorder2'

#### Class BinarySearchTree2

Implement a class `BinarySearchTree2` extending from `BinarySearchTree`.
* Add a `find()` method which find a node by `blk_no` and `street`.

In [19]:
class BinarySearchTree2(BinarySearchTree):
    
    def find(self, blk_no, street):
        if self.root is None:
            return None
        else:
            return self._find(self.root, HdbInfo(blk_no, street))
        
    def _find(self, node, val):
#         print('Visiting', node.data)
        if node is None:
            return None
        if val == node.data:
            return node.data
        if val < node.data:
            return self._find(node.left, val)
        if val > node.data:
            return self._find(node.right, val)

<u>Test</u>: Create an instance of BinarySearchTree2. And populate the tree with records.

Sample Output: `10047`

In [20]:
tree2 = BinarySearchTree2()

for hdb in hdb_list:
    tree2.add(hdb)

tree2.preorder()

10047

<u>Test</u>: Use the tree to find 2 records.

In [21]:
r1 = tree2.find('10A', 'BOON TIONG RD')
print(r1)

r2 = tree2.find('999B', 'BUANGKOK CRES')
print(r2)

10A BOON TIONG RD: completed in 2014, 40 floors, 228 units
999B BUANGKOK CRES: completed in 2018, 17 floors, 126 units


## 3. Comparing Performance

### Linear Search

Implement a function `linear_find_hdb()` which perform linear search for a HDB using `blk_no` and `street`.
* It takes in 2 parameters, `blk_no` and `street`.

In [24]:
def linear_find_hdb(blk_no, street):
    
    target = HdbInfo(blk_no, street)
    
    for h in hdb_list:
        if h == target:
            return h
    

<u>Test</u>: Performance of Linear Search

In [25]:
%%timeit
result = linear_find_hdb('999B', 'BUANGKOK CRES')

19.7 ms ± 1.64 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


<u>Test</u>: Performance of Binary Search Tree

In [26]:
%%timeit
result = tree2.find('999B', 'BUANGKOK CRES')

3.07 ms ± 499 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
