# Binary Search Tree - Assignment

In this assignment, we will work on HDB property information downloaded from [data.gov.sg](https://data.gov.sg/dataset/hdb-property-information).

**HDB Property Information**

HDB property information contains the location of existing HDB blocks, highest floor level, year of completion, type of building and number of HDB flats (breakdown by flat type) per block etc.

<img src="images/hdb_info.png" width=700 />

The csv file `hdb-property-information.csv` in `data` folder contains information of 12267 records. 
* We are only interested in residential HDB blocks. 
* We will build a **Binary Search Tree** to store such data for quick lookup.

## 1. Load CSV Data

We will implement a `HdbInfo` class to represent residential type of record; load all residential records from `hdb-property-information.csv` file in `data` folder. 

### Define Class `HdbInfo`

Define a class `HdbInfo` which has following attributes:
* `blk_no`: data from `blk_no` column
* `street`: data from `street` column
* `year_completed`: data from `year_completed` column
* `max_floor`: data from `max_floor_lvl` column
* `total_units`: data from `total_dwelling_units` column

Implement its `__init__()` function to which takes in parameter `blk_no`, `street`. 
* Initialize its other attributes to `0`.

Implement its `__str__()` and `__repr__()` functions which returns string in following format.

```
<blk_no> <street>: completed in <year_completed>, <max_floor> floors, <total_units> units
```
* For example, instance with data `10A,BOON TIONG RD,40,2014,Y,N,N,Y,N,N,BM,228,0,0,76,152,0,0,0,0,0,0,0,0` gives following printout:

```
10A BOON TIONG RD: completed in 2014, 40 floors, 228 units
```

Implement method `_get_address()` which returns address of the HDB block in the format of `"<blk_no> <stree>"`.
* For example, return `"10A BOON TIONG RD"` from above record.

Implement its `__lt__()` method such that `hdb1 < hdb2` will return `True` if string representation of `hdb1` is alphabetically in front of `hdb2`. 

Implement its `__eq__()` method such that `hdb1 == hdb2` will return `True` if the `_get_address()` method of both `hdb1` and `hdb2` returns same string value. 

<u>Test 1</u>

Sample output:
```
10A BOON TIONG RD: completed in 2014, 40 floors, 228 units
10A BOON TIONG RD: completed in 2014, 40 floors, 228 units
```

In [2]:
h = HdbInfo('10A', 'BOON TIONG RD')
h.year_completed = 2014
h.max_floor = 40
h.total_units = 228
print(str(h))
print(repr(h))

10A BOON TIONG RD: completed in 2014, 40 floors, 228 units
10A BOON TIONG RD: completed in 2014, 40 floors, 228 units


<u>Test 2</u>

Sample Output:
```
True
False
False
True
```

In [3]:
h1 = HdbInfo('10A', 'BOON TIONG RD')
h2 = HdbInfo('9A', 'BOON TIONG RD')
h3 = HdbInfo('9A', 'BOON TIONG RD')

print(h1 < h2)
print(h1 > h3)
print(h1 == h2)
print(h2 == h3)

True
False
False
True


### Load CSV Data

Write a function `load_hdb_info()`, which has a parameter `csv_path` pointing to the csv file.
* It skips all non-residential data.
* For each line of data it reads, it converts it to a `HdbInfo` instance.
* It returns list of `HdbInfo` instances converted from the data in the csv file.

You can use following index values.
```
BLK_NO = 0
STREET = 1
MAX_FLOOR = 2
YEAR_COMPLETED = 3
IS_RESIDENTIAL = 4
TOTAL_UNITS = 11
```

<u>Test</u>

Sample output:
```
10047
1 BEACH RD: completed in 1970, 16 floors, 142 units
```

In [5]:
hdb_list = load_hdb_info('data/hdb-property-information.csv')
print(len(hdb_list))
print(hdb_list[0])

10047
1 BEACH RD: completed in 1970, 16 floors, 142 units


## 2. Binary Search Tree

We will implement a binary search tree to store list of `HdbInfo` instances. Using the tree, user can perform quick lookup of records. 

### Binary Search Tree

#### Class Node

Implement Node class to represent a node in the tree. 
* It contains `left`, `right` attributes pointing to its left and right child respectively.
* Its `data` attribute will store the `HdbInfo` instance.
* Its `__str__()` method returns string representation of the object in `data`.

#### Class BinarySearchTree

Implement a BinarySearchTree which has following features:
* Its `__init__()` method initializes its `root` attribute with input parameter `root`, which has a default value of `None`.
* Implement an `add()` method for adding of a `HdbInfo` instance to the tree.
* Implement a `preoder()` method which traversal through the tree in **preoder**. But it doesn't print out visited nodes, instead, it returns the total number of nodes in the tree. 

<u>Test</u>

Sample Output: `10047`

In [8]:
tree = BinarySearchTree()

for hdb in hdb_list:
    tree.add(hdb)

tree.preorder()

10047

#### Class BinarySearchTree2

Implement a class `BinarySearchTree2` extending from `BinarySearchTree`.
* Add a `find()` method which find a node by `blk_no` and `street`.

<u>Test</u>: Create an instance of BinarySearchTree2. And populate the tree with records.

Sample Output: `10047`

In [10]:
tree2 = BinarySearchTree2()

for hdb in hdb_list:
    tree2.add(hdb)

tree2.preorder()

10047

<u>Test</u>: Use the tree to find 2 records.

In [11]:
r1 = tree2.find('10A', 'BOON TIONG RD')
print(r1)

r2 = tree2.find('999B', 'BUANGKOK CRES')
print(r2)

10A BOON TIONG RD: completed in 2014, 40 floors, 228 units
999B BUANGKOK CRES: completed in 2018, 17 floors, 126 units


## 3. Comparing Performance

### Linear Search

Implement a function `linear_find_hdb()` which perform linear search for a HDB using `blk_no` and `street`.
* It takes in 2 parameters, `blk_no` and `street`.

<u>Test</u>: Performance of Linear Search

In [13]:
%%timeit
result = linear_find_hdb('999B', 'BUANGKOK CRES')

25.9 ms ± 4.16 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


<u>Test</u>: Performance of Binary Search Tree

In [14]:
%%timeit
result = tree2.find('999B', 'BUANGKOK CRES')

2.19 ms ± 215 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
