# Binary Tree Problem Set
## Problem #1

>A binary tree is a data structure where each node in the tree has two, one, or no, children nodes.
>Implement a binary tree data structure with the following functions:
>
>a)	Insert a node to the binary tree
>
>b)	Swap two nodes on the binary tree
>
>c)	An algorithm to Sort the binary tree (https://en.wikipedia.org/wiki/Tree_sort)
>
>d)	Remove a node from the binary tree without breaking the remaining tree structure


Run the following in the terminal for a demo of all the tasks in Problem #1.

```
python Tree.py
```
Note: You can update the parameters in the script to reduce script execution time.

The code is found in `Tree.py`.




## Problem #2
>Using your tree from Problem #1, 
>
>a. Implement a brute force search algorithm
>
>b. implement a depth-first search algorithm
>
>c. implement a breadth first search algorithm
>
>d. insert a large number of nodes into your tree (10,000; 100,000; 1,000,000), measure the performance of each of your search algorithms (time to complete), comment on the Big O complexity of each (Best case, average case, worst case)

Run the following in the terminal for a demo of all the tasks in Problem #2.

```
python CompareAlgorithms.py
```

Note that you may need to install the `pandas` package to run the script. This can be done with one the the following:
* `pip install pandas` if `pip` is installed
* `conda install pandas` if Anaconda is installed

The code is found in `CompareAlgorithms.py` as well as in `Tree.py`.

To determine which algorithms are significantly different from each other with greater confidence, 
you would also perform statistical tests (e.g. one-way ANOVA followed by posthoc testing)
on the resulting data

## Big O complexity

Algorithm | Worst Case | Best Case | Average Case
--- | ---- | --- | ---
random search
binary tree search
depth-first search
breadth-first search

## Experiment results

Experiment parameters:
* Tree size: 10,000
* 50 trials

### All algorithms

In [5]:
import sys
sys.path.append(r"../")
from CompareAlgorithms import *

if __name__ == '__main__':

  ##################################################
  # UPDATE THIS IF NEEDED TO REDUCE RUN TIME: 
  # #Number of trials to perform
  n_trials = 30

  # Maximum number of nodes to traverse for brute force (random) search algorithm 
  max_nodes = 1000000

  ##################################################

  # Number of tree nodes
  tree_size = 10000

  logging_level='INFO'

  comparator = CompareAlgorithms(
    tree_size=tree_size,
    logging_level=logging_level
    )
  results = comparator.run_experiment(n_trials, max_nodes=max_nodes)
  print('\nAll data:\n', results)

Running 30 trials using trees with 10000 elements each.
Starting search for random strategy (timestamp: 1724381273.0011146)
Starting search for binary strategy (timestamp: 1724381273.4756613)
Starting search for breadth_first strategy (timestamp: 1724381273.4766605)
Starting search for depth_first strategy (timestamp: 1724381273.4866548)
Starting search for random strategy (timestamp: 1724381279.7702222)
Starting search for binary strategy (timestamp: 1724381279.860351)
Starting search for breadth_first strategy (timestamp: 1724381279.860351)
Starting search for depth_first strategy (timestamp: 1724381279.863881)
Starting search for random strategy (timestamp: 1724381290.1409783)
Starting search for binary strategy (timestamp: 1724381293.0259159)
Starting search for breadth_first strategy (timestamp: 1724381293.0299149)
Starting search for depth_first strategy (timestamp: 1724381293.0699005)
Starting search for random strategy (timestamp: 1724381300.2870944)
Starting search for binary 

In [6]:
summary = pd.DataFrame(results['fastest'].value_counts())
summary.index.name = None
summary

Unnamed: 0,count
binary,29
breadth_first,1


### Algorithms other than binary tree search

In [29]:
def find_fastest(row):
  # print(f'row: {row.name}, {row.index}')
  sorted = row.sort_values()
  return sorted.index[0]

results_without_binary = results.loc[:, ['random', 'breadth_first', 'depth_first']]
results_without_binary['fastest'] = results_without_binary.apply(lambda x: find_fastest(x), axis=1)
results_without_binary

Unnamed: 0_level_0,random,breadth_first,depth_first,fastest
trial,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,0.474547,0.009994,0.001002,depth_first
1,0.090129,0.002998,0.007997,breadth_first
2,1001.882231,0.039986,0.037999,depth_first
3,0.150301,0.012036,0.015015,breadth_first
4,0.126031,0.007409,0.013006,breadth_first
5,1000.353088,0.021374,0.006551,depth_first
6,1000.578554,0.018002,0.003999,depth_first
7,999.165356,0.014672,0.009002,depth_first
8,999.153041,0.021225,0.008,depth_first
9,1000.692481,0.026833,0.011997,depth_first


In [31]:
summary_without_binary = pd.DataFrame(results_without_binary['fastest'].value_counts())
summary_without_binary.index.name = None
summary_without_binary

Unnamed: 0,count
depth_first,18
breadth_first,12


### Raw data

Time is shown in seconds

In [7]:
results

Unnamed: 0_level_0,random,binary,breadth_first,depth_first,fastest
trial,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,0.474547,0.0,0.009994,0.001002,binary
1,0.090129,0.0,0.002998,0.007997,binary
2,1001.882231,0.0,0.039986,0.037999,binary
3,0.150301,0.0,0.012036,0.015015,binary
4,0.126031,0.0,0.007409,0.013006,binary
5,1000.353088,0.0,0.021374,0.006551,binary
6,1000.578554,0.0,0.018002,0.003999,binary
7,999.165356,0.0,0.014672,0.009002,binary
8,999.153041,0.0,0.021225,0.008,binary
9,1000.692481,0.000999,0.026833,0.011997,binary
