# Data Structure Experiment

The purpose of this notebook is to experiment with data structures in order to argue choices in algorithms.






In [1]:
# correct working directory.
# This is necessary for imports because the notebook is not in the main folder of the project. 
if not "working_directory_corrected" in vars():
    %cd ..
    working_directory_corrected = True

from collections import deque
from queue import PriorityQueue
from queue import SimpleQueue
from queue import Queue
from queue import LifoQueue
import heapq

import time
import copy
import random


c:\Users\frank\Documents\Teaching\LU\Planning and Optimization LU - Material\Planning Example Project\planning_example_project


  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


# Experiment 1 - List Data Types for Breadth First Search and Depth First Search

The goal of this epxeriment is to determine which list data type is better suited for representing the open list in BFS and DFS.

## Procedure

For this experiment we will compare list algorithms for their use within Breadth First Search and Depth First Search.

The following data structures will be compared:
- Python Lists
- Double Ended Queue (collections.deque)
- queue.SimpleQueue
- queue.Queue
- queue.LifoQueue

For each data structure, we will test the following cases, as far as they are applicable:
- Put element
- Pop oldest element.
- Pop newest element.
These cases are tested because they are the operations requiered for Depth First Search and Breadth First Search. 
It should be noted that pure queue and stack implementations (queue.Queue, queue.SimpleQueue, queue.LifoQueue) only allow popping from one end of the list. They are still included in this experiment because implementing either BFS or DFS only requires one of the two deletion operations.


Each use case will be tested by inserting / deleting 100.000 elements. We will then compare the overall time for each case.


## Results

The below cell runs the experiment and prints the results.


In [6]:
# list insert
list = []
start = time.time_ns()
for i in range(100000):
    list.append(i)
end = time.time_ns()
print(f"List - Put took {end - start} ns")

# list remove at end
list2 = copy.copy(list)
start = time.time_ns()
for i in range(len(list)):
    list2.pop()
end = time.time_ns()
print(f"List - Pop newest took {end - start} ns")

# list remove at front
list2 = copy.copy(list)
start = time.time_ns()
for i in range(len(list)):
    list2.pop(0)
end = time.time_ns()
print(f"List - Pop oldest took {end - start} ns")

# dequeue insert
q = deque()	
start = time.time_ns()	
for i in range(100000):
    q.append(i)
end = time.time_ns()	
print(f"Deque - Put took {end - start} ns")	

# dequeue remove at end	
q2 = copy.copy(q)	
start = time.time_ns()	
for i in range(len(q)):
    q2.pop()	
end = time.time_ns()	
print(f"Deque - Pop newest took {end - start} ns")	

# dequeue remove at front	
q2 = copy.copy(q)	
start = time.time_ns()	
for i in range(len(q)):
    q2.popleft()	
end = time.time_ns()	
print(f"Deque - Pop oldest took {end - start} ns")	

# Queue insert	
q = Queue()	
start = time.time_ns()	
for i in range(100000):
    q.put(i)	
end = time.time_ns()	
print(f"Queue - Put took {end - start} ns")	

# Queue remove front	
start = time.time_ns()	
for i in range(100000):
    q.get()	
end = time.time_ns()	
print(f"Queue - Pop oldest took {end - start} ns")	

# SimpleQueue insert
q = SimpleQueue()	
start = time.time_ns()	
for i in range(100000):
    q.put(i)	
end = time.time_ns()	
print(f"SimpleQueue - Put took {end - start} ns")	

# SimpleQueue remove front
start = time.time_ns()	
for i in range(100000):
    q.get()	
end = time.time_ns()	
print(f"SimpleQueue - Pop oldest took {end - start} ns")	

# LifoQueue insert
q = LifoQueue()	
start = time.time_ns()	
for i in range(100000):
    q.put(i)	
end = time.time_ns()	
print(f"LifoQueue - Put took {end - start} ns")	

# LifoQueue remove back
start = time.time_ns()	
for i in range(100000):
    q.get()	
end = time.time_ns()	
print(f"LifoQueue - Pop newest took {end - start} ns")		




List - Put took 9106000 ns
List - Pop newest took 9095800 ns
List - Pop oldest took 854796300 ns
Deque - Put took 6999700 ns
Deque - Pop newest took 6003700 ns
Deque - Pop oldest took 6094800 ns
Queue - Put took 108912700 ns
Queue - Pop oldest took 103044000 ns
SimpleQueue - Put took 6006300 ns
SimpleQueue - Pop oldest took 7111100 ns
LifoQueue - Put took 100922200 ns
LifoQueue - Pop newest took 98514500 ns


The below table is the result of running the experiment on a Lenovo ThinkPad E14 Gen 6, standard configuration.
Results are rounded to the nearest millisecond.

|          | List | Deque | queue.SimpleQueue | queue.Queue | queue.LifoQueue |
| -------- | ------- | ------- | ------- | ------- | ------- | 
| Put|   7 ms | 6 ms  | 6 ms| 111 ms | 103 ms |
| Pop Newest |   5  ms | 7 ms   | - | - | 102 ms | 
| Pop Oldest |  793 ms | 6 ms  | 8 ms | 108 ms | - |


# Discussion

In the above results we can see two clear classes of time. Some operations take between 5 and 10 ms. Other take 100 ms and more. When rerunning the above cell the times change slightly. From this observation we assume that differences of one or two milliseconds are not significant in the above table. 

Based on this observation we choose Dequeue as it has a time of 5 to 10 ms for all operations. For BFS, we could also have chosen queue. SimpleQueue for the same reason (BFS does not require the Pop Newest Operation).

queue.Queue and queue.LifoQueue are clearly outperformed by the normal Dequeue. This is likely due to those data structures including support for multi-threading, which adds additional computational complexity.

Python lists take significantly longer for the pop oldest operation. This aligns with the complexity classes given here: https://www.geeksforgeeks.org/deque-vs-list-in-python/. They could have been chosen for implementing DFS as the pop oldes operation is not used in this algorithm.

# Experiment 2 - Priority Queues 

The goal of this experiment is to determine which priority queue implementation is more performant for use in A*, Heuristic Search and Dijkstra's Algorithm.

## Procedure

For this experiment we will compare the following Priority Queue implementations:
- queue.PriorityQueue
- heapq

For each data structure, we will test the following cases:
- Adding elements in random order.
- Removing from front.
These cases are tested because they are the operations requiered for the abovementioned heuristic search methods. 
Each use case will be tested by inserting / deleting 100.000 elements. Randomness will be controlled by using the same random seed for both experiments. We will then compare the overall time for each case.


## Results

The below cell runs the experiment and prints the results.


In [15]:

# PriorityQueue insert
pq = PriorityQueue()
random.seed(42)
start = time.time_ns()
for i in range(100000):
    pq.put(random.randint(0,10000000))
end = time.time_ns()
print(f"PriorityQueue - Insert took {end - start} ns")

# PriorityQueue remove
for i in range(len(list)):
    pq.get()
end = time.time_ns()
print(f"PriorityQueue - Remove took {end - start} ns")

# heapq insert	
h = []	
start = time.time_ns()	
for i in range(100000):
    heapq.heappush(h, random.randint(0,10000000))	
end = time.time_ns()	
print(f"Heapq - Insert took {end - start} ns")	

# heapq remove	
for i in range(len(list)):
    heapq.heappop(h)	
end = time.time_ns()	
print(f"Heapq - Remove took {end - start} ns")	

PriorityQueue - Insert took 151701100 ns
PriorityQueue - Remove took 267593600 ns
Heapq - Insert took 56516700 ns
Heapq - Remove took 92520700 ns


The below table is the result of running the experiment on a Lenovo ThinkPad E14 Gen 6, standard configuration.
Results are rounded to the nearest millisecond.

|          | PriorityQueue | heapq |
| -------- | ------- | ------- | 
| Insert|   152   ms | 57  ms  | 
| Remove|   268    ms | 93  ms   | 


# Discussion

According to our results, the PriorityQueue implementation was slower than the heapq implementation by a factor of three. 
This may be due to the fact that the PriorityQueue class has added complexity to make it useable in context of multi-threading use cases.

According to these results we will use a heapq for implementing our algorithms.
