# A Star Test Experiment 

In this experiment we evaluate whether the coordinates we have annotated via graph layouting are usable to inform an A* search - and which coordinates are appropriate.

For this purpose, we will run A* and Dijkstra on each created coordinate annotation to find out which of them is most suitable.


In [1]:
# correct working directory.
# This is necessary for imports because the notebook is not in the main folder of the project. 
if not "working_directory_corrected" in vars():
    %cd ..
    working_directory_corrected = True


import pandas as pd

from evaluation.timed_experiment import Timed_Experiment
from algorithms.dijkstra import Dijkstra
from algorithms.a_star import A_Star


# load dataset
from data.dataset import Dataset

  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


c:\Users\frank\Documents\Teaching\LU\Planning and Optimization LU - Material\Planning Example Project\planning_example_project


## Dataset
This experiment is based on data generated with the script generate_coordinates. This script applies all algorithms identified in coordinate_annotation.ipynb to the graph and generates coordinates. It does so 5 times since in previous experiments we noted that the coordinates differ significantly from each other from run to run.


While executing this script, we had to exclude some of the algorithms due to the following reasons:
- kamada_kawai and mds ran into memory issues and could not be executed
- davidson_harel and graphopt have not delivered a result within an hour of time. To keep the experiment manageable time-wise, they have been excluded based on this.


After these exclusions, the following algorithms remained: auto, drl, fruchterman_reingold



## Experiment 1 - Scale factor

As discussed in coordinate_annotation.ipynb, some coordinates where extremely close together. To give A* a fair chance we would like to rectify this by scaling coordinates so that they are in similar maximal distances.

**Procedure**

The cell below loads the first coordinate set of each algorithm and prints the minimum and maximum x and y coordinates. From this we will determine scale factors as potenties of 10 to assure that both coordinates are at least 10000 apart.

In [2]:
from asyncio.windows_events import INFINITE


for algorithm_name in ["auto", "drl", "fruchterman_reingold"]:
    file_name = f"if_{algorithm_name}_1"

    # load and convert graph
    dataset = Dataset()
    graph = dataset.load_graph()
    dataset.convert_to_spatial(graph,file_name)

    # find min and max of coordinates
    min_x = min(graph.node_positions, key= lambda coord: coord[0])[0]
    max_x = max(graph.node_positions, key= lambda coord: coord[0])[0]
    min_y = min(graph.node_positions, key= lambda coord: coord[1])[1]
    max_y = max(graph.node_positions, key= lambda coord: coord[1])[1]

    print("\n" + algorithm_name)
    print(f"min_x: {min_x}, max_x: {max_x}, Distance: {max_x - min_x}")		
    print(f"min_y: {min_y}, max_y: {max_y}, Distance: {max_y - min_y}")	

    


auto
min_x: -979.9690551757812, max_x: 974.4048461914062, Distance: 1954.3739013671875
min_y: -955.8806762695312, max_y: 950.1741333007812, Distance: 1906.0548095703125

drl
min_x: -962.9459228515625, max_x: 959.2142944335938, Distance: 1922.1602172851562
min_y: -979.453125, max_y: 961.8577270507812, Distance: 1941.3108520507812

fruchterman_reingold
min_x: -258.30335937319484, max_x: 290.7462575501768, Distance: 549.0496169233716
min_y: -276.3667291666116, max_y: 282.34564149108724, Distance: 558.7123706576988


**Conclusions:**

From this experiment it seems like the the coordinates are already quite far apart and don't need to be changed much.

We will apply the following scale factors:
- auto: 10
- drl: 10
- fruchterman_reingold: 100


## Experiment 2: Which coordinates are best?

The goal of this experiment is to determine which coordinates perform best with A* and Dijkstra.

**Procedure**

In the below cell we run Dijkstra and A* with all 5 variants of the three layout algorithms. To have a comparison benchmark we also run them without coordinates.

Each run will be comprised of 100 planning problems, all with the same seed.From each experiment we will collect the following data:
- *average_time*: The average time they took to solve each problem.
- *nr_extended*: The number of nodes that were extended.
- *time per extension*: The average time required for one extension.

We will also record these three values for failed and successfull runs.

After running the experiments we will use these values to discuss the results. 
Specifically, we will use the average time to decide which dataset in which variation to select and compare against other algorithms later.

In [3]:

algorithm_names = [None, "auto", "drl", "fruchterman_reingold"]
scale_factors = [0, 10,10,100]
planners = [Dijkstra, A_Star] 

for index in range(len(algorithm_names)):
    algorithm_name = algorithm_names[index]	
    scale_factor = scale_factors[index]	
    
    for variation in range(5):
        file_name = f"if_{algorithm_name}_{variation}"
        dataset = Dataset()
        graph = dataset.load_graph()
        if algorithm_name is not None:
            dataset.convert_to_spatial(graph,file_name, scale_factor=scale_factor)

        for planner in planners:

            print(f"\nrunning {planner} on algorithm {algorithm_name} with variation {variation}")
            experiment = Timed_Experiment(graph, planner, 100, random_seed=42, verbose = False)
            experiment.run()
            print("Average Time: ", int(experiment.get_average_time()), "ns")
            print("Nr Extended: ", experiment.get_nr_extensions())
            print("Time per extension: ", int(experiment.get_average_extension_time()), "ns")
            
            print("Nr successful runs:", experiment.get_nr_successful_runs())	
            print("Average Successful Time: ", int(experiment.get_average_successful_time()), "ns")	 
            print("Nr Successful Extensions: ", experiment.get_nr_successful_extensions())	
            print("Time per successful extension: ", int(experiment.get_average_successful_extension_time()), "ns")	
            
            print("Nr unsuccessful runs:", experiment.get_nr_unsuccessful_runs())	
            print("Average Unsuccessful Time:", int(experiment.get_average_unsuccessful_time()), "ns")	 
            print("Nr Failed Extensions: ", experiment.get_nr_unsuccessful_extensions())	
            print("Time per failed extension: ", int(experiment.get_average_unsuccessful_extension_time()), "ns")	



running <class 'algorithms.dijkstra.Dijkstra'> on algorithm None with variation 0
Average Time:  416509816 ns
Nr Extended:  3388818
Time per extension:  12290 ns
Nr successful runs: 19
Average Successful Time:  467874136 ns
Nr Successful Extensions:  575155
Time per successful extension:  15456 ns
Nr unsuccessful runs: 81
Average Unsuccessful Time: 404461395 ns
Nr Failed Extensions:  2813663
Time per failed extension:  11643 ns

running <class 'algorithms.a_star.A_Star'> on algorithm None with variation 0
Average Time:  456377606 ns
Nr Extended:  3388818
Time per extension:  13467 ns
Nr successful runs: 19
Average Successful Time:  491203321 ns
Nr Successful Extensions:  575155
Time per successful extension:  16226 ns
Nr unsuccessful runs: 81
Average Unsuccessful Time: 448208611 ns
Nr Failed Extensions:  2813663
Time per failed extension:  12903 ns

running <class 'algorithms.dijkstra.Dijkstra'> on algorithm None with variation 1
Average Time:  399308717 ns
Nr Extended:  3388818
Time 

**Results**

The following table collects the average time over all experiments. Since the runs without layout algorithms are identical we will only pick one representation from them. All times are in ns.

| Experiment | Dijkstra | Dijkstra (sucessful) | Dijkstra (failed) | A* | A* (successsful) | A* (failed) |
| --- | --- | --- | --- | --- | --- | --- |
| Benchmark 4| 392 ms| 409 ms | 388 ms | 461 ms | 513 ms | 449 ms |
| auto 0 | **349 ms**  | 234 ms  | 376 ms | 380 ms | 51 ms | 458 ms |
| auto 1 | 360 ms  | 271 ms  | 381 ms | **364** ms | 42 ms | 439 ms |
| auto 2 | 363 ms  | 289 ms  | 380 ms | 387 ms | 55 ms | 465 ms |
| auto 3 | 368 ms  | 279 ms  | 388 ms | 382 ms | 65 ms | 457 ms |
| auto 4 | 370 ms  | 310 ms  | 384 ms | 382 ms | 74 ms | 454 ms |
| drl 0 | 365 ms  | 282 ms  | 384 ms | 380 ms | 56 ms | 456 ms |
| drl 1 | 362 ms  | 298 ms  | 377 ms | 374 ms | 47 ms | 450 ms |
| drl 2 | 356 ms  | 282 ms  | 373 ms | 367 ms | 41 ms | 444 ms |
| drl 3 | 355 ms  | 265 ms  | 376 ms | 379 ms | 54 ms | 455 ms |
| drl 4 | 375 ms  | 343 ms  | 383 ms | 385 ms | 57 ms | 462 ms |
| fruchterman_reingold 0 | 411 ms  | 360 ms | 423 ms | 405 ms | 70 ms | 484 ms |
| fruchterman_reingold 1 | 418 ms  | 387 ms  | 426 ms | 428 ms | 104 ms | 504 ms |
| fruchterman_reingold 2 | 426 ms  | 431 ms  | 425 ms | 425 ms | 102 ms | 500 ms |
| fruchterman_reingold 3 | 431 ms  | 424 ms | 432 ms | 409 ms | 110 ms| 479 ms |
| fruchterman_reingold 4 | 427 ms  | 411 ms  | 431 ms | 427 ms | 97 ms | 504 ms |

The following table collects the average number of extended nodes over all experiments. Since the runs without layout algorithms are identical we will only pick one representation from them. 


| Experiment | Dijkstra | Dijkstra (sucessful) | Dijkstra (failed) | A* | A* (successsful) | A* (failed) |
| --- | --- | --- | --- | --- | --- | --- |
| Benchmark 4| 3.388.818 | 575.155 | 2.813.663 | 3.388.818 | 575.155 | 2.813.663 |
| auto 0 | 3.218.843  | 405.180  | 2.813.663 | 2.842.061 | 28.398 | 2.813.663 |
| auto 1 | 3.274.321  | 460.658  | 2.813.663 | 2.852.468 | 38.805 | 2.813.663 |
| auto 2 | 3.293.419  | 479.756  | 2.813.663 | 2.858.840 | 45.177 | 2.813.663 |
| auto 3 | 3.274.679  | 461.016  | 2.813.663 | 2.875.837 | 62.174 | 2.813.663 |
| auto 4 | 3.324.951  | 511.288  | 2.813.663 | 2.897.464 | 83.801 | 2.813.663 |
| drl 0 | 3.288.192 | 474.529 | 2.813.663 | 2.872.806 | 59.143 | 2.813.663 |
| drl 1 | 3.301.889 | 488.226 | 2.813.663 | 2.853.738 | 40.075 | 2.813.663 |
| drl 2 | 3.257.531 | 443.868 | 2.813.663 | 2.845.389 | 31.726 | 2.813.663 |
| drl 3 | 3.259.777 | 446.114 | 2.813.663 | 2.870.461 | 56.798 | 2.813.663 |
| drl 4 | 3.367.594 | 553.931 | 2.813.663 | 2.869.428 | 55.765 | 2.813.663 |
| fruchterman_reingold 0 | 3.243.333 | 429.670 | 2.813.663 | 2.850.068 | 36.405 | 2.813.663 |
| fruchterman_reingold 1 | 3.293.176 | 479.513 | 2.813.663 | 2.859.680 | 46.017 | 2.813.663 |
| fruchterman_reingold 2 | 3.349.097 | 535.434 | 2.813.663 | 2.873.033 | 59.370 | 2.813.663 |
| fruchterman_reingold 3 | 3.353.094 | 539.431 | 2.813.663 | 2.877.601 | 63.938 | 2.813.663 |
| fruchterman_reingold 4 | 3.310.525 | 496.862 | 2.813.663 | 2.876.480 | 62.817 | 2.813.663 |

**Discussion**

In the tables above, we can see that "auto" and "drl" outperform fruchterman_reingold and the Benchmark on this task. Indeed, fruchterman_reingold is worse in execution time for dijkstra than the original graph but slightly better for A*.
For our continued experiments we will use the run with the lowest score. For Dijkstra this is auto_0 and for A* this is auto_1. We will use auto_0 as the run with the overall lowest score.

While this was not our main concern, the results also show some differences in how the coordinates affect Dijkstra's algorithm and A*.

A* significantly outperforms Dijkstra in cases where a path exist. For "auto" and "dlr", A* is around 50 ms, while Dijkstra takes around 250 to 300 ms. This can be attributed by A* being able to make efficient use of the annotated coordinates. This can be seen clearly in the number of extended nodes. 

In cases where no path exists, Dijkstra's algorithm is usually faster. This can be attributed to the lower algorithmic complexity of Dijkstra's algorithm.

Over all cases, the two algorithms perform similarly. This is likely due to the fact that the majority of nodes are not connected via a path, meaning the advantage of A* doesn't come into play on enough cases to matter.
