# Empirical analysis

### This notebook contains the run-time analysis for Dijkstra and A* algorithms

In [7]:
import numpy as np
import pandas as pd
from scipy.stats import normaltest, shapiro, ttest_ind

import plotly.express as px

In [2]:
with open("run_time_data/dijkstra_time.txt", 'r') as f:
    dijkstra_runtime_string = f.readlines()
    
with open("run_time_data/astar_time.txt", 'r') as f:
    astar_runtime_string = f.readlines()

In [3]:
dijkstra_runtime = []

for runtime in dijkstra_runtime_string:
    dijkstra_runtime.append(float(runtime.strip()))
    
astar_runtime = []

for runtime in astar_runtime_string:
    astar_runtime.append(float(runtime.strip()))

### Normality testing of Dijkstra algorithm's sample runtimes

In [4]:
stat, p = normaltest(dijkstra_runtime)
print('Statistics=%.3f, p=%.3f' % (stat, p))

alpha = 0.05

if p > alpha:
    print('Sample looks Gaussian (fail to reject H0)')
else:
    print('Sample does not look Gaussian (reject H0)')

Statistics=5.486, p=0.064
Sample looks Gaussian (fail to reject H0)


Based on the P-value, we conclude that the sample runtimes of Dijkstra algorithm come from a Gaussian distribution.

### Normality testing of A* algorithm's sample runtimes

#### We use a test based on D’Agostino and Pearson’s tests that combines skew and kurtosis to produce an omnibus test of normality.

In [11]:
stat, p = normaltest(astar_runtime)
print('Statistics=%.3f, p-value=%.3f' % (stat, p))

alpha = 0.05

if p > alpha:
    print('Sample looks Gaussian (fail to reject H0)')
else:
    print('Sample does not look Gaussian (reject H0)')

Statistics=3.642, p-value=0.162
Sample looks Gaussian (fail to reject H0)


Based on the P-value, we conclude that the sample runtimes of A* algorithm come from a Gaussian distribution.

#### Since both the samples come from a Gaussian distribution, we can use parametric statistical tests to see whether the two distributions are the same

### Comparison of the two distributions

#### We use Student's t-test to see if the the means of the two distributions are statistically significant

In [12]:
print("Mean of Dijkstra's sampled runtime values: ", np.mean(dijkstra_runtime))
print("Mean of A*'s sampled runtime values: ", np.mean(astar_runtime))

stat, p = ttest_ind(dijkstra_runtime, astar_runtime)
print('\nstat=%.3f, p-value=%.3f' % (stat, p))
if p > 0.05:
    print('They are the same distribution')
else:
    print('They are different distributions')

Mean of Dijkstra's sampled runtime values:  0.3833887504577637
Mean of A*'s sampled runtime values:  0.6835552072525024


stat=-217.575, p-value=0.000
They are different distributions
