3. Finding Best Routes (Q3)
Whenever you plan to fly to a specific city, your goal is to find the most efficient and fastest flight to reach your destination. In the system you are designing, the best route is defined as the one that minimizes the total distance flown to the greatest extent possible.

In this task, you need to implement a function that, given an origin and destination city, determines the best possible route between them. To simplify, the focus will be limited to flights operating on a specific day.
Note: Each city may have multiple airports; in such cases, the function should calculate the best route for every possible airport pair between the two cities. For example, if city 
A
 has airports 
a
1
,
a
2
 and city B has 
b
1
,
b
2
, the function should compute the best routes for 
a
1
→
b
1
, 
a
1
→
b
2
, 
a
2
→
b
1
 and 
a
2
→
b
2
. If it’s not possible to travel from one airport in the origin city to another airport in the destination city on that date, you must report it as well.

The function takes the following inputs:

Flights network
Origin city name
Destination city name
Considered Date (in yyyy-mm-dd format)
The function output:

A table with three columns: 'Origin_city_airport', 'Destination_city_airport', and the 'Best_route'.
Note: In the "Best_route" column, we expect a list of airport names connected by 
→
, showing the order in which they are to be visited during the optimal route. If no such route exists, the entry should display "No route found."



In [1]:
import pandas as pd
df = pd.read_csv('FinalDatasetForReal.csv')

In [2]:
df = df.drop(['Unnamed: 0'], axis=1)

In [3]:
df.head()

Unnamed: 0,Origin_airport,Destination_airport,Origin_city,Destination_city,Passengers,Seats,Flights,Distance,Fly_date,Origin_population,Destination_population,Org_airport_lat,Org_airport_long,Dest_airport_lat,Dest_airport_long
0,MHK,AMW,"Manhattan, KS","Ames, IA",21,30,1,254,2008-10-01,122049,86219,39.140999,-96.670799,41.990311,-93.622154
1,EUG,RDM,"Eugene, OR","Bend, OR",41,396,22,103,1990-11-01,284093,76034,44.124599,-123.211998,44.254101,-121.150002
2,EUG,RDM,"Eugene, OR","Bend, OR",88,342,19,103,1990-12-01,284093,76034,44.124599,-123.211998,44.254101,-121.150002
3,EUG,RDM,"Eugene, OR","Bend, OR",11,72,4,103,1990-10-01,284093,76034,44.124599,-123.211998,44.254101,-121.150002
4,MFR,RDM,"Medford, OR","Bend, OR",0,18,1,156,1990-02-01,147300,76034,42.374199,-122.873001,44.254101,-121.150002


In [6]:
from part5_func import Graph_weights # same function we used for number 5 but with Weighted Edges and Directed Support


# Extract edges (origin, destination, weight) from the dataset
edges = [
    (row['Origin_airport'], row['Destination_airport'], row['Distance'])
    for _, row in df.iterrows()
    if not pd.isnull(row['Distance'])  # Ensure the distance is not null
]

airport_graph = Graph_weights(edges=edges, directed=True) # Initializzation

# Print graph properties for verification
print(f"Number of nodes (airports): {len(airport_graph.nodes)}")
print(f"Number of edges (flights): {len(airport_graph.weights)}")

# Display a sample of edges with weights
print("Sample Edges with Weights:")
for edge, weight in list(airport_graph.weights.items())[:5]:
    print(f"{edge}: Distance = {weight}")


Number of nodes (airports): 727
Number of edges (flights): 36719
Sample Edges with Weights:
('MHK', 'AMW'): Distance = 254
('EUG', 'RDM'): Distance = 103
('MFR', 'RDM'): Distance = 156
('SEA', 'RDM'): Distance = 228
('PDX', 'RDM'): Distance = 116


While Checking the first functions of the algorithm I noticied a mistake: some nodes appears as neighbor in graph adjacency list but is missing from distances dictionary while calculating betweeness centrality; it return this error:

```
Cell In[21], line 55
     52 weight = weights.get((current_node, neighbor), float('inf'))  # Use weight from dictionary
     53 distance = current_distance + weight
---> 55 if distance < distances[neighbor]:
     56     distances[neighbor] = distance
     57     previous_nodes[neighbor] = current_node

KeyError: 'ULS'
```

In [26]:
# Validate graph consistency
dangling_nodes = set()

for node, neighbors in airport_graph.graph.items():
    for neighbor in neighbors:
        if neighbor not in airport_graph.graph:
            dangling_nodes.add(neighbor)

if dangling_nodes:
    print(f"[ERROR] Dangling nodes detected in the graph: {dangling_nodes}")
else:
    print("[DEBUG] Graph integrity verified. No dangling nodes found.")


[ERROR] Dangling nodes detected in the graph: {'MUT', 'RAC', 'CFV', 'MWC', 'NGP', 'UXJ', 'WVL', 'LJY', 'MEJ', 'DWH', 'FCM', 'TVI', 'IDI', 'AYS', 'SME', 'DTN', 'OGB', 'ESN', 'TDW', 'HUA', 'STK', 'NZJ', 'OLU', 'O85', 'JWN', 'FAM', 'MPS', 'SNL', 'RSN', 'RBL', 'PRZ', 'SNS', 'RVS', 'AWX', 'JCC', 'STE', 'XWL', 'ZXX', 'ULS', 'PHD', 'ARB', 'AL3', 'BYI'}


This confirms our hypothesis above, next step:
- Removed neighbors that don't exist in the graph
- Remove Orphan Node
- Removed edge weights associated with invalid nodes

In [27]:
def remove_dangling_nodes(graph, weights):
    """
    Remove dangling nodes from the graph and ensure all edges have valid references.

    Args:
        graph (dict): Adjacency list representation of the graph.
        weights (dict): Dictionary with edge weights.

    Returns:
        tuple: (cleaned_graph, cleaned_weights)
    """
    valid_nodes = set(graph.keys())  # Nodes present in the graph

    # Validate neighbors in the adjacency list
    for node in list(graph.keys()):
        graph[node] = {neighbor for neighbor in graph[node] if neighbor in valid_nodes}
        
        # If the node has no valid neighbors, remove it
        if not graph[node]:
            del graph[node]

    # Validate weights
    cleaned_weights = {
        (u, v): w for (u, v), w in weights.items()
        if u in graph and v in graph[u]
    }

    return graph, cleaned_weights


# Clean the graph and weights
airport_graph.graph, airport_graph.weights = remove_dangling_nodes(
    airport_graph.graph,
    airport_graph.weights
)

# Validate again
dangling_nodes = set()
for node, neighbors in airport_graph.graph.items():
    for neighbor in neighbors:
        if neighbor not in airport_graph.graph:
            dangling_nodes.add(neighbor)

if dangling_nodes:
    print(f"[ERROR] Still dangling nodes: {dangling_nodes}")
else:
    print("[DEBUG] Dangling nodes successfully removed. Graph is clean!")


[ERROR] Still dangling nodes: {'FVS', 'MIW'}


In [39]:
# Investigate dangling nodes
print("[DEBUG] Details about dangling nodes:")
for node in ['FVS', 'MIW']:
    if node in airport_graph.graph:
        print(f"Node: {node}")
        print(f"  Neighbors: {airport_graph.graph[node]}")
    else:
        print(f"Node: {node} is not in the graph but detected as dangling.")

# Check if these nodes exist in weights
for node in ['FVS', 'MIW']:
    for edge, weight in airport_graph.weights.items():
        if node in edge:
            print(f"Edge with dangling node found in weights: {edge} -> {weight}")
# 

[DEBUG] Details about dangling nodes:
Node: FVS is not in the graph but detected as dangling.
Node: MIW is not in the graph but detected as dangling.


Sorry I cancelled output here but I didn't save copy of airport graph (I don't want to re run), so I am just gonna explain;

In the cell above we validated that those nodes are not in graph adj list but exists in weights dictionary, the second part of the script returned the edges with our dangling nodes


In [40]:
# We can now remove it
invalid_edges = [('SLC', 'FVS'), ('SLN', 'MIW')]

for edge in invalid_edges:
    if edge in airport_graph.weights:
        del airport_graph.weights[edge]
    if (edge[1], edge[0]) in airport_graph.weights:
        del airport_graph.weights[(edge[1], edge[0])]

# Correctly remove

Double Check

In [41]:
# Final Validation of Graph and Weights
dangling_nodes = set()
for node, neighbors in airport_graph.graph.items():
    for neighbor in neighbors:
        if neighbor not in airport_graph.graph:
            dangling_nodes.add(neighbor)

# Verify weights consistency
invalid_weights = [
    edge for edge in airport_graph.weights.keys()
    if edge[0] not in airport_graph.graph or edge[1] not in airport_graph.graph
]

if dangling_nodes:
    print(f"[ERROR] Dangling nodes still exist: {dangling_nodes}")
elif invalid_weights:
    print(f"[ERROR] Invalid edges still exist in weights: {invalid_weights}")
else:
    print("[DEBUG] Graph and weights validation successful. No dangling nodes or invalid edges detected.")


[DEBUG] Graph and weights validation successful. No dangling nodes or invalid edges detected.


Initially I was working on the full Dataset just to understand everything. Since from now one we are gonna filter only data we need doing all of this is not worth at all, we are just gonna deal with it in an easier but partial way (the second validation is not gonna perfomed, we are just gonna skip those in case) and u can find it in ```dijkstra_adj_list_weighted``` .

### Back to the main part

We choose those two cause it mean literally to fligh from one side to an other of the country:
Manhattan is on north east of USA while Oregon is the the north west side

In [None]:
from part5_func import *
best_routes = compute_best_routes_between_cities(
    df=df,
    origin_city='Manhattan, KS',
    destination_city='Bend, OR',
    flight_date='2008-10-01'
)

print("\n📝 [RESULT] Best Routes Between Cities:")
best_routes

[DEBUG] Filtered dataset contains 17272 flights on 2008-10-01.
[DEBUG] Generated 1 airport pairs to evaluate.

🔄 [DEBUG] Processing pair: MHK → RDM

🔄 [DEBUG] Reconstructing path from 'MHK' to 'RDM'
✅ [DEBUG] Path Found: MHK → MCI → SLC → RDM

📝 [RESULT] Best Routes Between Cities:


Unnamed: 0,Origin_city_airport,Destination_city_airport,Best_route,Total_distance
0,MHK,RDM,MHK → MCI → SLC → RDM,1550


In [104]:
from part5_func import plot_best_route_on_map

best_route_map = plot_best_route_on_map(df, best_routes)

map_file_path = 'best_route_map_only.html'
best_route_map.save(map_file_path)
map_file_path


'best_route_map_only.html'