# Introduction to Data Science – Networks (Path Search)
*COMP 5360 / MATH 4100, University of Utah, http://datasciencecourse.net/* 

This is a continuation of how to work with graphs in Python using the [NetworkX](networkx.github.io) library. Here we focus on understand Path Search Algorithms.

In [None]:
import networkx as nx
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (10, 6)
plt.style.use('ggplot')

We'll also import the Les Miserable network again

In [None]:
# Read the graph file
lesmis = nx.read_gml('lesmis.gml')
# Plot the nodes
lesmis.nodes()

## Path Search

Path search, and in particular shortest path search is an important problem. It answers questions such as 
 * how do I get as quickly as possible from A to B in a road network
 * how to best rout a data package that delivers the next second of your Netflix movie
 * who can I talk to to get an introduction to Person B
 * etc.
 
There are two major types of path search algorithms: 

1. Algorithms that operate only on the topology, i.e., only the "distance" is relevant
2. Algorithms that also consider edge weights, i.e., they minimize a "cost"

For the above scenarios, edge weights make a lot of sense: I might give a different weight to an edge that is an Interstate, for example, as I will be able to travel faster. 

![](bread.png)

### Breadth First Seach

Breadth first search is a simple algorithm that solves the single-source shortest path problem, i.e., it calculates the shortest path from one source to all other nodes in the network. 

The algorithm works as follows:

1. Label source node 0
2. Find neighbors, label 1, put in queue
3. Take node labeled n (1 for first step) out of queue. Find its unlabeled neighbors. Label them n+1 and put in queue
4. Repeat 3 until found node (if only the exact path is relevant)  or no nodes left (when looking for all shortest paths)
5. The distance between start and end node is the label of the end node.

Let's look at the path from Boulatruelle to Napoleon:

In [None]:
path = nx.shortest_path(lesmis,source="Boulatruelle",target="Marius")
path

And the path from Perpetue to Napoleon:

In [None]:
path = nx.shortest_path(lesmis,source="Perpetue",target="Napoleon")
path

### Dijkstra's Algorithm

[Dijkstra's algoritm](https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm) is the go-to algorithm for finding paths in a weigthed graph.

Let the node at which we are starting be called the initial node. Let the distance of node Y be the distance from the initial node to Y. Dijkstra's algorithm will assign some initial distance values and will try to improve them step by step.
1. Assign to every node a tentative distance value: set it to zero for our initial node and to infinity for all other nodes.
2. Set the initial node as current. Mark all other nodes unvisited. Create a set of all the unvisited nodes called the unvisited set.
3. For the current node, consider all of its unvisited neighbors and calculate their tentative distances. Compare the newly calculated tentative distance to the current assigned value and assign the smaller one. For example, if the current node A is marked with a distance of 6, and the edge connecting it with a neighbor B has length 2, then the distance to B (through A) will be 6 + 2 = 8. If B was previously marked with a distance greater than 8 then change it to 8. Otherwise, keep the current value.
4. When we are done considering all of the neighbors of the current node, mark the current node as visited and remove it from the unvisited set. A visited node will never be checked again.
5. If the destination node has been marked visited (when planning a route between two specific nodes) or if the smallest tentative distance among the nodes in the unvisited set is infinity (when planning a complete traversal; occurs when there is no connection between the initial node and remaining unvisited nodes), then stop. The algorithm has finished.
6. Otherwise, select the unvisited node that is marked with the smallest tentative distance, set it as the new "current node", and go back to step 3.

Here' is an animation for Dijkstra's Algorithm from Wikipedia (we'll go through this in class):

![](Dijkstra_Animation.gif)

Here is an illustration of Dijkstra's Algorithm for a motion planning task:

![](Dijkstras_progress_animation.gif)

Our Les Miserables dataset actually comes with edge weights. The weight describes the number of co-occurrences of the characters. Now, let's look at the values:

In [None]:
lesmis.edges(data=True)

We can draw the graph with these weights.

In [None]:
plt.rcParams['figure.figsize'] = (10, 15)

pos = nx.spring_layout(lesmis)

# Use edge weights in line drawing
edge_widths = [1.0 * x[2]['value'] for x in lesmis.edges(data=True)]

nx.draw(lesmis, pos=pos)
nx.draw_networkx(lesmis, pos=pos, width=edge_widths)
plt.show()

That was nasty, let's try color.

In [None]:
plt.rcParams['figure.figsize'] = (10, 15)

pos = nx.spring_layout(lesmis)

# Use edge weights in line drawing
edge_colors = [ x[2]['value'] / 31.0 for x in lesmis.edges(data=True)]

nx.draw(lesmis, pos=pos)
nx.draw_networkx(lesmis, pos=pos, edge_color=edge_colors, width=2.0, edge_cmap=plt.cm.YlOrRd)
plt.show()

First we run the algorithm without weights:

In [None]:
path = nx.dijkstra_path(lesmis, source="Perpetue", target="Napoleon")
path

And then we run it with the weights, to have a comparison:

In [None]:
weighted_path = nx.dijkstra_path(lesmis, source="Perpetue", target="Napoleon", weight="value")
weighted_path

We can calculate the relative weights of these paths:

In [None]:
def getPathCost(path):
    length = len(path)
    weight = 0
    for i in range(length-1):
        attributes = lesmis[path[i]][path[i+1]]
        weight += attributes["value"]
        print(path[i], path[i+1], attributes)
    print("Weight:", weight)
    
print("Shortest Path")
getPathCost(path)

print("\n ==== \n")

print("Weighted Path")    
getPathCost(weighted_path)


### The A* Algorithm - Path Finding using Heuristics

Dijkstra is a great general algorithm, but it can be slow. 

If we know more about the network we're working with, we can use a more efficient algorithm that takes this information into account. For example, in motion planning and in route planning on a map, we know where the target point is located spatially, relative to the source point. We can take this information into account by using a heuristic function to refine the search. 

The [A* algorithm](https://en.wikipedia.org/wiki/A*_search_algorithm) is such an algorithm. It's based on Djikstra's algorithm, but uses a heuristic function to guide it's search into the right direction. A* is an informed search algorithm, or a best-first search, meaning that it solves problems by searching among all possible paths to the solution (goal) for the one that incurs the smallest cost (least distance traveled, shortest time, etc.), and among these paths it first considers the ones that appear to lead most quickly to the solution. 

At each step of the algorithm, A* evaluates which is the best paths to follow

See the following example:

![](Astar_progress_animation.gif)

While [NetworkX](https://networkx.readthedocs.io/en/stable/reference/algorithms.shortest_paths.html#module-networkx.algorithms.shortest_paths.astar) provides an implementation of the A* algorithm, we are not able to define a meaningful heuristic function for the Les Miserables graph, so we can't use it on this graph.