# Intro to Data Structures and Algorithms 

[course link](https://learn.udacity.com/courses/ud513)

## Lesson 7. Case Studies in Algorithms

### Shortest Path Problem

The shortest path is the one where the sum of the edges is as small as possible. 

If we have an unweighted graph, the shortest path would just be the one with the fewest number of edges. 

The solution of this problem for an unweighted graph is actually just a breadth first search: you start at one node and visit the closest nodes first, slowly moving out to more distant nodes until you find the one that you were looking for.

To solve the shortest path problem for a graph with weighted edges, we can use Dijkstra's algorithm.

### Dijkstra's Algorithm 

This algorithm is a greedy algorithm that finds the shortest path (a distance) between a starting node and all other nodes in the graph.

Greedy means that it always picks the best option for a given moment, i.e. choosing the next node to visit based on the minimal cost on its edge. 

A distance - is the sum of edges weights on a path between our starting point and the vertex we are on.  
At the end of running the dijkstra's algotihm the distance will be equal to the shortest path. 

In [1]:
import heapq


def dijkstra(graph, start_node, end_node):
    # Create a dictionary to store the distance to each node
    distances = {node: float('inf') for node in graph}
    distances[start_node] = 0

    # Create a dictionary to store the previous node in the shortest path
    previous_nodes = {node: None for node in graph}

    # Create a priority queue to store nodes that we haven't visited yet
    priority_queue = [(0, start_node)]

    while priority_queue:
        # Get the node with the smallest distance
        current_distance, current_node = heapq.heappop(priority_queue)

        # If we've already visited this node, skip it
        if current_distance > distances[current_node]:
            continue

        # Check all of the neighbors of this node
        for neighbor, weight in graph[current_node].items():
            distance = current_distance + weight

            # If we've found a new shortest path to this neighbor,
            # update our records and add it to the queue
            if distance < distances[neighbor]:
                distances[neighbor] = distance
                previous_nodes[neighbor] = current_node
                heapq.heappush(priority_queue, (distance, neighbor))

        # If we've reached the end node, we can stop searching
        if current_node == end_node:
            break

    # Build the path from the previous nodes dictionary
    path = []
    node = end_node
    while node is not None:
        path.append(node)
        node = previous_nodes[node]
    path.reverse()

    return distances[end_node], path


# Example usage
graph = {
    'A': {'B': 5, 'C': 1},
    'B': {'A': 5, 'C': 2, 'D': 1},
    'C': {'A': 1, 'B': 2, 'D': 4, 'E': 8},
    'D': {'B': 1, 'C': 4, 'E': 3, 'F': 6},
    'E': {'C': 8, 'D': 3},
    'F': {'D': 6}
}

start_node = 'A'
end_node = 'D'

dijkstra(graph, start_node, end_node)

(4, ['A', 'C', 'B', 'D'])

The algorithm works by maintaining a priority queue of nodes to visit, with the node with the smallest distance from the start node being visited first. The distances to each node are updated as the algorithm progresses, and the previous node in the shortest path is also recorded for each node.

Once the end node is reached, the algorithm stops and the shortest path is constructed by following the previous node pointers from the end node to the start node.

The time complexity of Dijkstra's Algorithm is O(V^2), where V is the number of vertices in the graph.  
This is because in the worst case scenario, we would need to visit every vertex in the graph once or twice, and each time we visit a vertex, we need to search through the priority queue to find the minimum element.  
But if the priority queue realisation is very efficient the Dijkstra's Algorithm time complexity is O(E + V log V).



### Knapsack Problem 

The Knapsack Problem is a classic optimization problem in computer science. The problem involves a knapsack (or backpack) with a limited weight capacity, and a set of items with their own weights and values.  

The goal is to determine the combination of items that can be placed in the knapsack such that the total value of the items is maximized, while also ensuring that the total weight of the items does not exceed the capacity of the knapsack.

Brute force solution for this problem will be O(2^n). And this is an exponential time:(

In [2]:
# O(2^n) solution
def knapSack(W, wt, val, n):
    # Base case
    if n == 0 or W == 0:
        return 0
    
    # If weight of the nth item is more than Knapsack capacity W,
    # then this item cannot be included in the optimal solution
    if (wt[n-1] > W):
        return knapSack(W, wt, val, n-1)
    
    # Return the maximum of two cases:
    # (1) nth item included
    # (2) not included
    else:
        return max(val[n-1] + knapSack(W-wt[n-1], wt, val, n-1),
                   knapSack(W, wt, val, n-1))

# Example usage
val = [60, 100, 120]
wt = [10, 20, 30]
W = 50
n = len(val)

knapSack(W, wt, val, n)

220

### Dynamic Programming

With **dynamic programming** you can make a really complicated problem run much faster by breaking it into subproblems. 

Dynamic programming solutions take advantage of both solving a problem for a trivial case and storing the solution in a lookup table, by using them to slowly add complexity to a problem. 

Another feature of a dynamic programming solution is an equation used at each step as you add complexity.   
The equation often combines some values previously computed in the lookup table, sometimes with each other and sometimes with a new value you introduce (like the value whatever object you are looking at).  

We use values already stored in the table as we added new object - a technique called **memoization**. So we definitely have no need to recompute them every time - you compute and save solutions to smaller problems and then you don't need to calculate them again for more complex problems.

If you want to use a dynamic programming approach first ask yourself: Can I break this problem up into subproblems?  
If the answer is YES, then you've got a problem with a dynamic programming solution. 


For Knapsack task:
- problem: to find max value for a weight limit 
- subproblem: to find a max value for some smaller weight
- base case (a subproblem so small that the answer is very simple or trival to compute): smallest computation (compute values for one object)

In [3]:
# O(nW) solution using dynamic programming
def knapSack(W, wt, val, n):
    # Initialize a 2D array K with all zeros
    K = [[0 for x in range(W + 1)] for x in range(n + 1)]
    
    # Build table K[][] in bottom up manner
    for i in range(n + 1):
        for w in range(W + 1):
            # Base case
            if i == 0 or w == 0:
                K[i][w] = 0
            # If weight of the ith item is less than or equal to w
            elif wt[i-1] <= w:
                # Take the maximum of two cases:
                # (1) ith item included
                # (2) not included
                K[i][w] = max(val[i-1] + K[i-1][w-wt[i-1]], K[i-1][w])
            else:
                # If weight of the ith item is more than w,
                # then this item cannot be included in the optimal solution
                K[i][w] = K[i-1][w]
    
    # Return the maximum value that can be put in a knapsack of capacity W
    return K[n][W]


# Example usage
val = [60, 100, 120]
wt = [10, 20, 30]
W = 50
n = len(val)

knapSack(W, wt, val, n)

220

### Travelling Salesman Problem

The Traveling Salesman Problem (TSP) is a classic problem in computer science and mathematics. In this problem, we have a graph where all of the nodes represent cities and all the edges represent roads between them. The goal of the TSP is to find the shortest possible route that visits every city exactly once and returns to the starting city.

The TSP is an optimization problem, which means that we are looking for the best possible solution among all possible solutions.   
The TSP is a well-known example of an NP-hard problem, which means that it is computationally difficult to find the optimal solution for large instances of the problem (NP - non deterministic polynomial time).

The TSP has many practical applications, such as in logistics and transportation, where it is used to optimize delivery routes. It is also used in microchip design, DNA sequencing, and many other fields.

In [4]:
# O(n!) solution
import itertools

def tsp_brute_force(graph):
    # Generate all possible paths
    nodes = list(graph.keys())
    paths = itertools.permutations(nodes)
    
    # Find the shortest path
    shortest_path = None
    shortest_distance = float('inf')
    for path in paths:
        distance = 0
        for i in range(len(path)-1):
            for neighbor, d in graph[path[i]]:
                if neighbor == path[i+1]:
                    distance += d
        if distance < shortest_distance:
            shortest_path = path
            shortest_distance = distance
    
    return shortest_path, shortest_distance

# Example usage
graph = {
    'A': [('B', 10), ('C', 15), ('D', 20)],
    'B': [('A', 10), ('C', 35), ('D', 25)],
    'C': [('A', 15), ('B', 35), ('D', 30)],
    'D': [('A', 20), ('B', 25), ('C', 30)]
}
path, distance = tsp_brute_force(graph)
print(path, distance)

('C', 'A', 'B', 'D') 50


Using permutations can be very slow, especially for larger graphs. In fact, the time complexity of the brute force approach using permutations is O(n!), which quickly becomes infeasible for even moderately sized graphs.

In [5]:
def tsp_nn(graph, start):
    # Find the shortest path using the nearest neighbor algorithm
    path = [start]
    visited = {start}
    while len(path) < len(graph):
        current = path[-1]
        next_node = None
        min_distance = float('inf')
        for neighbor, distance in graph[current]:
            if neighbor not in visited and distance < min_distance:
                next_node = neighbor
                min_distance = distance
        if next_node is None:
            break
        path.append(next_node)
        visited.add(next_node)
    
    # Calculate the distance of the path
    distance = 0
    for i in range(len(path)-1):
        for neighbor, d in graph[path[i]]:
            if neighbor == path[i+1]:
                distance += d
    
    return path, distance

# Example usage
graph = {
    'A': [('B', 10), ('C', 15), ('D', 20)],
    'B': [('A', 10), ('C', 35), ('D', 25)],
    'C': [('A', 15), ('B', 35), ('D', 30)],
    'D': [('A', 20), ('B', 25), ('C', 30)]
}
start = 'A'
path, distance = tsp_nn(graph, start)
print(path, distance)

['A', 'B', 'D', 'C'] 65


The code snippet above is an implementation of the nearest neighbor algorithm, which is a heuristic approach to solving the Traveling Salesman Problem. While it may not always give the optimal solution, it is much faster than the brute force approach and can often give a good approximation of the optimal solution.

The time complexity of the nearest neighbor algorithm is O(n^2), which is much faster than the brute force approach. 

However, the brute force algorithm found the optimal solution which is (‘C’, ‘A’, ‘B’, ‘D’) with a distance of 50. The nearest neighbor algorithm found a suboptimal solution which is [‘A’, ‘B’, ‘D’, ‘C’] with a distance of 65. The nearest neighbor algorithm is not guaranteed to find the optimal solution but it is usually much faster than the brute force algorithm and can be used as a good approximation for large values of n.