# Intro to Data Structures and Algorithms 

[course link](https://learn.udacity.com/courses/ud513)

## Lesson 6. Graphs

### Graphs Introduction

Graph - a data structure designed to show relationships between objects. 

Graphs sometimes called a network. 

A node of a graph is called a **vertex**.

Connections between nodes are called **edges**. And edges can store data too. It's usually data about the strength of a connection. 

Actually, a tree is a specific type of graph. But graphs don't have a root node as trees do.  

### Directions and Cycles

Edges of a graph can have a **direction**, meaning the relationship between two nodes that only applies one way and not the other. 

**Directed graph** is a term for a graph where edges have a sense of direction. 

Undirected graph has edges with no sense of direction. 

A graph can have cycles, but trees can't.  

A **cycle** happens in a graph when you start at one node and follow edges all the way back to that node. 

**Acyclic graph** means that it has no cycles. 

DAG - directed acyclic graph. 

### Connectivity 

A **disconnected graph** has some vertex that can't be reached by the other vertices. It might have one vertex off to the side with no edges. It also could have two so-called connected components, which are connected graphs on their own but have no connection between them. 

A **connected graph** has no disconnected vertices.  
In a connected graph, there is some path between one vertex and every other vertex.

**Graph connectivity** - measures the minimum number of elements that need to be removed for a graph to become disconnected. You can use connectivity to answer the question which graph is stronger. 

A directed graph is said to be **strongly connected** if there is a path between any two pairs of vertices in the graph. In other words, we can reach any vertex in the graph from any other vertex by following the directed edges.

A **weakly connected** graph is a graph where there is at least one path between any two vertices, but those paths may not necessarily be directed. In other words, if we ignore the direction of the edges in a directed graph, we can obtain an undirected graph. If that undirected graph is connected, then the original directed graph is weakly connected.



### Graph Representations

You can build a Vertex and an Edge objects to represent graphs if you are using OOP. 

Also you can use lists data structure to represent graphs. 

In [2]:
# edge list graph representation example
edge_list_graph = [[0, 1], [1, 2],
                   [1, 3], [2, 3]]

In [5]:
# adjacency list graph representation example
adjacency_list_graph = [[1], [0, 2, 3],
                        [1, 3], [1, 2]]
# here we use index in a list as an id number of verices in a graph

In [6]:
# another way to represent a graph is an adjacency matrix
adjacency_matrix_graph = [[0, 1, 0, 0],
                          [1, 0, 1, 1],
                          [0, 1, 0, 1],
                          [0, 1, 1, 0]]
# node id has its own slot in each subarray, 
# where subarray values act like a flag
# to show whether this node connected to another node or not

Which method of representation you use depends on what makes the most sense for you and what operations you'll be performing the most often.  
If you are looking the number of edges connected to a particular node, the adjacency list will probably be the fastest. 

#### Task 1.



In [7]:
class Node:
    def __init__(self, value):
        self.value = value
        self.edges = []

class Edge:
    def __init__(self, value, node_from, node_to):
        self.value = value
        self.node_from = node_from
        self.node_to = node_to

class Graph:
    def __init__(self, nodes=None, edges=None):
        self.nodes = nodes or []
        self.edges = edges or []

    def insert_node(self, new_node_val):
        new_node = Node(new_node_val)
        self.nodes.append(new_node)
        
    def insert_edge(self, new_edge_val, node_from_val, node_to_val):
        # Find nodes connected to the edge
        from_found = None
        to_found = None
        for node in self.nodes:
            if node_from_val == node.value:
                from_found = node
            if node_to_val == node.value:
                to_found = node
        # If one or both nodes were not found, create new nodes
        if from_found is None:
            from_found = Node(node_from_val)
            self.nodes.append(from_found)
        if to_found is None:
            to_found = Node(node_to_val)
            self.nodes.append(to_found)
        # Create new edge
        new_edge = Edge(new_edge_val, from_found, to_found)
        from_found.edges.append(new_edge)
        to_found.edges.append(new_edge)
        self.edges.append(new_edge)

    def get_edge_list(self):
        """
        Returns a list of tuples representing edges as
        (Edge Value, From Node Value, To Node Value)
        """
        return [(edge.value, edge.node_from.value, edge.node_to.value)
                for edge in self.edges]

    def get_adjacency_list(self):
        """
        Returns a list of lists representing adjacency list.
        The index of the outer list represents "from" nodes,
        each element in the outer list is a list of tuples.
        Each tuple represents (To Node, Edge Value).
        """
        max_index = self.find_max_index()
        adjacency_list = [None] * (max_index + 1)
        for edge in self.edges:
            if adjacency_list[edge.node_from.value]:
                adjacency_list[edge.node_from.value].append((edge.node_to.value, edge.value))
            else:
                adjacency_list[edge.node_from.value] = [(edge.node_to.value, edge.value)]
        return adjacency_list
    
    def find_max_index(self):
        max_index = -1
        if len(self.nodes):
            for node in self.nodes:
                if node.value > max_index:
                    max_index = node.value
        return max_index
    
    def get_adjacency_matrix(self):
        """
        Returns a matrix, or 2D list representing adjacency matrix.
        Row numbers represent from nodes, column numbers represent to nodes.
        Store the edge values in each spot, and a 0 if no edge exists.
        """
        max_index = self.find_max_index()
        adjacency_matrix = [[0] * (max_index + 1) for _ in range(max_index + 1)]
        for edge in self.edges:
            adjacency_matrix[edge.node_from.value][edge.node_to.value] = edge.value
        return adjacency_matrix

    
graph = Graph()
graph.insert_edge(100, 1, 2)
graph.insert_edge(101, 1, 3)
graph.insert_edge(102, 1, 4)
graph.insert_edge(103, 3, 4)
# Should be [(100, 1, 2), (101, 1, 3), (102, 1, 4), (103, 3, 4)]
print(graph.get_edge_list())
# Should be [None, [(2, 100), (3, 101), (4, 102)], None, [(4, 103)], None]
print(graph.get_adjacency_list())
# Should be [[0, 0, 0, 0, 0], [0, 0, 100, 101, 102], [0, 0, 0, 0, 0], [0, 0, 0, 0, 103], [0, 0, 0, 0, 0]]
print(graph.get_adjacency_matrix())

[(100, 1, 2), (101, 1, 3), (102, 1, 4), (103, 3, 4)]
[None, [(2, 100), (3, 101), (4, 102)], None, [(4, 103)], None]
[[0, 0, 0, 0, 0], [0, 0, 100, 101, 102], [0, 0, 0, 0, 0], [0, 0, 0, 0, 103], [0, 0, 0, 0, 0]]


### Graph Traversal

We have two basic methods for graphs traversal:
- a depth first search (DFS) where we follow one path as far as it'll go
- a breadth first search (BFS) where we look at all the nodes adjacent to one before moving on to the next level

Graphs traversal and graphs search basically is the same, except that in a search you exit when you have found the element you were looking for and in a traverse you explore the whole graph. 

### DFS

DFS implementation:
- there is no obvious place to start
- you can begin with any node
- mark the first node you selected as seen
- a common implementation of DFS is using a stack, so we can store the node we just saw on the stack
- next you pick an edge, follow it and mark that node as seen and add it to the stack
- as long as there are more edges and more unseen nodes repeat the process
- when you do hit a node that you've seen before, just go back to the previous node and try another edge
- if you run out of edges with new nodes, you pop the current node from the stack and go back to the one before it, which is just the next one on the stack 
- you continue this approach until you've popped everything off the stack or you find the node you were originally looking for

There is another common implementation of DFS that uses recursion and no stack:
- you just repeat the same process of picking an edge and marking a node as seen until you run out of new nodes to explore 
- that becomes the base case, and you move back to the last level of recursion, which happens to be the previous node in the search

The time complexity of DFS, or Depth First Search, is often written as O(V + E), which reads the number of vertices plus the number of edges. This means that the time it takes to complete DFS is proportional to the number of vertices and edges in the graph.

### BFS

BFS implementation:
- you visiting every edge and marking off every node as seen
- you search every edge of one node before continuing on through the graph
- here we use queue data structure instead of a stack
- you add seen nodes to the queue
- when we run out of edges we deque a node from a queue and use that as our next starting place
- you continue the approach until you find the element you were looking for or you visited (marked as seen) all the nodes in the graph

The time complexity of BFS, or Breadth First Search, is O(V + E).  

#### Task 2. 

In [14]:
class Node(object):
    def __init__(self, value):
        self.value = value
        self.edges = []
        self.visited = False

class Edge(object):
    def __init__(self, value, node_from, node_to):
        self.value = value
        self.node_from = node_from
        self.node_to = node_to

# You only need to change code with docs strings that have TODO.
# Specifically: Graph.dfs_helper and Graph.bfs
# New methods have been added to associate node numbers with names
# Specifically: Graph.set_node_names
# and the methods ending in "_names" which will print names instead
# of node numbers

class Graph(object):
    def __init__(self, nodes=None, edges=None):
        self.nodes = nodes or []
        self.edges = edges or []
        self.node_names = []
        self._node_map = {}

    def set_node_names(self, names):
        """The Nth name in names should correspond to node number N.
        Node numbers are 0 based (starting at 0).
        """
        self.node_names = list(names)

    def insert_node(self, new_node_val):
        "Insert a new node with value new_node_val"
        new_node = Node(new_node_val)
        self.nodes.append(new_node)
        self._node_map[new_node_val] = new_node
        return new_node

    def insert_edge(self, new_edge_val, node_from_val, node_to_val):
        "Insert a new edge, creating new nodes if necessary"
        nodes = {node_from_val: None, node_to_val: None}
        for node in self.nodes:
            if node.value in nodes:
                nodes[node.value] = node
                if all(nodes.values()):
                    break
        for node_val in nodes:
            nodes[node_val] = nodes[node_val] or self.insert_node(node_val)
        node_from = nodes[node_from_val]
        node_to = nodes[node_to_val]
        new_edge = Edge(new_edge_val, node_from, node_to)
        node_from.edges.append(new_edge)
        node_to.edges.append(new_edge)
        self.edges.append(new_edge)

    def get_edge_list(self):
        """Return a list of triples that looks like this:
        (Edge Value, From Node, To Node)"""
        return [(e.value, e.node_from.value, e.node_to.value)
                for e in self.edges]

    def get_edge_list_names(self):
        """Return a list of triples that looks like this:
        (Edge Value, From Node Name, To Node Name)"""
        return [(edge.value,
                 self.node_names[edge.node_from.value],
                 self.node_names[edge.node_to.value])
                for edge in self.edges]

    def get_adjacency_list(self):
        """Return a list of lists.
        The indecies of the outer list represent "from" nodes.
        Each section in the list will store a list
        of tuples that looks like this:
        (To Node, Edge Value)"""
        max_index = self.find_max_index()
        adjacency_list = [[] for _ in range(max_index)]
        for edg in self.edges:
            from_value, to_value = edg.node_from.value, edg.node_to.value
            adjacency_list[from_value].append((to_value, edg.value))
        return [a or None for a in adjacency_list] # replace []'s with None

    def get_adjacency_list_names(self):
        """Each section in the list will store a list
        of tuples that looks like this:
        (To Node Name, Edge Value).
        Node names should come from the names set
        with set_node_names."""
        adjacency_list = self.get_adjacency_list()
        def convert_to_names(pair, graph=self):
            node_number, value = pair
            return (graph.node_names[node_number], value)
        def map_conversion(adjacency_list_for_node):
            if adjacency_list_for_node is None:
                return None
            return map(convert_to_names, adjacency_list_for_node)
        return [map_conversion(adjacency_list_for_node)
                for adjacency_list_for_node in adjacency_list]

    def get_adjacency_matrix(self):
        """Return a matrix, or 2D list.
        Row numbers represent from nodes,
        column numbers represent to nodes.
        Store the edge values in each spot,
        and a 0 if no edge exists."""
        max_index = self.find_max_index()
        adjacency_matrix = [[0] * (max_index) for _ in range(max_index)]
        for edg in self.edges:
            from_index, to_index = edg.node_from.value, edg.node_to.value
            adjacency_matrix[from_index][to_index] = edg.value
        return adjacency_matrix

    def find_max_index(self):
        """Return the highest found node number
        Or the length of the node names if set with set_node_names()."""
        if len(self.node_names) > 0:
            return len(self.node_names)
        max_index = -1
        if len(self.nodes):
            for node in self.nodes:
                if node.value > max_index:
                    max_index = node.value
        return max_index

    def find_node(self, node_number):
        "Return the node with value node_number or None"
        return self._node_map.get(node_number)
    
    def _clear_visited(self):
        for node in self.nodes:
            node.visited = False

    def dfs_helper(self, start_node):
        """TODO: Write the helper function for a recursive implementation
        of Depth First Search iterating through a node's edges. The
        output should be a list of numbers corresponding to the
        values of the traversed nodes.
        ARGUMENTS: start_node is the starting Node
        MODIFIES: the value of the visited property of nodes in self.nodes 
        RETURN: a list of the traversed node values (integers).
        """
        ret_list = [start_node.value]
        # Your code here
        start_node.visited = True
        adjacent_edges = [e for e in start_node.edges \
                            if e.node_to.value != start_node.value]
        for edge in adjacent_edges:
            if not edge.node_to.visited:
                ret_list.extend(self.dfs_helper(edge.node_to))
        return ret_list

    def dfs(self, start_node_num):
        """Outputs a list of numbers corresponding to the traversed nodes
        in a Depth First Search.
        ARGUMENTS: start_node_num is the starting node number (integer)
        MODIFIES: the value of the visited property of nodes in self.nodes
        RETURN: a list of the node values (integers)."""
        self._clear_visited()
        start_node = self.find_node(start_node_num)
        return self.dfs_helper(start_node)

    def dfs_names(self, start_node_num):
        """Return the results of dfs with numbers converted to names."""
        return [self.node_names[num] for num in self.dfs(start_node_num)]

    def bfs(self, start_node_num):
        """TODO: Create an iterative implementation of Breadth First Search
        iterating through a node's edges. The output should be a list of
        numbers corresponding to the traversed nodes.
        ARGUMENTS: start_node_num is the node number (integer)
        MODIFIES: the value of the visited property of nodes in self.nodes
        RETURN: a list of the node values (integers)."""
        node = self.find_node(start_node_num)
        self._clear_visited()
        ret_list = []
        # Your code here
        node.visited = True
        q = [node] # just pop(0) and its a queue
        while q: # not empty
            node = q.pop(0)
            ret_list.append(node.value)
            for edge in node.edges:
                if not edge.node_to.visited:
                    edge.node_to.visited = True
                    q.append(edge.node_to)
        return ret_list

    def bfs_names(self, start_node_num):
        """Return the results of bfs with numbers converted to names."""
        return [self.node_names[num] for num in self.bfs(start_node_num)]

graph = Graph()

# You do not need to change anything below this line.
# You only need to implement Graph.dfs_helper and Graph.bfs

graph.set_node_names(('Mountain View',   # 0
                      'San Francisco',   # 1
                      'London',          # 2
                      'Shanghai',        # 3
                      'Berlin',          # 4
                      'Sao Paolo',       # 5
                      'Bangalore'))      # 6 

graph.insert_edge(51, 0, 1)     # MV <-> SF
graph.insert_edge(51, 1, 0)     # SF <-> MV
graph.insert_edge(9950, 0, 3)   # MV <-> Shanghai
graph.insert_edge(9950, 3, 0)   # Shanghai <-> MV
graph.insert_edge(10375, 0, 5)  # MV <-> Sao Paolo
graph.insert_edge(10375, 5, 0)  # Sao Paolo <-> MV
graph.insert_edge(9900, 1, 3)   # SF <-> Shanghai
graph.insert_edge(9900, 3, 1)   # Shanghai <-> SF
graph.insert_edge(9130, 1, 4)   # SF <-> Berlin
graph.insert_edge(9130, 4, 1)   # Berlin <-> SF
graph.insert_edge(9217, 2, 3)   # London <-> Shanghai
graph.insert_edge(9217, 3, 2)   # Shanghai <-> London
graph.insert_edge(932, 2, 4)    # London <-> Berlin
graph.insert_edge(932, 4, 2)    # Berlin <-> London
graph.insert_edge(9471, 2, 5)   # London <-> Sao Paolo
graph.insert_edge(9471, 5, 2)   # Sao Paolo <-> London
# (6) 'Bangalore' is intentionally disconnected (no edges)
# for this problem and should produce None in the
# Adjacency List, etc.

import pprint
pp = pprint.PrettyPrinter(indent=2)

print("Edge List")
pp.pprint(graph.get_edge_list_names())

print("\nAdjacency List")
pp.pprint(graph.get_adjacency_list_names())

print("\nAdjacency Matrix")
pp.pprint(graph.get_adjacency_matrix())

print("\nDepth First Search")
pp.pprint(graph.dfs_names(2))

# Should print:
# Depth First Search
# ['London', 'Shanghai', 'Mountain View', 'San Francisco', 'Berlin', 'Sao Paolo']

print("\nBreadth First Search")
pp.pprint(graph.bfs_names(2))
# test error reporting
# pp.pprint(['Sao Paolo', 'Mountain View', 'San Francisco', 'London', 'Shanghai', 'Berlin'])

# Should print:
# Breadth First Search
# ['London', 'Shanghai', 'Berlin', 'Sao Paolo', 'Mountain View', 'San Francisco']

Edge List
[ (51, 'Mountain View', 'San Francisco'),
  (51, 'San Francisco', 'Mountain View'),
  (9950, 'Mountain View', 'Shanghai'),
  (9950, 'Shanghai', 'Mountain View'),
  (10375, 'Mountain View', 'Sao Paolo'),
  (10375, 'Sao Paolo', 'Mountain View'),
  (9900, 'San Francisco', 'Shanghai'),
  (9900, 'Shanghai', 'San Francisco'),
  (9130, 'San Francisco', 'Berlin'),
  (9130, 'Berlin', 'San Francisco'),
  (9217, 'London', 'Shanghai'),
  (9217, 'Shanghai', 'London'),
  (932, 'London', 'Berlin'),
  (932, 'Berlin', 'London'),
  (9471, 'London', 'Sao Paolo'),
  (9471, 'Sao Paolo', 'London')]

Adjacency List
[ <map object at 0x7f2ecc2abd10>,
  <map object at 0x7f2ecc2abdd0>,
  <map object at 0x7f2ecc2abe90>,
  <map object at 0x7f2ecc2abf50>,
  <map object at 0x7f2ecc295050>,
  <map object at 0x7f2ecc295150>,
  None]

Adjacency Matrix
[ [0, 51, 0, 9950, 0, 10375, 0],
  [51, 0, 0, 9900, 9130, 0, 0],
  [0, 0, 0, 9217, 932, 9471, 0],
  [9950, 9900, 9217, 0, 0, 0, 0],
  [0, 9130, 932, 0, 0, 0, 0]

### Eulerian Path

**Eulerian path** - travels through every edge in a graph at least once.  
You start with one node, traverse through all edges and might end up at a different node.

It turns out that not every graph is capable of having an Eulerian path. You could get stuck in some outer nodes and won't be able to reach the other nodes without traveling over an edge you've already seen. 

For Eulerian path it's okay for your graph to have two nodes with an odd degree as long as they are the start and end of the path. 

In an **Eulerian cycle** you must traverse every edge only once and end up at the same node that you started with. 

Graphs can have Eulerian cycles if all vertices have an even degree or an even number of edges connected to them.

 The **Eulerian cycle algorithm** is used to find a cycle in a graph that visits every edge exactly once.  
The algorithm works by starting at any vertex in the graph and following edges until you return back to that vertex. If you didn't encounter every edge, you can start from an unseen edge connected to a node you've already visited. You create a path through those unseen edges, and continue this process until you've seen every edge in the graph once.  
Then you can simply add the paths together, combining them at the nodes they have in common. This will give you an Eulerian cycle.

This algorithm is highly efficient, because it takes bigO of the number of edges since it visits every edge once.  
Time complexity O(E).