# Graphs

Unlike arrays, linked lists, stacks or queues, a non linear data structure (trees are also non linear data structures)

Graphs are denoted by an ordered set of vertices and edges (nodes and connections)
Trees are a subset of graphs with certain special properties, graphs are a more general data structure
1) For example, there exists a path from root to every node in a tree, not necessary in a graph
2) Trees have n nodes and exactly n-1 edges, graphs with n nodes can have anywhere between 0 to a large number of nodes. For a simple graph, max no of nodes is n*(n-1) for a directed graph

![image.png](attachment:45ed2e2f-86cb-47a2-83f5-8f95ee1ade58.png)

## What is a graph ?

1) An ordered collection of vertices (V) and edges (E) , represented as (V,E) 
2) Graphs can be either undirected (all edges are bidirectional) or directed (all edges have a direction) or a mix of both -> some edges are bidirectional and some edges are unidirectional. Directed graphs are also called digraphs

![image.png](attachment:379ca376-c6d5-4c75-ab11-e46d3f593c40.png)

First edge is directed from u->v, second edge is undirected


3) How is a graph represented ?

 ![image.png](attachment:de166c30-c8f1-454f-b34b-1183253504d7.png)
 
 
 Given an undirected graph like this with 8 vertices and 10 edges
 
 Vertices can be represented by V = {V1, V2, V3, V4, V5, V6, V7, V8}
 
 Edges can be represented as E = {  {V1, V2}, {V1, V4}, {V1, V3}, {V2, V5}, {V2, V6}, {V3, V7}, {V4, V8}, {V7, V8}, {V5, V8}, {V6, V8}}
 
 Note  : Convention : {} represents an unordered pair , so undirected -> example : {V1, V2} represents an undirected edge between V1 and V2. () represents an ordered pair, so directed -> example : (V1, V2) represents a directed edge between V1 and V2
 
 Example of undirected graph :  A social network, as if A is a friend of B, B is a friend of A
 Example of directed graph : WWW . Pages are vertices, and hyperlinks are edges. If webpage A has a hyperlink to webpage B, it is not necessary that a vice versa link exists
 
 
 4) Weighted graphs - A special kind of graph where edges have weights. Example : to represent distances between cities, cities can be vertices, edges can be roads, in an undirected graph, and weights can be distance between vertices
 
 5) |V| is the number of vertices, |E| is the number of edges in a graph
 
 6) Special type of edges : 
     1) Self-edge or self-loop : Source and destination of edge are same - self loops or self edge can exist in both directed and undirected graphs  
     
     ![image.png](attachment:feaca9bb-3507-4612-b69c-6dd54ac826b3.png)
     
     Example of self-loop or self-edge : a webpage which hyperlinks to itself
     
     2) Multi-edge : more than one connection between two vertices. Can exist both in undirected or directed
     
     ![image.png](attachment:a9d3af2c-af91-489a-9bea-9d2c8e4fe44f.png)
     
Example of multi-edge : Where cities are vertices, edges are flights between cities. There can be more than 1 flight between two cities
     
     
     
     
     
     ![image.png](attachment:d2b7d64b-1eda-4231-ab84-75ef856f5aa1.png)
     
     
     
 7) Graphs with no self-edges or multi-edges are called simple graphs.
 
 
 8) For a directed simple graph, maximimum number of edges is n*(n-1) (nP2). For an undirected simple graph, maximum number of edges is n*(n-1)/2  (nC2)
 
 Therefore, O(maximum number of edges) ~ O(n^2) {Note for a binary tree, number of edges is O(n)}
 
 
 A dense graph has number of edges ~O(n^2), a sparse graph has number of edges ~ O(n)
 
 Dense graphs are stored as adjacency matrices, for a sparse graph, adjacency lists are used
 
 
 9) Path in a graph 
 
 A sequence of vertices which are connected by edges
 
 Simple path -> no vertices are not repeated
 
 A walk -> where vertices can be repeated
 
 A trail -> kind of walk where vertices can be repeated but edges cannot be
 
 Walk is most general
 
 
 
 Example :
 
 
 ![image.png](attachment:28bac5a0-2d2f-4020-86f9-4dbf8088699f.png)
 
 
 
<A,B,F,H> is a single path
 
 


![image.png](attachment:5c44a3a5-c97b-4a9d-8e1a-b0736a12544d.png)
 
 
 
 <A,B,F,H,E,B,A,D> is a walk, as vertices A and B are repeated, and edge {A,B} is also repeated
 
 
 ![image.png](attachment:f7d52dd6-86b5-4cb4-a069-15ec7e54f450.png)
 
 
 <A,B,E,H,D,A,C> is a trail as vertex A is repeated but no edges are repeated
 
 
 Note : Between 2 vertices, if there is a path where vertices are repeated, there is a path where vertices are not repeated
 
 
 In general, when we say path, we say simple path
 
 
 A walk is called a closed walk if it starts and ends at same vertex   and length > 0
 
 ![image.png](attachment:227f376b-b72a-40a5-9ba1-9772d9cb99ad.png)
 
 
 
 A closed walk with the additional condition that other than start and end vertex, no other vertex is repeated is called a simple cycle (often just called cycle) - the above graph is a cyle
 
 A graph with no cycle is called an acyclic graph
 
 For example, a simple binary tree is an acyclic graph, in which closed walk is possible but not a simple cycle
 
 ![image.png](attachment:f00e9081-8851-41a8-9808-c1df7807b760.png)
 
 The tree above is an undirected acyclic graph
 
 ![image.png](attachment:856521ed-0979-40d6-ad0c-8544495b3009.png)
 
 A directed acylic graph is above , its called a DAG
 
 
 10) Strongly connected graph : There exists a path in a directed graph between any vertex and any other vertex
 
 ![image.png](attachment:4d0dd3c1-0e2c-4b3a-b793-a5d7f732572b.png)
 
 
 is a strongly connected graph, as in a directed sense, we can go from any vertex to any other vertex
 
 
 
 
 ![image.png](attachment:dad2cd46-2c6d-40b3-8c79-afcc1222b875.png) 
 
 
 is not a strongly connected graph, as we can never go from C->A in a directed sense
 
 A  directed graph which is not strongly connected , if changed to an undirected graph is connected, is called weakly connected
 
 Example : above
 
 
 
 In an undirected graph, we say connected instead of strongly connected
 
 
 ![image.png](attachment:b487e112-a2c7-4ae2-b59a-b9b653f264d5.png)
 
 
 is a connected graph
 
 
 
 ![image.png](attachment:9a14b291-caa2-4fc9-bc36-149c97b60a29.png) is not connected

## Graph representation

### 1) Technique 1 - Edge list

Have two lists - One which lists all vertices - 
One for edges  - list of pairs of vertices which represent an edge - 

![image.png](attachment:a9d033c2-cbe8-4de1-ac5b-4f7fc26ac5e7.png)
 
 
Example :  Given an undirected graph like this with 8 vertices and 10 edges
 
 Vertices can be represented by V = {V1, V2, V3, V4, V5, V6, V7, V8}
 
 Edges can be represented as E = {  {V1, V2}, {V1, V4}, {V1, V3}, {V2, V5}, {V2, V6}, {V3, V7}, {V4, V8}, {V7, V8}, {V5, V8}, {V6, V8}}
 
 If there is a weight, we can store an Edge as {  {V1, V2, W1}, {V1, V4, W2}...}
 
 For undirected graphs, each edge is represented as a set as order does not matter {V1,V2}
 
 For directed graph, each edge can be represented as a class with start vertex and end vertex
 
![image.png](attachment:ebd44080-83e1-4ace-acee-c3389b572d2a.png)
 
 Memory / space complexity : O(|V|) for vertex list, O(|E|) for edge list -> totally O(|V| + |E|)
 
 Time complexity - 
 1) For a given vertex, we want to find all vertices joined to it - O(|E|) as we have to go through every edge, check which other vertex a given vertex is connected to. This is O(|E|)
 2) Find if a given pair of vertices is connected or not. This is again O(|E|)
 
 For a dense graph, if |V| ~ O(n^2) where n is |V|.
 
 Therefore this is not great in terms of time complexity

### 2) Technique 2 - Adjacency matrix

Have matrix of size |V|*|V|. At every position, for an unweighted graph, mark a 1 if the two vertices are connected, 0 if not. This yields a symmetric matrix for an undirected graph, and an unsymmetric matrix for a directed graph

![image.png](attachment:b99a5632-b4c1-4eca-9126-d085e2ff1b62.png)

As you see, compared to the edge list approach, time complexity is O(|V|) which will be O(n) instead of O(n^2) for figuring out number of nodes connected to a node, etc


Two see if two nodes are connected, if nodes are passed as indices, we need to look at A[i][j] to see if i and j are connected. This is O(1) !

If names of vertices are given instead of indices, finding index from name is O(|V|) if we have list of names. Instead , if we have a hash map of names vs indices, we get O(1)


If we want to store weighted graph in an adjacency matrix, Instead of Aij=1 or 0 in an unweighted graph, we can represent Aij  = wi. For unconnected vertices, we use a  number not possible as default value such as 10^6 etc

Memory is high here though. We are using O(|V|^2)

If a graph is dense, this is great. If a graph is sparse, we are wasting a lot of memory storing 0

Most graphs with large number of vertices will not be dense. For example - a social network with a billion users. But everyone in the network is never a friend with everyone else. A user on an average will not have more than 1000 Friends

Therefore, you will have 10^9 users * 10^3 edges per user = (10^12/2) edges in the graph (as undirected) << (10^9)^2

Therefore using an adjacency matrix here will cause of wastage

### 3) Technique 3 - Adjacency List

As we saw , adjacency graph is very efficient in time complexity, but not in memory complexity, especially if matrix is sparse

Let's take the first row (called 0 since 0 indexing) iun adjacency matrix above

![image.png](attachment:fee82947-6557-4c9f-85e3-40133d985382.png)

We have a vector of 1's and 0's . 1's where indices are connected to the 0 index, and 0 where indices are not. Since most graphs are sparse, storing 0 for connections which don't exist is unnecessary, we can live without it


If you think of a social network where there are a billion users, for any user, we will have a list with only ~10^3 1's (an average user has 1K friends) and 10^9 - 10^3 0's

So, instead of having indices of graph as array index, can we , for every node, just store the other nodes it connects to as a list ?

![image.png](attachment:68942d96-b8d0-4f90-8f21-0e3b77d0a340.png)

Instead of the first representation, we use the second

In the first, index of list represents node of graph, value represents if nodes are connected or not

In second, index of list is arbitrary , values represent the nodes connected to current element

Instead of list in second way, we can also store connections of node as a linked list, or a BST

![image.png](attachment:03b8fd67-76c9-4c8c-8b47-6245f48c801d.png)

Therefore, instead of the more memory intensive representation on the left, we get a more compact representation on the right

We can represent A[0] as an array of length 3, A[1] an array of length 2 and so on

Each row is an array of different size, depending on number of nodes

In this representation, memory ~ O(|E|) << O(||V||^2) for sparse graphs


To find if two nodes are connected or not, time complexity is O(|V|) in worst case, unlike adjacency matrix which is O(1). (If we use a binary search instead of a list at each position , it will be O(log |V|), but for this we have to keep array always sorted

However, this is in worst case, for a sparse matrix, a given list will have length << O(|V|)

FInding all neighbours of a node is O(|V|) which is also true in adjacency matrix

An example : 

![image.png](attachment:b6147183-0959-4e0a-8d03-160480906751.png)

For most sparse real world graphs, adjacency list is better

What about insertion of new edge ?

In adjacency matrix, its going to right Aij position and changing 0 to 1

For adjacency list -> if we want to insert a new edge, that means going to position of start vertex, and adding element to array . Since arrays can't be easily dynamically resized (as that's expensive),

we can use linked list like . Use an array of pointers of size |V|, a

![image.png](attachment:89bdb36f-ecc6-498e-b155-f0ea0f04431b.png)

To store weight, we just add one more value to LL

![image.png](attachment:0cafdc79-ec03-487b-8431-94bfd5b58ca1.png)

Space complexity is O(|E| + |V|) ~ O (|E|) since usually number of edges >> number of vertices

## Implementing graph in python

Implement the graph below using all the 3 techniques listed above - list of edges, adjacency matrix, adjacency list
Note that this graph has a self-edge. In addition, it is a weighted graph

![image.png](attachment:f59f57f3-4d8b-40e7-8524-568964c8aa5a.png)

## Using list of edges

Note that this implementation just stores list of edges. It does not store list of vertices

In [2]:
class Graph(object):
    
    def __init__(self, number_of_nodes = None, is_directed=True):
        self.m_number_of_nodes = number_of_nodes
        self.is_directed = is_directed
        
        self.m_list_of_edges = []
        
    def add_edge(self, node1, node2, weight=1):
        self.m_list_of_edges.append([node1, node2, weight])
        
        if not self.is_directed : ## if graph is not directed, add reverse too
            self.m_list_of_edges.append([node2, node1, weight])
            
    def print_edge_list(self):
        num_of_edges = len(self.m_list_of_edges)
        for i in range(num_of_edges):
            print("edge ", i+1, ": ", self.m_list_of_edges[i])
            
        

In [4]:
graph = Graph(5)

In [5]:
graph.add_edge(0, 0, 25)
graph.add_edge(0, 1, 5)
graph.add_edge(0, 2, 3)
graph.add_edge(1, 3, 1)
graph.add_edge(1, 4, 15)
graph.add_edge(4, 2, 7)
graph.add_edge(4, 3, 11)

In [8]:
graph.print_edge_list()

edge  1 :  [0, 0, 25]
edge  2 :  [0, 1, 5]
edge  3 :  [0, 2, 3]
edge  4 :  [1, 3, 1]
edge  5 :  [1, 4, 15]
edge  6 :  [4, 2, 7]
edge  7 :  [4, 3, 11]


## Using adjacency matrices

Note that the interface is the same for constructor and add_edge, instead of print_edge_list, we have print_adjacency_matrix

In [9]:
class Graph(object):
    
    def __init__(self, number_of_nodes = None, is_directed=True):
        self.m_number_of_nodes = number_of_nodes
        self.is_directed = is_directed
        
        # Initialize the adjacency matrix
        # Create a matrix with `num_of_nodes` rows and columns
        self.m_adj_matrix = [[0 for column in range(number_of_nodes)] 
                            for row in range(number_of_nodes)]
        
        self.m_list_of_edges = []
        
    def add_edge(self, node1, node2, weight=1):
        self.m_adj_matrix[node1][node2] = weight
        
        if not self.is_directed : ## if graph is not directed, add reverse too
            self.m_adj_matrix[node2][node1] = weight
            
    def print_adj_matrix(self):
        print(self.m_adj_matrix)
            

In [10]:
graph = Graph(5)

In [11]:
graph.add_edge(0, 0, 25)
graph.add_edge(0, 1, 5)
graph.add_edge(0, 2, 3)
graph.add_edge(1, 3, 1)
graph.add_edge(1, 4, 15)
graph.add_edge(4, 2, 7)
graph.add_edge(4, 3, 11)

In [12]:
graph.print_adj_matrix()

[[25, 5, 3, 0, 0], [0, 0, 0, 1, 15], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 7, 11, 0]]


## Using adjacency lists

Note that the interface is the same for constructor and add_edge, instead of print_edge_list, we have print_adjacency_list

We use a dictionary of tuples
Keys of dictionarys are each vertex. Each key contains a  set of tuples of target node and weight

For example : For node 0 above, we will have dict[0] = {(0,25), (1,5), (2,3)}

In [13]:
class Graph(object):
    
    def __init__(self, number_of_nodes = None, is_directed=True):
        self.m_number_of_nodes = number_of_nodes
        self.is_directed = is_directed
        
        # Initialize the adjacency matrix
        # Create a matrix with `num_of_nodes` rows and columns
        self.m_adj_list = {}
        for i in range(number_of_nodes):
            self.m_adj_list[i] = set()
        
        self.m_list_of_edges = []
        
    def add_edge(self, node1, node2, weight=1):
        self.m_adj_list[node1].add((node2, weight))
        
        if not self.is_directed : ## if graph is not directed, add reverse too
            self.m_adj_list[node2].add((node1, weight))
            
    def print_adj_list(self):
        print(self.m_adj_list)

In [14]:
graph = Graph(5)

In [15]:
graph.add_edge(0, 0, 25)
graph.add_edge(0, 1, 5)
graph.add_edge(0, 2, 3)
graph.add_edge(1, 3, 1)
graph.add_edge(1, 4, 15)
graph.add_edge(4, 2, 7)
graph.add_edge(4, 3, 11)

In [16]:
graph.print_adj_list()

{0: {(2, 3), (0, 25), (1, 5)}, 1: {(3, 1), (4, 15)}, 2: set(), 3: set(), 4: {(2, 7), (3, 11)}}


## References

1) https://www.youtube.com/watch?v=gXgEDyodOJU
2) https://stackabuse.com/courses/graphs-in-python-theory-and-implementation/lessons/representing-graphs-in-code/