# Some Data Structures

## Big O Notation

In programming, an algorithm is a process or set of rules to be followed in order to achieve a particular goal. An algorithm is characterized by its running time (run-time), whether in terms of space or time. As data scientists, we are interested in the most efficient algorithm so that we can optimize our workflow.

In computer science, Big O notation is used to describe how ‘fast’ an algorithm grows, by comparing the number of operations within the algorithm. This will be explained in further detail later on but for now, let’s understand all of the formal notation.

**Formal Notation**

- Big Ω: the best-case scenario. The Big Ω of an algorithm describes how quickly an algorithm can run under the best of circumstances.
- Big O: the worst-case scenario. Typically, we are most concerned with the Big O time because we are interested in how slowly a given algorithm will run, at worst. How do we essentially make the ‘worst-case’ not as bad as it could be?
- Big θ: this can only be used to describe the run-time of an algorithm if the Big Ω and the Big O are the same. That is, the algorithm’s run time is the same in both the best and worst cases.

Because we are most concerned with the Big O of an algorithm, the rest of this post will only focus on Big O.

**How do we use Big O to describe an algorithm? Suppose you wish to search for someone’s name in a phone book. **

What’s the most straightforward way of finding this person? Well, you could go through every single name in the phone book until you find your target. This is known as a simple search.

If the phone book is very small, with only 10 names, this is a fairly fast process. But what if there are 1,000 names in the phone book?

At best, your target’s name is at the front of the list and you only need to need to check the first item. At worst, your target’s name is at the very end of the phone book and you will need to have searched all 1000 names. As the ‘dataset’ (or the phone book) increases in size, the maximum time it takes to run a simple search also linearly increases.

In this case, our algorithm is a simple search. Big O notation allows us to describe what our worst case is. Our worst case is that we will have to search through all elements (n) in the phone book. We can describe the run-time as:

    O(n) where n: number of operations
    
Because the maximum number of operations is equal to the maximum number of elements in our phone book (you might need to search through them all to find your target’s name), we say the Big O of a simple search is O(n). A simple search will never be slower than O(n) time.

### Different Big O Run Times

Different algorithms have different run-times. That is, algorithms grow at different rates. The most common Big O run-times, from fastest to slowest, are:

- O(log n): aka log time
- O(n): aka linear time
- O(n log n)
- O(n²)
- O(n!)

The Big O cheatsheet is also very useful for a quick graphical representation of the different run times and how they compare to each other.

<table style="width:100%">
  <tr>
    <th><img src="photos/oNot.png" alt="Drawing" style="width:800px;"/></th>
  </tr>
</table>

## Linked List

A linked list is a data structure that represents a sequence of nodes. In a singly linked list, each node
points to the next node in the linked list. A doubly linked list gives each node pointers to both the next
node and the previous node.

The following diagram depicts a singly linked list:

<table style="width:100%">
  <tr>
    <th><img src="photos/Linkedlist.png" alt="Drawing" style="width:600px;"/></th>
  </tr>
</table>

In [1]:
# Node class 
class Node:   
    # Function to initialize the node object 
    def __init__(self, data):
        self.data = data
        self.next = None

In [2]:
# Linked List class 
class LinkedList:
    # Function to initialize the Linked List object
    def __init__(self):
        self.head = None

    def print_list(self):
        cur_node = self.head
        # Print until the last node
        while cur_node:
            print(cur_node.data)
            cur_node = cur_node.next

    # Function to insert a new node at the end
    def append(self, data):
        # Allocate the Node & 
        # Put in the data 
        new_node = Node(data)

        if self.head is None:
            self.head = new_node
            return

        # Start the last node as head
        last_node = self.head
        while last_node.next:
            last_node = last_node.next
        # Move the end node to point to new Node
        last_node.next = new_node
        
    # Function to insert a new node at the beginning
    def prepend(self, data):
        new_node = Node(data)

        # Make next of new Node as head
        new_node.next = self.head
        
        # Move the head to point to new Node
        self.head = new_node

    def insert_after_node(self, prev_node, data):

        # check if the given prev_node exists
        if not prev_node:
            print("Previous node is not in the list")
            return 

        #  Create new node & 
        #  Put in the data 
        new_node = Node(data)

        # Make next of new Node as next of prev_node
        new_node.next = prev_node.next
        
        # make next of prev_node as new_node
        prev_node.next = new_node

In [3]:
llist = LinkedList()
llist.append("A")
llist.append("B")
llist.append("C")
llist.append("D")

#llist.prepend("E")
llist.insert_after_node(llist.head.next, "E")

llist.print_list()

A
B
E
C
D


In [4]:
# Python program to delete a node from linked list 
  
class LinkedList: 
  
    # Function to initialize head 
    def __init__(self): 
        self.head = None
  
    # Function to insert a new node at the beginning 
    def push(self, new_data): 
        new_node = Node(new_data) 
        new_node.next = self.head 
        self.head = new_node 
  
    # Given a reference to the head of a list and a key, 
    # delete the first occurence of key in linked list 
    def deleteNode(self, key): 
          
        # Store head node 
        temp = self.head 
  
        # If head node itself holds the key to be deleted 
        if (temp is not None): 
            if (temp.data == key): 
                self.head = temp.next
                temp = None
                return
  
        # Search for the key to be deleted, keep track of the 
        # previous node as we need to change 'prev.next' 
        while(temp is not None): 
            if temp.data == key: 
                break 
            prev = temp 
            temp = temp.next 
  
        # if key was not present in linked list 
        if(temp == None): 
            return 
  
        # Unlink the node from linked list 
        prev.next = temp.next 
  
        temp = None 
    
  # Utility function to print the linked LinkedList 
    def print_list(self):
        cur_node = self.head
        # Print until the last node
        while cur_node:
            print(cur_node.data)
            cur_node = cur_node.next

In [5]:
# Driver program 
llist = LinkedList() 
llist.push(7) 
llist.push(1) 
llist.push(3) 
llist.push(2) 
  
print ("Created Linked List: ")
llist.print_list() 
llist.deleteNode(1)  
print ("\nLinked List after Deletion of 1:")
llist.print_list() 

Created Linked List: 
2
3
1
7

Linked List after Deletion of 1:
2
3
7


## Stack 

Stack is a linear data structure which follows a particular order in which the operations are performed. The order may be LIFO(Last In First Out) or FILO(First In Last Out).

There are many real-life examples of a stack. Consider an example of plates stacked over one another in the canteen. The plate which is at the top is the first one to be removed, i.e. the plate which has been placed at the bottommost position remains in the stack for the longest period of time. So, it can be simply seen to follow LIFO(Last In First Out)/FILO(First In Last Out) order.



<table style="width:100%">
  <tr>
    <th><img src="photos/stack.png" alt="Drawing" style="width:600px;"/></th>
  </tr>
</table>

In [6]:
# Python program for implementation of stack 
  
# import maxsize from sys module  
# Used to return -infinite when stack is empty 
from sys import maxsize 
  
# Function to create a stack. It initializes size of stack as 0 
def createStack(): 
    stack = [] 
    return stack 
  
# Stack is empty when stack size is 0 
def isEmpty(stack): 
    return len(stack) == 0
  
# Function to add an item to stack. It increases size by 1 
def push(stack, item): 
    stack.append(item) 
    print(item + " pushed to stack ") 
      
# Function to remove an item from stack. It decreases size by 1 
def pop(stack): 
    if (isEmpty(stack)): 
        return str(-maxsize -1) #return minus infinite 
      
    return stack.pop() 

In [7]:
# Driver program to test above functions     
stack = createStack() 
push(stack, str(10)) 
push(stack, str(20)) 
push(stack, str(30)) 
print(pop(stack) + " popped from stack") 

10 pushed to stack 
20 pushed to stack 
30 pushed to stack 
30 popped from stack


### Python program to reverse a string using stack 

In [1]:
# Function to create an empty stack.  
# It initializes size of stack as 0 
def createStack(): 
    stack=[] 
    return stack 
  
# Function to determine the size of the stack 
def size(stack): 
    return len(stack) 
  
# Stack is empty if the size is 0 
def isEmpty(stack): 
    if size(stack) == 0: 
        return true 

# Function to add an item to stack .  
# It increases size by 1  
def push(stack,item): 
    stack.append(item) 

#Function to remove an item from stack.  
# It decreases size by 1 
def pop(stack): 
    if isEmpty(stack): return
    return stack.pop() 
  
# A stack based function to reverse a string 
def reverse(string): 
    n = len(string) 
      
    # Create a empty stack 
    stack = createStack() 
  
    # Push all characters of string to stack 
    for i in range(0,n,1): 
        push(stack,string[i]) 
  
    # Making the string empty since all  
    #characters are saved in stack  
    string="" 
  
    # Pop all characters of string and  
    # put them back to string 
    for i in range(0,n,1): 
        string+=pop(stack) 
          
    return string 

In [2]:
      
# Driver program to test above functions 
string="ArtificialIntelligence"
string = reverse(string) 
print("Reversed string is " + string)

Reversed string is ecnegilletnIlaicifitrA


## Queue 

A Queue is a linear structure which follows a particular order in which the operations are performed. The order is First In First Out (FIFO). A good example of a queue is any queue of consumers for a resource where the consumer that came first is served first. The difference between stacks and queues is in removing. In a stack we remove the item the most recently added; in a queue, we remove the item the least recently added.


<table style="width:100%">
  <tr>
    <th><img src="photos/Queue.png" alt="Drawing" style="width:600px;"/></th>
  </tr>
</table>

In [9]:
# Class Queue to represent a queue 
class Queue: 
  
    # __init__ function 
    def __init__(self, capacity): 
        self.front = self.size = 0
        self.rear = capacity - 1
        self.Q = [None]*capacity 
        self.capacity = capacity 
      
    # Queue is full when size becomes 
    # equal to the capacity  
    def isFull(self): 
        return self.size == self.capacity 
      
    # Queue is empty when size is 0 
    def isEmpty(self): 
        return self.size == 0
  
    # Function to add an item to the queue.  
    # It changes rear and size 
    def EnQueue(self, item): 
        if self.isFull(): 
            print("Full") 
            return
        self.rear = (self.rear + 1) % (self.capacity) 
        self.Q[self.rear] = item 
        self.size = self.size + 1
        print("%s enqueued to queue"  % str(item)) 
  
    # Function to remove an item from queue.  
    # It changes front and size 
    def DeQueue(self): 
        if self.isEmpty(): 
            print("Empty") 
            return
          
        print("%s dequeued from queue" % str(self.Q[self.front])) 
        self.front = (self.front + 1) % (self.capacity) 
        self.size = self.size - 1
          
    # Function to get front of queue 
    def que_front(self): 
        if self.isEmpty(): 
            print("Queue is empty") 
  
        print("Front item is", self.Q[self.front]) 
          
    # Function to get rear of queue 
    def que_rear(self): 
        if self.isEmpty(): 
            print("Queue is empty") 
        print("Rear item is",  self.Q[self.rear]) 


In [10]:
# Driver Code 
queue = Queue(30) 
queue.EnQueue(10) 
queue.EnQueue(20) 
queue.EnQueue(30) 
queue.EnQueue(40) 
queue.DeQueue() 
queue.que_front() 
queue.que_rear() 

10 enqueued to queue
20 enqueued to queue
30 enqueued to queue
40 enqueued to queue
10 dequeued from queue
Front item is 20
Rear item is 40


## Concrete Data Structures

How do we reconcile these abstract structures, the stack and queue, with data structures we encounter regularly?

### Array
An array is comprised of a linear collection of items (known as elements), stored contiguously in memory. Any given element can be accessed using a numerical index which points to the element’s location in the array. In Python native, this is implemented with the **list** type.

### Hash Tables
In a hash table, unique keys are mapped to values. In Python native, a **dictionary** is the implementation of a hash table.

# Graph

A Graph is a non-linear data structure consisting of nodes and edges. The nodes are sometimes also referred to as vertices and the edges are lines or arcs that connect any two nodes in the graph. More formally a Graph can be defined as,

A Graph consists of a finite set of vertices (or nodes) and set of Edges which connect a pair of nodes.

<table style="width:100%">
  <tr>
    <th><img src="photos/undirectedgraph.png" alt="Drawing" style="width:600px;"/></th>
  </tr>
</table>

In the above Graph, the set of vertices V = {0,1,2,3,4} and the set of edges E = {01, 12, 23, 34, 04, 14, 13}.

Graphs are used to solve many real-life problems. Graphs are used to represent networks. The networks may include paths in a city or telephone network or circuit network. Graphs are also used in social networks like linkedIn, Facebook. For example, in Facebook, each person is represented with a vertex (or node). Each node is a structure and contains information like person id, name, gender, locale etc.

Following two are the most commonly used representations of a graph.
1. Adjacency Matrix
2. Adjacency List

There are other representations also like, Incidence Matrix and Incidence List. The choice of the graph representation is situation specific. It totally depends on the type of operations to be performed and ease of use.

### Adjacency Matrix:
Adjacency Matrix is a 2D array of size $V x V$ where $V$ is the number of vertices in a graph. Let the 2D array be $adj[][]$, a slot $adj[i][j] = 1$ indicates that there is an edge from vertex i to vertex j. Adjacency matrix for undirected graph is always symmetric. Adjacency Matrix is also used to represent weighted graphs. If $adj[i][j] = w$, then there is an edge from vertex i to vertex j with weight w.

The adjacency matrix for the above example graph is:

<table style="width:100%">
  <tr>
    <th><img src="photos/adjacencymatrix.png" alt="Drawing" style="width:300px;"/></th>
  </tr>
</table>

### Adjacency List:
An array of lists is used. Size of the array is equal to the number of vertices. Let the array be $array[]$. An entry $array[i]$ represents the list of vertices adjacent to the $i_{th}$ vertex. This representation can also be used to represent a weighted graph. The weights of edges can be represented as lists of pairs. Following is adjacency list representation of the above graph.

<table style="width:100%">
  <tr>
    <th><img src="photos/listadjacency.png" alt="Drawing" style="width:600px;"/></th>
  </tr>
</table>

### A Python program to demonstrate the adjacency list representation of the graph

In [11]:
""" 
A Python program to demonstrate the adjacency 
list representation of the graph 
"""
  
# A class to represent the adjacency list of the node 
class AdjNode: 
    def __init__(self, data): 
        self.vertex = data 
        self.next = None
  
  
# A class to represent a graph. A graph 
# is the list of the adjacency lists. 
# Size of the array will be the no. of the 
# vertices "V" 

In [12]:
class Graph: 
    def __init__(self, vertices): 
        self.V = vertices 
        self.graph = [None] * self.V 
  
    # Function to add an edge in an undirected graph 
    def add_edge(self, src, dest): 
        # Adding the node to the source node 
        node = AdjNode(dest) 
        node.next = self.graph[src] 
        self.graph[src] = node 
  
        # Adding the source node to the destination as 
        # it is the undirected graph 
        node = AdjNode(src) 
        node.next = self.graph[dest] 
        self.graph[dest] = node 
  
    # Function to print the graph 
    def print_graph(self): 
        for i in range(self.V): 
            print("Adjacency list of vertex {}\n head".format(i), end="") 
            temp = self.graph[i] 
            while temp: 
                print(" -> {}".format(temp.vertex), end="") 
                temp = temp.next
            print(" \n") 

In [13]:
V = 5
graph = Graph(V) 
graph.add_edge(0, 1) 
graph.add_edge(0, 4) 
graph.add_edge(1, 2) 
graph.add_edge(1, 3) 
graph.add_edge(1, 4) 
graph.add_edge(2, 3) 
graph.add_edge(3, 4) 
  
graph.print_graph() 

Adjacency list of vertex 0
 head -> 4 -> 1 

Adjacency list of vertex 1
 head -> 4 -> 3 -> 2 -> 0 

Adjacency list of vertex 2
 head -> 3 -> 1 

Adjacency list of vertex 3
 head -> 4 -> 2 -> 1 

Adjacency list of vertex 4
 head -> 3 -> 1 -> 0 



<table style="width:100%">
  <tr>
    <th><img src="photos/mcs1.png" alt="Drawing" style="width:1000px;"/></th>
  </tr>
</table>

<table style="width:100%">
  <tr>
    <th><img src="photos/mcs2.png" alt="Drawing" style="width:1000px;"/></th>
  </tr>
</table>

## Big O Times for Data Structures

For any given data structures, there are different big O run times associated with access, search, insertion, and deletion.

For example, for an array, if any given element is deleted, all subsequent elements have to be shifted accordingly. Accessing an array is very quick (O(1)) because the elements are arranged contiguously in memory and they can be accessed via index. But the contiguous arrangement makes the big-O time associated with deletion much worse.

<table style="width:100%">
  <tr>
    <th><img src="photos/dataStO.png" alt="Drawing" style="width:1000px;"/></th>
  </tr>
</table>

# Sort

## Selection Sort
Much like simple search for search algorithms, selection sort is perhaps the most straightforward, ‘brute force’ way to sort your data. Essentially, you go through every element in your list and append each element to a new list in your desired order. For example, if you are interested in sorting a list of numbers from greatest to smallest, you would:

1. Search through the list to find the largest number
2. Add that number to a new list or replace it in that list
3. Go to the original list, search through it again to find the next largest number
4. Add that number to the new list and so on…

For selection sort, you have to go through each item in the list (this takes n times, just as it would for a simple search) and you have to do this n times (not just once, because you have to keep going back to the original list to find the next item you want to add to the new list). Thus, this takes **$O(n^2)$** time.

In [5]:
# Python program for implementation of Selection 
# Sort 
import sys 
A = [64, 25, 12, 22, 11] 
  
# Traverse through all array elements 
for i in range(len(A)): 
      
    # Find the minimum element in remaining  
    # unsorted array 
    min_idx = i 
    for j in range(i+1, len(A)): 
        if A[min_idx] > A[j]: 
            min_idx = j 
              
    # Swap the found minimum element with  
    # the first element         
    A[i], A[min_idx] = A[min_idx], A[i] 
    
# Driver code to test above 
print ("Sorted array") 
for i in range(len(A)): 
    print("%d" %A[i]),  

Sorted array
11
12
22
25
64


## Quicksort
How would quicksort differ from selection sort? If we work with a list of numbers, just as before:

1. Pick an element from your list, known as the pivot. The selection of a pivot is important in determining how quickly a quicksort algorithm will run. For now, we can select the last element each time as the pivot. (For additional information on pivot selection, I recommend the Stanford Coursera algorithms course.)
2. Partition the list so that all numbers smaller than the pivot are to its left and all numbers greater than the pivot are to its right.
3. For each ‘half’ of the list, you can treat it as a new list with a new pivot and rearrange each half until it is sorted.

<table style="width:100%">
  <tr>
    <th><img src="photos/quickS.png" alt="Drawing" style="width:600px;"/></th>
  </tr>
</table>

Quicksort is an example of a Divede&Conquer algorithm because it divides the original list into smaller and smaller lists which are ordered. These smaller, ordered lists are then combined to result in a larger, ordered list.

Quicksort is unique because its speed is dependent on the pivot selection. At worst, it can take **$O(n^2)$** time, which is as slow as selection sort. However, if the pivot is always some random element in the list, quicksort runs in **O(n log n)** time on average.

In [6]:
# Python program for implementation of Quicksort Sort 
  
# This function takes last element as pivot, places 
# the pivot element at its correct position in sorted 
# array, and places all smaller (smaller than pivot) 
# to left of pivot and all greater elements to right 
# of pivot 
def partition(arr,low,high): 
    i = ( low-1 )         # index of smaller element 
    pivot = arr[high]     # pivot 
  
    for j in range(low , high): 
  
        # If current element is smaller than or 
        # equal to pivot 
        if   arr[j] <= pivot: 
          
            # increment index of smaller element 
            i = i+1 
            arr[i],arr[j] = arr[j],arr[i] 
  
    arr[i+1],arr[high] = arr[high],arr[i+1] 
    return ( i+1 ) 
  
# The main function that implements QuickSort 
# arr[] --> Array to be sorted, 
# low  --> Starting index, 
# high  --> Ending index 
  
# Function to do Quick sort 
def quickSort(arr,low,high): 
    if low < high: 
  
        # pi is partitioning index, arr[p] is now 
        # at right place 
        pi = partition(arr,low,high) 
  
        # Separately sort elements before 
        # partition and after partition 
        quickSort(arr, low, pi-1) 
        quickSort(arr, pi+1, high) 

In [7]:
  
# Driver code to test above 
arr = [10, 7, 8, 9, 1, 5] 
n = len(arr) 
quickSort(arr,0,n-1) 
print ("Sorted array is:") 
for i in range(n): 
    print ("%d" %arr[i]), 

Sorted array is:
1
5
7
8
9
10


## Merge Sort

Like QuickSort, Merge Sort is a Divide and Conquer algorithm. It divides input array in two halves, calls itself for the two halves and then merges the two sorted halves. The merge() function is used for merging two halves. The merge(arr, l, m, r) is key process that assumes that arr[l..m] and arr[m+1..r] are sorted and merges the two sorted sub-arrays into one.

MergeSort(arr[], l,  r)
If r > l
1. Find the middle point to divide the array into two halves:  
             middle m = (l+r)/2
2. Call mergeSort for first half:   
             Call mergeSort(arr, l, m)
3. Call mergeSort for second half:
             Call mergeSort(arr, m+1, r)
4. Merge the two halves sorted in step 2 and 3:
             Call merge(arr, l, m, r)
             
The following diagram from wikipedia shows the complete merge sort process for an example array {38, 27, 43, 3, 9, 82, 10}. If we take a closer look at the diagram, we can see that the array is recursively divided in two halves till the size becomes 1. Once the size becomes 1, the merge processes comes into action and starts merging arrays back till the complete array is merged.

Mergesort runs on **O(n log n)**) time because the entire list is halved **(O(log n))** and this is done for n items.

<table style="width:100%">
  <tr>
    <th><img src="photos/mergeS.png" alt="Drawing" style="width:800px;"/></th>
  </tr>
</table>

In [8]:
# Python program for implementation of MergeSort 
def mergeSort(arr): 
    if len(arr) >1: 
        mid = len(arr)//2 #Finding the mid of the array 
        L = arr[:mid] # Dividing the array elements  
        R = arr[mid:] # into 2 halves 
  
        mergeSort(L) # Sorting the first half 
        mergeSort(R) # Sorting the second half 
  
        i = j = k = 0
          
        # Copy data to temp arrays L[] and R[] 
        while i < len(L) and j < len(R): 
            if L[i] < R[j]: 
                arr[k] = L[i] 
                i+=1
            else: 
                arr[k] = R[j] 
                j+=1
            k+=1
          
        # Checking if any element was left 
        while i < len(L): 
            arr[k] = L[i] 
            i+=1
            k+=1
          
        while j < len(R): 
            arr[k] = R[j] 
            j+=1
            k+=1

# Code to print the list 
def printList(arr): 
    for i in range(len(arr)):         
        print(arr[i],end=" ") 
    print() 
  

In [9]:
# driver code to test the above code 
if __name__ == '__main__': 
    arr = [12, 11, 13, 5, 6, 7]  
    print ("Given array is", end="\n")  
    printList(arr) 
    mergeSort(arr) 
    print("Sorted array is: ", end="\n") 
    printList(arr) 

Given array is
12 11 13 5 6 7 
Sorted array is: 
5 6 7 11 12 13 


<table style="width:100%">
  <tr>
    <th><img src="photos/sort.png" alt="Drawing" style="width:1000px;"/></th>
  </tr>
</table>

# Breadth First Search or BFS for a Graph

Breadth First Traversal (or Search) for a graph is similar to Breadth First Traversal of a tree (See method 2 of this post). The only catch here is, unlike trees, graphs may contain cycles, so we may come to the same node again. To avoid processing a node more than once, we use a boolean visited array. For simplicity, it is assumed that all vertices are reachable from the starting vertex.

For example, in the following graph, we start traversal from vertex 2. When we come to vertex 0, we look for all adjacent vertices of it. 2 is also an adjacent vertex of 0. If we don’t mark visited vertices, then 2 will be processed again and it will become a non-terminating process. A Breadth First Traversal of the following graph is 2, 0, 3, 1.

<table style="width:100%">
  <tr>
    <th><img src="photos/bfs-5.png" alt="Drawing" style="width:600px;"/></th>
  </tr>
</table>

In [14]:
# Python3 Program to print BFS traversal 
# from a given source vertex. BFS(int s) 
# traverses vertices reachable from s. 
from collections import defaultdict 
  
# This class represents a directed graph 
# using adjacency list representation 
class Graph: 
  
    # Constructor 
    def __init__(self): 
  
        # default dictionary to store graph 
        self.graph = defaultdict(list) 
  
    # function to add an edge to graph 
    def addEdge(self,u,v): 
        self.graph[u].append(v) 
  
    # Function to print a BFS of graph 
    def BFS(self, s): 
  
        # Mark all the vertices as not visited 
        visited = [False] * (len(self.graph)) 
  
        # Create a queue for BFS 
        queue = [] 
  
        # Mark the source node as  
        # visited and enqueue it 
        queue.append(s) 
        visited[s] = True
  
        while queue: 
  
            # Dequeue a vertex from  
            # queue and print it 
            s = queue.pop(0) 
            print (s, end = " ") 
  
            # Get all adjacent vertices of the 
            # dequeued vertex s. If a adjacent 
            # has not been visited, then mark it 
            # visited and enqueue it 
            for i in self.graph[s]: 
                if visited[i] == False: 
                    queue.append(i) 
                    visited[i] = True

In [15]:
# Driver code 
  
# Create a graph given in 
# the above diagram 
g = Graph() 
g.addEdge(0, 1) 
g.addEdge(0, 2) 
g.addEdge(1, 2) 
g.addEdge(2, 0) 
g.addEdge(2, 3) 
g.addEdge(3, 3) 
  
print ("Following is Breadth First Traversal"
                  " (starting from vertex 2)") 
g.BFS(2)

Following is Breadth First Traversal (starting from vertex 2)
2 0 3 1 

# Depth First Search or DFS for a Graph
Depth First Traversal (or Search) for a graph is similar to Depth First Traversal of a tree. The only catch here is, unlike trees, graphs may contain cycles, so we may come to the same node again. To avoid processing a node more than once, we use a boolean visited array.

For example, in the following graph, we start traversal from vertex 2. When we come to vertex 0, we look for all adjacent vertices of it. 2 is also an adjacent vertex of 0. If we don’t mark visited vertices, then 2 will be processed again and it will become a non-terminating process. A Depth First Traversal of the following graph is 2, 0, 1, 3.

<table style="width:100%">
  <tr>
    <th><img src="photos/cycle.png" alt="Drawing" style="width:600px;"/></th>
  </tr>
</table>

In [16]:
# Python program to print DFS traversal from a 
# given given graph 
from collections import defaultdict 
  
# This class represents a directed graph using 
# adjacency list representation 
class Graph: 
  
    # Constructor 
    def __init__(self): 
  
        # default dictionary to store graph 
        self.graph = defaultdict(list) 
  
    # function to add an edge to graph 
    def addEdge(self,u,v): 
        self.graph[u].append(v) 
  
    # A function used by DFS 
    def DFSUtil(self,v,visited): 
  
        # Mark the current node as visited and print it 
        visited[v]= True
        print (v)
  
        # Recur for all the vertices adjacent to this vertex 
        for i in self.graph[v]: 
            if visited[i] == False: 
                self.DFSUtil(i, visited) 
  
  
    # The function to do DFS traversal. It uses 
    # recursive DFSUtil() 
    def DFS(self,v): 
  
        # Mark all the vertices as not visited 
        visited = [False]*(len(self.graph)) 
  
        # Call the recursive helper function to print 
        # DFS traversal 
        self.DFSUtil(v,visited) 
  

In [17]:
# Driver code 
# Create a graph given in the above diagram 
g = Graph() 
g.addEdge(0, 1) 
g.addEdge(0, 2) 
g.addEdge(1, 2) 
g.addEdge(2, 0) 
g.addEdge(2, 3) 
g.addEdge(3, 3) 
  
print ("Following is DFS from (starting from vertex 2)")
g.DFS(2) 

Following is DFS from (starting from vertex 2)
2
0
1
3


### BFS vs. DFS

The use of BFS and DFS (and associated run times) truly vary depending on the data and the graph/tree structure.

Time complexity is the same for both algorithms. In both BFS and DFS, every node is visited but only once. The big-O time is **O(n)** (for every node in the tree).

However, the space complexity for these algorithms varies.

For BFS, which traverses all nodes at a given depth in the tree and uses a queue implementation, the width of the tree matters. The space complexity for **BFS is O(w)** where w is the maximum width of the tree.

For DFS, which goes along a single ‘branch’ all the way down and uses a stack implementation, the height of the tree matters. The space complexity for **DFS is O(h)** where h is the maximum height of the tree.

<table style="width:100%">
  <tr>
    <th><img src="photos/tree.png" alt="Drawing" style="width:600px;"/></th>
  </tr>
</table>

# Dijkstra’s shortest path algorithm

Given a graph and a source vertex in the graph, find shortest paths from source to all vertices in the given graph.

We generate a SPT (shortest path tree) with given source as root. We maintain two sets, one set contains vertices included in shortest path tree, other set includes vertices not yet included in shortest path tree. At every step of the algorithm, we find a vertex which is in the other set (set of not yet included) and has a minimum distance from the source.

Below are the detailed steps used in Dijkstra’s algorithm to find the shortest path from a single source vertex to all other vertices in the given graph.

**Algorithm <br>**
1) Create a set sptSet (shortest path tree set) that keeps track of vertices included in shortest path tree, i.e., whose minimum distance from source is calculated and finalized. Initially, this set is empty. <br>
2) Assign a distance value to all vertices in the input graph. Initialize all distance values as INFINITE. Assign distance value as 0 for the source vertex so that it is picked first. <br>
3) While sptSet doesn’t include all vertices

….a) Pick a vertex u which is not there in sptSet and has minimum distance value. <br>
….b) Include u to sptSet. <br>
….c) Update distance value of all adjacent vertices of u. To update the distance values, iterate through all adjacent vertices. For every adjacent vertex v, if sum of distance value of u (from source) and weight of edge u-v, is less than the distance value of v, then update the distance value of v.

Let us understand with the following example:


<table style="width:100%">
  <tr>
    <th><img src="photos/Fig-11.jpg" alt="Drawing" style="width:600px;"/></th>
  </tr>
</table>

- The set sptSet is initially empty and distances assigned to vertices are {0, INF, INF, INF, INF, INF, INF, INF} where INF indicates infinite. Now pick the vertex with minimum distance value. The vertex 0 is picked, include it in sptSet. So sptSet becomes {0}. After including 0 to sptSet, update distance values of its adjacent vertices. Adjacent vertices of 0 are 1 and 7. The distance values of 1 and 7 are updated as 4 and 8. Following subgraph shows vertices and their distance values, only the vertices with finite distance values are shown. The vertices included in SPT are shown in green colour.
 
- Pick the vertex with minimum distance value and not already included in SPT (not in sptSET). The vertex 1 is picked and added to sptSet. So sptSet now becomes {0, 1}. Update the distance values of adjacent vertices of 1. The distance value of vertex 2 becomes 12.

- Pick the vertex with minimum distance value and not already included in SPT (not in sptSET). Vertex 7 is picked. So sptSet now becomes {0, 1, 7}. Update the distance values of adjacent vertices of 7. The distance value of vertex 6 and 8 becomes finite (15 and 9 respectively).

- Pick the vertex with minimum distance value and not already included in SPT (not in sptSET). Vertex 6 is picked. So sptSet now becomes {0, 1, 7, 6}. Update the distance values of adjacent vertices of 6. The distance value of vertex 5 and 8 are updated.

- We repeat the above steps until sptSet doesn’t include all vertices of given graph. Finally, we get the following Shortest Path Tree (SPT).



<table style="width:100%">
  <tr>
    <th><img src="photos/MST1.jpg" alt="Drawing" style="width:200px;"/></th>
    <th><img src="photos/DIJ2.jpg" alt="Drawing" style="width:200px;"/></th>
  </tr>
  <tr>
    <th><img src="photos/DIJ3.jpg" alt="Drawing" style="width:200px;"/></th>
    <th><img src="photos/DIJ4.jpg" alt="Drawing" style="width:200px;"/></th>
  </tr>
  <tr>
    <th><img src="photos/DIJ5.jpg" alt="Drawing" style="width:800px;"/></th>
  </tr>
</table>

In [18]:
# Python program for Dijkstra's single  
# source shortest path algorithm. The program is  
# for adjacency matrix representation of the graph 
  
# Library for INT_MAX 
import sys 
  
class Graph(): 
  
    def __init__(self, vertices): 
        self.V = vertices 
        self.graph = [[0 for column in range(vertices)]  
                      for row in range(vertices)] 
  
    def printSolution(self, dist): 
        print ("Vertex Distance from Source")
        for node in range(self.V): 
            print (node,"\t \t",dist[node])
  
    # A utility function to find the vertex with  
    # minimum distance value, from the set of vertices  
    # not yet included in shortest path tree 
    def minDistance(self, dist, sptSet): 
  
        # Initilaize minimum distance for next node (INF)
        min1 = sys.maxsize 
  
        # Search not nearest vertex not in the  
        # shortest path tree 
        for v in range(self.V): 
            if dist[v] < min1 and sptSet[v] == False: 
                min1 = dist[v] 
                min_index = v 
  
        return min_index 
  
    # Funtion that implements Dijkstra's single source  
    # shortest path algorithm for a graph represented  
    # using adjacency matrix representation 
    def dijkstra(self, src): 
  
        dist = [sys.maxsize] * self.V 
        dist[src] = 0
        sptSet = [False] * self.V 
  
        for cout in range(self.V): 
  
            # Pick the minimum distance vertex from  
            # the set of vertices not yet processed.  
            # u is always equal to src in first iteration 
            u = self.minDistance(dist, sptSet) 
  
            # Put the minimum distance vertex in the  
            # shotest path tree 
            sptSet[u] = True
  
            # Update dist value of the adjacent vertices  
            # of the picked vertex only if the current  
            # distance is greater than new distance and 
            # the vertex in not in the shotest path tree 
            for v in range(self.V): 
                if self.graph[u][v] > 0 and sptSet[v] == False and dist[v] > dist[u] + self.graph[u][v]: 
                        dist[v] = dist[u] + self.graph[u][v] 
  
        self.printSolution(dist) 

In [19]:
# Driver program 
g  = Graph(9) 
g.graph = [[0, 4, 0, 0, 0, 0, 0, 8, 0], 
           [4, 0, 8, 0, 0, 0, 0, 11, 0], 
           [0, 8, 0, 7, 0, 4, 0, 0, 2], 
           [0, 0, 7, 0, 9, 14, 0, 0, 0], 
           [0, 0, 0, 9, 0, 10, 0, 0, 0], 
           [0, 0, 4, 14, 10, 0, 2, 0, 0], 
           [0, 0, 0, 0, 0, 2, 0, 1, 6], 
           [8, 11, 0, 0, 0, 0, 1, 0, 7], 
           [0, 0, 2, 0, 0, 0, 6, 7, 0] 
          ]; 
  
g.dijkstra(0); 

Vertex Distance from Source
0 	 	 0
1 	 	 4
2 	 	 12
3 	 	 19
4 	 	 21
5 	 	 11
6 	 	 9
7 	 	 8
8 	 	 14
