# Graphs

* for any problem 
    * figure out what kind of data structure you ll need 
        * think about operations you'll need to perform 
    * then think about alogithm 
        * can the problem be broken down into simple repetitive form 
            * either recursively or iteratively 
        * 

* Linear data structures are: arrays, linked lists, stack, queue 
* Non-linear data structures are: trees, graphs 
* Graphs:
    * Collection of nodes/vertices connected via links/edges 
    * There are no specific rules for connecting nodes to each other 
    * *A Graph G is an ordered pair of a set V of vertices and a set E of edges **G = (V,E)** * 
    * and ordered pair (a,b) != (b,a) if a != b
    * directed edge has one way connection represented as ordered pair (a,b)
    * undirected edge has two way connection, represented as unordered pair{a,b} 
* A graph with all directed edges, is called directed graph 
* similarly undirected graph 
* Graph can represent any information that has pair wise relationship 
* A social network like facebook can be represented as an undirected graph where users are nodes in the graph 
    * if two people are friend then there is an edge connecting them 
* Pages in world wide web, can be represented as ordered graph 
    * web crawling is graph traversal 
* weighted vs unweighted graph
    * all graphs are weighted
    * unweighted graphs have edges with same weight 
* In trees:
    * for n nodes, we have n-1 links
    * There is only one possible path for a node from root 
    * Tree is a special kind of graph 
    
-----------------------------------------------------------------------------------
* Properties of graph 
    * g = (V, E), v = #vertices, E = #edges
    * self-edge = an edge that has only one vertex, also called self loop 
        * same node is the origin and destination 
    * Multi-edge: an edge that occurs more than once in a graph, also called parralel edge
    * Simple graph: there are no self-edge or multi-edge 
    * Number of edges: 
        * for n vertices, maximum # edges = **Nx(N-1)**, for directed graph 
        * in undirected graph, we only have one bidirectional edge between a pair of nodes
        * in undirected graph, maximum # edges = **(Nx(N-1))/2**
        
    * Set of edges can be empty for a graph, minimum number of edges in a graph is zero.
    * Maximum number of edges in a graph is close to square of # vertices
    * Dense graph has close to maximum #edges
    * Sparse graph has close to minimum #edges
    * Path: A sequence of vertices where each adjacent pair is connected by an edge
    * Simple path: No vertices and no edges are repeated in that path 
        * A simple path is also called a walk
    * if repeation of vertices are allowed its called a trail, no edges are repeated 
    * Strongly connected graphs: If there is a path from any vertex to any other vertex 
        * if it is undirected graph, we simply say its connected 
        * if it is directed graph, we say strongly connected
        * weakly connected: if we can connect all nodes by changing it to undirected 
    * Closed walk: starts and ends at same vertex 
    * **length of a walk** : # edges in the path 
    * simple cycle: no repetition other than start and end, length of walk is greater than zero 
    * Acyclic graph: a graph with no cycles 
    * A tree is an example of undirected acyclic graph 
   
---------------------------------------------------------------------------
 
* Implementation of graph(edge list representation) 
    * create two lists, one for vertices and one for edges 
    * an edges is identified by its two end points 
    * 
* In C, character pointers are used to store strings 
* for directed graph, (h,f), and (f,h) are two different edges 
* for undirected graph, they are same
* for # n nodes:
    * for undirected graph, # edges = Nx(N-1)/2
    * for directed graph, # edges = Nx(N-1) 
* for graph, the run time or space complexity of 0(#edges) is very expensive, so we keep the cost to (#vertices)
* finding adjacent nodes, or if two nodes are connected or not, both have complexity, 0(#edges)

* Graph representation
    * weights are stored in 2d array 1 representing connection, 0 reprsenting no connection 
    * for undirected graph, the matrix is symmetrical over line y=x, A[i,j] = A[j,i]
    * In this case, we can just use half of the graph for saving space 
* Using graph or 2D array, or a matrix for weight representation is called, adjacency matrix 
    * We do have to figure out the index for vertex list, which is 0(v)
    * then we go to that particular row in 2d matrix and go over each column, 0(v)
    * overall cost : 0(v)
    * if we already know the index, then cost is almost constant 0(1)
    * to see if a connection exists, we just have to check a[i][j], this is 0(1) time 
    * we loose on space complexity because 2d matrix occupies (v^2) space 
    * Adjacency matrix is good, we the graph is dense 
--------------------------------------------------------------------------------
* Adjacency list 
    * a better approach is having an array of arrays of different size which contains values for connected nodes only


#### Udemy -------------------------------------------------------------------------

* graph g=(v,e), v = vertices, e = edges
* indegree = #coming edges to a node
* outdegree = #going out edges from a node 
* two vertices connected by a single edge is called adjacent vertices 
* graph with no parallel edges or self loop is called simple graph 
* graph by default refers to undirected graph 
* directed graph is called digraph 
* #edges touching a node in undirected graph is called a degree
* articulation point is a node if taken out divides the graph into multiple segmented graphs
* Strongly connected graph is a directed graph where a point from every node can reach every other node 
* for non directed graph, with same property, its simply called connected.
* Path is set of all vertices in between a pair of vertices 
* if a path starts from same node and reaches the same node following a path its called a cycle 
* directed acyclic graph is a directed graph with no cycles
* if you can arrange a directed acyclic graph in a linear manner where edges are only going in forward direction, its called topological ordering of vertices. only possible in directed acyclic graph 


#### Representing an undirected graph 
1. Adjacency matrix 
2. Adjacency list 
3. Compact list 


#### Adjacency matrix 
* we use a 2d graph of columns and rows of same size (#vertices)
* We mark the spot that represents a connection with 1 otherwise 0 
$$
\begin{vmatrix}
0&1&1&1\\
1&0&1&0\\
1&1&0&1\\
0&0&1&0\\
\end{vmatrix}
$$
* if the graph is weighted the weight is multiplied by the 1 while putting in graph, the graph is then called cost adjacency matrix 
* 
* Diagonals represent self loop
* total places to visit while searching is $nxn$
* time taken: 0(n^2)


#### Adjacency list
* we use an array of linked list 
* each list in array represents a connection 
* each row of adjacency matrix becomes a spot in array
* each linked list is column whose values are 1 
* time taken = size of array + each edge represented twice. 0(n+2e)
* for weighted matrix, we include weight in the linked list 

#### Compact list 
* we have single dimension array that represents the graph 
* size of the array is $= |v|+2*|e|+1$, extra 1 is added if indices start from 0
* first V number of elements represent vertices 
* adjacent verices are followed and separated by a space from the first v elements which contains the end index of array 
    * if array is from 0 to 20, then 21 is stored in the space 
* the first v numbers contain indices of where their adjacency vertices start on the array 
* space consumed $= |v| + 2*|e|$ , 0(n), space consumed is linear rather than quadraic in case of the adjacency matrix


#### Representing directed graphs

#### adjacency matrix 
* each row represents all connection for that particular vertex. row 1 represents connections of vertex 1
* Only one way connections are represented in the graph, each connection is counted only once
* Row tells which connections are incoming. the column tells which connections are going out
* space required is 0(n^2)
* worse time is 0(n^2)

#### adjacency list 
* for outgoing connection, just count number of nodes in that linked list 
* only outgoing connections are appended to the lists 
* for incoming connection count count how many times the vertix index appears by traversing through all the linked lists
* space complexity is (n+e)
 
#### inverse adjacency list
* only incoming connections are appended to the list 



#### Breadth first search (BFS)
* it is similar to level order transversal of binary tree 
* we convert them to tree and visit level wise 
* you can make any vertex the root 
* you can start trasversal from any vertex you want
* you must explore that vertex completely when you visit it
* even you visit a vertex, visit all its adjacent vertices 

* bfs:
    1. visiting a vertex 
    2. exploring(visiting all adjacent vertices of that vertex)
* methods:
    * visit a vertex 
    * drop it in a queue 
    * pop that vertex 
    * visit all its vertices 
    * drop them in queue individually 
* the converted tree is called breadth first search spanning tree 
* the edges drawn to adjacent nodes are called cross edges 
* cross edges at max connect to one level up or down no more 
* breadth first search has time complexity of order 0(n)

* Implementation 
    * using adjacent matrix 
    * you must keep track of which vertices are visited and which are yet to be visited 
    * a queue to enqueue and dequeue 



#### Depth first search
1. visited 
2. exploring 

implementation:
    * use a stack
    * use any vertex as a starting vertex 
    * When you encounter a new node push the last one to the stack for future exploration 
    * the tree drawn is called DFS spanning trees
    * the connecting lines are called back tree
    * the answer maybe different but visiting all nodes is important
* we recursion is used, that means stack is being used. 

```cpp
#include<iostream>
#include<vector>
using namespace std;

//print vector
void pv(vector<int> v){
    for(auto i:v){
        cout <<i<<" ";
    }
}
//bfs   
void bfs(const vector<vector<int>> g, int start){
    int i = start; 
    queue<int> q; 
    vector<int> visited(7); 
    cout << i << " "; 
    visited[i] = 1; 
    q.emplace(i); 
    while(!q.empty()){
        i = q.front(); 
        for(int j = 0; j<g.size(); j++){
            if(g[i][j] == 1 && visited[j] == 0){
                cout << j << " ";
                visited[j] = 1;  
                q.emplace(j); 
            }
        }
        q.pop();
    }
}

//dfs 
void dfs(vector<vector<int>> g, int start){
   static vector<int> visited(7);

    if(visited[start] == 0){
        visited[start] = 1; 
        pv(visited);
        cout << "start" << start <<endl;
        for(int j = 1; j<g.size(); j++){
            if(g[start][j] == 1 && visited[j] == 0){
                cout<<"going forward " <<"visiting g:"<<start<<","<<j<<endl;
                dfs(g,j);
                cout<<"going backward " <<"visiting g:"<<start<<","<<j<<endl;
            }
        }
    }
}

int main(){
    //2d vector graph 
    vector<vector<int>> graph ={{0,0,0,0,0,0,0},
                                {0,0,1,1,0,0,0},
                                {0,1,0,0,1,0,0},
                                {0,1,0,0,1,0,0},
                                {0,0,1,1,0,1,1},
                                {0,0,0,0,1,0,0},
                                {0,0,0,0,1,0,0}
                                }; 
    //bfs(graph, 3);
    dfs(graph,2);
    return 0; 
}
```

#### spanning trees

* g=(v,e)
* spanning tree is a subgraph of a graph, 
    * where number of vertices are same as graph but number of edges are (#vertices-1) 
    * all vertices should be connected
    * there should not be any cycle 
* total number of spanning tree for a graph: combination (v, v-1) -cycles 
    * for eg. (6,4)-4 = 20-4 = 16 
* When the order doesn't matter, it is a Combination, When the order does matter it is a Permutation.
* $ nCr = \frac{(n!)}{(r!* (n-r)!)} $


#### Minimum cost spanning tree
* spanning tree with minumum cost 
* total weight of the edges must be minimum 

* Prim's minimum cost spanning tree 
    * select a minimum weight edge from the graph 
    * repeat:select next minimum weight edge with vertices connected to the first edge 
* time complexity 0( (v-1)xe ) = 0(n^2)


* implementation 
    * Simplest representation of a graph in a C++ program is a 2d array(adjacency weight graph)
    * take a 2d array to store spanning tree edges
    * another array to store nearer edges to first edge 
        * if a node is not connected to both you can write any in it 
        * fill each index with with other vertex that makes that pair minimum edge
        * fill with zero when taking the minimum edge to spanning tree array
        

#### Adjacency matrix 
* Adjacency Matrix is a 2D array of size V x V where V is the number of vertices in a graph.
* Adjacency matrix for undirected graph is always symmetric.
*  Representation is easier to implement and follow.
* Removing an edge takes O(1) time. Queries like whether there is an edge from vertex ‘u’ to vertex ‘v’ are efficient and can be done O(1).
* Consumes more space O(V^2). Even if the graph is sparse(contains less number of edges), it consumes the same space.

#### Adjacency list 
* An array of lists is used. 
* Size of the array is equal to the number of vertices. 
* An entry array[i] represents the list of vertices adjacent to the ith vertex
* The weights of edges can be represented as lists of pairs.


#### bfs 
* traverse nodes in layers 
* similar to level order traversal in trees
* Graph may have cycles, each node will be visited infinite times 
* use a boolean visited flag (array) for each node 
* 