# Class 1

## Clustering Coefficients

### Triadic Closure

- **Triadic Closure**: is the tendency for people who share lots of connections, to form a connection themselves to become connected

So, **people who share lots of friends have an increased likelihood of becoming connected themselves**. There are many mechanisms that give rise to this process. But, we're going to be focusing on:
**Figuring out How to Measure in a Network**

So, let's say you have a network and you're asked:
- What edges are likely to come to the network next?
- What edges are likely to arrive?

Well, **Triadic Closure would say that those edges that closed triangles are good candidates for edges** that may show up next. 

However, we don't always have time stamps, or we don't always know the ordering in which the edges come into the network. Sometimes, we're just given a static network with no time stamps, and sometimes, we want to know whether Triadic Closure is present in this network, whether it has lots of triangles or not. And so, what we're going to be talking about is how to measure the prevalence of Triadic Closure in a network

Another way of referring to **Triadic Closure is Clustering**


### Local Clustering Coefficient of a Node

We'll start with a **local version of measuring Clustering**. We're going to be measuring Clustering from the point of view of a single node. And, this is called a Local Clustering Coefficient.
- **Local Clustering Coefficient**: the fraction of pairs of the nodes friends that are friends with each other.

The best way to show you how Local Clustering Coefficient works is by showing an example.

#### Example 1 - Local Clustering Coefficient for Node C (Intuition of how it works)

Supponse the folowing equation to compute the local clusterring coefficient:

$$\text{Local Clustering Coefficient Node C} = \frac{\text{Number of Pairs of C's Friends who are Friends}}{\text{Number of Pairs of C's Friends}}$$

So, let's say, you wanted to compute the Clustering Coefficient of node C. What you would need to do is to take the ratio of the number of pairs of C's friends who are friends with each other, and the total number of pairs of C's friends. So, C has four friends in this network. That means that C has a degree of four. That's what we refer to as degree. It's the number of connections that a node has. And, we refer to it as DC as well. So, DC here, which the degree of C is four. Now, how many pairs of C's friends are there? Well, there are four friends of C, and you can easily see that, if you have four pairs of four people, then there are six total possible pairs of people. And so, the total number of pairs of C's friends is six. Now, this is easy to see because there is only four friends of C, but sometimes, there are many more and it might be harder to see how many possible pairs of friends you have. So, what you can do is you can just use this formula here which tells you how many. It's dc times dc-1 over two. In this case, that number is six, which is 12 or 2.

$$\text{Number of C's Friends} = d_{C} = 4 \rightarrow \text{ The "degree" of C}$$

Number of Paris of C's Friends - **Generalized Formula**:
$$\text{Number of Paris of C's Friends} = \frac{d_{C}(d_{C} - 1)}{2} = \frac{12}{2} = 6$$

Okay. So, that will be our denominator. What about the numerator? The number of pairs of friends of C who are friends with each other. Well, there are only two pairs of friends of C that are friends with each other. AB and EF. So, that number is two. So then, the Local Clustering Coefficient of node C is two over six or one-third. That means that one-third of all the possible pairs of friends of C who could be friends, are actually friends. Okay. Let's do another example. Compute the Local Clustering Coefficient of node F. Again, we need to compute the ratio of the number of pairs of F's friends who are friends with each other, and the total number of pairs of F's friends. So, we'll do the same thing here. F has a degree of three. So, the number of pairs of F's friends is three times two over two which is three. And then, there's only one pair of friends of F who are actually friends with each other. That's C and E. And so, the Local Clustering Coefficient of F is also one-third.

$$\text{Number of Pairs of C's Friends who are Friends} = 2$$

So then, the Local Clustering Coefficient of node C is two over six or one-third. That means that one-third of all the possible pairs of friends of C who could be friends, are actually friends.

$$\text{Local Clustering Coefficient of C} = \frac{\text{Number of Pairs of C's Friends who are Friends}}{\text{Number of Pairs of C's Friends}} = \frac{2}{6} = \frac{1}{3}$$

$$\therefore \textbf{one-third of all the possible pairs of friends of C who could be friends, are actually friends}$$


#### Example 2 - Local Clustering Coefficient for Node F (Intuition of how it works)

Compute the Local Clustering Coefficient of node F. We need to compute the ratio of the number of pairs of F's friends who are friends with each other, and the total number of pairs of F's friends. So, we'll do the same thing here, F has a degree of three. So, the number of pairs of F's friends is three times two over two which is three. And then, there's only one pair of friends of F who are actually friends with each other, that's C and E. And so, the Local Clustering Coefficient of F is also one-third.

$$\text{Local Clustering Coefficient Node F} = \frac{\text{Number of Pairs of F's Friends who are Friends}}{\text{Number of Pairs of F's Friends}}$$

$$\text{Number of F's Friends} = d_{F} = 3 \rightarrow \text{ The "degree" of F}$$

$$\text{Number of Paris of F's Friends} = \frac{d_{F}(d_{F} - 1)}{2} = \frac{6}{2} = 3$$

$$\text{Number of Pairs of F's Friends who are Friends} = 1$$

$$\text{Local Clustering Coefficient of F} = \frac{\text{Number of Pairs of F's Friends who are Friends}}{\text{Number of Pairs of F's Friends}} = \frac{1}{3}$$

$$\therefore \textbf{one-third of all the possible pairs of friends of F who could be friends, are actually friends}$$


#### Example 3 - Local Clustering Coefficient for Node J (What To Do when Dividing by 0)

Let's compute the Local Clustering Coefficient of node J. So, node J has only 1 friend which is node I, which means that J actually has zero pairs of friends. And, because that's what we're supposed to put in the denominator, we're in trouble because we cannot divide by zero. And so, what we're going to do for cases like this, where the definition doesn't work for nodes that have less than two friends, is **we're going to assume that nodes that have less than two friends have a Local Clustering Coefficient of zero**. And this is consistent with what `NetworkX` does.

$$\text{Local Clustering Coefficient Node J} = \frac{\text{Number of Pairs of J's Friends who are Friends}}{\text{Number of Pairs of J's Friends}}$$

$$\text{Number of Paris of J's Friends} = 0 \color{\red}{\text{ (Can not divide by 0)}}$$

$$\Rightarrow \textbf{We assume that the Local Clustering Coefficient of a Node with a Degree less than 2 is 0}$$

$$\therefore \textbf{Local Clustering Coefficient of J = 0}$$

### Apply NetworkX for Local Clustering Coefficient

How do you compute the Local Clustering Coefficient using network X? 

Well, let's say we load up the graph, and we compute the Clustering. We use the function `clustering()` to compute the Local Clustering Coefficient of node F. In this case of the network used in the examples before:

- Local Clustering Coefficient of node F = 0.33
- Local Clustering Coefficient of node A = 0.66
- Local Clustering Coefficient of node J = 0.0

#### Code - How to Estimate the Local Clustering Coefficient with NetworkX
```python
import networkx as nx

# Set the undirected graph object
G = nx.Graph()

# Set the edges among the different nodes in the network
G.add_edges([('A', 'K'), ('A', 'B'), ('A', 'C'), ('B', 'C'), ('B', 'K'), ('C', 'E'),
             ('C', 'F'), ('D', 'E'), ('E', 'F'), ('E', 'H'), ('F', 'G'), ('I', 'J')])

# Calculate local clustering coeff node F
In: nx.clustering(G, 'F')
Out: 0.333333333

# Calculate local clustering coeff node A
In: nx.clustering(G, 'A')
Out: 0.666666666

# Calculate local clustering coeff node J
In: nx.clustering(G, 'J')
Out: 0.0
```

### Global Clustering Coefficient

As seen before, this allows you to compute the Local Clustering Coefficient of each node in the graph. But, what we **were interested in is trying to figure out whether Triadic Closure is prevalent in the whole network**. And so, how do we go from having a local measure of Local Clustering Coefficient for each node to a global measure of Clustering Coefficient for the whole network? To answer that, we're going to talk about two different approaches.


- **Approach Nº1**: Take the Average Local Clustering Coefficient or all the nodes in the graph
    - The first one, which is pretty simple and straightforward, is to simply take the average Local Clustering Coefficient or all the nodes in the graph. And, you can do this in network X by using the function `average_clustering()` of the graph G. And, in this case, that is 0.29.

```python
import networkx as nx

# Set the undirected graph object
G = nx.Graph()

# Set the edges among the different nodes in the network
G.add_edges([('A', 'K'), ('A', 'B'), ('A', 'C'), ('B', 'C'), ('B', 'K'), ('C', 'E'),
             ('C', 'F'), ('D', 'E'), ('E', 'F'), ('E', 'H'), ('F', 'G'), ('I', 'J')])

# Calculate average local clustering coeff of all the nodes in the network
In: nx.average_clustering(G)
Out: 0.2878787878
```

- **Approach Nº2**: Try to Measure the Percentage of Open Triads in the Network that are Triangles
    - Definition of *Triangles* and *Triads* (what are open triads and triangles?):
        - **Trianges**: Triangles are simply three nodes that are connected by three edges
        - **Open Triads**: Open Triads are three nodes that are connected by only two edges
   
  
The thing to notice here is that **a triangle actually contains three different open triads**, right? So, if we consider this triangle here, you will notice that it contains three different open triads. The first open triad considers the three nodes and all the edges, these two edges but not this one. That is the first open triad. But, you can also consider the three nodes and these two edges in this one. Or, we could consider the three nodes and these two edges but not this one. So, inside each triangle, there are three different open triads.

If you go out in the network and count how many triangles it has, and then it counts how many possible open triads it has, **for each time that you see a triangle, you're going to count three different open triads**.

And so, what we're going to do for the second approach for measuring Clustering Coefficient, which is actually called the Transitivity,

- **Transivity**: Ratio of the Number of Triangles and the Number of Open Trids in the Network

$$\textbf{Transitivity} = 3 · \frac{\text{Number of Closed Triads}}{\text{Number of Open Triads}}$$

We can use network X to get the Transitivity of the network by using the function `transitivity()`. And, in this case, this network has a Transitivity of 0.41.

```python
import networkx as nx

# Set the undirected graph object
G = nx.Graph()

# Set the edges among the different nodes in the network
G.add_edges([('A', 'K'), ('A', 'B'), ('A', 'C'), ('B', 'C'), ('B', 'K'), ('C', 'E'),
             ('C', 'F'), ('D', 'E'), ('E', 'F'), ('E', 'H'), ('F', 'G'), ('I', 'J')])

# Calculate transitivity of the network
In: nx.transitivity(G)
Out: 0.4090909091
```

### Transitivity vs Average Clustering Coefficient

Okay. So, we have two different ways of measuring the global Clustering Coefficient.
- Are these two ways the same? Which one is better? 
- Are there are differences between the two? 

Well, it turns out that there are differences between the two. They **both try to measure the tendency for the edges to form triangles**, but it turns out the Transitivity weights the nodes with a larger number of connections higher. It weights the nodes with a larger degree higher. The best way to see that is by looking at examples. 

So, imagine a graph that kind of looks like a wheel with one node in the center and a pair of nodes connected at different places of the clock, forming one pair of nodes friend connection for the 3 nodes. If you look at this graph closely, you'll find that** most nodes actually have a pretty high Local Clustering Coefficient**. So, all the nodes that are on the outside of the wheel have a Local Clustering Coefficient of one because, each one of these nodes, you see that it has two connections. So, he has one pair friends, and that pair friend is connected. So, this node here has a Local Clustering Coefficient of one and the same is true for all the nodes on the outside of the wheel.

So, most nodes have a high Local Clustering Coefficient. However, if you consider the node inside the wheel, the central node, that one has a pretty high degree but it has a very low Clustering Coefficient. **That is because it has many, many connections in many pairs of connections and only a few of those are actually connected to each other, but most of them are not connected**. 

For example, these two nodes are not connected,these two nodes are not connected, these two nodes are not connected and so on. Even though all of them are friends with that central node. So, **in this graph, the average Clustering Coefficient is pretty high, it's 0.93 because most nodes have a very high Local Clustering Coefficient, except for one. However, the Transitivity of this network is 0.23. And that's because Transitivity weights the nodes with high degree higher**. 

And so, in this network, there's **one node with a very high degree compared to the others that has a very small Local Clustering Coefficient compared to the others, and Transitivity penalizes that. So, you get a much lower Transitivity**


### Summary

In summary, we've learned that **Clustering Coefficient measures the degree to which nodes in a network tend to cluster or form triangles**. And **there are several ways in which you can measure** this. So, the first way, we look at the **Local Clustering Coefficient** and this is **measured on a node by node basis**. And in two other ways, we found the **Global Clustering Coefficient** which **measures Clustering Coefficient on a global scale for the whole network**. 

- Local Clustering Coefficient

    - Definition: the fraction of pairs of nodes friends who are friends with each other


- Global Clustering Coefficient

    - **1st Method: Average of Local Clustering Coefficient**: the first way in which we could measure these was by simply taking the average of the Local Clustering Coefficient over all the nodes, for this, we can use the function `average_clustering()` in `NetworkX` to do it.
$$$$
    - **2nd Method: Transitivity the ratio of the number of triangles and the number of open triads in a network**: the second way was this thing called Transitivity, which was the ratio of the number of triangles and the number of open triads in a network, and this one puts a larger weight on high degree nodes, compared to the average Local Clustering Coefficient, and you can measure transitivity with `NetworkX` by using the function `transitivity()`.

---

# Class 2

## Distance Measures

### Distance

Today we're going to talk about the concept of distance in social networks. So the idea here is that sometimes we'd like to know how far nodes are away from each other. So for example in this network that you see:
- How far is node A from node H? 
- Are some nodes far away from each other and other nodes close to each other in general in this network?
- And if so, which nodes are closest and which nodes are the furthest away from each other in the network?


To answer all these questions, we need to develop a concept of distance between nodes, and that's what we're going to do in this lecture.

**Definition :** *Paths*
- **Paths**: is a sequence of nodes connected by edges

*Example*: How far is node A from node H?

$$Path-I:A-B-C-E-H \Rightarrow \text{4 hops} \leftrightarrow length = 4$$     
$$Path-II:A-B-C-F-E-H \Rightarrow \text{5 hops} \leftrightarrow length = 5$$ 


**Definition**: *[Paths Length]*
- **Path Length**: is the number of steps in the sequence, from the beginning to the end


**Definition**: *[Distance between Two Nodes]*
- **Distance between Two Nodes**: is the length of the shortest path between both nodes


And to define the distance between two nodes, we're going to define it to be the length of the shortest possible path between the two nodes. So going back to the question of what is the distance between node A to node H, the answer is 4, because the shortest path between A and H has 4 hops or has length of 4. In `NetworkX`, you can use the function `shortest_path()` to find a distance from any starting node to any other ending node.


```python
# Estimate the shortest path from node A to node H
In: nx.shortest_path(G, 'A', 'H')
Out: ['A', 'B', 'C', 'E', 'H']

# Estimate the length of shortest path from node A to node H
In: nx.shortest_path_length(G, 'A', 'H')
Out: 4
```


Sometimes what we would like to do when we have real social networks, is to **find a distance from a single node to all the other nodes in the network to figure out how far away are other nodes from this specific route to node**, in this case A. Let's say we're interested in figuring out what distance from node A to all the other nodes in the network is.

And so what we're going to do is we're going to talk about one of the **efficient ways that we have to compute the distances from a given node to all the other nodes**. And this one is called **breadth-first search** and what it basically does is that you start at a node and you start kind of discovering different nodes or different layers, and at each given layer, you discover the set of nodes that are one distance away from the nodes that were in the previous layer.

- **Breadth-First Search**: a systematic and efficient procedure for computing distances from a specific node `x` to all the other nodes in a large network, by 'discovering' nodes in layers

#### Example - Intuition behind the Breadth-First Search Algorithm

So, we're going to walk through an example of how breadth-first search works. 

So here we have the network and we're interested in figuring out the distance from node A to all the other nodes in the network. So what we're going to do is we're going to start at A and we're going to start discovering new nodes as we kind of walk through this network. And we're going to be writing down all the nodes that we discover. So we start at A and we sort of process the node A by looking at who is connected to A. In this case, K and B are connected to A and so those are going to be a distance one away because they're the shortest path from each one of those nodes to A it's just one hop, right? A path of length one. Okay, so now we're going to process each one of the newly discovered nodes and ask which nodes are connected to this newly discovered node that we haven't discovered yet? And those nodes are going to be assigned to the next layer. So let's say we process node B. Node B is connected to K, A and C. But we've already discovered nodes A and K, so the only node that we discover here is node C. Now we're going to process node K, and node K is connected to node A and B, but we've already discovered both of those. So the only newly discovered node is node C and it's a distance two away from A. Now we process node C which is connected to B, F, and E. And here we've already discovered B so the only two nodes that we discover are F and E and those are a distance three away from A. Okay, now we're going to process node E. Okay, node E has five connections and out of those five, C and F we already discovered. So the only new ones are the other three which are D, I and H. So we assign those to the next layer. Now we process node F which is connected to three nodes G, C and E. But the only one we haven't discovered yet out of all those is G so I want this to get assigned to the next layer. And all of those nodes are a distance four away from A. Okay, now we have to process each one of those newly discovered nodes, and by now you can see that we're already almost done here. So let's process node D which is only connected to E. But we've already discovered E so D does not discover any new nodes. Now let's go with I. I is connected to E and J. And we haven't discovered J yet, so this one it's assigned to the next layer. Next we process H which is only connected to E but we already discovered E. And finally, we process G which is connected to F which you've already discovered. So, J is a distance five away. All right? We have to process J, but J is only connected to I which we already discovered and now we're done. We've processed all the nodes. There are no new nodes to discover.

You can use `NetworkX` to run the breadth-first search algorithm by using the function bfs_tree. And what it does is it gives you the tree that we've built.

```python
# Estimate the nodes distances tree from root node A by applying breadth-first search algorithm 
# Shows the order the algorithm makes the discovery
In: T = nx.bfs_tree(G, 'A')
In: T.edges()
Out: [('A', 'K'), ('A', 'B'), ('A', 'C'), ('B', 'C'), ('B', 'K'), ('C', 'E'),
      ('C', 'F'), ('D', 'E'), ('E', 'F'), ('E', 'H'), ('F', 'G'), ('I', 'J')])
```

Now if you're interested in not necessarily the tree in the order in which these nodes were discovered, but just simply **the actual distances between A and all the other nodes**, then you can use `shortest_path_length()` and you give it the graph, which in this case is G, and the root node, which in this case is A. And you get a dictionary of all the distances from the node to all the other nodes. So, A is a distance zero from itself, a distance one from B, a distance two from C and so on.

```python
# Estimate the nodes distances from root node A by applying breadth-first search algorithm
# Doesn't matter the order of the discovery, just hte distance between root node and every other node
In: nx.shortest_path_length(G, 'A')
Out: {'A': 0, 'B': 1, 'C': 2, 'D': 4, 'E': 3, 'F': 3, 'G': 4, 'H': 4, 'I': 4, 'J': 5, 'K': 1}

```

### Distance Measure

We've defined the distance between any two pair of nodes in a network, but if we go back to the original questions we had, in the beginning, we were interested in, we're characterizing the distances between all pairs of nodes in the graph. Our nodes in general are close to each other, or are they far away from each other. If some are close and some are far, then how can we figure out which are close and which are far and so on? And so, now, we're going to try to answer these questions.

#### Average Distance between every pair of Nodes

The first first approach we can do, is you can simply **take the average distance between every pair of nodes in the network**. So, that tells you what's on average the distance. In `NetworkX` we can do that by using the function `average_shortest_path_length()`.

```python
# Estimate the average distance between every pair of nodes in network
In: nx.average_shortest_path_length(G)
Out: 2.527272727
```

#### Diamiter of the Graph or the Maximum Distance between any two pair of Nodes

Now, what is the **maximum possible distance between any two nodes**, that sort of like, what are the two nodes that are furthest away from each other? How long is that? How far away from each other are they? 

This is called the **diameter and is simply the maximum distance between any two pair of nodes**. And in `NetworkX` you can use the function `diameter()` to get it.

```python
# Estimate the diamiter or the max distance between any pair of nodes in network
In: nx.diamiter(G)
Out: 5
```

#### Eccentricity of a Node

The other thing that is useful to define is the eccentricity of a node. The **eccentricity of a node is the largest distance between the node and all the other nodes in the network**. So, you take a node, measure the distance from the node to all the other nodes, and figure out which one of those instances is the largest one of all. In `NetworkX` you can use the function `eccentricity()` to get all those distances

```python
In: nx.eccentricity()
Out: {'A': 5, 'B': 4, 'C': 3, 'D': 4, 'E': 3, 'F': 3, 'G': 4, 'H': 4, 'I': 4, 'J': 5, 'K': 5}
```

#### Radius of the Graph or the Minimum Eccentricity in the Network

Now that we have the eccentricities (dictionary), the **radius of the graph is the minimum eccentricity in the network**. It basically asks what is the maximum distance that a node has from all the other nodes and that's eccentricity, and the next step is to find the radius, that takes the smallest value from all the eccentricities.

In `NetworkX` we can use the function `radius()` to get the radius of the network and as you could see from the dictionary of eccentricities, the radius of this network is 3.

```python
# Estimate the radius of the graph that is the min eccentricity
In: nx.radius(G)
Out: 3
```

#### Periphery and Center of a Network

The next step forward is, now that we know how to calculate distances and now that we have a sense for what is a large distance in the network, like, the diameter, what is the short distance, like the radius, we can try to identify which nodes are sort of far away from all the other nodes and which nodes are close to all the other nodes.

##### Periphery of Graph

So, for the first approach, we have the **periphery, which is the set of nodes in a graph that have an eccentricity equals to the diameter of the network**, which is sort of the largest eccentricity that you could have, since the diameter is the largest distance between any two nodes in the network. You can get these set of nodes using `NetworkX` function `periphery()`.

```python
# Estimate the periphery of the graph, set of nodes where eccentricity = diamiter (max distance between 2 nodes)
In: nx.periphery(G)
Out: ['A', 'K', 'J']
```

##### Center of the Graph

The opposite concept is the concept of nodes that are central. So, the **center of the graph is a set of nodes that have eccentricity equal to the radius**. And when you check which nodes are in the center in this graph using the `center()` function `NetworkX`.

```python
# Estimate the center of the graph, set of nodes where eccentricity = radius (min distance between 2 nodes)
In: nx.center(G)
Out: ['C', 'E', 'F']
```

With these new tools, now you're able to take a look at a real social network, and start to ask questions about:
- How far are these nodes from each other?
- Which nodes are central?
- Which nodes are peripherical / not central?


#### Code Example - Friendship Network in a 34 People Karate Club

```python
import networkx as nx

# Import the karate club network
G = nx.karate_club_graph()

# Transform the nodes labels from strings to integers
G = nx.convert_node_labels_to_integers(G, first_label=1)

# Estimate the average shortest path
In: nx.average_shortest_path_length(G)
Out: 2.41

# Estimate the radius of the graph that is the min eccentricity
In: nx.radius(G)
Out: 3

# Estimate the diamiter or the max distance between any pair of nodes in network
In: nx.diamiter(G)
Out: 5

# Estimate center of the graph, set of nodes where eccentricity = radius (min distance between 2 nodes)
# In this case one center is the instructor and the rest are very connected students among the group
In: nx.center(G)
Out: [1, 2, 3, 4, 9, 14, 20, 32]

# Estimate periphery of the graph, set of nodes where eccentricity = diamiter (max distance between 2 nodes)
# They are in the outside of the group (small amount of connections) and no one is connected to any instructor
In: nx.periphery(G)
Out: [15, 16, 17, 19, 21, 23, 24, 27, 30]
```

$$$$
Why is 34 not showing up in the center? 

Well, it turns out that if we look carefully, node 34 has a distance four to node 17, right? To get from 34 to 17, you have to go 34, 32, 16, and 17, and so, it couldn't be in the center because the radius of the graph is three and this one has a node that is the distance four away from it. Now, it turns out that actually if this node 17 was just a bit closer, for example, if this node 17 was a distance three away from 34, then 34 would actually be in the center, because 34 is a distance at most three to every other node in the network.

And so, **this shows that this definition of center is pretty sensitive to just one node that happens to be far away**. So, if you make a very small change to a graph that makes a particular node far further away from the other, you can sort of make it or break it for the node to be in the center or not. 

In the future, we'll look at **other measures of centrality for nodes that are less sensitive to small changes** like that.


### Summary

- **Distance between Two Nodes**: Length of the Shortest Path between both nodes.
- The **Eccentricity** of a Node `n` is the largest distance between the node `n` and the rest of the nodes.


- **Characterizing Distances in a Network**: 
    - **Average Distance**: The average distance between every pair of nodes in the network
    - **Diamiter**: The maximum distance between any pair of nodes in the network
    - **Eccentricity of a Node**: Is the largest distance between the node `n` and all the other nodes in the network
    - **Radius**: From all the eccentricities in the graph, the radius is the minimum value among all
    
    
- **Identifying Central and Peripherical Nodes**:
    - **Periphery**: The periphery is the set of nodes where 
    $$\textbf{Periphery} \Rightarrow eccentricity = diamiter$$
    - **Center**: The center is the set of nodes where 
    $$\textbf{Center} \Rightarrow eccentricity = radius$$
    
---

# Class 3

## Connected Components

### Connectivity in Undirected Graphs

First, we're going to talk about connectivity in undirected graphs. Those are the ones where the edges don't have a direction. An **undirected graph is said to be connected if for every pair of nodes, there is a path between the two nodes**. 

We can use the function `is_connected()` in `NetworkX` and give it the undirected graph as input, and it will tell us whether the graph is connected or not.

```python
In: nx.is_connected(G)
Out: True
```

#### Graphs Components

- **Connected Component**: A connected component is a subset of nodes that satisfies the 2 follow conditions:
    - **Condition Nº1**: Every node in the subset has a path to every other node
    - **Condition Nº2**: No other node outside the subset has a path to and from any node inside the subset
    
    
We can use `NetworkX` to find the connected components of an undirected graph by using the function `number_connected_components()` and give it the graph as input and it would tell you how many there are. Also, you can  check for which ones are the actual subsets of nodes that satisfy the components criteria by using the function `connected_components()`, and here we tell you which are the subsets of nodes and which nodes belong to each one of the components.

What you can also do is use the function `node_connected_component(G, 'A')` with the input G of the graph itself but also a particular node as input, and this would tell you which connected component this particular node belongs to.

```python
# Estimate the number of connectect components subsets of nodes are present in the network
In: nx.number_connected_components(G)
Out: 3

# Estimate the nodes subsets of connectect components that are present in the network
In: sorted(nx.connected_components(G))
Out: [['A', 'B', 'C', 'D', 'E'],
      ['F', 'G', 'H', 'I', 'J'],
      ['K', 'L', 'M', 'N', 'O']]
      
# Verify if a specific node M belongs to a connected component subset, and return that subset of nodes
In: nx.node_connected_component(G, 'M')
Out: ['K', 'L', 'M', 'N', 'O']
```

### Connectivity in Directed Graphs

We would like to come up with definitions of **connected** and **connected components** that apply to directed graphs, but because paths have a different definition in directed graphs than they do in undirected graphs, then we need to adjust our definitions accordingly. And actually, what happens is that we will have two types of definitions for each one of the two concepts. 

#### I. First Concept: Connected

We have two concepts within the connected conpect:
- Strongly Connected
- Weakly Connected

##### Strongly Connected

Let's start with the concept of **connected**. The first type is we're going to say that a direct graph is strongly connected if for every pair of nodes, say U and V, there is a directed path that goes from U to V and another directed path that goes from V to U. That, if a directed graph has this property, then we say it's **strongly connected**.

We can use the function `is_strongly_connected()` in `NetworkX` to ask whether this particular directed graph G is strongly connected

```python
# Verify if the graph is strongly connected
In: nx.is_strongly_connected(G)
Out: False
```

There are many other examples of pairs of nodes for which there is no path, where here is one, there is no path from A to H, so therefore, this graph is not strongly connected.

##### Weakly Connected

The second definition for connected is **weakly connected**. And the **way weakly connected works is that the first thing you do is you replace all the directed edges and you make them undirected. So every edge, you could sort of ignore the direction and you make him into a undirected edge, and now this graph becomes undirected. And now, you ask the question that you already applied to undirected graphs, is this graph connected or not? And if it is, then we say that the original directed graph is weakly connected**. So in this case, if we use the function `is_weakly_connected()` from `NetworkX` would say yes, this graph is weakly connected because once you turn it into an undirected graph, this undirected graph is connected.

```python
# Verify if the graph is weakly connected
In: nx.is_strongly_connected(G)
Out: True
```


#### II. Second Concept: Connected Components

We have two concepts within the connected components conpect:
- Strongly Connected Component
- Weakly Connected Component

##### Strongly Connected Component

- **Strongly Connected Component**: A strongly connected component is a subset of nodes that satisfies the 2 follow conditions:
    - **Condition Nº1**: Every node in the subset has a directed path to every other node within the subset
    - **Condition Nº2**: No other node outside the subset has a directed path to and from any node inside the subset
    
We can use the function `strongly_connected_components()` from `NetworkX` and it will tell us what they are.

```python
# Find the strongly connected components subsets of the network
In: sorted(nx.strongly_connected_components(G))
Out: [['M'], 
      ['L'], 
      ['K'], 
      ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'J', 'N', 'O'],
      ['H', 'I']]
```

##### Weakly Connected Component

And of course, we have the weakly connected component version which works in the same way that it did before. So first, we would make all the directed edges undirected, and then we would find the connected components in the new undirected graph. Now, because this graph is weakly connected, that means that when you make all the direct edges undirected, it becomes a connected graph. Then this particular graph has only one weakly connected component, which is the whole graph.

We can use the function `weakly_connected_components()` from `NetworkX` and it will tell us what they are.

```python
# Find the weakly connected components subsets of the network
In: sorted(nx.weakly_connected_components(G))
Out: [['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I' 'J', 'K', 'L', 'M', 'N', 'O']]
```

### Summary

To summarize, we talked about undirected graphs and directed graphs and we were talking about a couple of definitions of connectivity. So for undirected graphs, we said that an undirected graph is connected if for every pair of nodes, there is a path between them. And we talked about connected components and we said that we could use the function `connected_components()` to find these connected components.

Now, for the directed case, we had two types of definitions, the strong and the weak. For the strongly connected, we said that our graph is strongly connected if every pair of nodes, they have a directed path from one node to the other and from the other node to the one, and you could use the function `strongly_connected_components()` to find what these components were. And both of these had the corresponding weak definitions, so weakly connected and weakly connected components. And the way that those work is by making the direct edges undirected, and then applying the same definitions that we had from undirected graphs to the new undirected graph that comes from the directed graph.

---

# Class 4

## Network Robustness

### Connectivity and Robustness

The way we define **network robustness** is the **ability of a network to maintain its general structure**, or its general functions when it faces failures or attacks. So what kind of **attacks** are we talking about here? Well, we're going to talk about **attacks that are in the form of removal of nodes or edges**. This could be somebody purposely trying to remove a node or an edge from a network, or maybe, just random failures that the network may have.

In this case we're going to be talking about **connectivity, so robustness is going to be the network's ability to maintain its connectivity when it loses some of its nodes or some of its edges**.

- **Network Robustness**: Is the ability of a network to maintain its general structure, or its general functions when it faces failures or attacks.
- **Type of Attacks**: Types of kinds of attacks ar going to be attacks that are in the form of removal of nodes or removal of edges from the network.
- **Structural Properties**: Connectivity.


- **Examples**: Closure of Airports or close connections between different airports, Internet router failures, Problems in the distribution of water supply.

### Disconnecting a Graph

#### Smallest Number of Nodes to Remove to Disconnect the Graph

What are the smallest amount of nodes that can be removed from the network in order to make it a Disconnected Network?

And so we can actually use a function in `NetworkX` that's `node_connectivity()` and the input will be the undirected graph and the return is the amount of nodes that are required to me removed from the network in order to make it a disconnected graph., and for this example, if we just remove one node, I would be able to disconnect the graph completely. 

The other alternative is to use the function `minimum_node_cut()`, and then give it us input the undirected graph, and it would tell you which nodes are the ones that you can remove in order to disconnect the graph.

```python
import networkx as nx

G = nx.Graph()

# Check how many nodes are required to be removed in order to make the graph disconnected
In: nx.node_connectivity(G)
Out: 1 

# Check which node is required to be removed in order to make the graph disconnected
In: nx.minimum_node_cut(G)
Out: {'A'}
```


#### Smallest Number of Edges to Remove to Disconnect the Graph

What is the smallest number of edges, instead of nodes, that we would need to remove from this graph in order to completely disconnect it?

As before, we can use the function `edge_connectivity()` and give it us input the undirected graph, and it tells us how many edges are required to be removed in order to disconnect the graph. In this example, there are two edges that you need to remove in order to disconnect this graph.

Well, which 2 edges are those? For this, we can use the function `minimum_edge_cut()` and give it us input the undirected graph, and it tells you what are the edges that you need to remove to disconnect the graph.

```python
import networkx as nx

G = nx.Graph()

# Check how many edges are required to be removed in order to make the graph disconnected
In: nx.edge_connectivity(G)
Out: 2

# Check which edges is required to be removed in order to make the graph disconnected
In: nx.minimum_edge_cut(G)
Out: {('A', 'G'), ('O', 'J')}
```

#### Robustness Networks and Disconnected Graphs Relation

Robust networks are those that have a **large minimum node and edge cuts**. That is those for which you **would have to remove a lot of nodes or edges in order to be able to disconnect them**. That's a very desirable property, because you don't want your network to be easily disconnected by just removing a few nodes or a few edges.

### Simple Paths

Okay, now let's look at a different scenario. Let's look at a directed graph, those that have edges that have a source and a destination. Now, let's think about these nodes as nodes that want to communicate with each other, that want to pass messages to each other. These could be maybe routers, or these could be people who are trying to pass a rumor or pass an important message to each other. And so, imagine that node G here wants to send a message to node L by passing it along to other nodes in this network. So, basically what G wants to do is to find the paths that could take a message from G all the way to L.
- What options does G have in order to do this?
- What are those paths that G can use? 

Well, if you use the function `all_simple_paths()` with input G, the graph and then the two nodes that source in the destination, in this case G and L.

```python
import networkx as nx

# Set Directed Network Graph
G = nx.DiGraph()

# All posibble paths from node G to communicate with node L 
In: sorted(nx.all_simple_paths(G, 'G', 'L'))
Out: [['G', 'A', 'N', 'L'],
      ['G', 'A', 'N', 'O', 'K', 'L'],
      ['G', 'A', 'N', 'O', 'L'],
      ['G', 'J', 'O', 'K', 'L'],
      ['G', 'J', 'O', 'L']]
```

#### Node Connectivity Attack : Attack Example when Nodes are Removed

Okay, now let's imagine that there's some attacker, some person that wants to actually block the message that's going from G to L. 

This attacker is no longer interested in just disconnecting the network in any particular way, it's interested in particularly disrupting the communication that's going from G to L. We could ask the question if this person was going to try to do this by removing node, how many nodes would this attack I need to remove in order to block G from L.

```python
import networkx as nx

# Set Directed Network Graph
G = nx.DiGraph()

# Check how many nodes are required to be removed in order to make the graph disconnected
In: nx.node_connectivity(G, 'G', 'L')
Out: 2

# Which two nodes are required to be remove to disconnect the 
In: nx.minimum_node_cut(G, 'G', 'L')
Out: {'N', 'O'}
```


#### Edges Connectivity Attack : Attack Example when Edges are Removed

Now the attacker, let's say, is only able to remove edges, not completely remove nodes. How many edges would this attacker need to remove in order to block G from L? 

We can use the function `edge_connectivity()` with the input, the graph, the source and destination. And it tells us that It needs two edges in order to be able to achieve this. And to figure out which edges they are, we can use again the `minimum_edge_cut()` function with the graph, the source and the destination.

```python
import networkx as nx

# Set Directed Network Graph
G = nx.DiGraph()

# Check how many edges are required to be removed in order to make the graph disconnected
In: nx.edge_connectivity(G, 'G', 'L')
Out: 2

# Which two edges are required to be remove to disconnect the 
In: nx.minimum_edge_cut(G, 'G', 'L')
Out: {('A', 'N'), ('J', 'O')}
```


### Summary

So in summary, in this video, we've talked about node connectivity, and that is the minimum number of nodes that you need in order to disconnect a graph or disconnect a particular pair of nodes. The functions associated with node connectivity are `node_connectivity()`, which tells us the smallest number of nodes that you need in order to achieve that, and `minimum_node_cut()` which actually tells us which are those nodes.

- **Node Connectivity**: Minimum number of nodes needed to disconnect a graph or a pair of nodes

```python
# minimum number of nodes that you need in order to disconnect a graph
nx.node_connectivity(G, 'G', 'L')

# smallest number of nodes that you need in order to achieve the disconnection
nx.minimum_node_cut(G, 'G', 'L')
```
$$$$
Then, you have a similar definition for edge connectivity, which is the minimum number of edges that you need in order to disconnect the graph, or to disconnect any pair of nodes. The functions they use are `edge_connectivity()` and `minimum_edge_cut()`. 

- **Edge Connectivity**: Minimum number of edges needed to disconnect a graph or a pair of nodes 

```python
# minimum number of nodes that you need in order to disconnect a graph
nx.edge_connectivity(G, 'G', 'L')

# smallest number of nodes that you need in order to achieve the disconnection
nx.minimum_edge_cut(G, 'G', 'L')
```
$$$$
And the takeaway here is that **graphs with large node or large edge connectivity are more robust to the loss of nodes and edges. They're able to keep their connectivity even if some number of edges and nodes are removed from the network**. And for many applications, this is a very desirable property for networks to have.