###### Introduction to Network Analysis 2023/24 (iv)

## Small-world and scale-free models, graphs vs networks

You are given six networks in Pajek format.

+ Zachary karate club network ([karate_club.net](http://lovro.fri.uni-lj.si/ina/nets/karate_club.net))
+ Map of Darknet from Tor network ([darknet.net](http://lovro.fri.uni-lj.si/ina/nets/darknet.net))
+ IMDb actors collaboration network ([collaboration_imdb.net](http://lovro.fri.uni-lj.si/ina/nets/collaboration_imdb.net))
+ WikiLeaks cable reference network ([wikileaks.net](http://lovro.fri.uni-lj.si/ina/nets/wikileaks.net))
+ Enron e-mail communication network ([enron.net](http://lovro.fri.uni-lj.si/ina/nets/enron.net))
+ A small part of Google web graph ([www_google.net](http://lovro.fri.uni-lj.si/ina/nets/www_google.net))

In [1]:
import networkx as nx
import random
import utils

### III. Synthetic random graphs vs real networks

Consider different large-scale properties of real networks. Namely, low average node degree $\langle k\rangle\ll n$, one giant connected component $S\approx 1$, short distances between the nodes $\langle d\rangle\approx\frac{\ln n}{\ln\langle k\rangle}$, high average node clustering coefficient $\langle C\rangle\gg 0$, power-law degree distribution $p_k\sim k^{-\gamma}$, pronounced community structure etc.



1. **(discuss)** Design synthetic graph model that generates undirected graphs that are _most different_ from real networks.



To generate the most different graph, it needs to have:
- large distances between nodes,
- low clustering,
- high average node degree, but we have to be careful since this makes graph denser, which increases clustering and lowers distances,
- uncharacteristic degree distribution like Binomial,
- no large connected component.

One example of that could be a chain. It has clustering coefficient of 0 since only connection is to the neighboring node, distances scale with O(n) and all nodes except for ending have the same degree. To have no large connected component just create multiple unconnected chains. This satisfies all the criteria except for high node degree.


Another good example is also a Hypercube graph. Here we would also need to do the trick to have multiple disconnected graphs in order not to have one LCC.

In [2]:
G = nx.Graph(name="Chain")

num_nodes = 8192
for i in range(num_nodes):
    G.add_node(i)

for i in range(num_nodes - 1):
    G.add_edge(i, i + 1)

utils.info(G, distance_sample=300)

       Graph | 'Chain'
       Nodes | 8,192 (iso=0)
       Edges | 8,191 (loop=0)
      Degree | 2.00 (max=2)
         LCC | 100.0% (n=1)
    Distance | 2691.04 (max=8,184)
  Clustering | 0.0000



In [5]:
import networkx as nx

n = 13  # Dimension of the hypercube
G = nx.hypercube_graph(n)
G.name = "Hypercube"

utils.info(G, distance_sample=300)

       Graph | 'Hypercube'
       Nodes | 8,192 (iso=0)
       Edges | 53,248 (loop=0)
      Degree | 13.00 (max=13)
         LCC | 100.0% (n=1)
    Distance | 6.50 (max=13)
  Clustering | 0.0000



2. **(code)** Implement generative graph model that _well reproduces_ the structure of real undirected networks.



To reproduce a real undirected network, we can use **copying model**. Model will create 2 links for every added node by connecting to a randomly selected node and one of it's neighbors. This will guarantee that the probability of edge is proportional to hubs, which gives us a scale free model and has high clustering coefficient, low distances, one LCC and low average degree.  

In [7]:
# Example code which takes in existing graph and number of nodes we want to add
def copying_model(G, new_nodes):

    n = G.number_of_nodes()

    # add initial node if the G is empty
    if(n == 0):
        G.add_node(1)
        new_nodes -= 1
        n += 1

    # add edge to chosen node and neighbor
    for i in range(new_nodes):

        new_node = n + i + 1

        random_node = random.choice(list(G.nodes))
        neighbors = list(G.neighbors(random_node))
        G.add_edge(random_node, new_node)

        if(len(neighbors)):
            random_neighbor = random.choice(neighbors)
            G.add_edge(random_neighbor, new_node)

    return G


graph = nx.Graph()
# Arbitrary initial graph
# graph.add_nodes_from([...])
# graph.add_edges_from([...])

new_nodes = 10000
G = copying_model(graph, new_nodes)

utils.info(G, distance_sample=300)

       Graph | ''
       Nodes | 10,000 (iso=0)
       Edges | 19,997 (loop=0)
      Degree | 4.00 (max=66)
         LCC | 100.0% (n=1)
    Distance | 8.76 (max=18)
  Clustering | 0.6925



3. **(discuss)** Does your model have reasonable interpretation or explanation? Does it also reproduce the structure of real directed networks?

The model is both interpretable and works for real directed networks. It can be intepreted as "meeting a friend of a friend", where we have a big chance of linking to one of the neighbors of the node we are already connected to. In terms of directed network, it could be explained as a network of paper citations.