### Introduction to Network Analysis 2024/25 (xi)

# Random-walk sampling, networks and models comparison

### I. Estimation by random-walk sampling

You are given five networks of different type and origin.

+ Java class dependency network ([java.net](http://lovro.fri.uni-lj.si/ina/nets/java.net))
+ Sample of Facebook social network ([facebook.net](http://lovro.fri.uni-lj.si/ina/nets/facebook.net))
+ *nec* overlay map of the Internet ([nec.net](http://lovro.fri.uni-lj.si/ina/nets/nec.net))
+ Enron e-mail communication network ([enron.net](http://lovro.fri.uni-lj.si/ina/nets/enron.net))
+ A small part of Google web graph ([www_google.net](http://lovro.fri.uni-lj.si/ina/nets/www_google.net))

1. **(code)** Represent the networks with simple undirected graphs and reduce them to their largest connected component.

In [None]:
# your code here

2. **(code)** Implement a random-walk sampling and apply it to the networks until you sample 15% of the nodes (with repetitions). Let $s$ be the number of sampled nodes and $k_1,\dots,k_s$ their degree sequence. Estimate the average degree of the network $\langle k\rangle$ using a biased average 

$$\frac{\sum_ik_i}{s}$$ 

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; and also the corrected estimate 

$$\frac{s}{\sum_ik_i^{-1}}.$$

In [None]:
# your code here

3. **(discuss)** Compare both estimates to the true average degree $\langle k\rangle$.

### II. Sampling Facebook social network

You are given two large samples of Facebook social network with around ten million of nodes and links. Due to their size, the networks are available only in compressed edge list format.

+ 1st sample of Facebook network ([facebook_1.adj.zip](http://lovro.fri.uni-lj.si/ina/nets/facebook_1.adj.zip))
+ 2nd sample of Facebook network ([facebook_2.adj.zip](http://lovro.fri.uni-lj.si/ina/nets/facebook_2.adj.zip))



1. **(discuss)** The samples were generated by a uniform random node selection technique called _rejection sampling_ and by the breadth-first search approach called _snowball sampling_.

2. **(homework)** Try to figure out which network sample is which. Since these are still very tiny samples of Facebook social network, the answer might not be immediately obvious from their structure.

In [None]:
# your code here

### III. Networks and models comparison

You are given three social networks and three food web graphs in Pajek format.

+ Zachary karate club network ([karate_club.net](http://lovro.fri.uni-lj.si/ina/nets/karate_club.net))
+ Lusseau bottlenose dolphins network ([dolphins.net](http://lovro.fri.uni-lj.si/ina/nets/dolphins.net))
+ US college football network ([american_football.net](http://lovro.fri.uni-lj.si/ina/nets/american_football.net))

<span/>

+ Little Rock Lake food web ([foodweb_littlerock.net](http://lovro.fri.uni-lj.si/ina/nets/foodweb_littlerock.net))
+ Cypress Wetlands food web (dry) ([foodweb_baydry.net](http://lovro.fri.uni-lj.si/ina/nets/foodweb_baydry.net))
+ Cypress Wetlands food web (wet) ([foodweb_baywet.net](http://lovro.fri.uni-lj.si/ina/nets/foodweb_baywet.net))



1. **(discuss)** Consider different approaches for comparing networks. These include comparing networks by different metrics or statistics, graph edit distance, graphlet degree distribution agreement, portrait divergence, $D$-measure etc. You can implement the approaches by yourself, browse your network library for existing implementations or use the code provided below.

	+ Simplified $D$-measure: [simplified_dmeasure.py](http://lovro.fri.uni-lj.si/ina/code/simplified_dmeasure.py)
	+ Network portrait divergence: [portrait_divergence.py](http://lovro.fri.uni-lj.si/ina/code/portrait_divergence.py)
	+ Graphlet distribution agreement: [graphlet_aggrement.py](http://lovro.fri.uni-lj.si/ina/code/graphlet_aggrement.py)

	Note that the last script requires a working installation of the [orca](https://github.com/thocevar/orca) algorithm for counting graphlet orbits.

2. **(code)** Compare the networks between each other and plot their dissimilarity or distances with a heat map. How similar are networks of different type? For instance, are social networks more similar to each other than to food webs? Does the answer depend on the selected measure of dissimilarity or distance?

In [None]:
# your code here

3. **(code)** Compare the networks also to small synthetic graphs such as Erd&ouml;s-R&eacute;nyi random graphs, Barab&aacute;si-Albert scale-free graphs and Watts-Strogatz small-world graphs with $n=500$ nodes and $m=1500$ edges. How similar are real networks to synthetic graphs? How do synthetic graphs compare between each other?

In [None]:
# your code here