# Competition — Network Generation

### Challenge Overview

Your goal is to generate a network that is as close as possible to the original real network. You do not have the original network in the explicit view, but you know some of its statistics. All statistics are in `stats.txt` file that contains a dictionary of the form
* number_nodes (number of nodes): value
* number_cc (number of connected components): value, sigma
* radius (radius of giant component): value, sigma
* diameter (diameter of giant component): value, sigma
* average_clustering (average clustering coefficient): value, sigma
* average_path_length (average path length): value, sigma
* degree_cdf (empirical CDF of degree distribution): values, probabilities

Meaning of all these sigmas is described in Evaluation section. 

You can use this code to draw CDF
```python
q_seq, p_seq = stats['degree_cdf']
    plt.plot(
        np.append(np.repeat(q_seq, 2)[1:], q_seq[-1]), 
        np.repeat(p_seq, 2)
    )
    plt.show()
```

### Evaluation Criteria

Your total score is calculated as weighted sum of 6 scores — similarities between statistics of original and generated networks. Each score takes values from the interval [0, 1], where 1 — absolute similarity with the original network. The scores are
* "KS"
    * 1 - KS_dist
    * where KS_dist is Kolmogorov-Smirnov test statistic value
* "Radius"
    * $\text{GK}(r, r', \sigma_r) = \exp\left[-\frac{(r - r')^2}{2\sigma_r^2}\right]$
    * where GK is Gaussian Kernel, $r$ is a radius of the original network, $r'$ is a radius of a generated network, $\sigma_r$ is a sigma of a radius from `stats.txt` file
* "Diameter", "Av. clustering", "Av. path length", "Number of CC" are calculated by Gaussian Kernel in the same way
* "Total"
    * 1/6 KS + 1/6 Radius + 1/6 Diameter + 1/6 Av. clustering + 1/6 Av. path length + 1/6 Number of CC

All scores immediately take value 0 if a generated network has incorrect number of nodes. All scores are multiplied by 100 on the leaderboard.

**Baselines**

Baselines are calculated by the following algorithm:
1. Generate a random degree sequence using Inverse Transform Sampling
2. Generate a valid graph by Configuration Model
3. Calculate total score
4. Repeat 1-3 steps 1000 times and accumulate a set of total scores

* Baseline for grade 4: beat a mean total score
* Baseline for grade 6: beat a mean + 3*sigma total score
* Baseline for grade 8: beat a maximum total score

Calculated baselines are in the leaderboard.

### Submission Guidelines

Submit a txt file with a list of edges without self-loops and parallels. The correct form is
```
1 2
1 3
3 2
```
and so on.