# Generating hypergraphs with some desired properties

Here we show how to use HyperGraphs.jl to generate hypergraphs that match certain desired properties.

Note that in the examples below we use `-1` a number of times; this is because `0` is a valid degree and cardinality value, meaning that we have to offset iteration counts to align with the first object being index `0` and not `1` as usual.

In [1]:
using HyperGraphs

## 1. Generating hypergraphs according to some degree distribution

The standard approach is to place stubs on each vertex corresponding to the desired number of hyperedges to be incident on that vertex; this effectively fixes the degree distribution, and hyperedges may then be generated by randomly connecting vertices with respect to the stubs.

We first show how to do this in the simple case where we have some target degree counts (i.e. how many vertices of each degree should appear in the hypergraph), and where hyperedges are randomly populated with a random number of vertices. We then show how to extend this approach in a number of ways, by for instance drawing degree counts or hyperedge cardinality from a distribution.

### 1.1 Generating hypergraph given some degree counts

#### 1.1.1. Setting the desired degree counts

We start by setting the desired degree counts. The parameters are
- `n` the number of vertices,
- `target_d_c` the target degree counts.

In [2]:
n, target_d_c = 6, [0, 3, 2, 1]

(6, [0, 3, 2, 1])

This means we are asking for `0` vertex with degree `0`, `3` vertices with degree `1`, `2` vertices with degree `2`, and `1` vertex with degree `3`.

Note that in this setting we cannot ask for a total degree count that is greater than `n`, i.e. the following must be true:

In [3]:
sum(target_d_c) <= n

true

We can then get the target degree sequence `target_d_s` from the target degree counts.

In [4]:
target_d_s = degree_sequence(target_d_c)

6-element Vector{Int64}:
 3
 2
 2
 1
 1
 1

#### 1.1.2. Assigning stubs to vertices

We then assign a given number of stubs to each vertex, which effectively fixes the degree distribution. Here we do so by simply assigning degrees to vertices in a random way. In this case, stubs are just given by a map that sends elements of the vertex set `V` to the degree of each element.

First we get the vertex set `V`; assuming vertices are integers, we simply do

In [5]:
V = 1:n

1:6

We then create a temporary variable `ds` that holds the target degree sequence; this is needed because we must keep track of which stubs have been used when generating hyperedges, and we do not want to modify the original variable.

In [6]:
ds = deepcopy(target_d_s)

6-element Vector{Int64}:
 3
 2
 2
 1
 1
 1

We then assign stubs i.e. make each vertex in `V` correspond to some degree in `ds`.

In [7]:
using Random
vs, ds = collect(V), Random.shuffle(ds)

([1, 2, 3, 4, 5, 6], [1, 1, 2, 2, 3, 1])

#### 1.1.3 Generating hyperedges

Here hyperedges of random cardinality are randomly drawn given available stubs.

In [8]:
es = Vector{HyperEdge{Int64}}()    # this will hold hyperedges

idx = ds .!== 0                    # this keeps track of which vertices have free stubs

while any(idx .!= 0)               # hyperedges are generated until all stubs are used
    k = rand(1:sum(ds))            # hyperedge cardinality is randomly drawn from the number of available stubs
    chosen_vs = Vector{Int64}()    # this will hold the vertices chosen to populate this hyperedge
    for _ in 1:k                   # we iterate over k to choose k vertices
        chosen_v = rand(vs[idx])   # a vertex is randomly chosen from vertices with free stubs
        push!(chosen_vs, chosen_v) # this vertex is added to the vertices that will populate this hyperedge
        ds[chosen_v] -= 1          # we decrement the number of stubs assigned to the chosen vertex
        idx = ds .!== 0            # one stub has been used up and we need to update idx
    end
    push!(es, HyperEdge(chosen_vs))
end

X = HyperGraph(es)                 # we build a hypergraph from the generated hyperedges

HyperGraph{Int64}([2, 5, 1, 4, 6, 3], HyperEdge{Int64}[HyperEdge{Int64}([2, 5, 1, 4, 5, 6, 5, 3]), HyperEdge{Int64}([3, 4])])

We can now check that the degree sequence of the generated hypergraph matches the target degree sequence.

In [9]:
degree_sequence(X) == target_d_s

true

### 1.2 Extensions

#### 1.2.1. Drawing the degree sequence from a distribution

The desired degree counts in `target_d_c` may be drawn from any distribution. Here we demonstrate how to do this by building a distribution from the target degree counts themselves; this is mainly to show how to build a custom `Distribution` type using the Distributions.jl package, but any suitable built-in distribution will work.

We first build the target degree distribution from the target degree counts.

In [10]:
target_d_d = target_d_c ./ n

4-element Vector{Float64}:
 0.0
 0.5
 0.3333333333333333
 0.16666666666666666

We can check that the resulting vector is a probability vector.

In [11]:
using Distributions
isprobvec(target_d_d)

true

We can then use that vector to build a custom distribution.

In [12]:
_support = 0:length(target_d_c)-1
d = DiscreteNonParametric(_support, target_d_d)
d isa Distribution

true

The point of building a custom distribution here is being able to use functions defined on distributions. For example, we can now use `rand` on `d`, which we need to sample a new target degree sequence from it.

In [13]:
sampled_target_d_s = sort(rand(d, n), rev=true)

6-element Vector{Int64}:
 3
 2
 2
 1
 1
 1

A hypergraph may then be built from this new degree sequence using the same code as above.

In [14]:
ds = deepcopy(sampled_target_d_s)
vs, ds = collect(V), Random.shuffle(ds)

es = Vector{HyperEdge{Int64}}()

idx = ds .!== 0
while any(idx .!= 0)
    k = rand(1:sum(ds))
    chosen_vs = Vector{Int64}()
    for _ in 1:k
        chosen_v = rand(vs[idx])
        push!(chosen_vs, chosen_v)
        ds[chosen_v] -= 1
        idx = ds .!== 0
    end
    push!(es, HyperEdge(chosen_vs))
end

X = HyperGraph(es)

degree_sequence(X) == sampled_target_d_s

true

#### 1.2.2. Drawing hyperedge cardinalities from a distribution

The cardinality of each hyperedge may also be drawn from any distribution; here we illustrate this using uniform and binomial distributions.

In [15]:
k_range = 0:maximum(target_d_s) # these are the possible cardinality values
d1 = DiscreteUniform(first(k_range), last(k_range))
d2 = Binomial(last(k_range))
rand(d1), rand(d2)

(2, 1)

The line `k = rand(1:sum(ds))` in the code above just needs to be replaced with `k = rand(d)` where `d` is the distribution of choice.

#### 1.2.3. Restricting hyperedge vertex sets to be sets and not multisets

In the code above, there is no restriction on vertex multiplicity in the hyperedge vertex set. The resulting hypergraphs still satisfy the required degree sequence because the notion of degree we implement in HyperGraphs.jl generalises the one that is commonly used (see [this section](https://github.com/lpmdiaz/HyperGraphs.jl/blob/main/README.md#properties) of the HyperGraphs.jl README for a short discussion of degree). One may however not want the hyperedge vertex set to be a multiset, thus satisfying the definition of the degree of some vertex _v_ as _the cardinality of the set of hyperedges incident on that vertex_.

We show below how to achieve this. Our approach relies on the `sample` function from StatsBase.jl. This also shortens the code since `k` vertices may be drawn at once.

In [16]:
using StatsBase

target_d_s = degree_sequence(target_d_c)
ds = deepcopy(target_d_s)
vs, ds = (collect(V), Random.shuffle(ds))

es = Vector{HyperEdge{Int64}}()

idx = ds .!== 0
while any(idx .!= 0)
    k = rand(1:sum(idx))
    chosen_vs = sample(vs[idx], k, replace=false)
    [ds[chosen_v] -= 1 for chosen_v in chosen_vs]
    idx = ds .!== 0
    push!(es, HyperEdge(chosen_vs))
end

X = HyperGraph(es)

degree_sequence(X) == target_d_s

true

Note that this approach sets a lower bound on the number of hyperedges, according to

In [17]:
min_n_es = length(target_d_c) - 1

3

#### 1.2.4. Dealing with target degree counts not summing up to `n`

Whenever the target degree counts do not sum up to `n`, degree information is is missing: some vertices will have no assigned target degree. We suggest dealing with that case by simply assuming the missing values should be degree `0`. We start in the same way as before, and check that we are missing some information:

In [18]:
target_d_c = [0, 3, 2]
sum(target_d_c) == n

false

We get the target degree sequence in the same way as before and note that is specifies degrees for `5` vertices when we have `6`.

In [19]:
target_d_s = degree_sequence(target_d_c)
ds = deepcopy(target_d_s)

5-element Vector{Int64}:
 2
 2
 1
 1
 1

We simply use this line to append zeros to the target degree sequence.

In [20]:
length(ds) != n && append!(ds, zeros(n - length(ds)))

6-element Vector{Int64}:
 2
 2
 1
 1
 1
 0

The new target degree sequence now contains enough information.

In [21]:
length(ds) == n

true

Note that combining this with example 1.2.1. requires to run `target_d_d = target_d_c ./ sum(target_d_c)` to build the target degree distribution to sample a new degree sequence from.