Binomial Random Uniform Hypergraph #21917

pelegm · 2016-11-21T19:22:07Z

I have implemented binomial random hypergraph. This is my first time writing code in sage, and I would appreciate any feedback.

Component: graph theory

Keywords: hypergraph, random, days79, days94

Author: Peleg Michaeli

Branch/Commit: 5a7b76a

Reviewer: David Coudert

Issue created by migration from https://trac.sagemath.org/ticket/21917

The text was updated successfully, but these errors were encountered:

sagetrac-git · 2016-11-21T19:22:55Z

Commit: 588b888

sagetrac-git · 2016-11-21T19:22:55Z

Branch pushed to git repo; I updated commit sha1. New commits:

`588b888`	`Random binomial uniform hypergraph`

pelegm · 2016-11-21T19:25:22Z

Author: Peleg Michaeli

pelegm · 2016-11-22T09:31:07Z

Changed keywords from hypergraph, random to hypergraph, random, days79

seblabbe · 2016-11-23T22:45:51Z

comment:6

Here are some comments:

INPUT: is missing
No empty line is needed between input items.
You should describe the type of the input:

- ``n`` -- integer, number of nodes of the graph

There must be EXAMPLES::. There can be also some TESTS::. I suggest to keep the limit cases inside a TEST block:

            sage: hypergraphs.RandomBinomialUniform(50, 3, 1).num_blocks()
            19600
            sage: hypergraphs.RandomBinomialUniform(50, 3, 0).num_blocks()
            0

and the one with p=0.2 inside the EXAMPLE block without the call to that other numblock method.

sage: hypergraphs.RandomBinomialUniform(50, 3, 0.2)
Incidence structure with 50 points and 3915 blocks

I think that the line sage: set_random_seed(0) is not necessary because the doctest framework already set the random seed so that random doctest always return the same thing...
Is the seed input only there for doctests reason? Or do you expect a human user to need this? If not, I would suggest to remove this input to the function. Also not that you can add # random at the end of a sage: line in doctests so that it only tests that no error are produced. See http://doc.sagemath.org/html/en/developer/coding_basics.html#section-further-conventions
In fact, I would remove num_blocks() everywhere in the examples. When we quickly look, we think that this methods return a integer...
I would suggest that you read the section http://doc.sagemath.org/html/en/developer/coding_basics.html#documentation-strings. Normally, everything that I said is explained there.
(default=None) -> (default: `None`), in case you keep the seed input

videlec · 2016-11-24T09:54:02Z

comment:7

Hello,

I do not understand what you are doing with the seed variable in your code... it is set but never used! Manipulating the random state should not be part of your function. There are Sage functions for that purpose.

Vincent

videlec · 2016-11-24T10:02:23Z

comment:8

This is very inefficient if p is small

edges = [e for e in combinations(range(n), k) if random() < p]

You need to use another approach...

The cardinality of your "edge space" is binomial(n,k). What you want to do is basically pick a binomial random variable B(binomial(n,k), p) and then pick that number of sets. This can be done as follows

sage: S = Subsets(range(50), 5)
sage: sample(S, 10)
[{34, 38, 5, 30, 21},
 {16, 1, 35, 5, 45},
 {34, 3, 4, 5, 15},
 {9, 2, 5, 6, 47},
 {35, 20, 37, 22, 15},
 {0, 17, 42, 12, 45},
 {8, 24, 19, 13, 22},
 {36, 42, 28, 5, 29},
 {48, 17, 49, 33, 7},
 {1, 39, 29, 43, 9}]

Hence you just need to generate a random number distributed with binomial distribution.

videlec · 2016-11-24T10:28:19Z

comment:9

And for generating binomial distribution there is at least

https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.binomial.html

pelegm · 2016-11-24T12:51:37Z

comment:10

Replying to @videlec:

I do not understand what you are doing with the seed variable in your code... it is set but never used! Manipulating the random state should not be part of your function. There are Sage functions for that purpose.

This is probably right. I do want to get seed as a parameter, though, to be consistent with graphs/generators/random.py, though. I'm just not sure what exactly to do with it.

pelegm · 2016-11-24T13:15:45Z

comment:11

sample yields an OverflowError:

sage: n = 10000
sage: p = 1/n
sage: k = 17
sage: sample(Subsets(range(n), k), binomial(n, p))
Traceback (most recent call last):
  File "<ipython-input-57-6d20b3bde428>", line 1, in <module>
    sample(Subsets(range(n), k), binomial(n, p))
  File "/usr/lib/sagemath/local/lib/python2.7/site-packages/sage/misc/prandom.py", line 178, in sample
    return _pyrand().sample(population, k)
  File "/usr/lib/sagemath/local/lib/python/random.py", line 321, in sample
    n = len(population)
OverflowError: Python int too large to convert to C long

I am also quite worried about the fact that numpy probably converts p into a float. Not sure it should matter much, but still...

(this was tested in Sage 7.3... until I'll build 7.5beta3 properly)

pelegm · 2016-11-24T13:27:42Z

comment:12

I think the maximum number of edges in the complete uniform hypergraph that this can handle via the proposed method is roughly 1e18 (check that xrange, for example, fails for 1e19).

videlec · 2016-11-25T12:24:15Z

comment:13

Replying to @pelegm:

sample yields an OverflowError:

sage: n = 10000
sage: p = 1/n
sage: k = 17
sage: sample(Subsets(range(n), k), binomial(n, p))
Traceback (most recent call last):
  File "<ipython-input-57-6d20b3bde428>", line 1, in <module>
    sample(Subsets(range(n), k), binomial(n, p))
  File "/usr/lib/sagemath/local/lib/python2.7/site-packages/sage/misc/prandom.py", line 178, in sample
    return _pyrand().sample(population, k)
  File "/usr/lib/sagemath/local/lib/python/random.py", line 321, in sample
    n = len(population)
OverflowError: Python int too large to convert to C long

I am also quite worried about the fact that numpy probably converts p into a float. Not sure it should matter much, but still...

I confirm the behavior on the latest beta (7.5.beta4). I don't think the conversion to float by numpy matters.

On the other hand, as we discussed in Jerusalem it would actually be simpler to have two methods:

one for GNM (not needing binomial)
one for GNP (that calls GNM with appropriate parameters)

dcoudert · 2016-11-26T09:58:43Z

comment:14

I'm also in favor of having GNM and GNP like methods.

In above example, you use binomial(n, p) with p=1/k. If p is a float, then binomial will raise TypeError: either m or x-m must be an integer.
You may try this instead sample(Subsets(range(n), k), randint(0,binomial(n, k))).

Obviously, the main limitation for the maximum size of a random uniform hypergraph is not what range can handle but the memory of your computer. 1e9 should already be beyond what you can do.

David.

sagetrac-git · 2016-11-26T12:55:03Z

Changed commit from 588b888 to 42fde5b

sagetrac-git · 2016-11-26T12:55:03Z

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

`42fde5b`	`Random binomial uniform hypergraph`

pelegm · 2016-11-26T12:55:19Z

comment:16

Rebased to develop.

pelegm · 2016-11-26T13:16:35Z

comment:17

OK, so this is my plan for the uniform model:

        if m < 0:
            raise ValueError("Number of edges must be nonnegative.")

        from sage.combinat.subset import Subsets
        all_edges = Subsets(n, k)
        max_points = len(all_edges)
        if m > max_points:
            raise ValueError("Number of edges may not exceed {}".format(
                max_points))

        from sage.misc.prandom import sample

        try:
            edges = sample(all_edges, m)

        except OverflowError:
            from sys import maxint
            from sage.functions.other import binomial

            if binomial(n, k) > maxint:
                raise OverflowError("binomial(n, k) may not exceed {}".format(
                    maxint))

            raise

        from sage.combinat.designs.incidence_structures import IncidenceStructure
        return IncidenceStructure(edges)

Now, for the binomial model, which function do you think I should use?

Import numpy.random.binomial, or
Implement binomial in sage.misc.prandom (either as a call to numpy, or copying code from numpy? I think it is implemented in C in numpy...)

dcoudert · 2016-11-26T13:31:05Z

comment:18

You have only 1 call, so this should be sufficient

sage: from sage.misc.prandom import randint
sage: m = randint(0, binomial(1000,17))

pelegm · 2016-11-26T16:29:37Z

comment:19

Replying to @dcoudert:

You have only 1 call, so this should be sufficient
sage: from sage.misc.prandom import randint
sage: m = randint(0, binomial(1000,17))

This will not yield the current distribution. randint is uniform, rather than binomial.

videlec · 2016-11-26T16:52:02Z

comment:20

Replying to @pelegm:

OK, so this is my plan for the uniform model:
[SNIP CODE]

I would actually regroup and simplify the errors (and please in the error message start with lower case and no ponctuation at the end)

from sage.combinat.subset import Subsets
from sage.misc.prandom import sample
all_edges = Subsets(n, k)
try:
    edges = sample(all_edges, m)
except OverflowError:
    raise OverflowError("binomial({},{}) too large to be treated".format(n,k))
except ValueError:
    raise ValueError("number of edges m must between 0 and binomial({},{})".format(n,k))

from sage.combinat.designs.incidence_structures import IncidenceStructure
return IncidenceStructure(edges)

I don't think that the overflow in sample has to do with the second argument... It is raised (indirectly) in your example by

sage: len(Subsets(range(10000), 17))
Traceback (most recent call last):
...
OverflowError: Python int too large to convert to C long

pelegm · 2016-11-26T17:21:31Z

comment:21

Great, thanks. In the meanwhile I implement the binomial random hypergraph using NumPy's binomial function. If there will be any negative comments, I'll think of a new solution.

videlec · 2016-11-26T17:23:39Z

comment:22

Replying to @pelegm:

Great, thanks. In the meanwhile I implement the binomial random hypergraph using NumPy's binomial function. If there will be any negative comments, I'll think of a new solution.

I am good with it!

pelegm · 2018-06-28T15:37:01Z

comment:23

Replying to @videlec:

I would actually regroup and simplify the errors (and please in the error message start with lower case and no ponctuation at the end)

from sage.combinat.subset import Subsets
from sage.misc.prandom import sample
all_edges = Subsets(n, k)
try:
    edges = sample(all_edges, m)
except OverflowError:
    raise OverflowError("binomial({},{}) too large to be treated".format(n,k))
except ValueError:
    raise ValueError("number of edges m must between 0 and binomial({},{})".format(n,k))

from sage.combinat.designs.incidence_structures import IncidenceStructure
return IncidenceStructure(edges)

The problem by this implementation is that it creates a hypergraph without the isolated vertices. This is easy to fix, I'm on it.

pelegm · 2018-06-28T15:37:23Z

Changed keywords from hypergraph, random, days79 to hypergraph, random, days79, days94

pelegm · 2018-06-28T15:54:06Z

Changed commit from 42fde5b to 2b6b048

pelegm · 2018-06-28T15:54:06Z

New commits:

`2b6b048`	`Random binomial uniform hypergraph`

pelegm · 2018-06-28T15:54:06Z

Changed branch from u/pelegm/randombinomialhypergraph to u/pelegm/21917

pelegm · 2018-06-28T15:55:49Z

comment:26

I avoided using seed since randomness is currently using numpy's, and working with two random states seems wrong. Wasn't sure what other tests I should add.

dcoudert · 2018-06-28T16:21:04Z

comment:28

Retrun -> Return

in method, BinomialRandomUniform, shouldn't you ensure that m = nrn.binomial(binomial(n, k), p) can be done / that it's not too large? or is it handled in UniformRandomUniform ?

The TESTS block is in fact a EXAMPLES block.

What if n or k or n are values <= 0? You may need specific tests for theses cases.

pelegm · 2018-06-29T09:35:50Z

comment:30

Replying to @dcoudert:

Retrun -> Return

Thanks, will fix.

in method, BinomialRandomUniform, shouldn't you ensure that m = nrn.binomial(binomial(n, k), p) can be done / that it's not too large? or is it handled in UniformRandomUniform ?

Indeed, for large inputs there's some weird issue in numpy I can't really understand at the moment:

sage: nrn.binomial(binomial(10000, 7), 0.1)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-d91f2b509dee> in <module>()
----> 1 nrn.binomial(binomial(Integer(10000), Integer(7)), RealNumber('0.1'))

mtrand.pyx in mtrand.RandomState.binomial()

TypeError: Cannot cast array data from dtype('O') to dtype('int64') according to the rule 'safe'

The TESTS block is in fact a EXAMPLES block.

Ok.

What if n or k or n are values <= 0? You may need specific tests for theses cases.

Will do.

sagetrac-git · 2018-06-29T11:11:09Z

Branch pushed to git repo; I updated commit sha1. New commits:

`de49d8c`	`Typo, parameter checks and tests`

sagetrac-git · 2018-06-29T11:11:09Z

Changed commit from 2b6b048 to de49d8c

dcoudert · 2018-06-29T21:54:54Z

comment:33

Shouldn't we put an empty line after TESTS:: ?

Once done (or not done if not necessary), you can set this ticket to positive review for me.

dcoudert · 2018-06-29T21:54:54Z

Reviewer: David Coudert

sagetrac-git · 2018-06-30T07:02:30Z

Branch pushed to git repo; I updated commit sha1. New commits:

`5a7b76a`	`Empty lines after TESTS::`

sagetrac-git · 2018-06-30T07:02:30Z

Changed commit from de49d8c to 5a7b76a

pelegm · 2018-06-30T07:02:51Z

comment:35

Thanks!

vbraun · 2018-07-03T23:40:23Z

Changed branch from u/pelegm/21917 to 5a7b76a

pelegm added this to the sage-7.5 milestone Nov 21, 2016

pelegm added c: graph theory labels Nov 21, 2016

pelegm changed the title ~~Binomial Random Hypergraph~~ Binomial Random Uniform Hypergraph Nov 21, 2016

pelegm added the s: needs review label Nov 21, 2016

seblabbe added s: needs work and removed s: needs review labels Nov 23, 2016

pelegm self-assigned this Jun 28, 2018

pelegm added s: needs review and removed s: needs work labels Jun 28, 2018

pelegm modified the milestones: sage-7.5, sage-8.3 Jun 28, 2018

pelegm added s: needs work and removed s: needs review labels Jun 29, 2018

pelegm added s: needs review and removed s: needs work labels Jun 29, 2018

pelegm added s: positive review and removed s: needs review labels Jun 30, 2018

vbraun removed the s: positive review label Jul 3, 2018

vbraun closed this as completed in 71448bd Jul 3, 2018

Binomial Random Uniform Hypergraph #21917

Binomial Random Uniform Hypergraph #21917

Comments

pelegm commented Nov 21, 2016

sagetrac-git mannequin commented Nov 21, 2016

sagetrac-git mannequin commented Nov 21, 2016

pelegm commented Nov 21, 2016

pelegm commented Nov 22, 2016

seblabbe commented Nov 23, 2016

videlec commented Nov 24, 2016

videlec commented Nov 24, 2016

videlec commented Nov 24, 2016

pelegm commented Nov 24, 2016

pelegm commented Nov 24, 2016

pelegm commented Nov 24, 2016

videlec commented Nov 25, 2016

dcoudert commented Nov 26, 2016

sagetrac-git mannequin commented Nov 26, 2016

sagetrac-git mannequin commented Nov 26, 2016

pelegm commented Nov 26, 2016

pelegm commented Nov 26, 2016

dcoudert commented Nov 26, 2016

pelegm commented Nov 26, 2016

videlec commented Nov 26, 2016

pelegm commented Nov 26, 2016

videlec commented Nov 26, 2016

pelegm commented Jun 28, 2018

pelegm commented Jun 28, 2018

pelegm commented Jun 28, 2018

pelegm commented Jun 28, 2018

pelegm commented Jun 28, 2018

pelegm commented Jun 28, 2018

dcoudert commented Jun 28, 2018

pelegm commented Jun 29, 2018

sagetrac-git mannequin commented Jun 29, 2018

sagetrac-git mannequin commented Jun 29, 2018

dcoudert commented Jun 29, 2018

dcoudert commented Jun 29, 2018

sagetrac-git mannequin commented Jun 30, 2018

sagetrac-git mannequin commented Jun 30, 2018

pelegm commented Jun 30, 2018

vbraun commented Jul 3, 2018