Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binomial Random Uniform Hypergraph #21917

Closed
pelegm opened this issue Nov 21, 2016 · 38 comments
Closed

Binomial Random Uniform Hypergraph #21917

pelegm opened this issue Nov 21, 2016 · 38 comments

Comments

@pelegm
Copy link
Contributor

pelegm commented Nov 21, 2016

I have implemented binomial random hypergraph. This is my first time writing code in sage, and I would appreciate any feedback.

Component: graph theory

Keywords: hypergraph, random, days79, days94

Author: Peleg Michaeli

Branch/Commit: 5a7b76a

Reviewer: David Coudert

Issue created by migration from https://trac.sagemath.org/ticket/21917

@pelegm pelegm added this to the sage-7.5 milestone Nov 21, 2016
@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Nov 21, 2016

Commit: 588b888

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Nov 21, 2016

Branch pushed to git repo; I updated commit sha1. New commits:

588b888Random binomial uniform hypergraph

@pelegm pelegm changed the title Binomial Random Hypergraph Binomial Random Uniform Hypergraph Nov 21, 2016
@pelegm
Copy link
Contributor Author

pelegm commented Nov 21, 2016

Author: Peleg Michaeli

@pelegm
Copy link
Contributor Author

pelegm commented Nov 22, 2016

Changed keywords from hypergraph, random to hypergraph, random, days79

@seblabbe
Copy link
Contributor

comment:6

Here are some comments:

  1. INPUT: is missing

  2. No empty line is needed between input items.

  3. You should describe the type of the input:

- ``n`` -- integer, number of nodes of the graph
  1. There must be EXAMPLES::. There can be also some TESTS::. I suggest to keep the limit cases inside a TEST block:
            sage: hypergraphs.RandomBinomialUniform(50, 3, 1).num_blocks()
            19600
            sage: hypergraphs.RandomBinomialUniform(50, 3, 0).num_blocks()
            0

and the one with p=0.2 inside the EXAMPLE block without the call to that other numblock method.

sage: hypergraphs.RandomBinomialUniform(50, 3, 0.2)
Incidence structure with 50 points and 3915 blocks 
  1. I think that the line sage: set_random_seed(0) is not necessary because the doctest framework already set the random seed so that random doctest always return the same thing...

  2. Is the seed input only there for doctests reason? Or do you expect a human user to need this? If not, I would suggest to remove this input to the function. Also not that you can add # random at the end of a sage: line in doctests so that it only tests that no error are produced. See http://doc.sagemath.org/html/en/developer/coding_basics.html#section-further-conventions

  3. In fact, I would remove num_blocks() everywhere in the examples. When we quickly look, we think that this methods return a integer...

  4. I would suggest that you read the section http://doc.sagemath.org/html/en/developer/coding_basics.html#documentation-strings. Normally, everything that I said is explained there.

  5. (default=None) -> (default: `None`), in case you keep the seed input

@videlec
Copy link
Contributor

videlec commented Nov 24, 2016

comment:7

Hello,

I do not understand what you are doing with the seed variable in your code... it is set but never used! Manipulating the random state should not be part of your function. There are Sage functions for that purpose.

Vincent

@videlec
Copy link
Contributor

videlec commented Nov 24, 2016

comment:8

This is very inefficient if p is small

edges = [e for e in combinations(range(n), k) if random() < p]

You need to use another approach...

The cardinality of your "edge space" is binomial(n,k). What you want to do is basically pick a binomial random variable B(binomial(n,k), p) and then pick that number of sets. This can be done as follows

sage: S = Subsets(range(50), 5)
sage: sample(S, 10)
[{34, 38, 5, 30, 21},
 {16, 1, 35, 5, 45},
 {34, 3, 4, 5, 15},
 {9, 2, 5, 6, 47},
 {35, 20, 37, 22, 15},
 {0, 17, 42, 12, 45},
 {8, 24, 19, 13, 22},
 {36, 42, 28, 5, 29},
 {48, 17, 49, 33, 7},
 {1, 39, 29, 43, 9}]

Hence you just need to generate a random number distributed with binomial distribution.

@videlec
Copy link
Contributor

videlec commented Nov 24, 2016

comment:9

And for generating binomial distribution there is at least

https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.binomial.html

@pelegm
Copy link
Contributor Author

pelegm commented Nov 24, 2016

comment:10

Replying to @videlec:

I do not understand what you are doing with the seed variable in your code... it is set but never used! Manipulating the random state should not be part of your function. There are Sage functions for that purpose.

This is probably right. I do want to get seed as a parameter, though, to be consistent with graphs/generators/random.py, though. I'm just not sure what exactly to do with it.

@pelegm
Copy link
Contributor Author

pelegm commented Nov 24, 2016

comment:11

sample yields an OverflowError:

sage: n = 10000
sage: p = 1/n
sage: k = 17
sage: sample(Subsets(range(n), k), binomial(n, p))
Traceback (most recent call last):
  File "<ipython-input-57-6d20b3bde428>", line 1, in <module>
    sample(Subsets(range(n), k), binomial(n, p))
  File "/usr/lib/sagemath/local/lib/python2.7/site-packages/sage/misc/prandom.py", line 178, in sample
    return _pyrand().sample(population, k)
  File "/usr/lib/sagemath/local/lib/python/random.py", line 321, in sample
    n = len(population)
OverflowError: Python int too large to convert to C long

I am also quite worried about the fact that numpy probably converts p into a float. Not sure it should matter much, but still...

(this was tested in Sage 7.3... until I'll build 7.5beta3 properly)

@pelegm
Copy link
Contributor Author

pelegm commented Nov 24, 2016

comment:12

I think the maximum number of edges in the complete uniform hypergraph that this can handle via the proposed method is roughly 1e18 (check that xrange, for example, fails for 1e19).

@videlec
Copy link
Contributor

videlec commented Nov 25, 2016

comment:13

Replying to @pelegm:

sample yields an OverflowError:

sage: n = 10000
sage: p = 1/n
sage: k = 17
sage: sample(Subsets(range(n), k), binomial(n, p))
Traceback (most recent call last):
  File "<ipython-input-57-6d20b3bde428>", line 1, in <module>
    sample(Subsets(range(n), k), binomial(n, p))
  File "/usr/lib/sagemath/local/lib/python2.7/site-packages/sage/misc/prandom.py", line 178, in sample
    return _pyrand().sample(population, k)
  File "/usr/lib/sagemath/local/lib/python/random.py", line 321, in sample
    n = len(population)
OverflowError: Python int too large to convert to C long

I am also quite worried about the fact that numpy probably converts p into a float. Not sure it should matter much, but still...

I confirm the behavior on the latest beta (7.5.beta4). I don't think the conversion to float by numpy matters.

On the other hand, as we discussed in Jerusalem it would actually be simpler to have two methods:

  • one for GNM (not needing binomial)
  • one for GNP (that calls GNM with appropriate parameters)

@dcoudert
Copy link
Contributor

comment:14

I'm also in favor of having GNM and GNP like methods.

In above example, you use binomial(n, p) with p=1/k. If p is a float, then binomial will raise TypeError: either m or x-m must be an integer.
You may try this instead sample(Subsets(range(n), k), randint(0,binomial(n, k))).

Obviously, the main limitation for the maximum size of a random uniform hypergraph is not what range can handle but the memory of your computer. 1e9 should already be beyond what you can do.

David.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Nov 26, 2016

Changed commit from 588b888 to 42fde5b

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Nov 26, 2016

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

42fde5bRandom binomial uniform hypergraph

@pelegm
Copy link
Contributor Author

pelegm commented Nov 26, 2016

comment:16

Rebased to develop.

@pelegm
Copy link
Contributor Author

pelegm commented Nov 26, 2016

comment:17

OK, so this is my plan for the uniform model:

        if m < 0:
            raise ValueError("Number of edges must be nonnegative.")

        from sage.combinat.subset import Subsets
        all_edges = Subsets(n, k)
        max_points = len(all_edges)
        if m > max_points:
            raise ValueError("Number of edges may not exceed {}".format(
                max_points))

        from sage.misc.prandom import sample

        try:
            edges = sample(all_edges, m)

        except OverflowError:
            from sys import maxint
            from sage.functions.other import binomial

            if binomial(n, k) > maxint:
                raise OverflowError("binomial(n, k) may not exceed {}".format(
                    maxint))

            raise

        from sage.combinat.designs.incidence_structures import IncidenceStructure
        return IncidenceStructure(edges)

Now, for the binomial model, which function do you think I should use?

  1. Import numpy.random.binomial, or
  2. Implement binomial in sage.misc.prandom (either as a call to numpy, or copying code from numpy? I think it is implemented in C in numpy...)

@dcoudert
Copy link
Contributor

comment:18

You have only 1 call, so this should be sufficient

sage: from sage.misc.prandom import randint
sage: m = randint(0, binomial(1000,17))

@pelegm
Copy link
Contributor Author

pelegm commented Nov 26, 2016

comment:19

Replying to @dcoudert:

You have only 1 call, so this should be sufficient

sage: from sage.misc.prandom import randint
sage: m = randint(0, binomial(1000,17))

This will not yield the current distribution. randint is uniform, rather than binomial.

@videlec
Copy link
Contributor

videlec commented Nov 26, 2016

comment:20

Replying to @pelegm:

OK, so this is my plan for the uniform model:
[SNIP CODE]

I would actually regroup and simplify the errors (and please in the error message start with lower case and no ponctuation at the end)

from sage.combinat.subset import Subsets
from sage.misc.prandom import sample
all_edges = Subsets(n, k)
try:
    edges = sample(all_edges, m)
except OverflowError:
    raise OverflowError("binomial({},{}) too large to be treated".format(n,k))
except ValueError:
    raise ValueError("number of edges m must between 0 and binomial({},{})".format(n,k))

from sage.combinat.designs.incidence_structures import IncidenceStructure
return IncidenceStructure(edges)

I don't think that the overflow in sample has to do with the second argument... It is raised (indirectly) in your example by

sage: len(Subsets(range(10000), 17))
Traceback (most recent call last):
...
OverflowError: Python int too large to convert to C long

@pelegm
Copy link
Contributor Author

pelegm commented Nov 26, 2016

comment:21

Great, thanks. In the meanwhile I implement the binomial random hypergraph using NumPy's binomial function. If there will be any negative comments, I'll think of a new solution.

@videlec
Copy link
Contributor

videlec commented Nov 26, 2016

comment:22

Replying to @pelegm:

Great, thanks. In the meanwhile I implement the binomial random hypergraph using NumPy's binomial function. If there will be any negative comments, I'll think of a new solution.

I am good with it!

@pelegm
Copy link
Contributor Author

pelegm commented Jun 28, 2018

comment:23

Replying to @videlec:

I would actually regroup and simplify the errors (and please in the error message start with lower case and no ponctuation at the end)

from sage.combinat.subset import Subsets
from sage.misc.prandom import sample
all_edges = Subsets(n, k)
try:
    edges = sample(all_edges, m)
except OverflowError:
    raise OverflowError("binomial({},{}) too large to be treated".format(n,k))
except ValueError:
    raise ValueError("number of edges m must between 0 and binomial({},{})".format(n,k))

from sage.combinat.designs.incidence_structures import IncidenceStructure
return IncidenceStructure(edges)

The problem by this implementation is that it creates a hypergraph without the isolated vertices. This is easy to fix, I'm on it.

@pelegm pelegm self-assigned this Jun 28, 2018
@pelegm
Copy link
Contributor Author

pelegm commented Jun 28, 2018

Changed keywords from hypergraph, random, days79 to hypergraph, random, days79, days94

@pelegm
Copy link
Contributor Author

pelegm commented Jun 28, 2018

Changed commit from 42fde5b to 2b6b048

@pelegm
Copy link
Contributor Author

pelegm commented Jun 28, 2018

New commits:

2b6b048Random binomial uniform hypergraph

@pelegm
Copy link
Contributor Author

pelegm commented Jun 28, 2018

Changed branch from u/pelegm/randombinomialhypergraph to u/pelegm/21917

@pelegm
Copy link
Contributor Author

pelegm commented Jun 28, 2018

comment:26

I avoided using seed since randomness is currently using numpy's, and working with two random states seems wrong. Wasn't sure what other tests I should add.

@pelegm pelegm modified the milestones: sage-7.5, sage-8.3 Jun 28, 2018
@dcoudert
Copy link
Contributor

comment:28

Retrun -> Return

in method, BinomialRandomUniform, shouldn't you ensure that m = nrn.binomial(binomial(n, k), p) can be done / that it's not too large? or is it handled in UniformRandomUniform ?

The TESTS block is in fact a EXAMPLES block.

What if n or k or n are values <= 0? You may need specific tests for theses cases.

@pelegm
Copy link
Contributor Author

pelegm commented Jun 29, 2018

comment:30

Replying to @dcoudert:

Retrun -> Return

Thanks, will fix.

in method, BinomialRandomUniform, shouldn't you ensure that m = nrn.binomial(binomial(n, k), p) can be done / that it's not too large? or is it handled in UniformRandomUniform ?

Indeed, for large inputs there's some weird issue in numpy I can't really understand at the moment:

sage: nrn.binomial(binomial(10000, 7), 0.1)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-d91f2b509dee> in <module>()
----> 1 nrn.binomial(binomial(Integer(10000), Integer(7)), RealNumber('0.1'))

mtrand.pyx in mtrand.RandomState.binomial()

TypeError: Cannot cast array data from dtype('O') to dtype('int64') according to the rule 'safe'

The TESTS block is in fact a EXAMPLES block.

Ok.

What if n or k or n are values <= 0? You may need specific tests for theses cases.

Will do.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jun 29, 2018

Branch pushed to git repo; I updated commit sha1. New commits:

de49d8cTypo, parameter checks and tests

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jun 29, 2018

Changed commit from 2b6b048 to de49d8c

@dcoudert
Copy link
Contributor

comment:33

Shouldn't we put an empty line after TESTS:: ?

Once done (or not done if not necessary), you can set this ticket to positive review for me.

@dcoudert
Copy link
Contributor

Reviewer: David Coudert

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jun 30, 2018

Branch pushed to git repo; I updated commit sha1. New commits:

5a7b76aEmpty lines after TESTS::

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jun 30, 2018

Changed commit from de49d8c to 5a7b76a

@pelegm
Copy link
Contributor Author

pelegm commented Jun 30, 2018

comment:35

Thanks!

@vbraun
Copy link
Member

vbraun commented Jul 3, 2018

Changed branch from u/pelegm/21917 to 5a7b76a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants