Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improved integer vectors efficiency -Enhancement #36830

Merged

Conversation

amanmoon
Copy link
Contributor

@amanmoon amanmoon commented Dec 7, 2023

Fixes #36816

  1. Added the cardinality function using stars and bars to the IntegerVector_nk class
  2. Fixed the bug in IntegerVectors_n and IntegerVectors_k now IntegerVectors_n(0)cardinality() and IntegerVectors_k(0).cardinality() returns 1 and both of them are Finite EnumeratedSets
  3. improved efficiency of rank and unrank methods in IntegerVectors_n and IntegerVectors_k
  4. IntegerVectors_n(0) and IntegerVectors_k(0) should now show cardinality as 1

…d bug of InfiniteEnumeratedSets and +infinity of IntegerVectors_n(0) and IntegerVectors_k(0)
@@ -839,7 +838,10 @@ def __init__(self, n):
sage: TestSuite(IV).run()
"""
self.n = n
IntegerVectors.__init__(self, category=InfiniteEnumeratedSets())
if self.n==0:
Copy link
Collaborator

@mantepse mantepse Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There need to be spaces around ==. Alternatively (and likely more according to the unwritten sage conventions, but I don't care):

if self.n:
    IntegerVectors.__init__(self, category=InfiniteEnumeratedSets())
else:
    IntegerVectors.__init__(self, category=EnumeratedSets())

This comment also applies below.


TESTS::

sage: IntegerVectors(200,5).cardinality()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really a test, but an example (demonstrating that cardinality works for large numbers). It would also be good if there is an example that can be checked by hand, or even better, in your head.

A test would be len(list(IntegerVectors(10, 5))) == IntegerVectors(10, 5).cardinality().

Warning: len automatically calls cardinality, if available.

Copy link
Contributor

@jukkakohonen jukkakohonen Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For an example&test that is checkable by hand (for us mortals) or in your head (for Gauss), you could have

sage: IntegerVectors(99, 3).cardinality()
5050

The "check in your head" would be that if the sum of the last two elements is $x$, then there are $x+1$ ways of dividing $x$ between the two elements, namely, $(0, x), (1, x-1), (2, x-2), \ldots, (x,0)$. Since $x$ can range from $0$ to $n=99$, the total cardinality is $1+2+\ldots+100 = 50 \times (1+100) = 5050$. Readers familiar with the Gauss story will be amused.

You can then make a bigger demonstration with

sage: IntegerVectors(10^9 - 1, 3).cardinality()
500000000500000000

@@ -1315,7 +1361,7 @@ def __init__(self, n=None, k=None, **constraints):
del constraints['inner']
self.constraints = constraints

if n is not None:
if n is not None :
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No space before :.

@@ -1438,8 +1484,7 @@ def cardinality(self):
if self.k is None:
if self.n is None:
return PlusInfinity()
if ('max_length' not in self.constraints
and self.constraints.get('min_part', 0) <= 0):
elif 'max_length' not in self.constraints and self.constraints.get('min_part', 0) <= 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original version is much more readable, please revert.

…rVectors_n and IntegerVectors_k optimised rank and unrank function

n, k, r = self.n, len(x), 0
for i in range(k):
r += binomial(i+n-1, n)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r = sum(binomial(n+i-1, i-1) for i in range(1, k))

should be easier to understand.

Copy link
Contributor

@jukkakohonen jukkakohonen Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we are on the topic of efficiency, it can be useful to notice that
sum(binomial(m, k) for m in range(n)) == binomial(n, k+1)
see Knuth The art of computer programming §1.2.6E or Wikipedia.

The right hand side can be slightly faster (try $n=10^7$ and $k=17$).

k -= 1
n -= x[i]
r += binomial(k + n - 1, k)

This comment was marked as resolved.

This comment was marked as resolved.

@amanmoon
Copy link
Contributor Author

amanmoon commented Dec 7, 2023

  1. implemented the unrank() method using an algorithm similar to binary search
  2. added a faster rank() and unrank() method for IntegerVector_n and IntegerVector_k classes
  3. resolved the infinite looping problem of IntegerVectors(k=0)[1]

if sum(x) != self.n:
raise ValueError("argument is not a member of IntegerVectors({},{})".format(self.n, None))

n, k, r = self.n, len(x), 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assignment to r is now unnecessary.


n, k, r = self.n, len(x), 0

r = sum(binomial(n + i - 1, i - 1) for i in range(1, k))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not realizing earlier, but this can be simplified further to r = binomial(k+n-1, n+1), if I'm not mistaken.

@@ -1438,7 +1643,7 @@ def cardinality(self):
if self.k is None:
if self.n is None:
return PlusInfinity()
if ('max_length' not in self.constraints
elif ('max_length' not in self.constraints
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no elif needed here, because there is a return just before.

@amanmoon
Copy link
Contributor Author

amanmoon commented Dec 8, 2023

Is there any way to check code style on my local machine?

@mantepse
Copy link
Collaborator

mantepse commented Dec 8, 2023

./sage -tox -e pycodestyle -- src/sage/combinat/integer_vector.py should do what you want.

@mantepse
Copy link
Collaborator

mantepse commented Dec 8, 2023

(see https://doc.sagemath.org/html/en/developer/tools.html)

@amanmoon
Copy link
Contributor Author

amanmoon commented Dec 8, 2023

Thanks, and sorry for not knowing that,
I have removed all the linting errors from my code, there were some errors when i ran ./sage -tox -e pycodestyle -- src/sage/combinat/integer_vector.py:
386:23: E741 ambiguous variable name 'l'
412:15: E741 ambiguous variable name 'l'
1813:1: E402 module level import not at top of file

I was not sure whether to change these, as they were present from the start, so I left them as they were.

rtn.append(0)
rtn.pop()

while True:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this while loop can be factored out into a helper function in IntegerVectors?

In any case, we should store current_rank = self.rank(rtn), so we don't compute it twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should i do the same for rank() method?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think no. If I see it correctly, it would only be 3 lines there.

@mantepse
Copy link
Collaborator

mantepse commented Dec 8, 2023

Excellent, thank you!

There are now only a few comments left, I'll do the review once they are done.

sage: IntegerVectors(2,3).unrank(5)
[0, 0, 2]
"""
if x >= len(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using len(self) seems problematic. For me

sage: V=IntegerVectors(100000,6); c=V.cardinality(); c; V[c-1]

produces

83345834041685416895001
...
OverflowError: cannot fit 'int' into an index-sized integer

How about using cardinality instead?

@jukkakohonen
Copy link
Contributor

jukkakohonen commented Dec 9, 2023

This is already a great improvement over the previous (default) cardinality and unrank methods, and worth including in the next version.

If the vector sum is large, I believe further improvements in unranking are possible. In the version currently pushed, unrank(self, x, rtn) is doing a linear search over the values for each position in the vector. The following takes half a minute to find the vector at position 3 million in the list of 10 million (plus one). I believe it is obvious (if you peek at how the ordered list of the vectors looks like) that there are faster methods.

sage: V = IntegerVectors(10^7, 2)
sage: V.cardinality()
10000001
sage: %time V.unrank(3*10^6)
CPU times: user 38 s, sys: 0 ns, total: 38 s
Wall time: 38 s
[7000000, 3000000]

However, I reiterate that the current version is already a big improvement.

@amanmoon
Copy link
Contributor Author

amanmoon commented Dec 9, 2023

@jukkakohonen Thanks a lot, I'll keep working on finding a new and improved algorithm.

@@ -780,6 +780,20 @@ def __contains__(self, x):
return False
return True

def unrank(self, x, rtn):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method needs a docstring and an example. Also, currently it is not an unrank method, because it takes a second argument, so it should be renamed. I didn't think about it, but could it be made into a proper unrank method by providing a default argument for rtn?

Copy link
Contributor Author

@amanmoon amanmoon Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value of rtn depends on n and k, and there are various ways the rtn list is created based on any one of the values of n and k being None. I believe providing a default argument is not possible.
What I can do is add an if-else block inside the unrank function that checks whether n or k is None and constructs the rtn based on that.
Alternatively, I can rename the function to something like construct_element_from_rtn and I can write a general doctest for the unrank method.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid I misled you, I am sorry. Possibly this shouldn't be a method of IntegerVectors at all, but rather an auxiliary function of the module. It is not completely clear to me what it does - do I understand correctly that it first finds a vector rtn that has rank at least equal to x by moving the content of rtn[0] to larger indices, and then iterating one by one through the vectors having smaller rank, until we find the vector with the correct rank?

If so (or something similar), this won't work, for example, for IntegerVectorsConstraints, because rtn is not guaranteed to satisfy the given conditions, right?

Copy link
Contributor Author

@amanmoon amanmoon Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You, got the algorithm right and the IntegerVectorsConstraints won't work as intended.
changing the name of the function will work or should i revert the changes back when there was no helper function?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I d make it a normal Funktion and call it ·_unrank_helper·, take self as an extra argument.

"""
if self.k == 0 and x != 0:
raise IndexError(f"Index {x} is out of range for the IntegerVector.")
else:
Copy link
Collaborator

@mantepse mantepse Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else: is unnecessary here, because if can only raise an error. Similarly below.

@@ -855,8 +855,7 @@ def __init__(self, n):
self.n = n
if self.n == 0:
IntegerVectors.__init__(self, category=EnumeratedSets())
else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This else: is necessary, there is no return or raise before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These things are bound to happen if one goes else-hunting indiscriminately. It would be good to remember the saying "don't fix it if it isn't broken".

I have heard that some misguided coding standards try to fix "no else after return", indiscriminately, but I have not heard SageMath would be among those.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't care much about any particular rule, however, I favour uniform coding style. I don't really see why always having the else: would be any better or any worse, both have their pro and cons. In any case, at least at some point, the linter checked for else after return, so a fair bit of the codebase adheres to this rule.

I remember that, when I was using Turbo Pascal in the 90s, I spent a night over a forgotten semicolon (or was it a comma?). Who cares?

Copy link
Contributor

@jukkakohonen jukkakohonen Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have given my advice, based on years of coding, and will leave it at that. Word to the wise.

@@ -998,8 +996,7 @@ def __init__(self, k):
self.k = k
if self.k == 0:
IntegerVectors.__init__(self, category=EnumeratedSets())
else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This else: is necessary!

@mantepse
Copy link
Collaborator

Since the main point of this PR is efficiency (besides the corner case fixes of course), it would be worthwhile to perform some offline performance tests to ensure that at least the methods are not getting slower than they used to be before the PR.

Seeing that the old version of IntegerVectors(k=1)[5000] simply iterated through the vectors [0], [1], ..., [5000] in 6.2 milliseconds (just measured that in Sage 9.0), and the proposed new method took 17.3 seconds to find the same result (before the latest fix that eliminated a summation of binomial coefficients), I believe such offline tests would be in order.

One way this can be done (but I'm not sure that all sagemath developers agree with me) is to have a doctest which takes a tolerable amount of time (e.g., half a second on a standard computer) with the good version, but will take very long if the performance degrades.

@jukkakohonen
Copy link
Contributor

One way this can be done (but I'm not sure that all sagemath developers agree with me) is to have a doctest which takes a tolerable amount of time (e.g., half a second on a standard computer) with the good version, but will take very long if the performance degrades.

It would indeed perform the test, but since it has the possibility of making a "short" test take a long time, it might raise some eyebrows. I would ask for opinions on Zulip before incorporating such speed tests.

But I also believe in preliminary and offline tests, performed by the developer before pushing, that are more extensive than the "TESTS" cases.

@@ -780,6 +780,20 @@ def __contains__(self, x):
return False
return True

def _unrank_helper(self, x, rtn):

This comment was marked as resolved.

while self.rank(rtn) <= x:
n += 1
rtn[0] = n
rtn[0] -= 1

This comment was marked as resolved.

@@ -780,6 +780,37 @@ def __contains__(self, x):
return False
return True

def _unrank_helper(self, x, rtn):
"""
return an element at rank ``x``.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return the instead of return an.

I also forgot:

INPUT:

    - ``x`` - a nonnegative integer
    - ``rtn`` - a list of nonnegative integers

Return an element of rank ``x`` by iterating through all integer vectors beginning with ``rtn``.


EXAMPLES::

sage: i=IntegerVectors(k=5)
Copy link
Collaborator

@mantepse mantepse Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After :: you have to indent by 4 spaces. Also, we try to follow the same coding recommendations in the doctests, that is, sage: i = IntegerVectors(k=5) would be better and sage: IV = IntegerVectors(k=5) would be even better.

@mantepse
Copy link
Collaborator

@adrinospy, please excuse me being so fussy! I very much like your contribution!

There are only 3 comments left for positive review!

- ``x`` - a nonnegative integer
- ``rtn`` - a list of nonnegative integers

Return the element at rank ``x`` by iterating through all integer vectors beginning with ``rtn``.

This comment was marked as resolved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mantepse Thank you for your kind words! These are my mistakes, and I'll try to minimize them next time.

@mantepse
Copy link
Collaborator

So, last comment :-) I think I like this version

+        r = binomial(k + n - 1, n + 1)
+        s = 0
+        for i in range(k - 1):
+            s += x[k - 1 - i]
+            r += binomial(s + i, i + 1)

best, it is the easiest to translate into something you would do by hand. If you agree, please replace the snippet in the two rank methods. Otherwise, just let me know.

Copy link

Documentation preview for this PR (built with commit dedee8d; changes) is ready! 🎉

@mantepse
Copy link
Collaborator

Perfekt, thank you!

@vbraun vbraun merged commit 44a0d2d into sagemath:develop Dec 19, 2023
17 of 18 checks passed
@mkoeppe mkoeppe added this to the sage-10.3 milestone Dec 19, 2023
@amanmoon amanmoon deleted the enhancement/IntegerVectors_efficiency branch December 19, 2023 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IntegerVectors efficiency and corner cases
5 participants