Enhanced the efficiency of power calculations #26606

haru-44 · 2024-05-18T13:01:56Z

References to other Issues or PRs

Brief description of what is fixed or changed

Other comments

Release Notes

NO ENTRY

sympy-bot · 2024-05-18T13:02:03Z

✅

Hi, I am the SymPy bot. I'm here to help you write a release notes entry. Please read the guide on how to write release notes.

No release notes entry will be added for this pull request.

Click here to see the pull request description that was parsed.

<!-- Your title above should be a short description of what
was changed. Do not include the issue number in the title. -->

#### References to other Issues or PRs
<!-- If this pull request fixes an issue, write "Fixes #NNNN" in that exact
format, e.g. "Fixes #1234" (see
https://tinyurl.com/auto-closing for more information). Also, please
write a comment on that issue linking back to this pull request once it is
open. -->


#### Brief description of what is fixed or changed


#### Other comments


#### Release Notes

<!-- Write the release notes for this release below between the BEGIN and END
statements. The basic format is a bulleted list with the name of the subpackage
and the release note for this PR. For example:

* solvers
  * Added a new solver for logarithmic equations.

* functions
  * Fixed a bug with log of integers. Formerly, `log(-x)` incorrectly gave `-log(x)`.

* physics.units
  * Corrected a semantical error in the conversion between volt and statvolt which
    reported the volt as being larger than the statvolt.

or if no release note(s) should be included use:

NO ENTRY

See https://github.com/sympy/sympy/wiki/Writing-Release-Notes for more
information on how to write release notes. The bot will check your release
notes automatically to see if they are formatted correctly. -->

<!-- BEGIN RELEASE NOTES -->
NO ENTRY
<!-- END RELEASE NOTES -->

github-actions · 2024-05-18T13:51:04Z

Benchmark results from GitHub Actions

Lower numbers are good, higher numbers are bad. A ratio less than 1
means a speed up and greater than 1 means a slowdown. Green lines
beginning with + are slowdowns (the PR is slower then master or
master is slower than the previous release). Red lines beginning
with - are speedups.

Significantly changed benchmark results (PR vs master)

Significantly changed benchmark results (master vs previous release)

| Change   | Before [2487dbb5]    | After [8519e0f7]    |   Ratio | Benchmark (Parameter)                                                |
|----------|----------------------|---------------------|---------|----------------------------------------------------------------------|
| -        | 70.8±0.6ms           | 44.2±0.2ms          |    0.62 | integrate.TimeIntegrationRisch02.time_doit(10)                       |
| -        | 68.1±0.7ms           | 43.4±0.4ms          |    0.64 | integrate.TimeIntegrationRisch02.time_doit_risch(10)                 |
| +        | 18.7±0.6μs           | 29.8±0.6μs          |    1.59 | integrate.TimeIntegrationRisch03.time_doit(1)                        |
| -        | 5.41±0.02ms          | 2.87±0.01ms         |    0.53 | logic.LogicSuite.time_load_file                                      |
| -        | 73.0±0.3ms           | 28.5±0.1ms          |    0.39 | polys.TimeGCD_GaussInt.time_op(1, 'dense')                           |
| -        | 73.0±0.3ms           | 28.9±0.2ms          |    0.4  | polys.TimeGCD_GaussInt.time_op(1, 'sparse')                          |
| -        | 252±1ms              | 124±0.5ms           |    0.49 | polys.TimeGCD_GaussInt.time_op(2, 'dense')                           |
| -        | 251±1ms              | 125±0.4ms           |    0.5  | polys.TimeGCD_GaussInt.time_op(2, 'sparse')                          |
| -        | 643±2ms              | 374±3ms             |    0.58 | polys.TimeGCD_GaussInt.time_op(3, 'dense')                           |
| -        | 653±2ms              | 372±1ms             |    0.57 | polys.TimeGCD_GaussInt.time_op(3, 'sparse')                          |
| -        | 496±4μs              | 283±3μs             |    0.57 | polys.TimeGCD_LinearDenseQuadraticGCD.time_op(1, 'dense')            |
| -        | 1.74±0.01ms          | 1.04±0.01ms         |    0.6  | polys.TimeGCD_LinearDenseQuadraticGCD.time_op(2, 'dense')            |
| -        | 5.77±0.07ms          | 3.07±0.03ms         |    0.53 | polys.TimeGCD_LinearDenseQuadraticGCD.time_op(3, 'dense')            |
| -        | 450±10μs             | 228±2μs             |    0.51 | polys.TimeGCD_QuadraticNonMonicGCD.time_op(1, 'dense')               |
| -        | 1.48±0.01ms          | 680±4μs             |    0.46 | polys.TimeGCD_QuadraticNonMonicGCD.time_op(2, 'dense')               |
| -        | 4.85±0.02ms          | 1.65±0.01ms         |    0.34 | polys.TimeGCD_QuadraticNonMonicGCD.time_op(3, 'dense')               |
| -        | 369±1μs              | 207±1μs             |    0.56 | polys.TimeGCD_SparseGCDHighDegree.time_op(1, 'dense')                |
| -        | 2.40±0.02ms          | 1.24±0ms            |    0.52 | polys.TimeGCD_SparseGCDHighDegree.time_op(3, 'dense')                |
| -        | 9.97±0.05ms          | 4.41±0.02ms         |    0.44 | polys.TimeGCD_SparseGCDHighDegree.time_op(5, 'dense')                |
| -        | 356±3μs              | 172±1μs             |    0.48 | polys.TimeGCD_SparseNonMonicQuadratic.time_op(1, 'dense')            |
| -        | 2.49±0.01ms          | 892±3μs             |    0.36 | polys.TimeGCD_SparseNonMonicQuadratic.time_op(3, 'dense')            |
| -        | 9.59±0.1ms           | 2.66±0.02ms         |    0.28 | polys.TimeGCD_SparseNonMonicQuadratic.time_op(5, 'dense')            |
| -        | 1.01±0.01ms          | 429±4μs             |    0.42 | polys.TimePREM_LinearDenseQuadraticGCD.time_op(3, 'dense')           |
| -        | 1.70±0.01ms          | 520±3μs             |    0.31 | polys.TimePREM_LinearDenseQuadraticGCD.time_op(3, 'sparse')          |
| -        | 5.92±0.04ms          | 1.78±0.01ms         |    0.3  | polys.TimePREM_LinearDenseQuadraticGCD.time_op(5, 'dense')           |
| -        | 8.36±0.02ms          | 1.52±0ms            |    0.18 | polys.TimePREM_LinearDenseQuadraticGCD.time_op(5, 'sparse')          |
| -        | 290±0.5μs            | 64.7±0.5μs          |    0.22 | polys.TimePREM_QuadraticNonMonicGCD.time_op(1, 'sparse')             |
| -        | 3.48±0.03ms          | 394±6μs             |    0.11 | polys.TimePREM_QuadraticNonMonicGCD.time_op(3, 'dense')              |
| -        | 3.95±0.01ms          | 283±1μs             |    0.07 | polys.TimePREM_QuadraticNonMonicGCD.time_op(3, 'sparse')             |
| -        | 7.06±0.07ms          | 1.25±0ms            |    0.18 | polys.TimePREM_QuadraticNonMonicGCD.time_op(5, 'dense')              |
| -        | 8.63±0.06ms          | 850±9μs             |    0.1  | polys.TimePREM_QuadraticNonMonicGCD.time_op(5, 'sparse')             |
| -        | 5.00±0.02ms          | 3.03±0.01ms         |    0.61 | polys.TimeSUBRESULTANTS_LinearDenseQuadraticGCD.time_op(2, 'sparse') |
| -        | 12.0±0.09ms          | 6.63±0.02ms         |    0.55 | polys.TimeSUBRESULTANTS_LinearDenseQuadraticGCD.time_op(3, 'dense')  |
| -        | 22.3±0.2ms           | 9.16±0.1ms          |    0.41 | polys.TimeSUBRESULTANTS_LinearDenseQuadraticGCD.time_op(3, 'sparse') |
| -        | 5.22±0.05ms          | 885±9μs             |    0.17 | polys.TimeSUBRESULTANTS_QuadraticNonMonicGCD.time_op(1, 'sparse')    |
| -        | 12.5±0.04ms          | 7.17±0.06ms         |    0.57 | polys.TimeSUBRESULTANTS_QuadraticNonMonicGCD.time_op(2, 'sparse')    |
| -        | 100±0.8ms            | 26.2±0.06ms         |    0.26 | polys.TimeSUBRESULTANTS_QuadraticNonMonicGCD.time_op(3, 'dense')     |
| -        | 164±0.7ms            | 55.0±0.1ms          |    0.34 | polys.TimeSUBRESULTANTS_QuadraticNonMonicGCD.time_op(3, 'sparse')    |
| -        | 174±1μs              | 113±2μs             |    0.65 | polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(1, 'dense')      |
| -        | 360±1μs              | 214±0.7μs           |    0.6  | polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(1, 'sparse')     |
| -        | 4.20±0.03ms          | 844±3μs             |    0.2  | polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(3, 'dense')      |
| -        | 5.26±0.01ms          | 385±2μs             |    0.07 | polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(3, 'sparse')     |
| -        | 19.8±0.09ms          | 2.80±0.01ms         |    0.14 | polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(5, 'dense')      |
| -        | 22.7±0.2ms           | 633±2μs             |    0.03 | polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(5, 'sparse')     |
| -        | 483±4μs              | 137±0.8μs           |    0.28 | polys.TimeSUBRESULTANTS_SparseNonMonicQuadratic.time_op(1, 'sparse') |
| -        | 4.82±0.03ms          | 612±2μs             |    0.13 | polys.TimeSUBRESULTANTS_SparseNonMonicQuadratic.time_op(3, 'dense')  |
| -        | 5.24±0.03ms          | 141±0.7μs           |    0.03 | polys.TimeSUBRESULTANTS_SparseNonMonicQuadratic.time_op(3, 'sparse') |
| -        | 13.1±0.08ms          | 1.33±0ms            |    0.1  | polys.TimeSUBRESULTANTS_SparseNonMonicQuadratic.time_op(5, 'dense')  |
| -        | 13.6±0.06ms          | 144±0.8μs           |    0.01 | polys.TimeSUBRESULTANTS_SparseNonMonicQuadratic.time_op(5, 'sparse') |
| -        | 132±0.7μs            | 75.3±1μs            |    0.57 | solve.TimeMatrixOperations.time_rref(3, 0)                           |
| -        | 251±1μs              | 89.0±0.4μs          |    0.36 | solve.TimeMatrixOperations.time_rref(4, 0)                           |
| -        | 24.3±0.2ms           | 10.3±0.08ms         |    0.42 | solve.TimeSolveLinSys189x49.time_solve_lin_sys                       |
| -        | 28.9±0.3ms           | 15.3±0.1ms          |    0.53 | solve.TimeSparseSystem.time_linsolve_Aaug(20)                        |
| -        | 56.0±0.2ms           | 24.9±0.1ms          |    0.44 | solve.TimeSparseSystem.time_linsolve_Aaug(30)                        |
| -        | 28.7±0.2ms           | 15.2±0.2ms          |    0.53 | solve.TimeSparseSystem.time_linsolve_Ab(20)                          |
| -        | 55.9±0.3ms           | 24.7±0.2ms          |    0.44 | solve.TimeSparseSystem.time_linsolve_Ab(30)                          |

Full benchmark results can be found as artifacts in GitHub Actions
(click on checks at the top of the PR).

smichr · 2024-05-19T02:36:37Z

sympy/combinatorics/free_groups.py

-            result = result*self
-        # this method can be improved instead of just returning the
-        # multiplication of elements
+            x = self.inverse()


Is it possible to just use pow(x, n)?

Will a RecursionError occur?

asmeurer · 2024-05-19T07:59:09Z

This looks reasonable to me, but we should check that all these methods are actually tested with nontrivial powers.

oscarbenjamin · 2024-05-19T13:18:19Z

sympy/polys/monomials.py

+        result = [0]*len(self)
+        if not n:
+            return self.rebuild(result)
+        x = self.exponents
+        while True:
+            if n % 2:
+                result = monomial_mul(result, x)


Shouldn't this just use monomial_pow?

oscarbenjamin · 2024-05-19T13:38:05Z

In some cases iterated multiplication is more efficient than exponentiation by squaring if multiplication is not constant time (e.g. for polynomials).

haru-44 · 2024-05-20T09:57:31Z

In some cases iterated multiplication is more efficient than exponentiation by squaring if multiplication is not constant time (e.g. for polynomials).

Is that the case for small n?
For sufficiently large n, multiplication can still be efficient even if it is not constant time. If the computational cost of multiplication is $m(n)$, the cost of iterated multiplication is $O(m(n)n)$, while the cost of exponentiation is $O(m(n) \log n)$.

oscarbenjamin · 2024-05-20T10:57:39Z

The computational cost of multiplication depends on the size of both operands like M(n1,n2) and for e.g. polynomials with n1 and n2 terms is quadratic like M(n1,n2) ~ n1*n2 at least if the basic polynomial multiplication algorithm is used. In iterated multiplication one operand stays small but in exponentiation by squaring both operands grow.

Consider something like (x + y + z + t)^8. The number of terms is:

In [37]: [len((Poly(x+y+z+t)**n).coeffs()) for n in range(1, 8+1)]
Out[37]: [4, 10, 20, 35, 56, 84, 120, 165]

In exponentiation by squaring the number of multiplies in the ground domain is:

In [38]: 4*4 + 10*10 + 35*35
Out[38]: 1341

For iterated multiplication it is:

In [45]: 4*4 + 4*10 + 4*20 + 4*35 + 4*56 + 4*84 + 4*120
Out[45]: 1316

These numbers are close but raising to the power 16 the difference is greater:

In [51]: 4*sum([len((Poly(x+y+z+t)**n).coeffs()) for n in range(1, 16)])
Out[51]: 15500

In [52]: 4*4 + 10*10 + 35*35 +165*165
Out[52]: 28566

The more symbols you have and the higher the exponent then the more iterated multiplication is better.

If you have a polynomial $p$ of degree $d$ in $r$ symbols and you want to compute $p^n$ then the iterative method is $O(d^{2r}n^{r+1})$ and exponentiation by squaring is $O(d^{2r}n^{2r})$ assuming constant time coefficient multiplication. The iterative method is asymptotically faster by a factor of $n^{r-1}$.

Counting only the number of coefficient multiplies assumes that multiplication in the coefficient domain is constant which is only true for RR, CC and GF(p). In other domains like ZZ the coefficients grow in size as well as the number of terms compounding the effect that multiplying two large operands is slower than multiplying a small operand by a larger one.

A more complete analysis is in e.g.

[HEI72] L. E. HEINDEL, "Computation of Powers of Multivariate Polynomials over the
Integers," J. Comput. System Sci. 1 (1972), pp. 1-8.

https://www.sciencedirect.com/science/article/pii/S0022000072800370

There are also faster algorithms that are more complicated but if we want to stick to simple algorithms then we should remember that iterated multiplication is not necessarily slower than exponentiation by squaring.

haru-44 · 2024-05-20T11:38:25Z

Thanks for the explanation. I lacked a thorough understanding of multivariate polynomials.

oscarbenjamin · 2024-05-20T11:47:44Z

Looks good. Thanks

Enhanced the efficiency of power calculations

8fff4c6

smichr reviewed May 19, 2024

View reviewed changes

added tests

4503a3a

oscarbenjamin reviewed May 19, 2024

View reviewed changes

Use monomial_pow

4d3b7f5

oscarbenjamin added combinatorics polys Performance holonomic labels May 20, 2024

oscarbenjamin merged commit 3cc42c4 into sympy:master May 20, 2024
48 checks passed

haru-44 deleted the __pow__ branch July 17, 2024 13:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhanced the efficiency of power calculations #26606

Enhanced the efficiency of power calculations #26606

haru-44 commented May 18, 2024

sympy-bot commented May 18, 2024

github-actions bot commented May 18, 2024 •

edited

Loading

smichr May 19, 2024

haru-44 May 19, 2024

asmeurer commented May 19, 2024

oscarbenjamin May 19, 2024

oscarbenjamin commented May 19, 2024

haru-44 commented May 20, 2024

oscarbenjamin commented May 20, 2024

haru-44 commented May 20, 2024

oscarbenjamin commented May 20, 2024

Enhanced the efficiency of power calculations #26606

Enhanced the efficiency of power calculations #26606

Conversation

haru-44 commented May 18, 2024

References to other Issues or PRs

Brief description of what is fixed or changed

Other comments

Release Notes

sympy-bot commented May 18, 2024

github-actions bot commented May 18, 2024 • edited Loading

smichr May 19, 2024

Choose a reason for hiding this comment

haru-44 May 19, 2024

Choose a reason for hiding this comment

asmeurer commented May 19, 2024

oscarbenjamin May 19, 2024

Choose a reason for hiding this comment

oscarbenjamin commented May 19, 2024

haru-44 commented May 20, 2024

oscarbenjamin commented May 20, 2024

haru-44 commented May 20, 2024

oscarbenjamin commented May 20, 2024

github-actions bot commented May 18, 2024 •

edited

Loading