Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul factorial{,2,k}: API coherence, bug fixes & consistent array-support #15841

Merged
merged 19 commits into from
May 3, 2023

Conversation

h-vetinari
Copy link
Member

@h-vetinari h-vetinari commented Mar 22, 2022

This is some of the work towards #15600. It's still missing several things (mainly extend='complex'), but it's very big as-is, so I wanted to post this as just a something to further the discussion.

Currently it contains several small behaviour changes:

  • Make factorial2 / factorialk return 0 for small negative integers
    • See 016919a and this comment.
    • Currently no mitigation/deprecation for the behaviour change implemented, though this is mainly because of the WIP - we can adapt however we'd like.
  • Make both factorial / factorial2 consistently return the same type as the input. This changes factorial2 to not return 0-dim arrays for scalar inputs, and - inversely - changes factorial to not turn 0-dim arrays into scalars.
  • Switch the output type of factorialk to object for k >= 10 (I think this is a reasonable trade-off, for more details see 99d088a)
  • Switch the float approximation for factorial2 to the one from ENH: extend factorial2 to complex domain #15349 deferred due to ambiguity of the extension.

In addition, it adds array-support for factorialk, exact=True array-support for factorial2 (and many related fixes), and also fixes #13122; Finally, it cleans up and parametrizes a lot of tests, and adds a whole bunch of new ones.

Note: the individual commits should be much easier to review than the entire diff at once.

@h-vetinari h-vetinari force-pushed the factorial branch 6 times, most recently from dfcc136 to abfd462 Compare March 22, 2022 12:54
@tupui tupui added enhancement A new feature or improvement scipy.special labels Mar 22, 2022
@h-vetinari h-vetinari force-pushed the factorial branch 4 times, most recently from 4cea1e4 to 2c055e8 Compare March 27, 2022 01:46
@h-vetinari h-vetinari marked this pull request as ready for review March 27, 2022 01:47
@h-vetinari
Copy link
Member Author

h-vetinari commented Mar 27, 2022

@tupui @ilayn @tirthasheshpatel @j-bowhay

This should be ready for a first look. I realize it's pretty big, and I'll be happy to cut it up into more digestible pieces (though as the OP notes, I've tried to make the individual commits make sense on their own). I just wanted to ask for some confirmation about the direction (slight behaviour changes, array-extensions, testing approach) before doing that.

I think cleaning up the current behavioural mess should be the first goal (TBD if we want to explicitly deprecate certain things, or if they're inconsequential enough to just change like 0-dim arrays to scalars), and further unifications/extensions should come on top of that.

@h-vetinari
Copy link
Member Author

Regarding test coverage, this adds about 500 tests (though highly parametrized) - compared to the CI for 9ce6bb0:

=== 36914 passed, 2640 skipped, 140 xfailed, 9 xpassed in 428.93s (0:07:08) ====

we now have

=== 37430 passed, 2640 skipped, 140 xfailed, 9 xpassed in 362.98s (0:06:02) ====

They're all pretty-to-very fast though. 🙃

Copy link
Member

@ilayn ilayn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a brief look to the parts that I could follow.

scipy/special/_basic.py Show resolved Hide resolved
scipy/special/_basic.py Outdated Show resolved Hide resolved
scipy/special/_basic.py Outdated Show resolved Hide resolved
scipy/special/_basic.py Outdated Show resolved Hide resolved
scipy/special/_basic.py Outdated Show resolved Hide resolved
scipy/special/_basic.py Outdated Show resolved Hide resolved
@j-bowhay
Copy link
Member

A friendly ping @h-vetinari

@h-vetinari
Copy link
Member Author

A friendly ping @h-vetinari

Thanks. Just a rebase so far, still needs some work on some open review comments (I had still been hoping to get some feedback about the double factorial for even integers, but since extend='complex' is not part of this PR, I guess we don't have to solve that now).

@h-vetinari
Copy link
Member Author

h-vetinari commented Apr 13, 2023

@ilayn @j-bowhay
I've finally picked this up again. I've now disabled the float-case for factorial2 that needed the ambiguous extension into the complex plane. I'll leave that for a follow-up if/once we implement extend='complex' or something like that.

The commit history is a complete mess (sorry), but before I spend a bunch of time cleaning it up, I wanted to ask:

  • could you please have a look if you see anything egregious that needs fixing?

  • would you object to landing this not as a single squashed commit, but as something like ~5 well-delineated commits (which could be rebased & fast-forward merged by the github interface)?

    I think this PR does quite a number of things (though all closely related) and I wouldn't want that all to be bunched up into a giant commit that makes the git history almost useless. My goal would be to always pair code-changes with the respective test changes/additions per commit (in a way that's passing CI, obviously), but I could also imagine different ways (e.g. commit tests first, but with some skips, and remove those skips as the fixes get committed to the code). Please let me know what you think.

I've rebased the PR (no conflicts), new commits since the last discussions start from 45a3177 (but I was thrashing around a fair bit, so not sure if single-commit reviewing is worth your time on this).

@j-bowhay
Copy link
Member

@ilayn @j-bowhay I've finally picked this up again. I've now disabled the float-case for factorial2 that needed the ambiguous extension into the complex plane. I'll leave that for a follow-up if/once we implement extend='complex' or something like that.

I am fine with this, better to have a solid implementation first before worrying about extensions.

The commit history is a complete mess (sorry), but before I spend a bunch of time cleaning it up, I wanted to ask:

  • could you please have a look if you see anything egregious that needs fixing?

I'll try to have a look over this in the coming weeks

  • would you object to landing this not as a single squashed commit, but as something like ~5 well-delineated commits (which could be rebased & fast-forward merged by the github interface)?
    I think this PR does quite a number of things (though all closely related) and I wouldn't want that all to be bunched up into a giant commit that makes the git history almost useless. My goal would be to always pair code-changes with the respective test changes/additions per commit (in a way that's passing CI, obviously), but I could also imagine different ways (e.g. commit tests first, but with some skips, and remove those skips as the fixes get committed to the code). Please let me know what you think.

I'm happy with what you're proposing

@j-bowhay
Copy link
Member

Can you post the benchmark results vs main?

@h-vetinari
Copy link
Member Author

Can you post the benchmark results vs main?

Unfortunately no. python dev.py bench -t special --compare main has failed 3 times already in my WSL. I'm not worried about performance though; the hot path for factorial is unchanged, and accelerated for factorial{2,k}. But even if the few extra checks were to have a little overhead, they're worth it for correctness / consistency.

If you can check out this PR and run the benchmark, that would be nice though.

@j-bowhay
Copy link
Member

before           after         ratio
     [c62be9c6]       [c71cc6e2]
     <main>           <factorial>
!      17.6±0.7μs           failed      n/a  special.Factorial.time_factorial_exact_true_array_negative_float
+         374±6ns       11.9±0.7μs    31.79  special.Factorial.time_factorial_exact_false_array_negative_float
+         420±8ns       13.2±0.5μs    31.33  special.Factorial.time_factorial_exact_false_array_positive_float
+        93.3±1ns      2.16±0.06μs    23.13  special.FactorialK.time_factorialk_exact_false_scalar_negative_int
+        87.5±2ns      1.90±0.02μs    21.72  special.Factorial2.time_factorial2_exact_true_scalar_negative_int
+         679±8ns       11.4±0.6μs    16.72  special.Factorial.time_factorial_exact_false_array_negative_int
+        799±20ns       11.7±0.7μs    14.62  special.Factorial.time_factorial_exact_false_array_positive_int
+        614±30ns      4.13±0.05μs     6.73  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(100, 9)
+         826±8ns       5.44±0.4μs     6.60  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(100, 5)
+        618±10ns      4.07±0.07μs     6.58  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(100, 8)
+         759±6ns       4.83±0.4μs     6.36  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(100, 6)
+         974±3ns       6.13±0.4μs     6.29  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(100, 4)
+         689±3ns       4.10±0.1μs     5.96  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(100, 7)
+     1.76±0.04μs       9.25±0.2μs     5.25  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(100, 2)
+     1.25±0.01μs      6.35±0.08μs     5.07  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(100, 3)
+     1.85±0.03μs       8.95±0.1μs     4.83  special.Factorial2.time_factorial2_exact_true_scalar_positive_int(100)
+     2.21±0.01μs       10.3±0.1μs     4.67  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(500, 9)
+     3.33±0.05μs       15.4±0.2μs     4.63  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(500, 6)
+     4.04±0.04μs       18.2±0.4μs     4.51  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(500, 5)
+     2.86±0.02μs      12.7±0.08μs     4.43  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(500, 7)
+     3.84±0.02μs       16.2±0.4μs     4.22  special.Factorial2.time_factorial2_exact_true_scalar_positive_int(200)
+     2.52±0.03μs       10.5±0.1μs     4.15  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(500, 8)
+         551±8ns       2.17±0.1μs     3.93  special.Factorial.time_factorial_exact_false_scalar_positive_float(10000.5)
+      3.64±0.1μs       14.3±0.3μs     3.92  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(100, 1)
+        555±20ns       2.16±0.1μs     3.89  special.Factorial.time_factorial_exact_false_scalar_positive_float(1000.3)
+     7.21±0.01μs       28.0±0.2μs     3.88  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(500, 3)
+         573±9ns       2.21±0.1μs     3.85  special.Factorial.time_factorial_exact_false_scalar_positive_float(100.8)
+     5.17±0.02μs       18.7±0.3μs     3.62  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(500, 4)
+      9.71±0.3μs         33.2±1μs     3.42  special.Factorial2.time_factorial2_exact_true_scalar_positive_int(400)
+         498±7ns      1.62±0.06μs     3.24  special.Factorial.time_factorial_exact_false_scalar_negative_float
+      12.1±0.1μs       35.0±0.4μs     2.90  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(500, 2)
+        841±20ns       2.37±0.2μs     2.82  special.Factorial.time_factorial_exact_false_scalar_positive_int(10000)
+        851±10ns       2.39±0.2μs     2.80  special.Factorial.time_factorial_exact_false_scalar_positive_int(1000)
+         870±4ns       2.37±0.1μs     2.73  special.Factorial.time_factorial_exact_false_scalar_positive_int(100)
+        774±30ns       1.55±0.1μs     2.01  special.Factorial.time_factorial_exact_false_scalar_negative_int

I am getting quite a significant slow down in certain cases

@h-vetinari
Copy link
Member Author

I am getting quite a significant slow down in certain cases

Thanks for testing. Even though we're testing very small arrays (where the checks' constant impact is mitigated the least by the size of the array), I'm surprised by the magnitude. Looking at the added checks, I don't see something that would explain this, but perhaps _ufuncs._factorial(n) is much faster than gamma.

It's was also an API design decision to return all-nan arrays of any dtype unchanged, but that does add the overhead of having to check whether a given array is actually all-nan. Given the slow-downs, I think I'll remove this bit again and just throw if the dtype is not what it should be.

I'll have a look at other places where we might be leaving some performance on the table.

@h-vetinari
Copy link
Member Author

@j-bowhay

Could you do another run perhaps?

For factorialk, I'm quite willing to eat the performance drop though - it currently has 0 checks and corner-case handling, and adding that is worth some overhead. Also, the _range_prod thing has a much better complexity than the existing implementation (O(log(n)) rather than O(n) multiplications); I guess with the recursion overhead it's possible that it's slower for small integers, but I don't think it's worth implementing a cut-off for this TBH.

@j-bowhay
Copy link
Member

       before           after         ratio
     [49d46084]       [ae4a7795]
     <main>           <factorial>
!      16.5±0.1μs           failed      n/a  special.Factorial.time_factorial_exact_true_array_negative_float
+      84.2±0.5ns      1.81±0.04μs    21.50  special.Factorial2.time_factorial2_exact_true_scalar_negative_int
+        131±40ns      2.14±0.01μs    16.38  special.FactorialK.time_factorialk_exact_false_scalar_negative_int
+         400±3ns       2.50±0.8μs     6.26  special.Factorial.time_factorial_exact_false_array_positive_float
+     1.77±0.01μs      8.73±0.04μs     4.94  special.Factorial2.time_factorial2_exact_true_scalar_positive_int(100)
+         602±8ns       2.96±0.9μs     4.91  special.Factorial.time_factorial_exact_false_scalar_positive_float(10000.5)
+         600±9ns       2.95±0.9μs     4.91  special.Factorial.time_factorial_exact_false_scalar_positive_float(1000.3)
+         622±7ns       3.01±0.9μs     4.84  special.Factorial.time_factorial_exact_false_scalar_positive_float(100.8)
+       851±300ns      4.01±0.09μs     4.71  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(100, 9)
+        4.65±1μs         21.9±7μs     4.70  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(500, 6)
+       530±200ns       2.46±0.8μs     4.64  special.Factorial.time_factorial_exact_false_array_negative_float
+       889±300ns       4.12±0.2μs     4.63  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(100, 8)
+        5.53±2μs         25.4±8μs     4.60  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(500, 5)
+        3.93±1μs         18.0±6μs     4.59  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(500, 7)
+        3.26±1μs         14.7±5μs     4.50  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(500, 9)
+     3.61±0.01μs       16.1±0.2μs     4.47  special.Factorial2.time_factorial2_exact_true_scalar_positive_int(200)
+      1.15±0.4μs       5.09±0.1μs     4.44  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(100, 5)
+       956±300ns       4.15±0.2μs     4.34  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(100, 7)
+        3.47±1μs         14.8±5μs     4.26  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(500, 8)
+      1.36±0.4μs       5.77±0.2μs     4.24  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(100, 4)
+      1.06±0.3μs      4.47±0.03μs     4.21  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(100, 6)
+        541±10ns       2.16±0.7μs     4.00  special.Factorial.time_factorial_exact_false_scalar_negative_float
+        835±30ns         3.25±1μs     3.89  special.Factorial.time_factorial_exact_false_scalar_positive_int(1000)
+        838±30ns         3.25±1μs     3.89  special.Factorial.time_factorial_exact_false_scalar_positive_int(10000)
+        861±30ns         3.26±1μs     3.79  special.Factorial.time_factorial_exact_false_scalar_positive_int(100)
+     8.66±0.01μs         32.7±1μs     3.77  special.Factorial2.time_factorial2_exact_true_scalar_positive_int(400)
+      2.49±0.8μs         9.16±4μs     3.67  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(100, 2)
+         596±3ns       2.17±0.7μs     3.65  special.Factorial.time_factorial_exact_false_array_negative_int
+      1.76±0.6μs       6.33±0.3μs     3.60  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(100, 3)
+         702±3ns       2.35±0.7μs     3.35  special.Factorial.time_factorial_exact_false_array_positive_int
+        755±20ns       2.11±0.7μs     2.80  special.Factorial.time_factorial_exact_false_scalar_negative_int
+        10.0±3μs       27.7±0.6μs     2.76  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(500, 3)
+        5.11±2μs      14.1±0.07μs     2.75  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(100, 1)
+        7.14±2μs       18.4±0.3μs     2.58  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(500, 4)
+        17.0±5μs       34.7±0.5μs     2.04  special.FactorialK.time_factorialk_exact_true_scalar_positive_int(500, 2)

That change looks like it improved performance quite a bit

Also avoids math.factorial for these; which errors in 3.10+.

Fixes scipyGH-13122
No functional changes, only added the docstring.
Also fix NaN-handling in factorial{2,k}.
For factorial2, the array-path for exact=False existed already.
Remove benchmarks for floats with exact=True (not supported
anymore) and extend the tested range with some larger inputs.
also improve errors for complex inputs in special.factorial
@h-vetinari
Copy link
Member Author

OK, it's time :)

After chasing green CI for ~3 weeks, the only blemish is now #18407, but that's manageable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.special
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The test suite fails on Python 3.10: issue with factorial() on NaN
5 participants