Avoid redundant computations in IRR calculation #60

jlopezpena · 2023-02-13T15:32:00Z

The IRR computation adds and then subtract the previous iteration value before comparing to the tolerance. We can just compute the delta and compare that to the tolerance instead. This should also make the computation more robust if there is a large difference in magnitude between x and delta

The IRR computation adds and then subtract the previous iteration value before comparing to the tolerance. We can just compute the delta and compare that to the tolerance instead. This should also make the computation more robust if there is a large difference in magnitude between `x` and `delta`

Kai-Striega

Good find! I'm going to do some digging on why it's failing the test - it should be an identical result. I'm not quite sure where the errors are being introduced from.

jlopezpena · 2023-02-14T11:09:36Z

If x is very large and delta is very small there might be floating point errors that affect the outcome, so (x - delta) - x might not be actually the same as delta, but other than the numerical instability yeah, the result should be identical

jlopezpena · 2023-02-14T11:14:41Z

I haven't looked in detail at the tests, but this is all but guaranteed to give you numerical problems:

v = [
            -3000.0,
            2.3926932267015667e-07,
            4.1672087103345505e-16,
            5.3965110036378706e-25,
            5.1962551071806174e-34,
            3.7202955645436402e-43,
            1.98061711632469e-52,
            7.8393517651814181e-62,
            2.3072565113911438e-71,
            5.0491839233308912e-81,
            8.2159177668499263e-91,
            9.9403244366963527e-101,
            8.942410813633967e-111,
            5.9816122646481191e-121,
            2.97309031844241e-131,
            1.1002067043497954e-141,
            3.02528765638021e-152
            6.1854121948207909e-163,
            9.40329800153301e-174,
            1.0629218520017728e-184,
            8.9337141847171845e-196,
            5.5830607698467935e-207,
            2.5943122036622652e-218,
            8.9635842466507006e-230,
            2.3027710094332358e-241,
            ...
        ]

Kai-Striega · 2023-02-14T11:24:05Z

but this is all but guaranteed to give you numerical problems:

I agree, however, we went from that test passing to it failing. I'd like to investigate what those numerical problems are. Looking at the original issue #15 it looks like it was an issue with the old irr solver, not with the most recent newton's method solver.

jlopezpena · 2023-02-14T12:35:45Z

I think so. For practical purposes the cashflow in the test is basically "all the initial payout is lost" so that would represent a loss of 100% (which doesn't have a properly defined behaviour under the assumption of "reinvesting the profits" that IRR is based at). So a solution to that is "something close to -1" which seems to be right by the intended test. In the IRR formula, when eirr is close to -1 the transformed variable 1 / (1 + err) (which is referred to as x in the code) is close to infinity. So what I think is happening is that in the current code the x - delta part becomes just x because of the floating point precision thing with x being very large, so (x - delta) - x becomes 0 and the algorithm stops. With this PR, delta might not be below the tolerance, and the number of iterations might be exhausted

jlopezpena · 2023-02-14T12:37:42Z

A potential solution is to use a relative tolerance instead of an absolute one, but we are dealing with a very edge case here. Alternatively, we can have a breakout that returns -1 if x is larger than a certain threshold

jlopezpena · 2023-02-16T15:38:26Z

I've been doing some further digging on this, and found out that, as suspected, the IRR algorithm is NOT converging for the test case (neither for the original nor for my proposed change). This can be seen by inserting a print(delta) in the loop and then running test_gh_15:

-12538172285.538937
1053069465.3266779
989220413.1617088
928321332.590804
870269688.6044048
814967045.1286157
762319263.6737074
712236751.376529
664634679.6006484
619432894.2146667
576554675.9649655
535921974.7416782
497440630.96024746
460958230.0502432
426148959.4876158
392208772.1405912
357083216.30890393
315714469.10958934
257360879.981704
168611272.6382977
64440716.36582911
7770992.814447409
100879.09078333691
16.736887134227963
4.711100328574186e-07
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
...
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08

so, the iterations get exhausted without reaching the set tolerance of 1e-12. With the current algo, the floating point error on x - delta triggers the early exit, but the exit condition was not actually reached.

I found the easiest way to fix this is by setting a better guess. Simply doing something like this:

inflow = sum(x for x in values if x > 0)
outflow = -sum(x for x in values if x < 0)
guess = 0.1 if inflow > outflow else inflow / outflow - 1

fixes the convergence problem (by setting an initial guess that is negative if the outflows are larger than the inflows):

-0.9999999999202436
4.407969309258772e-12
4.208982322158506e-12
4.017399791083582e-12
3.832991104009405e-12
3.6555317783074664e-12
3.4848033205951103e-12
3.3205930897771546e-12
3.16269416317344e-12
3.0109052056699856e-12
2.8650303418619812e-12
2.7248790311801247e-12
2.5902659460126176e-12
2.461010852855957e-12
2.3369384965514225e-12
2.2178784876925083e-12
2.1036651933242854e-12
1.9941376311010488e-12
1.8891393671276444e-12
1.7885184177871862e-12
1.6921271559599094e-12
1.5998222221745013e-12
1.5114644414171902e-12
1.4269187465742507e-12
1.346054109828449e-12
1.2687434838096503e-12
1.1948637549757807e-12
1.1242957126649819e-12
1.0569240386559779e-12
9.92637324125317e-13

Would it be acceptable to change the default value of guess to None and wrap the above in that, so that if guess is not passed then it can get a nice starting place?

jlopezpena · 2023-02-16T16:01:58Z

Actually, setting the guess to inflow / outflow - 1 seems to provide similar if not faster convergence in the positive case as well. Is there a strong reason to keep the 0.1 default value?

Kai-Striega · 2023-02-18T02:08:47Z

Thanks for taking the time to investigate. I think we're getting somewhere.

With the current algo, the floating point error on x - delta triggers the early exit, but the exit condition was not actually reached.

I see. Let's consider this a bug.

Actually, setting the guess to inflow / outflow - 1 seems to provide similar if not faster convergence in the positive case as well.

Please give me some time to experiment with it myself, but, I like the idea. The current version seems to have some funny results at times, having a better guess could help with that.

Is there a strong reason to keep the 0.1 default value?

When I rewrote IRR, I wanted to match what Google Sheets/Excel had. Is that a good reason? Probably not. We haven't made a release of this version of the IRR, so I'm happy to change it and get it right come release time.

Kai-Striega · 2023-02-18T11:45:17Z

inflow = sum(x for x in values if x > 0)
outflow = -sum(x for x in values if x < 0)
guess = inflow / outflow - 1

I've played around with this a bit. It seems to work well for me too. Would you like to implement it and see if that helps as part of this PR?

I guess this could be tided up/optimised a bit by using NumPy functions, but I don't see this as being the bottleneck anyway. E.g. something like this (untested code)

positive_cashflow = values > 0
inflow = values.sum(where=positive_cashflow)
outflow = -values.sum(where=~positive_cashflow)

jlopezpena · 2023-02-20T09:24:15Z

I've played around with this a bit. It seems to work well for me too. Would you like to implement it and see if that helps as part of this PR?

Sure, I will update the PR with the changes.

While working at this I also thought of getting rid of the denominators altogether, by using g = 1 + eirr instead of x and multiplying the whole IRR formula by g^N, which works because we are solving "Net present value equal to 0". In practice, this is nothing but reversing the order of the values array when creating the polynomial (and adjusting the return value, of course). I think this might help with convergence in some situations where the values of x get out of hand, but might hinder it in others. Perhaps it would be useful to have both implementations and let the user pass a computation method keyword to pick if the default doesn't seem to work?

jlopezpena · 2023-02-20T10:53:55Z

Committed the proposed changes. Even if performance is slightly hurt in some cases I believe it makes sense to use the gain g instead of x for stability, as g can never get unreasonably large

Kai-Striega · 2023-02-22T09:04:45Z

Sorry for taking so long to respond, I've been pretty bust lately – thank you for being patient.

I really like what I'm seeing here. The PR looks good, works in theory, and works in practice. In addition, the CI is green.

Perhaps it would be useful to have both implementations and let the user pass a computation method keyword to pick if the default doesn't seem to work?

I'm -0.5 on this. numpy-financial is on the cusp of not being maintained. I think this will add to the maintenance burden for any future contributors. In addition, it is supposed to support simple financial functions. So I think it will be fine to have only one method that works in the general case.

jlopezpena · 2023-02-22T09:36:03Z

So I think it will be fine to have only one method that works in the general case.

Makes sense. Then my suggestion would be to use the denominator-free expression, the rationale being that if there is a single method to compute something, then numerical stability takes precedence over pure performance. I also believe the trick of getting rid of the denominators is standard enough that maintainers/contributors shouldn't be baffled by it

Kai-Striega · 2023-02-22T10:59:12Z

Then my suggestion would be to use the denominator-free expression

I agree. I think the numerical stability trumps a minor performance gain.

Is there anything else you would like to add to this P.R.? I'm happy to merge.

jlopezpena · 2023-02-22T11:11:52Z

Cool! I am happy with the current PR state, feel free to merge!

Kai-Striega · 2023-02-23T00:49:39Z

Merged. Thanks @jlopezpena for taking the time to contribute

user799595 · 2023-03-06T14:38:36Z

I worry about the logic that this PR adds for the initial guess.

Consider a bond with coupon rate C. Its cashflows are

values = [-1, C, C, C, C, ... (1+C)]

The IRR of the above is C.

But the formula for the initial guess approximates it as NC where N is the number of coupon dates. For large N this isn't a good guess.

See

numpy-financial/numpy_financial/_financial.py

Line 763 in fb63b04

guess = inflow / outflow - 1

Kai-Striega reviewed Feb 14, 2023

View reviewed changes

Use heuristic for guess and reverse polynomial for stability

e50c610

Use numpy mask for inflows and outflows

accc214

Kai-Striega added bug Something isn't working enhancement New feature or request labels Feb 20, 2023

Kai-Striega merged commit fb63b04 into numpy:main Feb 22, 2023

user799595 mentioned this pull request Mar 6, 2023

Added better guess for IRR #61

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid redundant computations in IRR calculation #60

Avoid redundant computations in IRR calculation #60

jlopezpena commented Feb 13, 2023

Kai-Striega left a comment

jlopezpena commented Feb 14, 2023

jlopezpena commented Feb 14, 2023 •

edited

Loading

Kai-Striega commented Feb 14, 2023

jlopezpena commented Feb 14, 2023

jlopezpena commented Feb 14, 2023 •

edited

Loading

jlopezpena commented Feb 16, 2023 •

edited

Loading

jlopezpena commented Feb 16, 2023

Kai-Striega commented Feb 18, 2023

Kai-Striega commented Feb 18, 2023 •

edited

Loading

jlopezpena commented Feb 20, 2023 •

edited

Loading

jlopezpena commented Feb 20, 2023

Kai-Striega commented Feb 22, 2023

jlopezpena commented Feb 22, 2023 •

edited

Loading

Kai-Striega commented Feb 22, 2023

jlopezpena commented Feb 22, 2023

Kai-Striega commented Feb 23, 2023

user799595 commented Mar 6, 2023

Avoid redundant computations in IRR calculation #60

Avoid redundant computations in IRR calculation #60

Conversation

jlopezpena commented Feb 13, 2023

Kai-Striega left a comment

Choose a reason for hiding this comment

jlopezpena commented Feb 14, 2023

jlopezpena commented Feb 14, 2023 • edited Loading

Kai-Striega commented Feb 14, 2023

jlopezpena commented Feb 14, 2023

jlopezpena commented Feb 14, 2023 • edited Loading

jlopezpena commented Feb 16, 2023 • edited Loading

jlopezpena commented Feb 16, 2023

Kai-Striega commented Feb 18, 2023

Kai-Striega commented Feb 18, 2023 • edited Loading

jlopezpena commented Feb 20, 2023 • edited Loading

jlopezpena commented Feb 20, 2023

Kai-Striega commented Feb 22, 2023

jlopezpena commented Feb 22, 2023 • edited Loading

Kai-Striega commented Feb 22, 2023

jlopezpena commented Feb 22, 2023

Kai-Striega commented Feb 23, 2023

user799595 commented Mar 6, 2023

jlopezpena commented Feb 14, 2023 •

edited

Loading

jlopezpena commented Feb 14, 2023 •

edited

Loading

jlopezpena commented Feb 16, 2023 •

edited

Loading

Kai-Striega commented Feb 18, 2023 •

edited

Loading

jlopezpena commented Feb 20, 2023 •

edited

Loading

jlopezpena commented Feb 22, 2023 •

edited

Loading