Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid redundant computations in IRR calculation #60

Merged
merged 3 commits into from
Feb 22, 2023

Conversation

jlopezpena
Copy link
Contributor

The IRR computation adds and then subtract the previous iteration value before comparing to the tolerance. We can just compute the delta and compare that to the tolerance instead. This should also make the computation more robust if there is a large difference in magnitude between x and delta

The IRR computation adds and then subtract the previous iteration value before comparing to the tolerance. We can just compute the delta and compare that to the tolerance instead. This should also make the computation more robust if there is a large difference in magnitude between `x` and `delta`
Copy link
Member

@Kai-Striega Kai-Striega left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find! I'm going to do some digging on why it's failing the test - it should be an identical result. I'm not quite sure where the errors are being introduced from.

@jlopezpena
Copy link
Contributor Author

If x is very large and delta is very small there might be floating point errors that affect the outcome, so (x - delta) - x might not be actually the same as delta, but other than the numerical instability yeah, the result should be identical

@jlopezpena
Copy link
Contributor Author

jlopezpena commented Feb 14, 2023

I haven't looked in detail at the tests, but this is all but guaranteed to give you numerical problems:

v = [
            -3000.0,
            2.3926932267015667e-07,
            4.1672087103345505e-16,
            5.3965110036378706e-25,
            5.1962551071806174e-34,
            3.7202955645436402e-43,
            1.98061711632469e-52,
            7.8393517651814181e-62,
            2.3072565113911438e-71,
            5.0491839233308912e-81,
            8.2159177668499263e-91,
            9.9403244366963527e-101,
            8.942410813633967e-111,
            5.9816122646481191e-121,
            2.97309031844241e-131,
            1.1002067043497954e-141,
            3.02528765638021e-152
            6.1854121948207909e-163,
            9.40329800153301e-174,
            1.0629218520017728e-184,
            8.9337141847171845e-196,
            5.5830607698467935e-207,
            2.5943122036622652e-218,
            8.9635842466507006e-230,
            2.3027710094332358e-241,
            ...
        ]

@Kai-Striega
Copy link
Member

but this is all but guaranteed to give you numerical problems:

I agree, however, we went from that test passing to it failing. I'd like to investigate what those numerical problems are. Looking at the original issue #15 it looks like it was an issue with the old irr solver, not with the most recent newton's method solver.

@jlopezpena
Copy link
Contributor Author

I think so. For practical purposes the cashflow in the test is basically "all the initial payout is lost" so that would represent a loss of 100% (which doesn't have a properly defined behaviour under the assumption of "reinvesting the profits" that IRR is based at). So a solution to that is "something close to -1" which seems to be right by the intended test. In the IRR formula, when eirr is close to -1 the transformed variable 1 / (1 + err) (which is referred to as x in the code) is close to infinity. So what I think is happening is that in the current code the x - delta part becomes just x because of the floating point precision thing with x being very large, so (x - delta) - x becomes 0 and the algorithm stops. With this PR, delta might not be below the tolerance, and the number of iterations might be exhausted

@jlopezpena
Copy link
Contributor Author

jlopezpena commented Feb 14, 2023

A potential solution is to use a relative tolerance instead of an absolute one, but we are dealing with a very edge case here. Alternatively, we can have a breakout that returns -1 if x is larger than a certain threshold

@jlopezpena
Copy link
Contributor Author

jlopezpena commented Feb 16, 2023

I've been doing some further digging on this, and found out that, as suspected, the IRR algorithm is NOT converging for the test case (neither for the original nor for my proposed change). This can be seen by inserting a print(delta) in the loop and then running test_gh_15:

-12538172285.538937
1053069465.3266779
989220413.1617088
928321332.590804
870269688.6044048
814967045.1286157
762319263.6737074
712236751.376529
664634679.6006484
619432894.2146667
576554675.9649655
535921974.7416782
497440630.96024746
460958230.0502432
426148959.4876158
392208772.1405912
357083216.30890393
315714469.10958934
257360879.981704
168611272.6382977
64440716.36582911
7770992.814447409
100879.09078333691
16.736887134227963
4.711100328574186e-07
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
...
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08
-4.2828184805219944e-08

so, the iterations get exhausted without reaching the set tolerance of 1e-12. With the current algo, the floating point error on x - delta triggers the early exit, but the exit condition was not actually reached.

I found the easiest way to fix this is by setting a better guess. Simply doing something like this:

inflow = sum(x for x in values if x > 0)
outflow = -sum(x for x in values if x < 0)
guess = 0.1 if inflow > outflow else inflow / outflow - 1

fixes the convergence problem (by setting an initial guess that is negative if the outflows are larger than the inflows):

-0.9999999999202436
4.407969309258772e-12
4.208982322158506e-12
4.017399791083582e-12
3.832991104009405e-12
3.6555317783074664e-12
3.4848033205951103e-12
3.3205930897771546e-12
3.16269416317344e-12
3.0109052056699856e-12
2.8650303418619812e-12
2.7248790311801247e-12
2.5902659460126176e-12
2.461010852855957e-12
2.3369384965514225e-12
2.2178784876925083e-12
2.1036651933242854e-12
1.9941376311010488e-12
1.8891393671276444e-12
1.7885184177871862e-12
1.6921271559599094e-12
1.5998222221745013e-12
1.5114644414171902e-12
1.4269187465742507e-12
1.346054109828449e-12
1.2687434838096503e-12
1.1948637549757807e-12
1.1242957126649819e-12
1.0569240386559779e-12
9.92637324125317e-13

Would it be acceptable to change the default value of guess to None and wrap the above in that, so that if guess is not passed then it can get a nice starting place?

@jlopezpena
Copy link
Contributor Author

Actually, setting the guess to inflow / outflow - 1 seems to provide similar if not faster convergence in the positive case as well. Is there a strong reason to keep the 0.1 default value?

@Kai-Striega
Copy link
Member

Thanks for taking the time to investigate. I think we're getting somewhere.

With the current algo, the floating point error on x - delta triggers the early exit, but the exit condition was not actually reached.

I see. Let's consider this a bug.

Actually, setting the guess to inflow / outflow - 1 seems to provide similar if not faster convergence in the positive case as well.

Please give me some time to experiment with it myself, but, I like the idea. The current version seems to have some funny results at times, having a better guess could help with that.

Is there a strong reason to keep the 0.1 default value?

When I rewrote IRR, I wanted to match what Google Sheets/Excel had. Is that a good reason? Probably not. We haven't made a release of this version of the IRR, so I'm happy to change it and get it right come release time.

@Kai-Striega
Copy link
Member

Kai-Striega commented Feb 18, 2023

inflow = sum(x for x in values if x > 0)
outflow = -sum(x for x in values if x < 0)
guess = inflow / outflow - 1

I've played around with this a bit. It seems to work well for me too. Would you like to implement it and see if that helps as part of this PR?

I guess this could be tided up/optimised a bit by using NumPy functions, but I don't see this as being the bottleneck anyway. E.g. something like this (untested code)

positive_cashflow = values > 0
inflow = values.sum(where=positive_cashflow)
outflow = -values.sum(where=~positive_cashflow)

@jlopezpena
Copy link
Contributor Author

jlopezpena commented Feb 20, 2023

I've played around with this a bit. It seems to work well for me too. Would you like to implement it and see if that helps as part of this PR?

Sure, I will update the PR with the changes.

While working at this I also thought of getting rid of the denominators altogether, by using g = 1 + eirr instead of x and multiplying the whole IRR formula by g^N, which works because we are solving "Net present value equal to 0". In practice, this is nothing but reversing the order of the values array when creating the polynomial (and adjusting the return value, of course). I think this might help with convergence in some situations where the values of x get out of hand, but might hinder it in others. Perhaps it would be useful to have both implementations and let the user pass a computation method keyword to pick if the default doesn't seem to work?

@jlopezpena
Copy link
Contributor Author

Committed the proposed changes. Even if performance is slightly hurt in some cases I believe it makes sense to use the gain g instead of x for stability, as g can never get unreasonably large

@Kai-Striega Kai-Striega added bug Something isn't working enhancement New feature or request labels Feb 20, 2023
@Kai-Striega
Copy link
Member

Sorry for taking so long to respond, I've been pretty bust lately – thank you for being patient.

I really like what I'm seeing here. The PR looks good, works in theory, and works in practice. In addition, the CI is green.

Perhaps it would be useful to have both implementations and let the user pass a computation method keyword to pick if the default doesn't seem to work?

I'm -0.5 on this. numpy-financial is on the cusp of not being maintained. I think this will add to the maintenance burden for any future contributors. In addition, it is supposed to support simple financial functions. So I think it will be fine to have only one method that works in the general case.

@jlopezpena
Copy link
Contributor Author

jlopezpena commented Feb 22, 2023

So I think it will be fine to have only one method that works in the general case.

Makes sense. Then my suggestion would be to use the denominator-free expression, the rationale being that if there is a single method to compute something, then numerical stability takes precedence over pure performance. I also believe the trick of getting rid of the denominators is standard enough that maintainers/contributors shouldn't be baffled by it

@Kai-Striega
Copy link
Member

Then my suggestion would be to use the denominator-free expression

I agree. I think the numerical stability trumps a minor performance gain.

Is there anything else you would like to add to this P.R.? I'm happy to merge.

@jlopezpena
Copy link
Contributor Author

Cool! I am happy with the current PR state, feel free to merge!

@Kai-Striega Kai-Striega merged commit fb63b04 into numpy:main Feb 22, 2023
@Kai-Striega
Copy link
Member

Merged. Thanks @jlopezpena for taking the time to contribute

@user799595
Copy link

I worry about the logic that this PR adds for the initial guess.

Consider a bond with coupon rate C. Its cashflows are

values = [-1, C, C, C, C, ... (1+C)]

The IRR of the above is C.

But the formula for the initial guess approximates it as NC where N is the number of coupon dates. For large N this isn't a good guess.

See

guess = inflow / outflow - 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants