Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Results for Extra Floating Functions #108

Open
julmb opened this issue Mar 6, 2024 · 19 comments
Open

Incorrect Results for Extra Floating Functions #108

julmb opened this issue Mar 6, 2024 · 19 comments

Comments

@julmb
Copy link
Contributor

julmb commented Mar 6, 2024

The library calculates incorrect results for at least some of the extra functions log1p, expm1, log1pexp, and log1mexp of the Floating class.

For example:

>>> import Numeric.AD
>>> log1pexp <$> [-1000, -100, -10, -1, 0, 1, 10, 100, 1000]
>>> fst . diff' log1pexp <$> [-1000, -100, -10, -1, 0, 1, 10, 100, 1000]
[0.0,3.720075976020836e-44,4.539889921686465e-5,0.31326168751822286,0.6931471805599453,1.3132616875182228,10.000045398899218,100.0,1000.0]
[0.0,0.0,4.5398899216870535e-5,0.31326168751822286,0.6931471805599453,1.3132616875182228,10.000045398899218,100.0,Infinity]

Near zero, there are only mild inaccuracies, but further away, the results are completely wrong.

I believe that this is due to missing definitions for these functions in the type class instances. This causes them to use the default implementations for those functions, which are pretty bad. For example, log1pexp x = log1p (exp x), which exhausts the precision of Double quite quickly.

I will do some further testing and then most likely start working on a pull request.

@RyanGlScott
Copy link
Collaborator

I believe this was addressed by #111, so I'll close this. Please re-open if there is more to do here.

@julmb
Copy link
Contributor Author

julmb commented Mar 12, 2024

All good, thank you very much!

@julmb
Copy link
Contributor Author

julmb commented Mar 14, 2024

Welp, looks like I will be working on this some more:

>>> import Numeric.AD
>>> diff (diff log1pexp) (-1000)
NaN

This is after #111 has been applied. Is there a way to reopen this issue? As far as I am concerned, the topic still applies. I will make a fresh PR though.

@RyanGlScott RyanGlScott reopened this Mar 14, 2024
@julmb
Copy link
Contributor Author

julmb commented Mar 14, 2024

The problem here seems to be with derivatives of reciprocals of large numbers. This problem is not unique to the new implementations of the extra floating-point functions in #111. For example:

>>> import Numeric.AD
>>> diff' (recip . exp) 1000
>>> diff' (\ x -> recip $ x * x) 1e400
(0.0,NaN)
(0.0,NaN)

In both cases, the function evaluates to recip Infinity, which is just 0 and that would be fine. The derivatives also exist and are well-behaved. However, they are calculated as follows:
$$f(x) = \frac{1}{g(x)} \quad f'(x) = -\frac{g'(x)}{g(x)^2}$$
So we end up with - exp x / exp x ^ 2 and - 2 * x / x ^ 4 respectively. This evaluates to Infinity / Infinity = NaN.

This seems to be a fundamental problem for the ad library. My guess is that it does not usually come up as one would have to be operating with extreme values. However, with log1pexp being just the Softplus function, it is quite reasonable to evaluate it at values like -1000 or 1000, so it is easy to run into this issue.

@cartazio
Copy link
Collaborator

cartazio commented Mar 14, 2024 via email

@julmb
Copy link
Contributor Author

julmb commented Mar 14, 2024

How many digits of excess precision would we need to calculate it correctly?

I am not sure what you mean by that, but the derivative of log1pexp includes exp in the denominator. With the largest Double being on the order of $10^{308}$, for a calculation to not overflow to Infinity, we would have to represent numbers as large as $e^{10^{308}}$.

This is neither possible nor necessary, as log1pexp is nearly constant and linear at the far ends, respectively. The instances for Float and Double use this property to define log1pexp piecewise. However, we do not have that option when defining an instance like Floating a => Floating (Forward a) as there is no Ord a constraint in scope. I have not yet found a solution to this problem.

@julmb
Copy link
Contributor Author

julmb commented Mar 14, 2024

I might be overlooking something, but none of this would even be an issue in the first place if log1pexp and log1mexp were just functions of type (Ord a, Floating a) => a -> a rather than members of the Floating type class.

  1. Their default implementations only yield correct (or even any!) results for a tiny fraction of the domains of these functions
  2. The only two instances that actually implement them are Float and Double and they both use the same (Ord a, Floating a) => a -> a implementation that is partly duplicated
  3. We would get correct derivatives for free from these implementations, without maintenance burden or dealing with issues like https://gitlab.haskell.org/ghc/ghc/-/issues/17125

I understand why log1p and expm1 are class members, they are translated to primops during compilation. But I feel like log1pexp and log1mexp should just be ordinary functions.

@julmb
Copy link
Contributor Author

julmb commented Mar 15, 2024

Yesterday I was wondering why log1pexp and log1mexp were members of the Floating class. So today I took a deep dive and read through https://gitlab.haskell.org/haskell/prime/-/wikis/libraries/proposals/expand-floating and https://mail.haskell.org/pipermail/libraries/2014-April/022667.html to understand the ideas and motivations behind the design.

It turns out that one of the motivations was to support instances of vector spaces (https://mail.haskell.org/pipermail/libraries/2014-April/022741.html). In the mean time, I did overlook the fact that types like Complex a could not use the Ord-based implementations. So there are good reasons for log1pexp and log1mexp to be members of the Floating class. However, base does not seem to make good use of this capability, as instances like Complex a still use the default implementations:

ghci> log1pexp (1000 :+ 0)
NaN :+ NaN

It also looks like @ekmett, who orignally made the proposal, already had ad in mind when coming up with it. However, it seems like he never got around to implementing the new functions in the AD mode instances.

@julmb
Copy link
Contributor Author

julmb commented Mar 15, 2024

As for the actual issue at hand here, I have been considering several ideas:

  1. Implement log1*exp as lift1 log1*exp d in such a way that d and all the higher-order derivatives of it are well-behaved throughout the entire domain. I do not know how to do this.
  2. Implement log1*exp piece-wise to exploit behavior near the extremes of the domain. I do not know how to do this without having access to an Ord instance.
  3. Add an Ord a constraint to the Floating instances of AD modes. This would mean we can no longer have Floating (Forward (Complex Double)).
  4. Add a warning about higher-order derivatives of log1*expOrd to the documentation and resign to using out-of-class implementations like log1*expOrd :: (Ord a, Floating a) => a -> a for applications that require AD. This would make functions that use AD needlessly less general since there are good implementations for types like Complex a despite the lack of an Ord instance. The whole point of having log1*exp in the Floating class is to enable abstraction over these different implementations.

It sounds like this should not be such a hard problem, but I am pretty lost and do not really see a good solution here. I feel like the issue boils down to this:

  • Concrete instances like Floating Double are easy since they have access to all the tools of the concrete types
  • Newtype-like instances like Floating a => Floating (Down a) are easy since they just pass through the values
  • The instance RealFloat a => Floating (Complex a) requires the base type to be RealFloat in order to get access to the necessary tools
  • The instance Floating a => Floating (Forward a) has to squeeze blood from a stone. It has to implement nontrivial functions using nothing but a Floating instance on the base type. And we cannot really strengthen the constraints on the base type since users really do expect to be able to use instances like Floating (Forward (Complex Double)). In a sense, we have to come up with a single implementation for the derivative of log1*expOrd that works accurately for both Double and Complex Double simultaneously.

Sorry if I am rambling a bit. Just trying to put my thoughts down before the weekend since I will not have time to work on this until next week. I am curious if anyone has either more ideas or thoughts on the ideas listed here already.

@ekmett
Copy link
Owner

ekmett commented Mar 16, 2024

I think one part of the answer might be to provide a custom Complex type that has the corrected instance if base can't be brought to fix it and then use it to shame base into correct behavior.

I also vaguely recall that log1mexp might have had wrongly defined behavior in base, which is yet another issue with getting these fixed up to a fully usable state.

@julmb
Copy link
Contributor Author

julmb commented Mar 18, 2024

Thank you for chiming in!

I think one part of the answer might be to provide a custom Complex type that has the corrected instance if base can't be brought to fix it and then use it to shame base into correct behavior.

I am not using Complex in any of my projects, I just came across this deficiency while trying to understand the design decisions made here. As such, I do not have the time to spearhead an effort to fix the Complex instances, but maybe someone else will.

I also vaguely recall that log1mexp might have had wrongly defined behavior in base, which is yet another issue with getting these fixed up to a fully usable state.

Maybe you mean this https://gitlab.haskell.org/ghc/ghc/-/issues/17125?

I think the Float and Double instances in base are in okay shape for now, although maybe they could use some cleanup. I just do not know how to make proper AD mode instances without having access to an Ord constraint.

@julmb
Copy link
Contributor Author

julmb commented Mar 18, 2024

In https://mail.haskell.org/pipermail/libraries/2014-April/022741.html, @ekmett writes:

Why incorporate log1pexp, log1mexp? They have to be in the class or can't be properly implemented pointwise for vector spaces.

Could you elaborate on this (I know it has been a while)? I am trying to understand the possible use cases for having log1pexp and log1mexp as part of the Floating class.

Part of me is still convinced that they should not be type class members but instead separate functions. They cause issues in ad, the instances for Floating and Double awkwardly share the same implementation, and in the one place where I can see the usefulness in Floating (Complex a), apparently nobody needed them badly enough to override the default implementations.

If there are no compelling use cases, I might consider making a CLC proposal to demote them to regular functions with an Ord constraint.

@ekmett
Copy link
Owner

ekmett commented Mar 19, 2024

It was a bit of a nightmare even getting things to the half-implemented state they are in base. I stepped down from the core libraries committee sometime in the middle of the process, and that probably didn't help get things across the finish line.

The comment about vector spaces has to do with the fact that, say, V3 Float needs to borrow the definition for log1pexp from Float to get it to work, while V3 (V3 Float) needs to borrow it from V3 Float. This is done pointwise, not using Ord. Ord on V3 Float is useless to this operation. Instead it should lift the underlying definition pointwise. It looks like I never released a version of linear that does so.

That's on me.

The folks that get complicated is when you have things twisted together, e.g. Complex numbers or Quaternions or AD or Compensated arithmetic. Complex a (done properly) needs information not supplied by Ord, even if they never got around to implementing Complex properly in base. (I vaguely recall that there are a bunch of other issues with Complex as specified, but let's keep going here.) There you have to actually know something more about what you are lifting, which is what you are encountering here. I'd be rather vehemently against going back to a world where I can't ever fix these functions on vector spaces and have to implement non-trivial numerical algorithms either monomorphically or with some one-off ad-hoc class instead.

I needed log1p to get rid of the problem with the taylor series starting with 1 + ... drowning out all information when x is small. From there everything else is gravy.

log1pexp = log1p . exp is definable and is actually the best you can do unless it would cause exp x to overflow. Doing anything better requires knowing more about the numeric type you are working with. If you know nothing about the underlying type 'a' and just need to carve something apart, you could do worse than using this. It has the side-benefit of being branch-free, but that seems mostly like sour-grapes reasoning, defending the status quo, not something overtly desirable, except it does allow for expression types as arguments that can't be compared as if they were a number.

and log1mexp = log1p . negate . exp doesn't require Ord either. It is again only really to dodge that pesky overflow that you need to consider doing anything else.

For types that do offer Ord log1mexpOrd already exists and provides exactly what you'd be suggesting that base offer. (given the definition of log1mexp with the flipped sign relative to c that haskell adopted to my consternation).

It is a known issue that any class that seeks to use this extra information e.g. Floating (AD a) is going to have to make a choice, but I didn't want that issue to extend down to the base Float type or preclude V3 Float from getting it right. Both of those cleanly bootstrap. It is only the hard stuff like AD which suffers for the conceit that it can do things in full generality. ;)

Now we can try to tease apart what AD should do here.

Consider the following cases: I'd like to do something sensible for each -- even if that involves splitting Forward into multiple types, or applying an argument tag or separate set of modules to indicate chosen behavior.

  • Forward Float -- this is going to use the log1p . exp definition, right now, which overflows "unnecessarily" for large X. It'd be kind of nice to be able to lean on the Ord based version but it makes it branchy. Observing that we only really care of the exp overflows, one could replace the Ord with something that just checks for NaN, after doing exp, which only needs Eq and a self-comparison, but risks wasted work relative to the Ord definition. The problem is that this wouldn't work for expression types.
  • Forward (V3 Float) -- this properly bootstraps Floating (V3 Expr) pointwise then bootstraps something for Forward that uses the default definition. This is actually the best that can be done without knowing the thing we are lifting is a vector space and somehow letting forward work pointwise over its argument.
  • V3 (Forward Float) -- this yields the same state, effectively.

But now let's consider a symbolic "Expr" type:

  • Forward Expr -- Expr lacks a suitable Ord instance, so it'd have to use the boring definition anyways.
  • Forward (V3 Expr) -- this properly bootstraps Floating (V3 Expr) pointwise
  • V3 (Forward Expr) -- this is also the best that can be done.

My personal primary usecase for AD involves about an 80/20 mix of doing 80% Expr-style AD calculations and 20% actual simple floating point operations.

Losing the ability to handle expression types is a complete nonstarter as literally it is the reason I wrote and maintain the library. Situational improvements to the other case are a nice to have, though, and I'm more than open to adding modules that are more robust for that limited scenario or to finding nice ways to tag types to allow us to special case reasoning for it. e.g. we have ForwardDouble. That could benefit, obviously. With a little elbow grease, we could roll more specialized instances, or come up with a general pattern that captures this across multiple numerical types. The best mechanism I have for doing this is backpack though, and backpack runs afoul of stack never having gotten around to supporting it.

The downside of the status quo is that we don't recover from some NaNs that we otherwise could recover from in the case where you use this on large floating point values. Do I have that correct?

@julmb
Copy link
Contributor Author

julmb commented Mar 21, 2024

First of all, thank you for your reply. From what I know, you are not as active in the Haskell community anymore, so I really appreciate you taking the time to give such a detailed response.

The comment about vector spaces has to do with the fact that, say, V3 Float needs to borrow the definition for log1pexp from Float to get it to work, while V3 (V3 Float) needs to borrow it from V3 Float. This is done pointwise, not using Ord. Ord on V3 Float is useless to this operation. Instead it should lift the underlying definition pointwise.

Alright, I think I understand the problem. Although it feels like this approach is difficult to scale, as every function that does any kind of comparison would need to be moved to one of the numerical type classes.

log1pexp = log1p . exp is definable and is actually the best you can do unless it would cause exp x to overflow.

and log1mexp = log1p . negate . exp doesn't require Ord either. It is again only really to dodge that pesky overflow that you need to consider doing anything else.

The implementation log1mexp = log1p . negate . exp suffers from loss of precision in addition to the overflow issue:

ghci> log1p . negate . exp <$> [-1e-16, -1.5e-16, -1e-20]
[-36.7368005696771,-36.7368005696771,-Infinity]
ghci> log1mexp <$> [-1e-16, -1.5e-16, -1e-20]
[-36.841361487904734,-36.43589637979657,-46.051701859880914]
  • Forward Float -- this is going to use the log1p . exp definition, right now, which overflows "unnecessarily" for large X.

The downside of the status quo is that we don't recover from some NaNs that we otherwise could recover from in the case where you use this on large floating point values. Do I have that correct?

Before #111, it was worse than what you describe here, with overflow, underflow, and catastrophic loss of precision. This was because all four of the extra functions were using their default implementations, both for evaluation and for the first derivative.

After #111, there should be neither loss of precision nor overflows in any of the extra functions or their first derivatives. Only the higher-order derivatives have overflows for reasons described in #108 (comment).

For types that do offer Ord log1mexpOrd already exists and provides exactly what you'd be suggesting that base offer.

Yes, and that is probably what I will be using for my application. It is just regrettable that the functions in the Floating type class do not "just work", and that instead people need to understand a great deal of implementation details in order to pick the correct function to use.

It is only the hard stuff like AD which suffers for the conceit that it can do things in full generality. ;)

This is probably a good insight. It is easy to become accustomed to AD things getting everything right when that is in fact a difficult problem to solve. Maybe it should be more suprising that it works so well in almost all other cases.

My personal primary usecase for AD involves about an 80/20 mix of doing 80% Expr-style AD calculations and 20% actual simple floating point operations.

Losing the ability to handle expression types is a complete nonstarter as literally it is the reason I wrote and maintain the library. Situational improvements to the other case are a nice to have, though, and I'm more than open to adding modules that are more robust for that limited scenario or to finding nice ways to tag types to allow us to special case reasoning for it. e.g. we have ForwardDouble. That could benefit, obviously. With a little elbow grease, we could roll more specialized instances, or come up with a general pattern that captures this across multiple numerical types. The best mechanism I have for doing this is backpack though, and backpack runs afoul of stack never having gotten around to supporting it.

My use case is entirely floating point numerics for an optimization problem. But I am not at all suggesting we drop support for doing AD on arbitrary types. I was just hoping there would be a way to have both.

Unfortunately I do not have the time to do any ambitious restructuring of ad. I am already way behind on my project after trying to adjust the Floating instance upstream in ad rather than just rolling my own log1pexpOrd or similar implementations.

@ekmett
Copy link
Owner

ekmett commented Mar 21, 2024

I do think we could do a bit better.

e.g. letting log1pexp just delegate to the log1pexp of the underlying number type for the primal would help a lot in both worlds.

I think the "right" fix for this might be to add variants to the constructions here in a separate subdir, maybe "Numeric.AD.Numeric.X.Y.Z" that handles these maybe by using some awful class to describe the derivatives of them that can be applied to the argument type, but that's like 30+ modules to cut and paste. If I got clever with backpack the changes would be contained to a dozen lines, but then I break stack support.

We could do better with another include file hack like we use to generate most of the instances, but then the code becomes even harder to read.

@julmb
Copy link
Contributor Author

julmb commented Mar 21, 2024

I do think we could do a bit better.

e.g. letting log1pexp just delegate to the log1pexp of the underlying number type for the primal would help a lot in both worlds.

Yes, that is exactly what #111 does.

I think the "right" fix for this might be to add variants to the constructions here in a separate subdir, maybe "Numeric.AD.Numeric.X.Y.Z" that handles these maybe by using some awful class to describe the derivatives of them that can be applied to the argument type, but that's like 30+ modules to cut and paste. If I got clever with backpack the changes would be contained to a dozen lines, but then I break stack support.

We could do better with another include file hack like we use to generate most of the instances, but then the code becomes even harder to read.

I have never used backpack, but all of these options sound like they would add a considerable amount of complexity for perhaps not too great of a payoff. But that is not for me to decide.

@ekmett
Copy link
Owner

ekmett commented Mar 21, 2024

I basically already paid this complexity once to add the Rank1 variants so that folks who wanted to skip the safety of the API in exchange for being able to do things with multiple passes, etc. could do so if they wanted, so this isn't out of line with what I've been willing to do in the past.

@julmb
Copy link
Contributor Author

julmb commented Mar 22, 2024

Alright, sounds good to me.

So my conclusion is that we are currently doing the best we can be doing with the current architecture. As mentioned earlier, I do not have the time to do any big changes at the moment, so I will use log1pexpOrd since that will work for my application. As far as I am concerned, this issue could be closed, unless you want to keep it open to track a potential library design change.

@ekmett
Copy link
Owner

ekmett commented Mar 22, 2024

I'll keep it open to nag me for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants