Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stacktrace misses calling function if exception is in the expression that is the value to be returned by that function #6357

Closed
pragdave opened this issue Jul 18, 2017 · 18 comments

Comments

@pragdave
Copy link
Contributor

Environment

  • 1.6.0-dev/20
  • OS X

Symptoms

Function a calls function b. If the call to b is in the expression that could return a value from a, then a does not appear in the stacktrace. If the call is not a final expression in a, then a does appear:

defmodule One do
  def boom do
    raise "boom"
  end
end


defmodule Two do
  def call_boom do
    One.boom
    IO.puts "never get here"
  end
end

# iex(1)> Two.call_boom
# ** (RuntimeError) boom
#     (check) lib/t.ex:3: One.boom/0
#     (check) lib/t.ex:9: Two.call_boom/0  <<<<<<<<<<<<<


############################################################

defmodule Three do
  def call_boom do
    One.boom()
  end
end

# iex(1)> Three.call_boom
# ** (RuntimeError) boom
#     (check) lib/t.ex:3: One.boom/0     <<< no module Three

############################################################

defmodule Four do
  def call_boom do
    if true do
      One.boom
    else
      IO.puts "never get here"
    end
  end
end

# iex(1)> Four.call_boom 
# ** (RuntimeError) boom
#     (check) lib/t.ex:3: One.boom/0      <<< or 4
@michalmuskala
Copy link
Member

This is a consequence of the last call optimisation. Every function call in the tail position means the stack frame for the calling function is destroyed. I'm not sure there can be anything done about it.

@pragdave
Copy link
Contributor Author

pragdave commented Jul 18, 2017 via email

@michalmuskala
Copy link
Member

michalmuskala commented Jul 18, 2017

It's not possible to change it without breaking the semantics of infinite recursive loops.

If anything, this is an issue with Erlang/OTP itself.

@josevalim
Copy link
Member

josevalim commented Jul 18, 2017

Any recursive Elixir process (which is the majority of them) rely on this behaviour to avoid stack growth. It is more powerful than tail call optimization because you should be able to jump between functions and not be stuck on a single name/arity. And, as @michalmuskala said, it is not something we can change, this behaviour is part of the VM.

I personally think it could be interesting to enable it for some processes, for example the test process, but even doing so means function calls become more expensive in the whole VM, as we now need to check if this feature is enabled or not. I personally don't know how to implement this feature without incurring performance penalties in the VM.

On the positive side, the VM does provide tools though to address this issue. For example, someone could implement a trace/1 macro that receives an expression and traces all functions called and their arguments:

trace Foo.bar(1, 2, 3)

@pragdave
Copy link
Contributor Author

pragdave commented Jul 18, 2017 via email

@michalmuskala
Copy link
Member

michalmuskala commented Jul 18, 2017

One thing to consider is that every loop is implemented as a recursive function. So, for example, for any error in an Enum.reduce or Enum.map that would happen inside the fun in some later element, all stack entries you'd get would be those recursive calls to map instead of anything useful that happened before. It could very well make understanding what's happening harder rather than easier. It's extremely hard to figure out which one stack traces are useful and which ones aren't.

I also don't agree that dev & test would be fine. This would dramatically change memory usage pattern of the program depending on the environment. An explicit annotation has also downsides - it could become this "magical" thing you use to make your program faster and less memory hungry. Not unlike strict record fields in Haskell or cut operator in Prolog (funny thing - both use !).

Either way, doing any of it would either require changes to the OTP itself or writing a complete compiler for Elixir right down to the BEAM bytecode (that's the only place in the compiler, where you have control over this behaviour) - this is a huge endeavour.

@OvermindDL1
Copy link
Contributor

  • Have a compilation flag to disable tail call elimination.

The system would die, there are a monstrous amount of infinite loops in the system, take every single GenServer as just one example.

The entire system is built for TCO, disabling that makes the BEAM non-functional in any useful way.

@michalmuskala
Copy link
Member

With all that I said, I fully agree that omitting the stack entries makes debugging significantly harder. I had to struggle with it myself a lot of times. I'm just not sure there's a way to fix this without breaking other things.

One of the most frustrating places for this is if you try to create a generic error function when you implement the bang version of a function. Suddenly you're losing the stack from the actual function and instead, you only get the error one in the trace.

@pragdave
Copy link
Contributor Author

pragdave commented Jul 18, 2017 via email

@OvermindDL1
Copy link
Contributor

Ah, that's exactly the kind of information I was looking for. What are you
citing? I'd love to dig deeper.

Well for GenServer you can just look at it's source (or :gen_server for erlang). It is actually a fascinating setup and shows how OTP processes truly work. They are just an infinite looping function that listens for messages and calls back in to your module. :-)

@pragdave
Copy link
Contributor Author

pragdave commented Jul 19, 2017 via email

@OvermindDL1
Copy link
Contributor

I was wondering what the source of your statement that TCE would cause the
system to "die" in development or test. I'd love to see the numbers you're
working from.

  1. Everything inside a stackframe stays allocated until the stackframe goes away.
  2. Take a gen_server, it infinite loops, if it did not TCO then its stackframes would never go away and it would build up endlessly until all memory was consumed.
  3. Once all memory was consumed, the VM would be killed.

And gen servers are not the only place infinite loops are used, they are everywhere in the system. The numbers bit is that 'RAM is limited' and 'The system would then eat infinite RAM'. ^.^;

@pragdave
Copy link
Contributor Author

pragdave commented Jul 19, 2017 via email

@OvermindDL1
Copy link
Contributor

OvermindDL1 commented Jul 19, 2017

Hmmm... how could we determine who's correct?

Easiest way to test, add a no-op try handler around such recursive calls in an infinite loop, those force the stack frames to always exist even in the case of TCO. :-)

@OvermindDL1
Copy link
Contributor

OvermindDL1 commented Jul 19, 2017

Which I just did:

Wrapping a tail call in a catch handler:

eheap_alloc: Cannot allocate 3936326656 bytes of memory (of type "heap").

Crash dump is being written to: erl_crash.dump...done

Making the call not be TCO'able by doing a no-op after its call:

eheap_alloc: Cannot allocate 3936326656 bytes of memory (of type "heap").

Crash dump is being written to: erl_crash.dump...done

@pragdave
Copy link
Contributor Author

pragdave commented Jul 19, 2017 via email

@OvermindDL1
Copy link
Contributor

If there's a function in the user's code where this is not true, then they
can flag it with compile: force_tce, and TCE will be reenabled for that
function in all environments. However, I can't imagine a circumstance in
which a well behaved program would need this.

They can already 'flag' a function to not do tco (just by doing some no-op operation after the last call or wrapping it in a try/catch, both of which are easily doable via a macro and testing the environment if it is :test or :dev or not). Thus if they want the full stack then they could already get it?

@tensiondriven
Copy link

If there's a function in the user's code where this is not true, then they
can flag it with compile: force_tce, and TCE will be reenabled for that
function in all environments. However, I can't imagine a circumstance in
which a well behaved program would need this.

They can already 'flag' a function to not do tco (just by doing some no-op operation after the last call or wrapping it in a try/catch, both of which are easily doable via a macro and testing the environment if it is :test or :dev or not). Thus if they want the full stack then they could already get it?

I've been looking for a way to disable TCO in either Erlang or Elixir but can't find a shred of info on this.. what am I missing? Can anyone point me to docs explaining how to disable TCO in Elixir? (Not a description of a technique that would work, but a concrete flag or existing dep which implements disabling TCO?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants