Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unhelpful error when job fails #305

Closed
anthonator opened this issue Aug 3, 2020 · 13 comments · Fixed by #308
Closed

Unhelpful error when job fails #305

anthonator opened this issue Aug 3, 2020 · 13 comments · Fixed by #308
Labels
kind:bug Something isn't working

Comments

@anthonator
Copy link
Contributor

Environment

  • Oban Version

2.0.0

  • PostgreSQL Version

12.3

  • Elixir & Erlang/OTP Versions (elixir --version)

Erlang/OTP 23 [erts-11.0.2] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [hipe]

Elixir 1.10.4 (compiled with Erlang/OTP 21)

Current Behavior

We're receiving an unhelpful error when a job fails to execute.

** (ErlangError) Erlang error: %{}\n    (elixir 1.10.4) lib/process.ex:765: Process.info/2\n    (oban 2.0.0) lib/oban/queue/executor.ex:223: Oban.Queue.Executor.current_stacktrace/0\n    (oban 2.0.0) lib/oban/queue/executor.ex:170: Oban.Queue.Executor.perform_inline/1\n    (oban 2.0.0) lib/oban/queue/executor.ex:152: Oban.Queue.Executor.perform_inline/1\n    (oban 2.0.0) lib/oban/queue/executor.ex:69: Oban.Queue.Executor.call/1\n    (elixir 1.10.4) lib/task/supervised.ex:90: Task.Supervised.invoke_mfa/2\n    (elixir 1.10.4) lib/task/supervised.ex:35: Task.Supervised.reply/5\n

We attempt to send errors to Sentry using a telemetry instrumenter but whenever we see this error nothing gets sent to Sentry.

Expected Behavior

Error messages/stacktraces are helpful and get sent to external error services via the telemetry integration

Additional Information

Job record

%Oban.Job{
  __meta__: #Ecto.Schema.Metadata<:loaded, "integration_shopify", "oban_jobs">,
  args: %{ ... },
  attempt: 5,
  attempted_at: ~U[2020-08-03 19:30:36.864663Z],
  attempted_by: ["integration_shopify@10.244.1.152", "events", "84kquhuh"],
  completed_at: nil,
  discarded_at: ~U[2020-08-03 19:30:36.881854Z],
  errors: [
    %{
      "at" => "2020-08-03T19:29:03.980467Z",
      "attempt" => 1,
      "error" => "** (ErlangError) Erlang error: %{}\n    (elixir 1.10.4) lib/process.ex:765: Process.info/2\n    (oban 2.0.0) lib/oban/queue/executor.ex:223: Oban.Queue.Executor.current_stacktrace/0\n    (oban 2.0.0) lib/oban/queue/executor.ex:170: Oban.Queue.Executor.perform_inline/1\n    (oban 2.0.0) lib/oban/queue/executor.ex:152: Oban.Queue.Executor.perform_inline/1\n    (oban 2.0.0) lib/oban/queue/executor.ex:69: Oban.Queue.Executor.call/1\n    (elixir 1.10.4) lib/task/supervised.ex:90: Task.Supervised.invoke_mfa/2\n    (elixir 1.10.4) lib/task/supervised.ex:35: Task.Supervised.reply/5\n"
    },
    %{
      "at" => "2020-08-03T19:29:21.881709Z",
      "attempt" => 2,
      "error" => "** (ErlangError) Erlang error: %{}\n    (elixir 1.10.4) lib/process.ex:765: Process.info/2\n    (oban 2.0.0) lib/oban/queue/executor.ex:223: Oban.Queue.Executor.current_stacktrace/0\n    (oban 2.0.0) lib/oban/queue/executor.ex:170: Oban.Queue.Executor.perform_inline/1\n    (oban 2.0.0) lib/oban/queue/executor.ex:152: Oban.Queue.Executor.perform_inline/1\n    (oban 2.0.0) lib/oban/queue/executor.ex:69: Oban.Queue.Executor.call/1\n    (elixir 1.10.4) lib/task/supervised.ex:90: Task.Supervised.invoke_mfa/2\n    (elixir 1.10.4) lib/task/supervised.ex:35: Task.Supervised.reply/5\n"
    },
    %{
      "at" => "2020-08-03T19:29:41.807279Z",
      "attempt" => 3,
      "error" => "** (ErlangError) Erlang error: %{}\n    (elixir 1.10.4) lib/process.ex:765: Process.info/2\n    (oban 2.0.0) lib/oban/queue/executor.ex:223: Oban.Queue.Executor.current_stacktrace/0\n    (oban 2.0.0) lib/oban/queue/executor.ex:170: Oban.Queue.Executor.perform_inline/1\n    (oban 2.0.0) lib/oban/queue/executor.ex:152: Oban.Queue.Executor.perform_inline/1\n    (oban 2.0.0) lib/oban/queue/executor.ex:69: Oban.Queue.Executor.call/1\n    (elixir 1.10.4) lib/task/supervised.ex:90: Task.Supervised.invoke_mfa/2\n    (elixir 1.10.4) lib/task/supervised.ex:35: Task.Supervised.reply/5\n"
    },
    %{
      "at" => "2020-08-03T19:30:04.986102Z",
      "attempt" => 4,
      "error" => "** (ErlangError) Erlang error: %{}\n    (elixir 1.10.4) lib/process.ex:765: Process.info/2\n    (oban 2.0.0) lib/oban/queue/executor.ex:223: Oban.Queue.Executor.current_stacktrace/0\n    (oban 2.0.0) lib/oban/queue/executor.ex:170: Oban.Queue.Executor.perform_inline/1\n    (oban 2.0.0) lib/oban/queue/executor.ex:152: Oban.Queue.Executor.perform_inline/1\n    (oban 2.0.0) lib/oban/queue/executor.ex:69: Oban.Queue.Executor.call/1\n    (elixir 1.10.4) lib/task/supervised.ex:90: Task.Supervised.invoke_mfa/2\n    (elixir 1.10.4) lib/task/supervised.ex:35: Task.Supervised.reply/5\n"
    },
    %{
      "at" => "2020-08-03T19:30:36.881864Z",
      "attempt" => 5,
      "error" => "** (ErlangError) Erlang error: %{}\n    (elixir 1.10.4) lib/process.ex:765: Process.info/2\n    (oban 2.0.0) lib/oban/queue/executor.ex:223: Oban.Queue.Executor.current_stacktrace/0\n    (oban 2.0.0) lib/oban/queue/executor.ex:170: Oban.Queue.Executor.perform_inline/1\n    (oban 2.0.0) lib/oban/queue/executor.ex:152: Oban.Queue.Executor.perform_inline/1\n    (oban 2.0.0) lib/oban/queue/executor.ex:69: Oban.Queue.Executor.call/1\n    (elixir 1.10.4) lib/task/supervised.ex:90: Task.Supervised.invoke_mfa/2\n    (elixir 1.10.4) lib/task/supervised.ex:35: Task.Supervised.reply/5\n"
    }
  ],
  id: 13,
  inserted_at: ~U[2020-08-03 19:29:03.844930Z],
  max_attempts: 5,
  priority: 0,
  queue: "events",
  scheduled_at: ~U[2020-08-03 19:30:35.986078Z],
  state: "discarded",
  tags: [],
  unique: nil,
  unsaved_error: nil,
  worker: "MyApp.Workers.CreateOrder"
}
@sorentwo
Copy link
Owner

sorentwo commented Aug 4, 2020

@anthonator Do you know what your job is returning that may cause this error? This looks like the result of returning {:error, %{}}, which wouldn't have any additional data and won't have a stacktrace available. I'm open to suggestions on how to improve the reporting here (a standard error?).

In the meantime you can improve the error reporting with an explicit match or by raising an error rather than returning a tuple.

@sorentwo sorentwo added the kind:bug Something isn't working label Aug 4, 2020
@anthonator
Copy link
Contributor Author

I guess the reason this was confusing is that it looks like Oban.Queue.Executor is failing. This may be normal but we also don't get errors sent to Sentry when we have a telemetry instrumenter set up and I thought this may be why.

@anthonator
Copy link
Contributor Author

anthonator commented Aug 4, 2020

Here's our instrumenter if that helps.

defmodule MyApp.Instrumenter do
  @moduledoc false

  def setup do
    :ok = Oban.Telemetry.attach_default_logger()

    events = []
    events = events ++ [:oban, :circuit, :trip]
    events = events ++ [:oban, :job, :exception]

    :telemetry.attach_many("myapp-instrumenter", [events], &handle_event/4, nil)
  end

  def handle_event([:oban, :circuit, :trip], _measure, meta, _) do
    Sentry.capture_exception(meta.error, stacktrace: meta.stracktrace, extra: meta)
  end

  def handle_event([:oban, :job, :exception], measure, meta, _) do
    extra = Map.take(meta, [:args, :id, :queue, :worker])
    extra = Map.merge(extra, measure)

    Sentry.capture_exception(meta.error, stracktrace: meta.stacktrace, extra: extra)
  end
end

@sorentwo
Copy link
Owner

sorentwo commented Aug 4, 2020

The default stacktrace is pretty much useless in this case. It is even potentially harmful because it doesn't come from application code at all and it looks like Oban itself is erroring.

Do you receive Sentry notices for other failures?

@anthonator
Copy link
Contributor Author

Yes, we receive Sentry errors in other places in the app.

@sorentwo
Copy link
Owner

sorentwo commented Aug 4, 2020

I meant do you receive sentry notices for other job failures. Though, looking at your config it seems like this is the only job you'd be reporting for.

@anthonator
Copy link
Contributor Author

Sorry, I'm not sure what you mean. Given my instrumenter above I'd expect any job that raises an error to get reported to Sentry.

At the moment we don't receive any failures in Sentry from Oban.

@sorentwo
Copy link
Owner

sorentwo commented Aug 5, 2020

Sorry, I'm not sure what you mean. Given my instrumenter above I'd expect any job that raises an error to get reported to Sentry.

At the moment we don't receive any failures in Sentry from Oban.

You're absolutely right. I don't see any issue with the instrumenter above, it looks correct for Oban 2.0 events. I have a hunch that something is causing your handler to detach and that's preventing the error reports. It's hard to say without doing some debugging.

@sorentwo
Copy link
Owner

sorentwo commented Aug 5, 2020

@anthonator It seems like this the result of changes in Sentry, getsentry/sentry-elixir#403. It no longer accepts non-exception values for capture_exception. The meta.error value may be any term, in which case you'll need to use capture_message.

@anthonator
Copy link
Contributor Author

Is there a way to get the exception from Oban so I can pass it to Sentry?

@sorentwo
Copy link
Owner

sorentwo commented Aug 5, 2020

In this case (and any other where the job returns {:error, some_value}) there isn't an exception. The value from the error tuple, some_value, is the value passed through as meta.error. That is why it gets formatted as ** (ErlangError) Erlang error: %{}, there isn't a proper exception.

Until the next release (and safely after that) you can use Exception.normalize/3 to ensure you have an exception to report.

  def handle_event([:oban, :job, :exception], measure, meta, _) do
    extra = Map.take(meta, [:args, :id, :queue, :worker])
    extra = Map.merge(extra, measure)

    meta.kind
    |> Exception.normalize(meta.error, meta.stacktrace)
    |> Sentry.capture_exception(stracktrace: meta.stacktrace, extra: extra)
  end

@anthonator
Copy link
Contributor Author

anthonator commented Aug 6, 2020

Ah, ok. That makes much more sense. Thanks for the clarification. I'm not sure I want to report failed jobs that return {:error, _}. I'd prefer to only report exceptions. Is there a good way to do that?

@sorentwo
Copy link
Owner

sorentwo commented Aug 6, 2020

Ah, ok. That makes much more sense. Thanks for the clarification. I'm not sure I want to report failed jobs that return {:error, _}. I'd prefer to only report exceptions. Is there a good way to do that?

You can check if error value is an exception. You'll probably want to report exits as well though. Something like this should work (untested):

case {meta.kind, meta.error} do
  {_, %{__exception__: _}} ->
    Sentry.capture_exception(meta.error, stracktrace: meta.stacktrace, extra: extra)

  {:exit, _} ->
    Sentry.capture_exception(meta.error, stracktrace: meta.stacktrace, extra: extra)

  _ ->
    :do_nothing
end

sorentwo added a commit that referenced this issue Aug 7, 2020
This makes several significant changes to how crahes, errors and
timeouts are reported from `perform/1` calls:

* Timeouts are wrapped in `Oban.TimeoutError`
* Error and discard tuples are wrapped in `Oban.PerformError`
* Exits and throws are wrapped in `Oban.CrashError`
* Stacktraces are only included from code that is rescued or caught, not
  from error tuples or timeouts.

The goal is improve error formatting within a job's error array and to
make error reporting to external services like Sentry entirely
consistent.

Fixes #305
sorentwo added a commit that referenced this issue Aug 13, 2020
This makes several significant changes to how crashes, errors and
timeouts are reported from `perform/1` calls:

* Timeouts are wrapped in `Oban.TimeoutError`
* Error and discard tuples are wrapped in `Oban.PerformError`
* Exits and throws are wrapped in `Oban.CrashError`
* Stacktraces are only included from code that is rescued or caught, not
  from error tuples or timeouts.

The goal is improve error formatting within a job's error array and to
make error reporting to external services like Sentry entirely
consistent.

Fixes #305
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants