# Apache Beam - Error Handling
When we write a Beam flow, errors can occur.  We need to handle those errors.  In this notebook we will look at various ways we can handle this.

In [1]:
%%loadFromPOM

<dependency>
  <groupId>org.apache.beam</groupId>
  <artifactId>beam-sdks-java-core</artifactId>
  <version>2.40.0</version>
</dependency>

<dependency>
  <groupId>org.apache.beam</groupId>
  <artifactId>beam-runners-direct-java</artifactId>
  <version>2.40.0</version>
  <scope>runtime</scope>
</dependency>

<dependency>
  <groupId>org.slf4j</groupId>
  <artifactId>slf4j-api</artifactId>
  <version>2.0.6</version>
</dependency>

Next we define our imports required for execution.

In [2]:
import java.util.Arrays;
import java.util.List;

import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.options.Default;
import org.apache.beam.sdk.options.Description;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.options.StreamingOptions;
import org.apache.beam.sdk.transforms.Create;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.ParDo;
import org.apache.beam.sdk.values.TupleTag;
import org.apache.beam.sdk.values.TupleTagList;

String args[] = new String[] {};
var options = PipelineOptionsFactory.fromArgs(args).withValidation().create();

Now we define our DoFn that is going to be executed once per element.  In this example, we simply write the output to the output stream (console).

In [3]:
public class Person implements Serializable {
  private String name;
  private int age;
  
  public Person(String name, int age) {
    this.name = name;
    this.age = age;
  }
  public String getName() {
    return name;
  }
  public int getAge() {
    return age;
  }
  
  public String toString() {
    return "Name: " + name + ", Age: " + age;
  }
} // End of Person

public class PersonToUpperDoFn extends DoFn<Person, Person>  {
  @ProcessElement
  public void processElement(@Element Person person, OutputReceiver<Person> out) {
    out.output(new Person(person.getName().toUpperCase(), person.getAge()));
  }
} // End of PersonToUpperDoFn

Finally, we run the pipeline and see the output:

In [4]:
public class LoggingDoFn<T> extends DoFn<T, T>  {
  @ProcessElement
  public void processElement(@Element T element, OutputReceiver<T> out) {
    System.out.println(element);
    out.output(element);
  }
} // End of LoggingDoFn

var pipeline = Pipeline.create(options);
pipeline
  .apply("Create elements", Create.of(
    new Person("Neil", 41),
    new Person("John", 99)
  ))
  .apply("Calculate Fractions",ParDo.of(new PersonToUpperDoFn()))
  .apply("Print elements",ParDo.of(new LoggingDoFn<>()));
pipeline.run().waitUntilFinish();

Name: JOHN, Age: 99
Name: NEIL, Age: 41


DONE

Now let's run it again ... but this time it will fail.  Can you see why?

If we look, we are creating a `Person` using the parameters of `null` and `22`.  We have supplied a `null` value for the person's name and in the `PersonToUpperDoFn`, we are invoking `toUpperCase()` on the string value of the name.  We can't invoke methods on a `null` object or we will get a `NullPointerException`.

In [5]:
var pipeline = Pipeline.create(options);
pipeline
  .apply("Create elements", Create.of(
    new Person("Neil", 41),
    new Person(null, 22),
    new Person("John", 99)
  ))
  .apply("Calculate Upper of Names",ParDo.of(new PersonToUpperDoFn()))
  .apply("Print elements",ParDo.of(new LoggingDoFn<>()));
pipeline.run().waitUntilFinish();

EvalException: java.lang.NullPointerException

One solution is to catch the error (exception).  Once caught, we *could* silently ignore it but that obviously isn't good.  Instead, what we will do is catch the error and return as part of the pipeline.  This now implies that when we execute a PTransform that could fail, we now have two outputs ... one output is the PCollection of "good" results and a second output (again a PCollection) of caught errors.  This error PCollection can be sent to a *dead letter queue* or some other persistent storage for subsequent correction.

In [6]:
public class Error implements Serializable {
  private Object element;
  private Exception ex;
  
  public Error(Object element, Exception ex) {
    this.element = element;
    this.ex = ex;
  }
  
  public String toString() {
    return "Error: " + ex.toString() + ", element: " + element;
  }
} // End of Error

public class PersonToUpperDoFn extends DoFn<Person, Person>  {
  public final static TupleTag<Person> normalTag = new TupleTag<Person>(){};
  public final static TupleTag<Error> errorTag = new TupleTag<Error>(){};  
  @ProcessElement
  public void processElement(@Element Person person, MultiOutputReceiver outputReceivers) {
    try {
      outputReceivers.get(normalTag).output(new Person(person.getName().toUpperCase(), person.getAge()));
    }
    catch(Exception e) {
      //e.printStackTrace();
      outputReceivers.get(errorTag).output(new Error(person, e));
    }   
  } // End of processElement
} // End of PersonToUpperDoFn

var pipeline = Pipeline.create(options);
var multi = pipeline
  .apply("Create elements", Create.of(
    new Person("Neil", 41),
    new Person(null, 22),
    new Person("John", 99)
  ))
  .apply("Calculate Upper of Names",ParDo.of(new PersonToUpperDoFn())
    .withOutputTags(PersonToUpperDoFn.normalTag, TupleTagList.of(PersonToUpperDoFn.errorTag)));

// Handle the normal (non error) output of the data
multi.get(PersonToUpperDoFn.normalTag)
  .apply("Print elements",ParDo.of(new LoggingDoFn<>()));

// Here we do error handling ...
multi.get(PersonToUpperDoFn.errorTag)
  .apply("Print elements",ParDo.of(new LoggingDoFn<>()));  

pipeline.run().waitUntilFinish();

Name: NEIL, Age: 41
Name: JOHN, Age: 99
Error: java.lang.NullPointerException, element: Name: null, Age: 22


DONE