# Apache Beam - Custom Transform

So far we have been using the PTransforms supplied as part of Beam.  We can write our own PTransforms that encapsulate our own functions.

At the highest level, the concept is:

```
public static class MyTransform extends PTransform<PCollection<String>, PCollection<String>> {
  @Override
  public PCollection<String> expand(PCollection<String> input) {
    ... Do something with the input PCollection and return a new output PCollection
  }
}
```

* [JavaDoc: PTransform](https://beam.apache.org/releases/javadoc/2.42.0/org/apache/beam/sdk/transforms/PTransform.html)


First, we define the dependencies that we wish to load from the Maven repositories.

In [1]:
%%loadFromPOM

<dependency>
  <groupId>org.apache.beam</groupId>
  <artifactId>beam-sdks-java-core</artifactId>
  <version>2.40.0</version>
</dependency>

<dependency>
  <groupId>org.apache.beam</groupId>
  <artifactId>beam-runners-direct-java</artifactId>
  <version>2.40.0</version>
  <scope>runtime</scope>
</dependency>

<dependency>
  <groupId>org.slf4j</groupId>
  <artifactId>slf4j-api</artifactId>
  <version>2.0.6</version>
</dependency>

Next we define our imports required for execution.

In [2]:
import java.util.Arrays;
import java.util.List;

import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.options.Default;
import org.apache.beam.sdk.options.Description;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.options.StreamingOptions;
import org.apache.beam.sdk.transforms.Create;
import org.apache.beam.sdk.transforms.GroupByKey;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.ParDo;
import org.apache.beam.sdk.coders.KvCoder;
import org.apache.beam.sdk.coders.StringUtf8Coder;
import org.apache.beam.sdk.values.KV;
import org.apache.beam.sdk.values.PDone;
import org.apache.beam.sdk.values.TupleTag;
import org.apache.beam.sdk.transforms.join.CoGbkResult;
import org.apache.beam.sdk.transforms.join.KeyedPCollectionTuple;
import org.apache.beam.sdk.transforms.join.CoGroupByKey;
import org.apache.beam.sdk.transforms.Combine.CombineFn;
import org.apache.beam.sdk.transforms.Combine;
import org.apache.beam.sdk.transforms.SerializableFunction;
import org.apache.beam.sdk.transforms.Sum;
import org.apache.beam.sdk.transforms.Count;
import org.apache.beam.sdk.transforms.Distinct;
import org.apache.beam.sdk.transforms.Filter;
import org.apache.beam.sdk.transforms.PTransform;

String args[] = new String[] {};
var options = PipelineOptionsFactory.fromArgs(args).withValidation().create();

In this example we create a new PTransform called `Upper`.  This transform takes a `PCollection<String>` and returns a new `PCollection<String>` where all the elements in the original PCollection are converted to upper case.  The new transform overrides the `expand()` method to do its work.

In [3]:
public class LoggingDoFn<T> extends DoFn<T, T>  {
  @ProcessElement
  public void processElement(
    @Element T element,
    OutputReceiver<T> out) {
    System.out.println(element);
    out.output(element);
  }
}

public static class Upper extends PTransform<PCollection<String>, PCollection<String>> {
  
  private class UpperDoFn extends DoFn<String, String> {
    @ProcessElement
    public void processElement(@Element String word, OutputReceiver<String> out) {
        out.output(word.toUpperCase());
    }
  }
  
  public static Upper create() {
    return new Upper();
  } // End of Upper.create()
  
  @Override
  public PCollection<String> expand(PCollection<String> inputPCollection) {
    return inputPCollection.apply("To Upper", ParDo.of(new UpperDoFn()));
  }
} // End of Upper


var pipeline = Pipeline.create(options);
pipeline
  .apply("Create elements", Create.of(Arrays.asList("Hello!", "World!")))
  .apply("Combine sum", Upper.create()) // Use the new transform
  .apply("Print elements", ParDo.of(new LoggingDoFn<>()));

pipeline.run().waitUntilFinish();

HELLO!
WORLD!


DONE