# Apache Beam - Puzzles

This notebook contains unsolved puzzles

Since our notebook is going to use Google Cloud SDK JARS we must include these in our dependencies.  Specifically, we need to include:

```
<dependency>
  <groupId>org.apache.beam</groupId>
  <artifactId>beam-sdks-java-io-google-cloud-platform</artifactId>
  <version>2.43.0</version>
</dependency>
```

Normally we would load our dependencies using the IJava Jupyter cell magic called `%%loadFromPom`.  Unfortunately, this doesn't work ([issue](https://github.com/SpencerPark/IJava/issues/139)).  A workaround is to download the dependencies outside of Jupyter and then launch Jupyter with the downloaded dependencies in the classpath.

```
mvn dependency:copy-dependencies
export IJAVA_CLASSPATH="./target/dependency/*"
jupyter notebook

```

Next we define our imports required for execution.

In [1]:
import java.util.Arrays;
import java.util.List;

import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.options.Default;
import org.apache.beam.sdk.options.Description;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.options.StreamingOptions;
import org.apache.beam.sdk.transforms.Create;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.ParDo;
import org.apache.beam.sdk.coders.KvCoder;
import org.apache.beam.sdk.coders.StringUtf8Coder;
import org.apache.beam.sdk.values.KV;
import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO;
import org.apache.beam.sdk.transforms.Sample;
import org.apache.beam.sdk.transforms.SerializableFunction;
import org.apache.beam.sdk.io.gcp.bigquery.SchemaAndRecord;
import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.WriteDisposition;
import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.Write.CreateDisposition;
import com.google.api.services.bigquery.model.TableSchema;
import com.google.api.services.bigquery.model.TableFieldSchema;
import com.google.api.services.bigquery.model.TableRow;
import org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoder;
import org.apache.beam.sdk.transforms.MapElements;
import org.apache.beam.sdk.values.TypeDescriptor;
import com.google.api.services.bigquery.model.TableReference;
import org.apache.beam.sdk.io.gcp.bigquery.InsertRetryPolicy;

String args[] = new String[] {};
var options = PipelineOptionsFactory.fromArgs(args).withValidation().create();

SLF4J: No SLF4J providers were found.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See https://www.slf4j.org/codes.html#noProviders for further details.


## Unchecked Method invocation
The following fails with an `Unchecked Method Invocation`.

In [44]:
public class LoggingDoFn<T> extends DoFn<T, T>  {
  @ProcessElement
  public void processElement(@Element T element, OutputReceiver<T> out) {
    System.out.println(element);
    out.output(element);
  }
}

class Employee implements Serializable {
  private String name;
  private Double salary;
  private Integer tenure;
  
  public Employee(String name, Double salary, Integer tenure) {
    this.name = name;
    this.salary = salary;
    this.tenure = tenure;
  }
  
  public String toString() {
    return "name: " + name + ", salary: " + salary + ", tenure: " + tenure;
  }
}

var pipeline = Pipeline.create(options);
pipeline
  .apply("Create Rows", Create.
    of(
      new Employee("Neil", 50000.11, 48),
      new Employee("Sue", 75000.99, 12),
      new Employee("Bob", 45000.32, 6)
    )
  )
  .apply("1", ParDo.of(new LoggingDoFn()))  
  .apply("2", ParDo.of(new LoggingDoFn()));

pipeline.run().waitUntilFinish();

CompilationException: 

The solution is to provide a type to the LoggingDoFn.

In [45]:
var pipeline = Pipeline.create(options);
pipeline
  .apply("Create Rows", Create.
    of(
      new Employee("Neil", 50000.11, 48),
      new Employee("Sue", 75000.99, 12),
      new Employee("Bob", 45000.32, 6)
    )
  )
  .apply("1", ParDo.of(new LoggingDoFn<>()))  
  .apply("2", ParDo.of(new LoggingDoFn<>()));

pipeline.run().waitUntilFinish();

name: Sue, salary: 75000.99, tenure: 12
name: Neil, salary: 50000.11, tenure: 48
name: Bob, salary: 45000.32, tenure: 6
name: Neil, salary: 50000.11, tenure: 48
name: Sue, salary: 75000.99, tenure: 12
name: Bob, salary: 45000.32, tenure: 6


DONE