Java:
- First Apache Beam Application
- ParDo and DoFn: Parallel Processing
- KV + GroupByKey: Aggregation
- MapElement.via(new SimpleFunction) <-> ParDo + DoFn
- KV with Custom Class and GroupIntoBatches
- MultiOutput: Failure Handling
- MultiOutput: with differnt types
- Read from Google PubSub
Python:
- Read and write PubSub
- Read and write PubSub proto message
- Read and write PubSub with deduplication (ToDo)
- Study Resource:
- Error Handling:
- https://medium.com/@vallerylancey/error-handling-elements-in-apache-beam-pipelines-fffdea91af2a
- https://www.linuxdeveloper.space/retry-apache-beam-flink/
- https://medium.com/@bravnic/apache-beam-fundamentals-765ea5b59565
- https://stackoverflow.com/questions/53392311/apache-beam-retrytransienterrors-neverretry-does-not-respect-table-not-found-err
- Examples:
- IO:
- Coder
- KafkaIO+Protobuf: https://selectfrom.dev/apache-beam-python-dataflow-kafkaio-for-protobuf-message-streaming-f349119850ad