• Table of contents
  • Introduction
  • Apache Beam Job Architecture
  • Pipelines
  • PValues
  • PTransforms
  • ParDo and DoFn
  • Map and FlatMap
  • Filter
  • GroupByKey
  • Example of using GroupByKey,Filter, and FlatMap
  • Runners
  • Writing Apache Beam Jobs
  • 1. Subclass the base_jobs.JobBase class and override the run() method
  • 2. Override the run() method to operate on self.pipeline
  • 3. Have the run() method return a PCollection of JobRunResults
  • 4. Add the job module to core/jobs/registry.py
  • Testing Apache Beam Jobs
  • 1. Inherit from JobTestBase and override the class constant JOB_CLASS
  • 2. Run assertions using a assert_job_output_is_* method
  • Running Apache Beam Jobs
  • Local Development Server
  • Production Server
  • Instruction for job testers
  • Deploy to backup server
  • Run the job on backup server through the local dev server
  • Beam guidelines
  • Do not use NDB put/get/delete directly
  • Use get_package_file_contents for accessing files
  • Example
  • Common Beam errors
  • '_UnwindowedValues' object is not subscriptable error
  • Example
  • _namedptransform is not iterable error
  • Example
  • Guidelines for writing beam jobs
  • Planning a job
  • Executing jobs
  • PR guidelines
  • Job testing workflow
  • Case Studies
  • Case Study: SchemaMigrationJob