Skip to content

rishisinghal/BeamPipelineSamples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Beam Data Samples

Samples for Apache Beam/Dataflow

  • PipelineCsvAvroProtobuf - Read from CSV file present in GCS, convert it to Protobuf and write back to GCS. Uses AvroIO to write the file.

  • PipelineAvroProtobufParquet - Read from Protobuf file present in GCS, convert it to parquet and write back to GCS. PipelineCsvAvroProtobuf was used to generate the file.

  • PipelineProtobufParquet - Read from Protobuf file present in GCS, convert it to parquet and write back to GCS. The Protobuf file was NOT written using Avro.

  • PipelineCsvParquet - Read from CSV file present in GCS, convert it to Parquet and write back to GCS.

  • PipelinePubSubBtBq - Read from Pub/Sub the device telemetry data and write it to BigQuery and Bigtable. Avro is used to define the data schema.

  • PipelineDbBq - Read from MySQL database using JDBCIO and write to BigQuery using BigQueryIO. Uses Employee database employees table.

  • PipelineDbNestedBQ - Read from MySQL database using JDBCIO, create nested repeating tables and write to BigQuery using BigQueryIO. Uses Employee database employees table.

  • PipelineCsvAvroBq - Read from CSV file present in GCS, use OpenCSV to parse the line and write to BigQuery using BigQueryIO. Uses Employee database employees table as CSV data. Avro is used to define the data schema.

  • PipelineDbInterleaveSpanner - Read from MySQL database using JDBCIO, write to Spanner using SpannerIO. Uses Employee database employees table. Two interleaved tables are to be pre-created in Spanner. Avro is used to define the data schema.

      - emp Table with schema:
         *emp_id: INT64 NOT NULL
          birth_date: STRING 
          first_name: STRING 
          
      - dept Table with schema:
          emp_id: INT64 NOT NULL
         *dept: INT64 
          join_date: STRING 
    

How to compile

mvn clean package

Deploy in Cloud Composer

Check the deployDF.py file

About

Provides different code samples for Apache Beam and DataFlow

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published