Skip to content

spotify/gcs-tools

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.

GCS Tools

Build Status GitHub license

Raison d'être:

Light weight wrapper that adds Google Cloud Storage (GCS) support to common Hadoop tools, including avro-tools, parquet-cli, proto-tools for Scio's Protobuf in Avro file, and magnolify-tools for Magnolify code generation, so that they can be used from regular workstations or laptops, outside of a Google Compute Engine (GCE) instance.

It uses your existing OAuth2 credentials and allows authentication via a browser.

Usage:

You can install the tools via our Homebrew tap on Mac.

brew tap spotify/public
brew install gcs-avro-tools gcs-parquet-cli gcs-proto-tools gcs-magnolify-tools
avro-tools tojson <GCS_PATH>
parquet-cli cat <GCS_PATH>
proto-tools tojson <GCS_PATH>
magnolify-tools <avro|parquet> <GCS_PATH>

Or build them yourself.

sbt assembly
java -jar avro-tools/target/scala-2.13/avro-tools-*.jar tojson <GCS_PATH>
java -jar parquet-cli/target/scala-2.13/parquet-cli-*.jar cat <GCS_PATH>
java -jar proto-tools/target/scala-2.13/proto-tools-*.jar cat <GCS_PATH>
java -jar magnolify-tools/target/scala-2.13/magnolify-tools-*.jar <avro|parquet> <GCS_PATH>

How it works:

To make avro-tools and parquet-cli work with GCS we need:

GCS connector won't pick up your local gcloud configuration, and instead expects settings in core-site.xml.