Ecclesiastical Latin IPA: /ˈʃi.o/, [ˈʃiː.o], [ˈʃi.i̯o]
Verb: I can, know, understand, have knowledge.
Starting from version 0.3.0, Scio will move from Dataflow Java SDK to Beam as its core dependencies and will introduce a few breaking changes. See this page for more.
- Scala API close to that of Spark and Scalding core APIs
- Unified batch and streaming programming model1, 2
- Fully managed service2
- Integration with Google Cloud products: Cloud Storage, BigQuery, Pub/Sub, Datastore, Bigtable2
- HDFS source/sink
- Interactive mode with Scio REPL
- Type safe BigQuery
- Integration with Algebird and Breeze
- Pipeline orchestration with Scala Futures
- Distributed cache
1 provided by Apache Beam
2 provided by Google Cloud Dataflow
The ubiquitous word count example can be run directly with SBT in local mode, using
README.md as input.
sbt "scio-examples/run-main com.spotify.scio.examples.WordCount --input=README.md --output=wc" cat wc/part-00000-of-00001.txt
- Scio Wiki - wiki page
- ScalaDocs - current API documentation
- Big Data Rosetta Code - comparison of code snippets in Scio, Scalding and Spark
Scio includes the following artifacts:
scio-core: core library
scio-test: test utilities, add to your project as a "test" dependency
scio-bigquery: Add-on for BigQuery, included in
scio-corebut can also be used standalone
scio-bigtable: Add-on for Bigtable
scio-extra: Extra utilities for working with collections, Breeze, etc.
scio-hdfs: Add-on for HDFS
Copyright 2016 Spotify AB.
Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0