Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Spanner package #1491

Merged
merged 8 commits into from Oct 30, 2018

Conversation

@ClaireMcGinty
Copy link
Contributor

commented Oct 29, 2018

Currently supports writes from Mutations and reads into Structs. I have the spanner type macros saved for sometime further down the line.

notes:

  • Spanner integration test is kind of slow, takes 37 seconds
  • SpannerIO's testId doesn't support a unit having 2 spanner reads/writes to the same DB since it's just keyed by the SpannerConfig params, which don't include table name. For writes the table name is keyed into each Mutation object and reads give you a choice of specifying a table name or a literal query
  • Haven't implemented Taps yet

ClaireMcGinty added some commits Oct 29, 2018

@ClaireMcGinty ClaireMcGinty requested review from regadas and nevillelyh Oct 29, 2018

config.getInstanceId.get(),
config.getDatabaseId.get(),
List(
s"CREATE TABLE ${tablePrefix}_1 (\n Key INT64, \n Value STRING(MAX) \n) PRIMARY KEY (Key)",

This comment has been minimized.

Copy link
@nevillelyh

nevillelyh Oct 29, 2018

Member

nit: do we really need the \ns here? Kinda makes it hard to read.

import SpannerIOIT._
implicit val ec: ExecutionContext = ExecutionContext.global

override def beforeAll(): Unit = {

This comment has been minimized.

Copy link
@nevillelyh

nevillelyh Oct 29, 2018

Member

Also make a helper function for table creation and move the logic to each individual tests? We can also change _[123] to something more meaningful in each test to make it more readable.

This comment has been minimized.

Copy link
@ClaireMcGinty

ClaireMcGinty Oct 30, 2018

Author Contributor

I thought it made since to batch all the Spanner CREATE statements for the sake of simplicity/performance -- I refactored it a bit to provide a Spanner table context per test that handles all the other setup logic

This comment has been minimized.

Copy link
@nevillelyh

nevillelyh Oct 30, 2018

Member

👍 no strong preference. Either way works.

.set("Value").to("bar")
.build())

val awaited = Await.result(result, 10.seconds)

This comment has been minimized.

Copy link
@nevillelyh

nevillelyh Oct 29, 2018

Member

What's the reason for these timeouts?

throw new IllegalStateException("SpannerRead is read-only")

override def tap(params: ReadP): Tap[Nothing] =
throw new NotImplementedError("SpannerRead tap is not implemented")

This comment has been minimized.

Copy link
@nevillelyh

nevillelyh Oct 29, 2018

Member

Should be EmptyTap?

This comment has been minimized.

Copy link
@ClaireMcGinty

ClaireMcGinty Oct 30, 2018

Author Contributor

I chose the NotImplementedError because I thought it was something that actually should be implemented in the future for the SpannerRead op. what do you think?

This comment has been minimized.

Copy link
@nevillelyh

nevillelyh Oct 30, 2018

Member

Talked IRL. Database source/sinks are not immutable/atomic so reading a tap is not guaranteed to return the same data in the original SCollection.

.withInstanceId("someInstance")
}

class SpannerIOTest extends PipelineSpec with Matchers {
}

sealed trait ReadMethod
case class FromTable(tableName: String, columns: Seq[String]) extends ReadMethod

This comment has been minimized.

Copy link
@regadas

regadas Oct 29, 2018

Contributor

@ClaireMcGinty

Suggested change
case class FromTable(tableName: String, columns: Seq[String]) extends ReadMethod
final case class FromTable(tableName: String, columns: Seq[String]) extends ReadMethod

sealed trait ReadMethod
case class FromTable(tableName: String, columns: Seq[String]) extends ReadMethod
case class FromQuery(query: String) extends ReadMethod

This comment has been minimized.

Copy link
@regadas

regadas Oct 29, 2018

Contributor

@ClaireMcGinty

Suggested change
case class FromQuery(query: String) extends ReadMethod
final case class FromQuery(query: String) extends ReadMethod

override protected def write(data: SCollection[Mutation],
params: WriteP): Future[Tap[Nothing]] = {
var transform = BSpannerIO.write().withSpannerConfig(config)

This comment has been minimized.

Copy link
@regadas

regadas Oct 29, 2018

Contributor

@ClaireMcGinty can you do it like this?

Suggested change
var transform = BSpannerIO.write().withSpannerConfig(config)
val transform = BSpannerIO
.write()
.withSpannerConfig(config)
.withBatchSizeBytes(params.batchSizeBytes)
.withFailureMode(params.failureMode)
import org.apache.beam.sdk.io.gcp.spanner.SpannerConfig

object Spanner {
lazy val defaultInstance: Spanner = {

This comment has been minimized.

Copy link
@regadas

regadas Oct 29, 2018

Contributor

@ClaireMcGinty

Suggested change
lazy val defaultInstance: Spanner = {
private lazy val defaultInstance: Spanner = {
SpannerOptions.newBuilder().build().getService
}

def getDatabaseClient(config: SpannerConfig,

This comment has been minimized.

Copy link
@regadas

regadas Oct 29, 2018

Contributor
Suggested change
def getDatabaseClient(config: SpannerConfig,
def databaseClient(config: SpannerConfig,
))
}

def getAdminClient(project: String, instance: Spanner = defaultInstance): DatabaseAdminClient =

This comment has been minimized.

Copy link
@regadas

regadas Oct 29, 2018

Contributor
Suggested change
def getAdminClient(project: String, instance: Spanner = defaultInstance): DatabaseAdminClient =
def adminClient(project: String, instance: Spanner = defaultInstance): DatabaseAdminClient =

package object spanner {

implicit class SpannerScioContext(@transient val self: ScioContext) extends Serializable {

This comment has been minimized.

Copy link
@regadas

regadas Oct 29, 2018

Contributor
Suggested change
implicit class SpannerScioContext(@transient val self: ScioContext) extends Serializable {
implicit class SpannerScioContext(@transient private val self: ScioContext) extends Serializable {
}
}

implicit class SpannerSCollection(@transient val self: SCollection[Mutation])

This comment has been minimized.

Copy link
@regadas

regadas Oct 29, 2018

Contributor
Suggested change
implicit class SpannerSCollection(@transient val self: SCollection[Mutation])
implicit class SpannerSCollection(@transient private val self: SCollection[Mutation])
@regadas
Copy link
Contributor

left a comment

Looking good 👍 There are some suggestions.

@codecov

This comment has been minimized.

Copy link

commented Oct 30, 2018

Codecov Report

Merging #1491 into master will increase coverage by 0.07%.
The diff coverage is 88.37%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1491      +/-   ##
==========================================
+ Coverage   79.05%   79.12%   +0.07%     
==========================================
  Files         170      173       +3     
  Lines        5208     5251      +43     
  Branches      415      411       -4     
==========================================
+ Hits         4117     4155      +38     
- Misses       1091     1096       +5
Impacted Files Coverage Δ
...com/spotify/scio/coders/instances/JavaCoders.scala 85.71% <100%> (+0.86%) ⬆️
.../main/scala/com/spotify/scio/spanner/package.scala 100% <100%> (ø)
...cala/com/spotify/scio/spanner/client/Spanner.scala 100% <100%> (ø)
...ain/scala/com/spotify/scio/spanner/SpannerIO.scala 81.48% <81.48%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bef6571...e906126. Read the comment docs.

@ClaireMcGinty ClaireMcGinty merged commit b012c4f into master Oct 30, 2018

2 checks passed

ci/circleci: build_211 Your tests passed on CircleCI!
Details
ci/circleci: build_212 Your tests passed on CircleCI!
Details

regadas added a commit that referenced this pull request Oct 30, 2018

add Spanner package (#1491)
* spanner IO first pass

* clean up

* add to root project

* make scala 2.11 happy

* remove redundant implicits

* address comments

* SpannerRead returns EmptyTap

* refactor integration test

@regadas regadas deleted the claire/spanner_io branch Oct 30, 2018

ClaireMcGinty added a commit that referenced this pull request Oct 31, 2018

add Spanner package (#1491)
* spanner IO first pass

* clean up

* add to root project

* make scala 2.11 happy

* remove redundant implicits

* address comments

* SpannerRead returns EmptyTap

* refactor integration test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.