Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Spanner package #1491

Merged
merged 8 commits into from Oct 30, 2018
Merged

add Spanner package #1491

merged 8 commits into from Oct 30, 2018

Conversation

clairemcginty
Copy link
Contributor

Currently supports writes from Mutations and reads into Structs. I have the spanner type macros saved for sometime further down the line.

notes:

  • Spanner integration test is kind of slow, takes 37 seconds
  • SpannerIO's testId doesn't support a unit having 2 spanner reads/writes to the same DB since it's just keyed by the SpannerConfig params, which don't include table name. For writes the table name is keyed into each Mutation object and reads give you a choice of specifying a table name or a literal query
  • Haven't implemented Taps yet

config.getInstanceId.get(),
config.getDatabaseId.get(),
List(
s"CREATE TABLE ${tablePrefix}_1 (\n Key INT64, \n Value STRING(MAX) \n) PRIMARY KEY (Key)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we really need the \ns here? Kinda makes it hard to read.

import SpannerIOIT._
implicit val ec: ExecutionContext = ExecutionContext.global

override def beforeAll(): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also make a helper function for table creation and move the logic to each individual tests? We can also change _[123] to something more meaningful in each test to make it more readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it made since to batch all the Spanner CREATE statements for the sake of simplicity/performance -- I refactored it a bit to provide a Spanner table context per test that handles all the other setup logic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 no strong preference. Either way works.

.set("Value").to("bar")
.build())

val awaited = Await.result(result, 10.seconds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for these timeouts?

throw new IllegalStateException("SpannerRead is read-only")

override def tap(params: ReadP): Tap[Nothing] =
throw new NotImplementedError("SpannerRead tap is not implemented")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be EmptyTap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose the NotImplementedError because I thought it was something that actually should be implemented in the future for the SpannerRead op. what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked IRL. Database source/sinks are not immutable/atomic so reading a tap is not guaranteed to return the same data in the original SCollection.

.withInstanceId("someInstance")
}

class SpannerIOTest extends PipelineSpec with Matchers {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

sealed trait ReadMethod
case class FromTable(tableName: String, columns: Seq[String]) extends ReadMethod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clairemcginty

Suggested change
case class FromTable(tableName: String, columns: Seq[String]) extends ReadMethod
final case class FromTable(tableName: String, columns: Seq[String]) extends ReadMethod


sealed trait ReadMethod
case class FromTable(tableName: String, columns: Seq[String]) extends ReadMethod
case class FromQuery(query: String) extends ReadMethod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clairemcginty

Suggested change
case class FromQuery(query: String) extends ReadMethod
final case class FromQuery(query: String) extends ReadMethod


override protected def write(data: SCollection[Mutation],
params: WriteP): Future[Tap[Nothing]] = {
var transform = BSpannerIO.write().withSpannerConfig(config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clairemcginty can you do it like this?

Suggested change
var transform = BSpannerIO.write().withSpannerConfig(config)
val transform = BSpannerIO
.write()
.withSpannerConfig(config)
.withBatchSizeBytes(params.batchSizeBytes)
.withFailureMode(params.failureMode)

import org.apache.beam.sdk.io.gcp.spanner.SpannerConfig

object Spanner {
lazy val defaultInstance: Spanner = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clairemcginty

Suggested change
lazy val defaultInstance: Spanner = {
private lazy val defaultInstance: Spanner = {

SpannerOptions.newBuilder().build().getService
}

def getDatabaseClient(config: SpannerConfig,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def getDatabaseClient(config: SpannerConfig,
def databaseClient(config: SpannerConfig,

))
}

def getAdminClient(project: String, instance: Spanner = defaultInstance): DatabaseAdminClient =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def getAdminClient(project: String, instance: Spanner = defaultInstance): DatabaseAdminClient =
def adminClient(project: String, instance: Spanner = defaultInstance): DatabaseAdminClient =


package object spanner {

implicit class SpannerScioContext(@transient val self: ScioContext) extends Serializable {
Copy link
Contributor

@regadas regadas Oct 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
implicit class SpannerScioContext(@transient val self: ScioContext) extends Serializable {
implicit class SpannerScioContext(@transient private val self: ScioContext) extends Serializable {

}
}

implicit class SpannerSCollection(@transient val self: SCollection[Mutation])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
implicit class SpannerSCollection(@transient val self: SCollection[Mutation])
implicit class SpannerSCollection(@transient private val self: SCollection[Mutation])

Copy link
Contributor

@regadas regadas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good 👍 There are some suggestions.

@codecov
Copy link

codecov bot commented Oct 30, 2018

Codecov Report

Merging #1491 into master will increase coverage by 0.07%.
The diff coverage is 88.37%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1491      +/-   ##
==========================================
+ Coverage   79.05%   79.12%   +0.07%     
==========================================
  Files         170      173       +3     
  Lines        5208     5251      +43     
  Branches      415      411       -4     
==========================================
+ Hits         4117     4155      +38     
- Misses       1091     1096       +5
Impacted Files Coverage Δ
...com/spotify/scio/coders/instances/JavaCoders.scala 85.71% <100%> (+0.86%) ⬆️
.../main/scala/com/spotify/scio/spanner/package.scala 100% <100%> (ø)
...cala/com/spotify/scio/spanner/client/Spanner.scala 100% <100%> (ø)
...ain/scala/com/spotify/scio/spanner/SpannerIO.scala 81.48% <81.48%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bef6571...e906126. Read the comment docs.

@clairemcginty clairemcginty merged commit b012c4f into master Oct 30, 2018
regadas pushed a commit that referenced this pull request Oct 30, 2018
* spanner IO first pass

* clean up

* add to root project

* make scala 2.11 happy

* remove redundant implicits

* address comments

* SpannerRead returns EmptyTap

* refactor integration test
@regadas regadas deleted the claire/spanner_io branch October 30, 2018 19:44
clairemcginty added a commit that referenced this pull request Oct 31, 2018
* spanner IO first pass

* clean up

* add to root project

* make scala 2.11 happy

* remove redundant implicits

* address comments

* SpannerRead returns EmptyTap

* refactor integration test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants