-
Notifications
You must be signed in to change notification settings - Fork 513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add *withAttributes Pubsub I/O #546
Conversation
val elementCoder = pipeline.getCoderRegistry.getScalaCoder[T] | ||
val outputCoder = pipeline.getCoderRegistry.getScalaCoder[(T, Map[String, String])] | ||
val parseFn = Functions.simpleFn { msg: PubsubMessage => | ||
val element = CoderUtils.decodeFromByteArray(elementCoder, msg.getMessage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can throw a CoderException
. It looks like in some places Beam catches this and rethrows as a RuntimeException
. Not sure what we should do here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It only throws exception for corrupt message right? In that case not much we can do and should be fine to let it escalate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, only for corrupt message.
val elementCoder = pipeline.getCoderRegistry.getScalaCoder[T] | ||
val outputCoder = pipeline.getCoderRegistry.getScalaCoder[(T, Map[String, String])] | ||
val parseFn = Functions.simpleFn { msg: PubsubMessage => | ||
val element = CoderUtils.decodeFromByteArray(elementCoder, msg.getMessage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above re: exceptions.
How should I add tests to this? Just reuse |
val outputCoder = pipeline.getCoderRegistry.getScalaCoder[(T, Map[String, String])] | ||
val parseFn = Functions.simpleFn { msg: PubsubMessage => | ||
val element = CoderUtils.decodeFromByteArray(elementCoder, msg.getMessage) | ||
val attributes = msg.getAttributeMap.asScala.toMap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use a private class that extends Map[String, String]
to do the lazy wrapping. It's fine since the underlying java Map is never exposed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this a little bit too much for pubsub attributes? I mean if someone is using this methods specifically, they do care about getting attributes. Seems like a unnecessary optimization here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They do care about getting them, but modifying that map is probably unlikely, so delaying the conversion makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems pretty common to wrap j.u.Map
as immutable Map
instead of the mutable one from asScala
. We can move those from SideInput.scala
to a util file and reuse them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wrapper in SideInput
also converts JIterable
to Iterable
, so would have to specialize for that anyways, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No that's specific to the IterableSideInput
case. But wouldn't hurt to pull it into a util file like the regular Map[K, V]
wrapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since parseFn
is a SimpleFunction<T, U>
, which is invariant in U, it would have to leak the wrapper. Is that okay, or is there some way to get around that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do mean by leak? parseFn
and it's signature doesn't leave this method right? Can you provide a snippet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output type of parseFn
is what's used as the output type of the read transform:
public Read<T> withAttributes(SimpleFunction<PubsubMessage, T> parseFn)
and thus the element type of the returned SCollection. Would casting the wrapper to an instance of Map[String, String]
be the way to go?
* @group input | ||
*/ | ||
def pubsubSubscriptionWithAttributes[T: ClassTag](sub: String, | ||
idLabel: String = null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: align arguments?
} else { | ||
val elementCoder = pipeline.getCoderRegistry.getScalaCoder[T] | ||
val outputCoder = pipeline.getCoderRegistry.getScalaCoder[(T, Map[String, String])] | ||
val parseFn = Functions.simpleFn { msg: PubsubMessage => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elementCoder
& parseFn
are identical in pubsubSubscriptionWithAttributes
and pubsubTopicWithAttributes
, put them in a helper method? You can even put almost everything in a helper method and parameterize only the gio.PubsubIO.read().subscription(topic)
vs gio.PubsubIO.read().topic(topic)
part.
* Save this SCollection as a Pub/Sub topic using the given map as message attributes. | ||
* @group output | ||
*/ | ||
def saveAsPubsubWithAttributes[V: ClassTag](topic: String) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you want idLabel
and timestamp
args here too?
import scala.collection.JavaConverters._ | ||
|
||
private[scio] object JMapWrapper { | ||
def ofMultiMap[A, B](self: JMap[A, JIterable[B]]): Map[A, Iterable[B]] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ofJIterable
? 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way don't have a strong opinion. We can keep ofMultiMap
.
LGTM but there's some conflict against master. Can you fix? Thanks. |
Somehow squash works. Weird. |
Appreciate the work, guys! Would you expect it to be immediately included in an upcoming release? If so, when? |
Don't have any immediate plan and we're still waiting on a lot of upstream changes like Beam release and scala 2.12 fixes. Can you give the snapshot a try first? |
Sure...we can try the snapshot.
…On Apr 19, 2017 4:08 PM, "Neville Li" ***@***.***> wrote:
Don't have any immediate plan and we're still waiting on a lot of upstream
changes like Beam release and scala 2.12 fixes. Can you give the snapshot a
try first?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#546 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AH27RMLtgivJkJGdiFHQ3_sHgPcqtM9oks5rxpQFgaJpZM4M-M60>
.
|
Hello, I work with jonopwell and tried to build the current source, it fails with I tried working around that error and hit further errors. |
Are you building in IntelliJ? There's a known IntelliJ issue that can be worked around, see #543 |
I built from the command line, sbt compile |
They're published to the Sonatype Snapshots repo which you can add to your build.sbt with: resolvers += Resolver.sonatypeRepo("snapshots") |
Added this to the wiki FAQ. |
There is no unit test on saveAsPubsubWithAttributes. The signature of saveAsPubsubWithAttributes is difficult to read. An example with a unit test would be a good thing |
Filed #590, would be great if you already have something and can submit a PR. |
Resolves #535, but by adding new input and output functions, as discussed in #538
Open to renaming. Needs tests.