-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unique tuple tags #4711
Unique tuple tags #4711
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check should also be added here?
scio/scio-smb/src/main/java/org/apache/beam/sdk/extensions/smb/SortedBucketIO.java
Line 107 in aa81b54
return new CoGbk<>( |
HashSet<String> inputNames = new HashSet<>(); | ||
inputs.stream() | ||
.forEach(i -> { | ||
if (!inputNames.add(i.getTupleTag().getId())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would just change this to:
assert(inputNames.add(i.getTupleTag().getId())
And not add IllegalArgumentException
to these signatures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh ok I can do that. Can you please say why that is preferred?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! It's just a small enough edge case that I don't think it warrants changing a user-facing API signature for. Additionally it's the convention we're using elsewhere in scio-smb (check out at the constructors for all BucketMetadata implementations, which all use assert
s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm ok I see the signature change is a good reason. But as far as I know the main difference between assert and throwing an exception is the ability to turn assert off in prod (it is a development feature). Isn't that why we use it in other places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general it is easy to know where assert should be used and where it should not. e.g. when the input is set by user, you should NOT use assertion. In this case "the user" is the developer which chooses the name of the tuple tags? But that is at development time from their POV, while they are using our prod version of lib. confusing!
please let me know wyt :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also this is a runtime exception so no need to have it in the signature. Now the signature is unchanged
@@ -110,12 +128,14 @@ private CoGbkWithSecondaryBuilder(Class<K1> primaryKeyClass, Class<K2> secondary | |||
* Returns a new {@link CoGbkWithSecondary} with the given first sorted-bucket source in {@link | |||
* Read}. | |||
*/ | |||
public CoGbkWithSecondary<K1, K2> of(Read<?> read) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this method got deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My IDE made a lot of white space changes, I made a bunch of mistakes fixing them back :)
aa81b54
to
3ae0da3
Compare
Codecov Report
@@ Coverage Diff @@
## main #4711 +/- ##
==========================================
- Coverage 60.94% 60.93% -0.02%
==========================================
Files 286 286
Lines 10467 10479 +12
Branches 750 755 +5
==========================================
+ Hits 6379 6385 +6
- Misses 4088 4094 +6
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
3ae0da3
to
efeefcd
Compare
@@ -69,7 +69,7 @@ final class SortedBucketScioContext(@transient private val self: ScioContext) ex | |||
rhs: SortedBucketIO.Read[R], | |||
targetParallelism: TargetParallelism = TargetParallelism.auto() | |||
): SCollection[(K, (L, R))] = { | |||
val t = SortedBucketIO.read(keyClass).of(lhs).and(rhs).withTargetParallelism(targetParallelism) | |||
val t = SortedBucketIO.read(keyClass).of(lhs, rhs).withTargetParallelism(targetParallelism) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you move from approach with and
? To easier catch all TupleTags in one method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that was the convention used in the generated smb multi join file. Also it causes a bit less object overhead, not that it matters much
132a5fb
to
4b31c87
Compare
4b31c87
to
2b1dcdb
Compare
1ce100a
to
ae8bb67
Compare
ae8bb67
to
85243d9
Compare
52380d3
to
acc4dd4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
* check tuple tag id uniqueness for smb cogroup * test cases * update documentations
#4692