Unique tuple tags #4711

farzad-sedghi · 2023-02-14T07:37:25Z

#4692

clairemcginty

Check should also be added here?

scio/scio-smb/src/main/java/org/apache/beam/sdk/extensions/smb/SortedBucketIO.java

Line 107 in aa81b54

return new CoGbk<>(

clairemcginty · 2023-02-14T15:11:35Z

scio-smb/src/main/java/org/apache/beam/sdk/extensions/smb/SortedBucketIO.java

+    HashSet<String> inputNames = new HashSet<>();
+    inputs.stream()
+        .forEach(i -> {
+              if (!inputNames.add(i.getTupleTag().getId())) {


I think I would just change this to:

assert(inputNames.add(i.getTupleTag().getId())

And not add IllegalArgumentException to these signatures

oh ok I can do that. Can you please say why that is preferred?

Sure! It's just a small enough edge case that I don't think it warrants changing a user-facing API signature for. Additionally it's the convention we're using elsewhere in scio-smb (check out at the constructors for all BucketMetadata implementations, which all use asserts)

hmm ok I see the signature change is a good reason. But as far as I know the main difference between assert and throwing an exception is the ability to turn assert off in prod (it is a development feature). Isn't that why we use it in other places?

in general it is easy to know where assert should be used and where it should not. e.g. when the input is set by user, you should NOT use assertion. In this case "the user" is the developer which chooses the name of the tuple tags? But that is at development time from their POV, while they are using our prod version of lib. confusing!
please let me know wyt :)

also this is a runtime exception so no need to have it in the signature. Now the signature is unchanged

clairemcginty · 2023-02-14T15:15:07Z

scio-smb/src/main/java/org/apache/beam/sdk/extensions/smb/SortedBucketIO.java

@@ -110,12 +128,14 @@ private CoGbkWithSecondaryBuilder(Class<K1> primaryKeyClass, Class<K2> secondary
     * Returns a new {@link CoGbkWithSecondary} with the given first sorted-bucket source in {@link
     * Read}.
     */
-    public CoGbkWithSecondary<K1, K2> of(Read<?> read) {


Looks like this method got deleted?

My IDE made a lot of white space changes, I made a bunch of mistakes fixing them back :)

codecov · 2023-02-14T17:08:22Z

Codecov Report

Merging #4711 (08a8ac2) into main (1c30980) will decrease coverage by 0.02%.
The diff coverage is 33.33%.

❗ Current head 08a8ac2 differs from pull request most recent head dae4316. Consider uploading reports for the commit dae4316 to get more accurate results

@@            Coverage Diff             @@
##             main    #4711      +/-   ##
==========================================
- Coverage   60.94%   60.93%   -0.02%     
==========================================
  Files         286      286              
  Lines       10467    10479      +12     
  Branches      750      755       +5     
==========================================
+ Hits         6379     6385       +6     
- Misses       4088     4094       +6

Impacted Files	Coverage Δ
...n/scala/com/spotify/scio/bigquery/BigQueryIO.scala	`39.13% <0.00%> (-0.50%)`	⬇️
...n/scala/com/spotify/scio/bigtable/BigTableIO.scala	`20.00% <0.00%> (-0.52%)`	⬇️
...scala/com/spotify/scio/datastore/DatastoreIO.scala	`16.66% <0.00%> (+4.16%)`	⬆️
.../main/scala/com/spotify/scio/pubsub/PubsubIO.scala	`9.87% <0.00%> (-0.52%)`	⬇️
...ain/scala/com/spotify/scio/spanner/SpannerIO.scala	`23.07% <0.00%> (-0.93%)`	⬇️
.../src/main/scala/com/spotify/scio/jdbc/JdbcIO.scala	`27.50% <0.00%> (ø)`
.../spotify/scio/jdbc/sharded/JdbcShardedSelect.scala	`16.66% <0.00%> (ø)`
.../smb/syntax/SortMergeBucketScioContextSyntax.scala	`18.32% <66.66%> (ø)`
.../src/main/scala/com/spotify/scio/avro/AvroIO.scala	`93.00% <100.00%> (+0.14%)`	⬆️
...com/spotify/scio/coders/instances/AvroCoders.scala	`82.35% <100.00%> (+2.35%)`	⬆️
... and 6 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

shnapz · 2023-02-14T22:35:00Z

scio-smb/src/main/scala/com/spotify/scio/smb/syntax/SortMergeBucketScioContextSyntax.scala

@@ -69,7 +69,7 @@ final class SortedBucketScioContext(@transient private val self: ScioContext) ex
    rhs: SortedBucketIO.Read[R],
    targetParallelism: TargetParallelism = TargetParallelism.auto()
  ): SCollection[(K, (L, R))] = {
-    val t = SortedBucketIO.read(keyClass).of(lhs).and(rhs).withTargetParallelism(targetParallelism)
+    val t = SortedBucketIO.read(keyClass).of(lhs, rhs).withTargetParallelism(targetParallelism)


Why did you move from approach with and? To easier catch all TupleTags in one method?

that was the convention used in the generated smb multi join file. Also it causes a bit less object overhead, not that it matters much

clairemcginty

nice!

* check tuple tag id uniqueness for smb cogroup * test cases * update documentations

check tuple tag id uniqueness for smb cogroup

5850e51

clairemcginty reviewed Feb 14, 2023

View reviewed changes

farzad-sedghi force-pushed the unique-tuple-tags branch from aa81b54 to 3ae0da3 Compare February 14, 2023 16:48

test cases

efeefcd

farzad-sedghi force-pushed the unique-tuple-tags branch from 3ae0da3 to efeefcd Compare February 14, 2023 18:42

java format

bdde16b

shnapz reviewed Feb 14, 2023

View reviewed changes

verify uniqeness using assert

7e648df

farzad-sedghi force-pushed the unique-tuple-tags branch 3 times, most recently from 132a5fb to 4b31c87 Compare February 15, 2023 20:44

enable java assert

2b1dcdb

farzad-sedghi force-pushed the unique-tuple-tags branch from 4b31c87 to 2b1dcdb Compare February 15, 2023 21:03

farzad-sedghi added 4 commits February 15, 2023 16:15

testing assertion setting on different jvma

5f71c08

testing exclude v 17 java tests

09fa684

test the jvm flag

650dea6

add back all the changes

c15861b

farzad-sedghi force-pushed the unique-tuple-tags branch 3 times, most recently from 1ce100a to ae8bb67 Compare February 16, 2023 06:06

enable assertion on sbt command line

85243d9

farzad-sedghi force-pushed the unique-tuple-tags branch from ae8bb67 to 85243d9 Compare February 16, 2023 06:07

back to throwing exceptions instead of assertions

acc4dd4

farzad-sedghi force-pushed the unique-tuple-tags branch from 52380d3 to acc4dd4 Compare February 16, 2023 16:12

update documentations

dae4316

clairemcginty approved these changes Feb 17, 2023

View reviewed changes

farzad-sedghi merged commit 19e3b2d into spotify:main Feb 17, 2023

farzad-sedghi added a commit to farzad-sedghi/scio that referenced this pull request Mar 6, 2023

Unique tuple tags (spotify#4711)

f61d69a

* check tuple tag id uniqueness for smb cogroup * test cases * update documentations

clairemcginty mentioned this pull request Mar 23, 2023

TupleTag collides in sortMergeTransform of multiple sorted bucket IO #4692

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unique tuple tags #4711

Unique tuple tags #4711

farzad-sedghi commented Feb 14, 2023

clairemcginty left a comment

clairemcginty Feb 14, 2023

farzad-sedghi Feb 14, 2023

clairemcginty Feb 14, 2023

farzad-sedghi Feb 14, 2023 •

edited

farzad-sedghi Feb 14, 2023 •

edited

farzad-sedghi Feb 14, 2023 •

edited

clairemcginty Feb 14, 2023

farzad-sedghi Feb 14, 2023

codecov bot commented Feb 14, 2023 •

edited

shnapz Feb 14, 2023

farzad-sedghi Feb 15, 2023

clairemcginty left a comment

Unique tuple tags #4711

Unique tuple tags #4711

Conversation

farzad-sedghi commented Feb 14, 2023

clairemcginty left a comment

Choose a reason for hiding this comment

clairemcginty Feb 14, 2023

Choose a reason for hiding this comment

farzad-sedghi Feb 14, 2023

Choose a reason for hiding this comment

clairemcginty Feb 14, 2023

Choose a reason for hiding this comment

farzad-sedghi Feb 14, 2023 • edited

Choose a reason for hiding this comment

farzad-sedghi Feb 14, 2023 • edited

Choose a reason for hiding this comment

farzad-sedghi Feb 14, 2023 • edited

Choose a reason for hiding this comment

clairemcginty Feb 14, 2023

Choose a reason for hiding this comment

farzad-sedghi Feb 14, 2023

Choose a reason for hiding this comment

codecov bot commented Feb 14, 2023 • edited

Codecov Report

shnapz Feb 14, 2023

Choose a reason for hiding this comment

farzad-sedghi Feb 15, 2023

Choose a reason for hiding this comment

clairemcginty left a comment

Choose a reason for hiding this comment

farzad-sedghi Feb 14, 2023 •

edited

farzad-sedghi Feb 14, 2023 •

edited

farzad-sedghi Feb 14, 2023 •

edited

codecov bot commented Feb 14, 2023 •

edited