Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unique tuple tags #4711

Merged
merged 12 commits into from
Feb 17, 2023
Merged

Unique tuple tags #4711

merged 12 commits into from
Feb 17, 2023

Conversation

farzad-sedghi
Copy link
Contributor

Copy link
Contributor

@clairemcginty clairemcginty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check should also be added here?

HashSet<String> inputNames = new HashSet<>();
inputs.stream()
.forEach(i -> {
if (!inputNames.add(i.getTupleTag().getId())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would just change this to:

assert(inputNames.add(i.getTupleTag().getId())

And not add IllegalArgumentException to these signatures

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh ok I can do that. Can you please say why that is preferred?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! It's just a small enough edge case that I don't think it warrants changing a user-facing API signature for. Additionally it's the convention we're using elsewhere in scio-smb (check out at the constructors for all BucketMetadata implementations, which all use asserts)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm ok I see the signature change is a good reason. But as far as I know the main difference between assert and throwing an exception is the ability to turn assert off in prod (it is a development feature). Isn't that why we use it in other places?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general it is easy to know where assert should be used and where it should not. e.g. when the input is set by user, you should NOT use assertion. In this case "the user" is the developer which chooses the name of the tuple tags? But that is at development time from their POV, while they are using our prod version of lib. confusing!
please let me know wyt :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also this is a runtime exception so no need to have it in the signature. Now the signature is unchanged

@@ -110,12 +128,14 @@ private CoGbkWithSecondaryBuilder(Class<K1> primaryKeyClass, Class<K2> secondary
* Returns a new {@link CoGbkWithSecondary} with the given first sorted-bucket source in {@link
* Read}.
*/
public CoGbkWithSecondary<K1, K2> of(Read<?> read) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this method got deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My IDE made a lot of white space changes, I made a bunch of mistakes fixing them back :)

@codecov
Copy link

codecov bot commented Feb 14, 2023

Codecov Report

Merging #4711 (08a8ac2) into main (1c30980) will decrease coverage by 0.02%.
The diff coverage is 33.33%.

❗ Current head 08a8ac2 differs from pull request most recent head dae4316. Consider uploading reports for the commit dae4316 to get more accurate results

@@            Coverage Diff             @@
##             main    #4711      +/-   ##
==========================================
- Coverage   60.94%   60.93%   -0.02%     
==========================================
  Files         286      286              
  Lines       10467    10479      +12     
  Branches      750      755       +5     
==========================================
+ Hits         6379     6385       +6     
- Misses       4088     4094       +6     
Impacted Files Coverage Δ
...n/scala/com/spotify/scio/bigquery/BigQueryIO.scala 39.13% <0.00%> (-0.50%) ⬇️
...n/scala/com/spotify/scio/bigtable/BigTableIO.scala 20.00% <0.00%> (-0.52%) ⬇️
...scala/com/spotify/scio/datastore/DatastoreIO.scala 16.66% <0.00%> (+4.16%) ⬆️
.../main/scala/com/spotify/scio/pubsub/PubsubIO.scala 9.87% <0.00%> (-0.52%) ⬇️
...ain/scala/com/spotify/scio/spanner/SpannerIO.scala 23.07% <0.00%> (-0.93%) ⬇️
.../src/main/scala/com/spotify/scio/jdbc/JdbcIO.scala 27.50% <0.00%> (ø)
.../spotify/scio/jdbc/sharded/JdbcShardedSelect.scala 16.66% <0.00%> (ø)
.../smb/syntax/SortMergeBucketScioContextSyntax.scala 18.32% <66.66%> (ø)
.../src/main/scala/com/spotify/scio/avro/AvroIO.scala 93.00% <100.00%> (+0.14%) ⬆️
...com/spotify/scio/coders/instances/AvroCoders.scala 82.35% <100.00%> (+2.35%) ⬆️
... and 6 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@@ -69,7 +69,7 @@ final class SortedBucketScioContext(@transient private val self: ScioContext) ex
rhs: SortedBucketIO.Read[R],
targetParallelism: TargetParallelism = TargetParallelism.auto()
): SCollection[(K, (L, R))] = {
val t = SortedBucketIO.read(keyClass).of(lhs).and(rhs).withTargetParallelism(targetParallelism)
val t = SortedBucketIO.read(keyClass).of(lhs, rhs).withTargetParallelism(targetParallelism)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you move from approach with and? To easier catch all TupleTags in one method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was the convention used in the generated smb multi join file. Also it causes a bit less object overhead, not that it matters much

@farzad-sedghi farzad-sedghi force-pushed the unique-tuple-tags branch 3 times, most recently from 132a5fb to 4b31c87 Compare February 15, 2023 20:44
@farzad-sedghi farzad-sedghi force-pushed the unique-tuple-tags branch 3 times, most recently from 1ce100a to ae8bb67 Compare February 16, 2023 06:06
Copy link
Contributor

@clairemcginty clairemcginty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@farzad-sedghi farzad-sedghi merged commit 19e3b2d into spotify:main Feb 17, 2023
farzad-sedghi added a commit to farzad-sedghi/scio that referenced this pull request Mar 6, 2023
* check tuple tag id uniqueness for smb cogroup

* test cases

* update documentations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants