Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add counter for SMB Predicate filtering #5221

Merged
merged 7 commits into from
Feb 1, 2024
Merged

Conversation

clairemcginty
Copy link
Contributor

Screenshot 2024-01-31 at 3 44 14 PM

@@ -178,7 +178,7 @@ object SortMergeBucketJoinExample {
sc.sortMergeJoin(
classOf[Integer],
ParquetAvroSortedBucketIO
.read(new TupleTag[GenericRecord](), SortMergeBucketExample.UserDataSchema)
.read(new TupleTag[GenericRecord]("users"), SortMergeBucketExample.UserDataSchema)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately, if you don't name your TupleTag it gets a randomly generated ID and you end up with a counter called SortedBucketSource{1}-PredicateFilteredRecordsCount_com.spotify.scio.examples.extra.SortMergeBucketJoinExample$.pipeline:181#a43f085462b77df0 😬

another option would be to use the TypeDescriptor of the TupleTag (in this case, GenericRecord, so you'd end up with SortedBucketSource{1}-PredicateFilteredRecordsCount_GenericRecord ... but I think that's worse since two sources might have the same parameterized type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that said, it's common practice to name the TupleTag used in an SMB read, as far as I know, so this shouldn't be much of an issue

Copy link

codecov bot commented Jan 31, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (0ca894e) 62.63% compared to head (564e231) 62.65%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5221      +/-   ##
==========================================
+ Coverage   62.63%   62.65%   +0.01%     
==========================================
  Files         301      301              
  Lines       10845    10845              
  Branches      768      768              
==========================================
+ Hits         6793     6795       +2     
+ Misses       4052     4050       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@RustedBones RustedBones merged commit c35e667 into main Feb 1, 2024
11 checks passed
@RustedBones RustedBones deleted the smb-predicate-counter branch February 1, 2024 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants