-
Notifications
You must be signed in to change notification settings - Fork 513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Magnolify API #5286
Magnolify API #5286
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## v0.15.x #5286 +/- ##
===========================================
- Coverage 62.63% 62.55% -0.09%
===========================================
Files 301 302 +1
Lines 10842 11013 +171
Branches 748 746 -2
===========================================
+ Hits 6791 6889 +98
- Misses 4051 4124 +73 ☔ View full report in Codecov by Sentry. |
scio-avro/src/main/scala/com/spotify/scio/avro/ProtobufIO.scala
Outdated
Show resolved
Hide resolved
data | ||
.map(t => pt.to(t)) | ||
.write(underlying)(params.copy(metadata = metadata)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be wrapped in a transform too ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately when calling write
on an underlying IO, the returned tap obscures the contained steps. Specifically, this doesn't work:
data.transform { scoll =>
scoll
.map(t => pt.to(t))
.write(underlying)(params.copy(metadata = metadata))
}
You can use writeWithContext
directly ala:
underlying.writeWithContext(
data.map(t => pt.to(t)),
params.copy(metadata = metadata)
}
but this doesn't solve our issue as there is a single map
step here so we wouldn't be combining any steps inside a transform (data.transform(_.map(t=>pt.to(t)))
== data.map(t=>pt.to(t))
) and any steps in the underlying transform are chained afterwards.
For these cases we'd need some way to capture or rewrite previous steps in the graph.
scio-avro/src/main/scala/com/spotify/scio/avro/syntax/ScioContextSyntax.scala
Outdated
Show resolved
Hide resolved
scio-examples/src/main/scala/com/spotify/scio/examples/extra/MagnolifyAvroExample.scala
Show resolved
Hide resolved
...oud-platform/src/main/scala/com/spotify/scio/bigquery/dynamic/syntax/SCollectionSyntax.scala
Outdated
Show resolved
Hide resolved
...-google-cloud-platform/src/main/scala/com/spotify/scio/bigquery/syntax/MagnolifySyntax.scala
Outdated
Show resolved
Hide resolved
scio-tensorflow/src/main/scala/com/spotify/scio/tensorflow/syntax/SCollectionSyntax.scala
Outdated
Show resolved
Hide resolved
scio-avro/src/main/scala/com/spotify/scio/avro/syntax/SCollectionSyntax.scala
Outdated
Show resolved
Hide resolved
scio-avro/src/main/scala/com/spotify/scio/avro/AvroTypedIO.scala
Outdated
Show resolved
Hide resolved
scio-avro/src/main/scala/com/spotify/scio/avro/AvroTypedIO.scala
Outdated
Show resolved
Hide resolved
scio-avro/src/main/scala/com/spotify/scio/avro/ProtobufIO.scala
Outdated
Show resolved
Hide resolved
* Read avro data from `path` as `GenericRecord` and convert to `T` via the implicitly-available | ||
* `magnolify.avro.AvroType[T]` | ||
*/ | ||
def typedAvroFileMagnolify[T: MagnolifyAvroType: Coder]( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO this was a mistake to name that typedAvro...
. This should have been avroTyped...
.
This improves developer experience when APIs are grouped by prefix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I've always found the typed
prefix annoying. We could scalafix it I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the question is if we want that change for 0.15? We can discuss this later anyway. I'd better get this change in another PR
scio-google-cloud-platform/src/main/scala/com/spotify/scio/bigquery/BigQueryIO.scala
Outdated
Show resolved
Hide resolved
scio-google-cloud-platform/src/main/scala/com/spotify/scio/bigquery/BigQueryIO.scala
Outdated
Show resolved
Hide resolved
scio-google-cloud-platform/src/main/scala/com/spotify/scio/bigquery/BigQueryIO.scala
Outdated
Show resolved
Hide resolved
scio-google-cloud-platform/src/main/scala/com/spotify/scio/datastore/DatastoreIO.scala
Outdated
Show resolved
Hide resolved
scio-google-cloud-platform/src/main/scala/com/spotify/scio/datastore/DatastoreIO.scala
Outdated
Show resolved
Hide resolved
scio-avro/src/main/scala/com/spotify/scio/avro/ProtobufIO.scala
Outdated
Show resolved
Hide resolved
scio-avro/src/main/scala/com/spotify/scio/avro/ProtobufIO.scala
Outdated
Show resolved
Hide resolved
scio-avro/src/main/scala/com/spotify/scio/avro/ProtobufIO.scala
Outdated
Show resolved
Hide resolved
scio-avro/src/main/scala/com/spotify/scio/avro/syntax/SCollectionSyntax.scala
Outdated
Show resolved
Hide resolved
} | ||
} | ||
|
||
final class TypedMagnolifyAvroSCollectionOps[T](private val self: SCollection[T]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make T: AvroMagnolifyType
part of TypedMagnolifyAvroSCollectionOps
's context bound so that all methods have access to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vouched for the opposite in another comment :)
Reason: I'd rather have the method available to the IDE completion suggestion, even if there is no implicit found. This is also consistent with the coder behavior, required on the method rather that the implicit extensions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense! I'm in favor of making APIs more IDE-friendly 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: since all Scio IOs are public, we should create scalafix migration rules to change to the new names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work.
Deprecation of the BQ api makes sense on the AvroType
and toTable
.
BQ from*
does not really have an equivalent. I feel the redirection to magnolify can be confusing but I don't have better alternative.
successfulInsertsPropagation: Boolean, | ||
extendedErrorInfo: Boolean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we create overload ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ambivalent to/slightly against providing every possible override combination, so did not do that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with that
Adds magnolify-based 'typed' APIs.
This is a kind of brute-force change that adds several ugly but easily renamed APIs while not conflicting with the existing macro-based APIs, which we hope to deprecate in a future version. I attempted some advanced type gymnastics to allow better naming, but Miles Sabin I am not.
Deprecates
AvroType
andBigQueryType
, breaks binary compatibility.Avro
sc.typedAvroFileMagnolify
, avoiding conflicts withavroFile
and macro-basedtypedAvroFile
. Alternative:typedAvro
(noFile
)coll.saveAsAvroFile
Bigtable
typedBigtable
saveAsBigtable
BigQuery
sc.typedBigQuerySelect
,typedBigQueryTable
;Select
andTable
to avoid conflicts with the macro read variantstypedBigQueryStorageMagnolify
, avoiding conflicts withbigQueryStorage
and macro-basedtypedBigQueryStorage
saveAsBigQueryTable
, avoiding macro-basedsaveAsTypedBigQueryTable
. OthersaveAsBigQueryTable
variants are scoped toTableRow
andGenericRecord
.Datastore
typedDatastore
saveAsDatastore
Protobuf
typedProtobufFile
saveAsProtobufFile
Tensorflow
typedTfRecordFile
saveAsTfRecordFile