Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support conversion of String type to Utf8 in AvroWrapper #61

Merged
merged 3 commits into from
Nov 10, 2020

Conversation

curtiscwang
Copy link
Contributor

When reading Avro, objects of type String may be passed in. Since Transport uses Utf8 types for AvroString, this can lead to incompatibility issues.

When reading Avro, objects of type String  may be passed in. Since Transport uses Utf8 types for AvroString, this can lead to incompatibility issues.
@@ -43,32 +43,69 @@ private AvroWrapper() {

public static StdData createStdData(Object avroData, Schema avroSchema) {
switch (avroSchema.getType()) {
case INT:
case INT: {
if (!(avroData instanceof Integer)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you extract the shared code into a method? like
private validateData(Object data, Type avroType, Class... acceptableClasses) {
...
// throws exception here if data is not an instance of acceptableClasses, you can use method isAssignableFrom
}

Comment on lines 53 to 55
if (!(avroData instanceof Long)) {
throw new IllegalArgumentException("Unsupported type for Avro long: " + avroData.getClass());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain why those checks are required? How can we end up here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for better error message rather than relying on Java class cast exception. When we use AvroWrapper outside of Transport UDF, this may happen.

Copy link
Contributor

@wmoustafa wmoustafa Nov 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Could you please show an example call? This method is mostly internal to the framework. Do you expect the user to call it directly, or still will it be called from the framework? To see what I am referring to, see usages in other engines (Presto, Spark, etc).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned before, we need to use it for Limestone portable engine. The engine will need to create StdData from the actual Avro record (reading from Kafka or HDFS, for example).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend removing that check to keep the parallelism of implementation in different engines, since we do not have such a check in others. To keep parallelism, we either remove it from here or add it to others. Reason for not adding it to others is that it will affect performance (while still throwing an error in both cases), and it will not be very helpful since all the createStdData() calls are made from within Transport and they exactly know what type it expects (so it is not up to the user to make mistakes). I think the case of Limestone is similar, where it is an engine that knows the contract, and there is no need to check whether Limestone is doing the right call or not. Such checks are better covered with unit tests.

Comment on lines +65 to +68
if (avroData instanceof Utf8) {
return new AvroString((Utf8) avroData);
} else if (avroData instanceof String) {
return new AvroString(new Utf8((String) avroData));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this PR handles this case better. I will make some time soon to merge it :)

@khaitranq
Copy link
Contributor

LGTM

Copy link
Contributor

@wmoustafa wmoustafa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @curtiscwang for the patch!

@shardulm94 shardulm94 merged commit 7fc04aa into linkedin:master Nov 10, 2020
wmoustafa pushed a commit that referenced this pull request Oct 12, 2021
* Disable Jacoco for platform tests (#37)

* 0.0.46 release (previous 0.0.45) + release notes updated [ci skip]

* Fixed presto UDF patch broken link (#41)

Co-authored-by: Sushant Raikar <sraikar@sraikar-mn2.linkedin.biz>

* FileSystemUtils: remove an unreliable check for unit testing (#42)

Why: we don't need it

What changed: instead of checking configuration for hints on whether we
are unit testing, we trust the URI's protocol

Tests performed: ./gradlew build

* 0.0.47 release (previous 0.0.46) + release notes updated [ci skip]

* Presto: Pass custom configuration object when using FileSystemUtils (#43)

* 0.0.48 release (previous 0.0.47) + release notes updated [ci skip]

* Presto: Make ScalarFunctionImplementation state independent of StdUdfWrapper (#44)

* 0.0.49 release (previous 0.0.48) + release notes updated [ci skip]

* Upgrade to PrestoSQL 333 (#45)

Some major changes:
 - `SqlFunction`, `SqlScalarFunction` and `ScalarFunctionImplementation` have evolved
   in trinodb/trino#1764
 - `Metadata::getScalarFunctionImplementation` evolved in trinodb/trino#1039
 - Type signature parser was moved to presto-main in trinodb/trino#1738

* 0.0.50 release (previous 0.0.49) + release notes updated [ci skip]

* Add support for StdFloat, StdDouble, and StdBinary (#46)

* Introduce StdFloat, StdDouble, and StdBinary interfaces
* Add implementations of those interfaces in Avro, Hive, Presto, Spark, and Generic type systems
* Add examples of transport UDFs on those new types, and add tests for those UDFs
* Update documentation

* 0.0.51 release (previous 0.0.50) + release notes updated [ci skip]

* Allow users to override main and test source set names, output directories

* 0.0.52 release (previous 0.0.51) + release notes updated [ci skip]

* Empty commit to release new version

* Empty commit to release new version [ci skip-compare-publications]

* 0.0.53 release (previous 0.0.52) + release notes updated [ci skip]

* Plugin: Publish Presto thin jar which allows consumers to control dependency graph (#49)

* 0.0.54 release (previous 0.0.53) + release notes updated [ci skip]

* Hive: Struct data should not be converted to object array during StdStruct creation (#50)

* Remove slf4j-log4j12 from Transport dependency graph (#51)

* Bump shipkit (#54)

* 0.0.55 release (previous 0.0.54) + release notes updated [ci skip]

* Fix test SQL generation for binary inputs (#55)

* 0.0.56 release (previous 0.0.55) + release notes updated [ci skip]

* Spark: Create index-based iterator for non-mutable map keySet and values access (#58)

* 0.0.57 release (previous 0.0.56) + release notes updated [ci skip]

* Avro: Support simple union schemas (#60)

* 0.0.58 release (previous 0.0.57) + release notes updated [ci skip]

* Support conversion of String type to Utf8 in AvroWrapper (#61)

* 0.0.59 release (previous 0.0.58) + release notes updated [ci skip]

* Add Avro ENUM read support and fix String bug (#62)

Co-authored-by: Raymond Lam <ralam@linkedin.com>

* 0.0.60 release (previous 0.0.59) + release notes updated [ci skip]

* Build: Fail if there are checkstyle violations (#64)

- Change the Checkstyle severity level from warning to error
- Eliminate all existing checkstyle violations

Co-authored-by: Carl Steinbach <cwsteinbach@gmail.com>

* 0.0.61 release (previous 0.0.60) + release notes updated [ci skip]

* Add travis-build.sh for pre-commit testing from command line (#65)

* Add travis-build.sh for pre-commit testing from command line

* Fix name of build file in comment

Co-authored-by: Carl Steinbach <cwsteinbach@gmail.com>
Co-authored-by: Shardul Mahadik <smahadik@linkedin.com>

* Upgrade to Gradle 6.7 (#67)

* Support builds with platform specific JDK (#69)

* Bump Avro dependency to 1.10.2 (from 1.7.7). (#71)

There doesn't seem to be any impact to the code. gradle build passes.

* Migrate from PrestoSQL to Trino  (#68)

* Automate artifact publication to Maven Central (#72)

* Update ci.yml java version to 8 (#77)

skip release

* Fix org.pentaho:pentaho-aggdesigner-algorithm sunset problem (#78)

* Remove travis build in favor of github actions (#87)

* Add scala_2.11 and scala_2.12 support (#85)

* Update ci.yml to also build the udf-examples folder (#90)

Co-authored-by: Malini Mahalakshmi Venkatachari <malvenkatachari@linkedin.com>

* Fix running multiple builds in run step in workflow action (#92)

Co-authored-by: Malini Mahalakshmi Venkatachari <malvenkatachari@linkedin.com>

* A solution to fix running multiple UDFs in Spark issue (#93)

Co-authored-by: Kai Xu <kxu@kxu-mn3.linkedin.biz>

Co-authored-by: Shardul Mahadik <smahadik@linkedin.com>
Co-authored-by: shipkit-org <shipkit.org@gmail.com>
Co-authored-by: Sushant Raikar <sraikar@wish.com>
Co-authored-by: Sushant Raikar <sraikar@sraikar-mn2.linkedin.biz>
Co-authored-by: Suren Nihalani <1093911+SurenNihalani@users.noreply.github.com>
Co-authored-by: Xingyuan Lin <xinlin@linkedin.com>
Co-authored-by: Khai Tran <46727493+khaitranq@users.noreply.github.com>
Co-authored-by: John Joyce <jjoyce6@nd.edu>
Co-authored-by: Raymond <13109642+raymondlam12@users.noreply.github.com>
Co-authored-by: curtiscwang <cwang89@gmail.com>
Co-authored-by: Raymond Lam <ralam@linkedin.com>
Co-authored-by: Carl Steinbach <cwsteinbach+github@gmail.com>
Co-authored-by: Carl Steinbach <cwsteinbach@gmail.com>
Co-authored-by: Akshay Rai <akrai@linkedin.com>
Co-authored-by: Sreeram Ramachandran <sramachandran@linkedin.com>
Co-authored-by: Raymond Zhang <razhang@linkedin.com>
Co-authored-by: KAI XU <kxu@linkedin.com>
Co-authored-by: Sushant Raikar <sraikar@linkedin.com>
Co-authored-by: Malini Mahalakshmi Venkatachari <malvenkatachari@linkedin.com>
Co-authored-by: Kai Xu <kxu@kxu-mn3.linkedin.biz>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants