feat: serialize/deserialize custom coders #25

laysakura · 2023-05-12T23:37:19Z

Please add a meaningful description for your change here

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI.

…ache#26153) * add parallel number to gradle.properties * remove unused config, attempt to configure higher parallelism for kafka tests * Create generic task to represent Integration tests. Then, create sub-projects to enable running tests in parallel * extend test timeout to reduce flakyness * add comment for running locally * run spotless * factor kafka integration tests up a level

* Add output cells to notebook * Edit out personal details * valid json * No outputs for markdown * Remove most of dependency output * Remove more non-useful output * Remove more non-useful output

…tions

* Add outputs to notebook * Dummy outputs * Comma

* Add outputs to custom inference notebook * commas * Add dummy outputs to tfma notebook

… GroupIntoBatches tests, fixes apache#25675. (apache#26207)

…er (apache#26274)

…pache#26282) * Disable codecov for precommits * Download codecov * download curl instead of pip

* Adds a reference to new Java RunInference example * Update website/www/site/content/en/documentation/sdks/python-machine-learning.md Co-authored-by: Danny McCormick <dannymccormick@google.com> --------- Co-authored-by: Danny McCormick <dannymccormick@google.com>

* DLQ support in RunInference * Doc example * Comment * CHANGES.md

… file writes. AsList is backed by a multi-map in an attempt to provide proper indexing semantics, but this can be significantly more expensive for small pipelines (especially as it may require fixed sharding and prevent fusion).

…pache#26156) Co-authored-by: tvalentyn <tvalentyn@users.noreply.github.com>

- Reduces the use of Box, in particular each input element no longer must be boxed. - Removes one use of unsafe. - Reduce use and Any, and hide it behind newtypes. - Introduce pattern to distinguish between pipeline generation time with generic types and pipeline run time with dynamic dispatch.

…r for AWS IOs (resolves apache#26097) (apache#26098)

…list side inputs in file writes.

* Set an auth key in multi_process_shared.py * Format * Lint * remove todo, handled in apache#26202 (avoid conflicts) * Lint

…Unique Keys cases. Flush all key-values if map size reach to 12K entries + keep using LRU cache Micro BenchMark Result: 5x to 6x improvement for uniqueKeys PrecombineGroupingTableBenchmark, Minor impact on other End To End WordCount Pipeline: Approx 25% improvement Before Benchmark (distribution) (globallyWindowed) Mode Cnt Score Error Units PrecombineGroupingTableBenchmark.sumIntegerBinaryCombine uniform true thrpt 15 52.584 ± 5.631 ops/s PrecombineGroupingTableBenchmark.sumIntegerBinaryCombine uniform false thrpt 15 48.427 ± 4.090 ops/s PrecombineGroupingTableBenchmark.sumIntegerBinaryCombine normal true thrpt 15 36.470 ± 3.498 ops/s PrecombineGroupingTableBenchmark.sumIntegerBinaryCombine normal false thrpt 15 35.610 ± 0.940 ops/s PrecombineGroupingTableBenchmark.sumIntegerBinaryCombine hotKey true thrpt 15 55.111 ± 2.996 ops/s PrecombineGroupingTableBenchmark.sumIntegerBinaryCombine hotKey false thrpt 15 49.423 ± 2.859 ops/s PrecombineGroupingTableBenchmark.sumIntegerBinaryCombine uniqueKeys true thrpt 15 5.319 ± 0.655 ops/s PrecombineGroupingTableBenchmark.sumIntegerBinaryCombine uniqueKeys false thrpt 15 5.094 ± 0.337 ops/s After Benchmark (distribution) (globallyWindowed) Mode Cnt Score Error Units PrecombineGroupingTableBenchmark.sumIntegerBinaryCombine uniform true thrpt 15 52.442 ± 1.937 ops/s PrecombineGroupingTableBenchmark.sumIntegerBinaryCombine uniform false thrpt 15 44.824 ± 3.504 ops/s PrecombineGroupingTableBenchmark.sumIntegerBinaryCombine normal true thrpt 15 33.719 ± 2.688 ops/s PrecombineGroupingTableBenchmark.sumIntegerBinaryCombine normal false thrpt 15 30.081 ± 1.278 ops/s PrecombineGroupingTableBenchmark.sumIntegerBinaryCombine hotKey true thrpt 15 51.839 ± 3.127 ops/s PrecombineGroupingTableBenchmark.sumIntegerBinaryCombine hotKey false thrpt 15 46.264 ± 1.691 ops/s PrecombineGroupingTableBenchmark.sumIntegerBinaryCombine uniqueKeys true thrpt 15 32.422 ± 1.269 ops/s PrecombineGroupingTableBenchmark.sumIntegerBinaryCombine uniqueKeys false thrpt 15 29.210 ± 1.757 ops/s

Bumps [com.github.spotbugs.snom:spotbugs-gradle-plugin](https://github.com/spotbugs/spotbugs-gradle-plugin) from 5.0.3 to 5.0.14. - [Release notes](https://github.com/spotbugs/spotbugs-gradle-plugin/releases) - [Commits](spotbugs/spotbugs-gradle-plugin@5.0.3...5.0.14) --- updated-dependencies: - dependency-name: com.github.spotbugs.snom:spotbugs-gradle-plugin dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…n. (apache#26270)

* Gradle task to remove snippet by its ID * Improve logging * Add option to select datastore namespace * Check if user supplied invalid subcommand

…pache#26267) * Coalesce sources until compressed serialized bundles under API limit * Address comments: update url

* Fix broken links * Update python-machine-learning.md

aromanenko-dev and others added 30 commits April 13, 2023 18:28

Test Avro extension against multiple Avro versions (apache#25216)

7919c3f

Add output cells to notebook (apache#26265)

183b74d

* Add output cells to notebook * Edit out personal details * valid json * No outputs for markdown * Remove most of dependency output * Remove more non-useful output * Remove more non-useful output

Extract BundleManager to an Interface in SamzaRunner (apache#26268)

5a9ab68

Merge pull request apache#25930: Optimize counters by reducing alloca…

241e40f

…tions

Add outputs to notebook (apache#26269)

fb545ea

* Add outputs to notebook * Dummy outputs * Comma

Allow notebooks to be picked up by tooling (apache#26281)

4a20ea4

* Add outputs to custom inference notebook * commas * Add dummy outputs to tfma notebook

Autosharding support for Java is now fixed on Dataflow. Un-sickbaying…

f1fba08

… GroupIntoBatches tests, fixes apache#25675. (apache#26207)

Refactor DoFnOp.FutureCollectorImpl to a top level class in SamzaRunn…

78b5ffb

…er (apache#26274)

Vendor grpc 1.54.0 (apache#26271)

a38d9b9

Download codecov uploader since codecov has been removed from pypi (a…

0bf9d55

…pache#26282) * Disable codecov for precommits * Download codecov * download curl instead of pip

DLQ support in RunInference (apache#26261)

b9f27f9

* DLQ support in RunInference * Doc example * Comment * CHANGES.md

Implement file write fix in update-compatible way.

33293fe

Add ValidatesContainer tests with installing release candidates(RCs) (a…

e86486d

…pache#26156) Co-authored-by: tvalentyn <tvalentyn@users.noreply.github.com>

Merge pull request apache#26284: Fix GroupIntoBatches hold

f5f7a47

[AWS] Support usage of StsAssumeRoleWithWebIdentityCredentialsProvide…

9c00dca

…r for AWS IOs (resolves apache#26097) (apache#26098)

Merge pull request apache#26289 Use iterable side inputs rather than …

80f1f6c

…list side inputs in file writes.

Set an auth key in multi_process_shared.py (apache#26172)

3424320

* Set an auth key in multi_process_shared.py * Format * Lint * remove todo, handled in apache#26202 (avoid conflicts) * Lint

Unpin azurite version. (apache#26324)

1dd129c

Generating Unique ID within function so that it's unique for every ru…

479b064

…n. (apache#26270)

Add more logging to QueryChangeStreamAction exceptions (apache#26219)

7a68348

[Playground] Gradle task to remove snippet by its ID (apache#26102)

f2f2fb8

* Gradle task to remove snippet by its ID * Improve logging * Add option to select datastore namespace * Check if user supplied invalid subcommand

address comments

e61e365

Coalesce sources until compressed serialized bundles under API limit (a…

51d857f

…pache#26267) * Coalesce sources until compressed serialized bundles under API limit * Address comments: update url

Fix broken links (apache#26322)

57d9029

* Fix broken links * Update python-machine-learning.md

laysakura closed this May 12, 2023

github-actions bot added build docker examples go infra java kotlin learning model python extensions euphoria jackson kryo protobuf sketching io gcp runners core dataflow direct flink fn-execution jet portability samza spark twister2 labels May 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: serialize/deserialize custom coders #25

feat: serialize/deserialize custom coders #25

laysakura commented May 12, 2023

feat: serialize/deserialize custom coders #25

feat: serialize/deserialize custom coders #25

Conversation

laysakura commented May 12, 2023

GitHub Actions Tests Status (on master branch)