Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: serialize/deserialize custom coders #25

Closed
wants to merge 1,797 commits into from
Closed

feat: serialize/deserialize custom coders #25

wants to merge 1,797 commits into from

Conversation

laysakura
Copy link
Owner

Please add a meaningful description for your change here


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI.

aromanenko-dev and others added 30 commits April 13, 2023 18:28
…ache#26153)

* add parallel number to gradle.properties

* remove unused config, attempt to configure higher parallelism for kafka tests

* Create generic task to represent Integration tests. Then, create sub-projects to enable running tests in parallel

* extend test timeout to reduce flakyness

* add comment for running locally

* run spotless

* factor kafka integration tests up a level
* Add output cells to notebook

* Edit out personal details

* valid json

* No outputs for markdown

* Remove most of dependency output

* Remove more non-useful output

* Remove more non-useful output
* Add outputs to notebook

* Dummy outputs

* Comma
* Add outputs to custom inference notebook

* commas

* Add dummy outputs to tfma notebook
…pache#26282)

* Disable codecov for precommits

* Download codecov

* download curl instead of pip
* Adds a reference to new Java RunInference example

* Update website/www/site/content/en/documentation/sdks/python-machine-learning.md

Co-authored-by: Danny McCormick <dannymccormick@google.com>

---------

Co-authored-by: Danny McCormick <dannymccormick@google.com>
* DLQ support in RunInference

* Doc example

* Comment

* CHANGES.md
… file writes.

AsList is backed by a multi-map in an attempt to provide proper
indexing semantics, but this can be significantly more expensive
for small pipelines (especially as it may require fixed sharding
and prevent fusion).
…pache#26156)

Co-authored-by: tvalentyn <tvalentyn@users.noreply.github.com>
- Reduces the use of Box, in particular each input element
  no longer must be boxed.
- Removes one use of unsafe.
- Reduce use and Any, and hide it behind newtypes.
- Introduce pattern to distinguish between pipeline generation time
  with generic types and pipeline run time with dynamic dispatch.
* Set an auth key in multi_process_shared.py

* Format

* Lint

* remove todo, handled in apache#26202 (avoid conflicts)

* Lint
…Unique Keys cases.

Flush all key-values if map size reach to 12K entries + keep using LRU cache

Micro BenchMark Result: 5x to 6x improvement for uniqueKeys PrecombineGroupingTableBenchmark, Minor impact on other
End To End WordCount Pipeline: Approx 25% improvement

Before

Benchmark                                                 (distribution)  (globallyWindowed)   Mode  Cnt   Score   Error  Units
PrecombineGroupingTableBenchmark.sumIntegerBinaryCombine         uniform                true  thrpt   15  52.584 ± 5.631  ops/s
PrecombineGroupingTableBenchmark.sumIntegerBinaryCombine         uniform               false  thrpt   15  48.427 ± 4.090  ops/s
PrecombineGroupingTableBenchmark.sumIntegerBinaryCombine          normal                true  thrpt   15  36.470 ± 3.498  ops/s
PrecombineGroupingTableBenchmark.sumIntegerBinaryCombine          normal               false  thrpt   15  35.610 ± 0.940  ops/s
PrecombineGroupingTableBenchmark.sumIntegerBinaryCombine          hotKey                true  thrpt   15  55.111 ± 2.996  ops/s
PrecombineGroupingTableBenchmark.sumIntegerBinaryCombine          hotKey               false  thrpt   15  49.423 ± 2.859  ops/s
PrecombineGroupingTableBenchmark.sumIntegerBinaryCombine      uniqueKeys                true  thrpt   15   5.319 ± 0.655  ops/s
PrecombineGroupingTableBenchmark.sumIntegerBinaryCombine      uniqueKeys               false  thrpt   15   5.094 ± 0.337  ops/s

After

Benchmark                                                 (distribution)  (globallyWindowed)   Mode  Cnt   Score   Error  Units
PrecombineGroupingTableBenchmark.sumIntegerBinaryCombine         uniform                true  thrpt   15  52.442 ± 1.937  ops/s
PrecombineGroupingTableBenchmark.sumIntegerBinaryCombine         uniform               false  thrpt   15  44.824 ± 3.504  ops/s
PrecombineGroupingTableBenchmark.sumIntegerBinaryCombine          normal                true  thrpt   15  33.719 ± 2.688  ops/s
PrecombineGroupingTableBenchmark.sumIntegerBinaryCombine          normal               false  thrpt   15  30.081 ± 1.278  ops/s
PrecombineGroupingTableBenchmark.sumIntegerBinaryCombine          hotKey                true  thrpt   15  51.839 ± 3.127  ops/s
PrecombineGroupingTableBenchmark.sumIntegerBinaryCombine          hotKey               false  thrpt   15  46.264 ± 1.691  ops/s
PrecombineGroupingTableBenchmark.sumIntegerBinaryCombine      uniqueKeys                true  thrpt   15  32.422 ± 1.269  ops/s
PrecombineGroupingTableBenchmark.sumIntegerBinaryCombine      uniqueKeys               false  thrpt   15  29.210 ± 1.757  ops/s
Bumps [com.github.spotbugs.snom:spotbugs-gradle-plugin](https://github.com/spotbugs/spotbugs-gradle-plugin) from 5.0.3 to 5.0.14.
- [Release notes](https://github.com/spotbugs/spotbugs-gradle-plugin/releases)
- [Commits](spotbugs/spotbugs-gradle-plugin@5.0.3...5.0.14)

---
updated-dependencies:
- dependency-name: com.github.spotbugs.snom:spotbugs-gradle-plugin
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Gradle task to remove snippet by its ID

* Improve logging

* Add option to select datastore namespace

* Check if user supplied invalid subcommand
…pache#26267)

* Coalesce sources until compressed serialized bundles under API limit

* Address comments: update url
* Fix broken links

* Update python-machine-learning.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment