Add specialized TupleCoders #3350

regadas · 2020-09-24T15:21:14Z

I recently noticed that Tuple3 / Tuple4 are not that uncommon so decided to specialize them to save on serialization. Users should rarely rely on above then that but this adds for the other ones as well (bonus).

We should at some point ditch these python scripts and use either paiges or scalafix.

scio-core/src/main/scala/com/spotify/scio/coders/Coder.scala

scio-core/src/main/scala/com/spotify/scio/coders/instances/ScalaCoders.scala

codecov · 2020-09-24T22:06:28Z

Codecov Report

Merging #3350 into master will decrease coverage by 3.60%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #3350      +/-   ##
==========================================
- Coverage   72.71%   69.10%   -3.61%     
==========================================
  Files         234      233       -1     
  Lines        7710     7431     -279     
  Branches      347      326      -21     
==========================================
- Hits         5606     5135     -471     
- Misses       2104     2296     +192

Impacted Files	Coverage Δ
...src/main/scala/com/spotify/scio/coders/Coder.scala	`84.87% <ø> (+0.41%)`	⬆️
...om/spotify/scio/coders/instances/ScalaCoders.scala	`81.00% <ø> (-0.58%)`	⬇️
...y/scio/extra/sorter/syntax/SCollectionSyntax.scala	`95.00% <100.00%> (ø)`
...scala/com/spotify/scio/bigquery/client/Cache.scala	`0.00% <0.00%> (-74.08%)`	⬇️
.../spotify/scio/bigquery/client/BigQueryConfig.scala	`0.00% <0.00%> (-70.59%)`	⬇️
.../spotify/scio/bigquery/BigQueryPartitionUtil.scala	`0.00% <0.00%> (-52.50%)`	⬇️
...scala/com/spotify/scio/bigquery/BigQueryUtil.scala	`50.00% <0.00%> (-50.00%)`	⬇️
...otify/scio/bigquery/syntax/ScioContextSyntax.scala	`21.87% <0.00%> (-40.63%)`	⬇️
...la/com/spotify/scio/bigquery/client/QueryOps.scala	`0.73% <0.00%> (-39.71%)`	⬇️
.../scala/com/spotify/scio/bigquery/StorageUtil.scala	`0.00% <0.00%> (-36.85%)`	⬇️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7232886...51aee2d. Read the comment docs.

jto

I'm pretty sure this will actually be slower than the current version. Did you run benchmarks ?

scio-core/src/main/scala-2.12/com/spotify/scio/coders/instances/TupleCoders.scala

scio-test/src/test/scala/com/spotify/scio/coders/CoderTest.scala

regadas

Hi @jto! good points I should have made them clear in the PR description.

To emphasise the main point of this PR is to address serde time. From the quick benchmarks below we can see improvements between 20% to 50%. This can translate to somewhat big savings when using cogroup ops.

master

[info] Benchmark                               Mode  Cnt    Score    Error  Units
[info] CoderBenchmark.tuple3Decode             avgt    5   53.833 ±  0.326  ns/op
[info] CoderBenchmark.tuple3Encode             avgt    5   86.729 ± 10.622  ns/op
[info] CoderBenchmark.tuple4Decode             avgt    5   69.513 ±  1.937  ns/op
[info] CoderBenchmark.tuple4Encode             avgt    5  106.868 ±  1.345  ns/op

PR

[info] Benchmark                               Mode  Cnt   Score   Error  Units
[info] CoderBenchmark.tuple3Decode             avgt    5  26.200 ± 0.136  ns/op
[info] CoderBenchmark.tuple3Encode             avgt    5  65.883 ± 2.037  ns/op
[info] CoderBenchmark.tuple4Decode             avgt    5  40.348 ± 0.209  ns/op
[info] CoderBenchmark.tuple4Encode             avgt    5  72.218 ± 0.524  ns/op

Regarding compile time I'm aware that there's in fact some degradation (not really sure how much in user code base but there's some in scio.) but as I mention that's not the main focus here and perhaps this can be alleviated with PR #3170

~~We can try however a few approaches:~~
~~1) Apply this to tuples up to 4/5. Above that we can resort to Coder.gen.~~
~~2) If possible try see if we can optimise nested Coder.transform it in a separate PR.~~

jto · 2020-09-25T12:11:16Z

I was also talking about serde time. I think I remember fiddling with nested Coder.transform which was performing worse than the implementation of LowPriorityCoderDerivation because of the closure nesting being slower that iterating on an array. I'll need to have another look at that.

regadas · 2020-09-25T15:19:59Z

So I was looking at what impacted compile time by 2.8% and found out that it was the removal of pairCoder from Coder companion object. Not being under object scope makes implicit search take a little bit longer. We could bring it back plus the new ones but that would mean to also bring the shapeless.Strict into the scene to make it work with 2.12 but I want to avoid it so I'm inclined to let be as is.

regadas commented Sep 24, 2020

View reviewed changes

scio-core/src/main/scala/com/spotify/scio/coders/Coder.scala Outdated Show resolved Hide resolved

regadas commented Sep 24, 2020

View reviewed changes

scio-core/src/main/scala/com/spotify/scio/coders/instances/ScalaCoders.scala Show resolved Hide resolved

regadas force-pushed the tuple_coders_clean branch 3 times, most recently from 47fa3b6 to 77e8b9c Compare September 24, 2020 21:52

jto reviewed Sep 25, 2020

View reviewed changes

scio-core/src/main/scala-2.12/com/spotify/scio/coders/instances/TupleCoders.scala Show resolved Hide resolved

scio-test/src/test/scala/com/spotify/scio/coders/CoderTest.scala Show resolved Hide resolved

regadas commented Sep 25, 2020

View reviewed changes

regadas added 4 commits September 25, 2020 11:51

Remove TupleCoders shapeless.Strict

df47b1d

Clean script

7d0f9f9

Generate TupleCoders

513ac81

Add scala 2.12 TupleCoders variant

efb8985

regadas force-pushed the tuple_coders_clean branch 2 times, most recently from 3a567d3 to 0d19b9a Compare September 25, 2020 15:07

Add derived tuple coder benchmark

51aee2d

regadas force-pushed the tuple_coders_clean branch from 0d19b9a to 51aee2d Compare September 25, 2020 15:14

jto approved these changes Sep 28, 2020

View reviewed changes

jto merged commit dfc780a into spotify:master Sep 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add specialized TupleCoders #3350

Add specialized TupleCoders #3350

Uh oh!

regadas commented Sep 24, 2020 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Sep 24, 2020 •

edited

Loading

Uh oh!

jto left a comment

Uh oh!

Uh oh!

Uh oh!

regadas left a comment •

edited

Loading

Uh oh!

jto commented Sep 25, 2020

Uh oh!

regadas commented Sep 25, 2020 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add specialized TupleCoders #3350

Add specialized TupleCoders #3350

Uh oh!

Conversation

regadas commented Sep 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Sep 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jto left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

regadas left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

master

PR

Uh oh!

jto commented Sep 25, 2020

Uh oh!

regadas commented Sep 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

regadas commented Sep 24, 2020 •

edited

Loading

codecov bot commented Sep 24, 2020 •

edited

Loading

regadas left a comment •

edited

Loading

regadas commented Sep 25, 2020 •

edited

Loading