Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace mapslices with selectdim #45

Merged
merged 4 commits into from
Mar 9, 2021
Merged

Replace mapslices with selectdim #45

merged 4 commits into from
Mar 9, 2021

Conversation

glennmoy
Copy link
Member

@glennmoy glennmoy commented Mar 9, 2021

This PR replaces the call to mapslices with a call to selectdim instead.

We change this now because we no longer need to take slices in the parent apply method(s) (#42).
So we can replace mapslices with selectdim instead, which is simpler and much more performant (see below), yet keeps the desired dims convention.

background

This is a follow up to #42, which removed the need for looping over the slices in apply, and #25, which was aimed at solving our dims convention #18.

We originally liked mapslices because we wanted to adopt its dims convention, which was more intuitive to our users.
However, it has some considerable downsides that make it a poor choice in the long-term, namely:

Its one technical advantage was allowing multiple dims, but this isn't a necessary feature right now.

Comparing to previous benchmarks this significantly reduces allocations (x100) for element-wise transforms (although MeanStdScaling is a bit of an unfair comparison since it has been simplified in the meantime). It might be worth looking to make the rest more efficient also.

# Periodic - this branch
# apply
0.333721 seconds (1.46 M allocations: 70.581 MiB, 7.08% gc time)
0.000225 seconds (11 allocations: 234.734 KiB)
# apply!
0.014992 seconds (31.75 k allocations: 1.924 MiB)
0.000282 seconds (10 allocations: 234.703 KiB)

# Periodic - main
apply
0.780758 seconds (2.84 M allocations: 146.639 MiB, 5.11% gc time)
0.000626 seconds (1.64 k allocations: 380.922 KiB)
apply!
0.027169 seconds (71.10 k allocations: 4.024 MiB)
0.000667 seconds (1.64 k allocations: 380.922 KiB)

# Power - this branch
# apply
0.080944 seconds (282.02 k allocations: 14.327 MiB)
0.000037 seconds (10 allocations: 78.516 KiB)
# apply!
0.031575 seconds (50.51 k allocations: 2.755 MiB, 39.06% gc time)
0.000034 seconds (9 allocations: 78.500 KiB)

# Power - main
# apply
0.160279 seconds (347.52 k allocations: 18.862 MiB)
0.000445 seconds (1.74 k allocations: 221.500 KiB)
# apply!
0.007062 seconds (5.40 k allocations: 417.305 KiB)
0.000387 seconds (1.74 k allocations: 221.500 KiB)

# MeanStdScaling - this branch
# apply
0.168360 seconds (606.91 k allocations: 31.994 MiB)
0.000052 seconds (13 allocations: 78.734 KiB)
# apply!
0.005720 seconds (2.72 k allocations: 229.367 KiB)
0.000019 seconds (13 allocations: 78.734 KiB)

# MeanStdScaling - main
# apply
0.184251 seconds (419.53 k allocations: 22.622 MiB, 5.67% gc time)
0.000619 seconds (2.04 k allocations: 234.000 KiB)
# apply!
0.007365 seconds (5.70 k allocations: 430.008 KiB)
0.000334 seconds (2.04 k allocations: 234.000 KiB)

# LinearCombination - this branch
# apply
0.342954 seconds (1.24 M allocations: 63.726 MiB, 6.75% gc time)
0.001046 seconds (30.61 k allocations: 812.000 KiB)

# LinearCombination - main
# apply
0.329633 seconds (1.26 M allocations: 64.869 MiB, 3.28% gc time)
0.000814 seconds (30.61 k allocations: 812.000 KiB)

# OneHotEncoding - this branch
# apply
0.057715 seconds (240.52 k allocations: 11.567 MiB)
0.000384 seconds (10.01 k allocations: 254.109 KiB)

# OneHotEncoding - main
# apply
0.021401 seconds (31.42 k allocations: 1.939 MiB)
0.000913 seconds (11.98 k allocations: 968.531 KiB)
"""

@codecov
Copy link

codecov bot commented Mar 9, 2021

Codecov Report

Merging #45 (d27a016) into main (f6bb5e4) will decrease coverage by 0.05%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #45      +/-   ##
==========================================
- Coverage   94.84%   94.79%   -0.06%     
==========================================
  Files           9        9              
  Lines          97       96       -1     
==========================================
- Hits           92       91       -1     
  Misses          5        5              
Impacted Files Coverage Δ
src/transformers.jl 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f6bb5e4...99efde3. Read the comment docs.

@glennmoy glennmoy changed the title WIP: Remove mapslices Replace mapslices with selectdim Mar 9, 2021
@@ -35,8 +35,8 @@
@test FeatureTransforms.apply(x, ohe; inds=2:4) == [0 0 1; 0 1 0; 0 0 1]
@test FeatureTransforms.apply(x, ohe; dims=:) == expected

@test_throws DimensionMismatch FeatureTransforms.apply(x, ohe; dims=1)
Copy link
Member Author

@glennmoy glennmoy Mar 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dims no longer takes slices in apply so it makes no difference to how OneHotEncoding works unless used with inds

Copy link
Contributor

@bencottier bencottier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Are we leaving LinearCombination for a follow-up?

@glennmoy
Copy link
Member Author

glennmoy commented Mar 9, 2021

Looks good to me. Are we leaving LinearCombination for a follow-up?

yep, I'm working on it now

@glennmoy glennmoy merged commit 99b65ee into main Mar 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants