Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change parse_row! to a generated function to avoid dynamic dispatch #23

Merged
merged 1 commit into from
Oct 21, 2021

Conversation

nickrobinson251
Copy link
Owner

@nickrobinson251 nickrobinson251 commented Oct 1, 2021

currently this is a bit gross... tbh the generic parse_row! function is actually quite nice (for a generated function), but the Transformers one is pretty gross right now. And in general needs a bit of clean-up. But i think something like this is the way to go.

@nickrobinson251 nickrobinson251 marked this pull request as draft October 1, 2021 17:09
@codecov-commenter
Copy link

codecov-commenter commented Oct 1, 2021

Codecov Report

Merging #23 (bee93aa) into main (99ca5ee) will increase coverage by 2.19%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #23      +/-   ##
==========================================
+ Coverage   95.48%   97.67%   +2.19%     
==========================================
  Files           2        3       +1     
  Lines         177      258      +81     
==========================================
+ Hits          169      252      +83     
+ Misses          8        6       -2     
Impacted Files Coverage Δ
src/parsing.jl 95.60% <100.00%> (+0.48%) ⬆️
src/transformers_parsing.jl 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 99ca5ee...bee93aa. Read the comment docs.

@nickrobinson251
Copy link
Owner Author

Some intial timings (on files with ~250,000 rows, although of course we're currently only parsing up the end of transformers data, so ~90% of that)

on main

julia> using PowerFlowData

julia> @time net = parse_network("test/realistic29.raw");
  5.072275 seconds (49.92 M allocations: 1.661 GiB, 5.07% gc time, 27.00% compilation time)

julia> @time net = parse_network("test/realistic29.raw");
  3.600963 seconds (48.55 M allocations: 1.593 GiB, 2.00% gc time)

julia> @time net = parse_network("test/realistic30.raw");
  3.478833 seconds (43.96 M allocations: 1.441 GiB, 4.28% gc time)

julia> @time net = parse_network("test/realistic30.raw");
  3.369844 seconds (43.96 M allocations: 1.441 GiB, 4.23% gc time)

on this branch

(at 3146cb0)

julia> using PowerFlowData

julia> @time net = parse_network("test/realistic29.raw");
  0.363916 seconds (128.33 k allocations: 55.870 MiB, 15.86% gc time, 44.51% compilation time)

julia> @time net = parse_network("test/realistic29.raw");
  0.226292 seconds (444 allocations: 47.671 MiB, 2.50% gc time)

julia> @time net = parse_network("test/realistic30.raw");
  0.211939 seconds (444 allocations: 47.608 MiB, 3.81% gc time)

julia> @time net = parse_network("test/realistic30.raw");
  0.214129 seconds (444 allocations: 47.608 MiB, 3.66% gc time)

@nickrobinson251
Copy link
Owner Author

i think 444 allocations seems a lot more reasonable

Here's how many come from just the constructors

julia> for T in fieldtypes(Network)[2:end]
           @show T fieldcount(T)
           @btime $T(0)
           println("--")
       end
T = Buses
fieldcount(T) = 11
  2.370 μs (27 allocations: 1.50 KiB)
--
T = Loads
fieldcount(T) = 12
  2.590 μs (29 allocations: 1.66 KiB)
--
T = Generators
fieldcount(T) = 20
  4.093 μs (46 allocations: 2.89 KiB)
--
T = Branches
fieldcount(T) = 17
  3.596 μs (40 allocations: 2.48 KiB)
--
T = Transformers
fieldcount(T) = 73
  14.476 μs (152 allocations: 10.02 KiB)
--

@nickrobinson251
Copy link
Owner Author

nickrobinson251 commented Oct 2, 2021

some ideas for when i have another go at refactoring the Transformers bit of this:

  • split the code so we only check is_t2 once, e.g. build up the expressions for both sides of that branch, rather than branching multiple times
  • could we move out of the generated func all the code that isn't just pulling out the fieldtypes? Can we e.g. pull the types out into a Tuple (T1, T2, T3, ...) and then work with that (I suppose that "work with that" bit might still need to be generated... hmm -- invesitgate something like this anyway)

also, i think these are the cases that appear handle:

  1. parse cell
  2. parse end cell (and handle EOL comments)
  3. if t2; do 2; else 1
  4. if t2; fill with missing; else do 1
  5. if t2; fill with missing; else do 2

@nickrobinson251
Copy link
Owner Author

on this branch with the simplified code (at 08d6278)

julia> using PowerFlowData

julia> @time net = parse_network("test/realistic29.raw");
  0.316417 seconds (128.28 k allocations: 55.867 MiB, 3.57% gc time, 33.23% compilation time)

julia> @time net = parse_network("test/realistic29.raw");
  0.212118 seconds (444 allocations: 47.671 MiB, 4.25% gc time)

julia> @time net = parse_network("test/realistic30.raw");
  0.222221 seconds (444 allocations: 47.608 MiB, 3.28% gc time)

julia> @time net = parse_network("test/realistic30.raw");
  0.207607 seconds (444 allocations: 47.608 MiB)

- Doesn't yet do this for Transfomers,
  as they need their own handling

Add initial generate func for parsing Transformers

Handle comments in Transformers data wherever possible

Define generated function after the functions they call

Put `generated` declaration on outside

Remove special handling of EOL comments from generated functions

Simpler generated function for Transformer data

- relies on not having to handle
  end-of-line comments explicitly
@nickrobinson251 nickrobinson251 force-pushed the npr/generated-func-for-type-stability branch from 08d6278 to 24bead9 Compare October 21, 2021 17:38
@nickrobinson251 nickrobinson251 marked this pull request as ready for review October 21, 2021 17:39
@nickrobinson251 nickrobinson251 merged commit 25f0e67 into main Oct 21, 2021
@nickrobinson251 nickrobinson251 deleted the npr/generated-func-for-type-stability branch October 21, 2021 18:45
@nickrobinson251 nickrobinson251 mentioned this pull request Nov 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avoid type instability/dynamic dispatch for parsing values
2 participants