Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multithreading in forward! #37

Merged
merged 3 commits into from
Jul 15, 2024
Merged

Multithreading in forward! #37

merged 3 commits into from
Jul 15, 2024

Conversation

kleincode
Copy link
Owner

Enable multithreading using the Threads.@threads macro.

  • enabling this on the RoPE loop decreases runtime by about 25%
  • enabling this on the head loop does not decrease runtime, sometimes even icreases runtime slightly -> won't do

Benchmarking code

using Llama2
using BenchmarkTools

config, weights = read_karpathy("bin/transformer/stories15M.bin") # or stories110M.bin
state = RunState{Float32}(config)
transformer = Transformer{Float32}(config, weights, state)

@benchmark forward!(transformer, 1, 1)

Before multithreading

stories15M.bin

BenchmarkTools.Trial: 1937 samples with 1 evaluation.
 Range (min … max):  2.361 ms …   5.889 ms  ┊ GC (min … max): 0.00% … 56.06%
 Time  (median):     2.499 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.571 ms ± 291.406 μs  ┊ GC (mean ± σ):  0.56% ±  3.66%

   ▂█▇▇▁▁▁▃▁
  ▄██████████▆▆▆▅▄▅▄▃▃▄▃▂▃▂▂▂▂▂▂▂▂▂▂▁▂▁▂▂▁▁▂▂▃▃▂▂▂▃▂▃▂▁▂▂▂▂▂▂ ▃
  2.36 ms         Histogram: frequency by time        3.54 ms <

 Memory estimate: 495.98 KiB, allocs estimate: 24489.

stories110M.bin

BenchmarkTools.Trial: 340 samples with 1 evaluation.
 Range (min … max):  13.248 ms … 32.860 ms  ┊ GC (min … max): 0.00% … 57.67%
 Time  (median):     14.523 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   14.715 ms ±  1.498 ms  ┊ GC (mean ± σ):  0.75% ±  4.36%

                 ▁ ▆█▇▃▁▆▅▄  
  ▃▁▁▁▁▁▁▄▄▇▄█▅▃▄█▆█████████▇▆▅▅▆▇▅▅▄▄▄▅▄▃▃▃▄▄▃▃▃▃▁▃▁▃▁▁▁▃▁▃▃ ▄
  13.2 ms         Histogram: frequency by time        16.6 ms <

 Memory estimate: 2.73 MiB, allocs estimate: 143130.

With multithreading

stories15M.bin

BenchmarkTools.Trial: 2540 samples with 1 evaluation.
 Range (min … max):  1.801 ms …   5.931 ms  ┊ GC (min … max): 0.00% … 63.55%
 Time  (median):     1.890 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.958 ms ± 228.916 μs  ┊ GC (mean ± σ):  0.21% ±  2.14%

   ▅█▅▅▁
  ▅█████▇▆▅▅▄▄▄▄▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▃▃▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▁▁▂▁▂ ▃
  1.8 ms          Histogram: frequency by time        2.95 ms <

 Memory estimate: 95.95 KiB, allocs estimate: 1209.

stories110M.bin

BenchmarkTools.Trial: 458 samples with 1 evaluation.
 Range (min … max):  10.318 ms …  12.303 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     10.780 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   10.912 ms ± 364.027 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

            ▅█▇▁▅
  ▃▁▁▄▃▃▅▅███████▆█▅▇▃▅▅▇▅▅▆▅▅▃▅▅▅▅▅▃▅▆▃▃▁▃▃▃▃▃▃▃▂▃▂▁▃▃▃▂▂▃▂▂▂ ▃
  10.3 ms         Histogram: frequency by time           12 ms <

 Memory estimate: 425.28 KiB, allocs estimate: 4926.

@kleincode kleincode added the enhancement New feature or request label Jul 13, 2024
@kleincode kleincode self-assigned this Jul 13, 2024
Copy link

codecov bot commented Jul 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Files Coverage Δ
src/transformer.jl 98.52% <100.00%> (-1.48%) ⬇️

Copy link
Collaborator

@janik072 janik072 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You said tested with all models. Does this include the 260k model as well? I thought our package is currently unable to read it at all.

@kleincode kleincode merged commit f15725e into main Jul 15, 2024
3 checks passed
@kleincode kleincode deleted the forward-multithreading branch July 15, 2024 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants