Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Suggestion / Benchmarks #66

Open
alxspiker opened this issue May 16, 2023 · 1 comment
Open

Performance Suggestion / Benchmarks #66

alxspiker opened this issue May 16, 2023 · 1 comment

Comments

@alxspiker
Copy link
Contributor

Max Threads = Poor Performance on 8 thread processor and GGJT model after convert.py

TL:DR - Try setting n_threads to 6 instead of 8 if you have an 8 thread processor. Getting consistently faster results than trying to use all of my 8 threads.
Been doing some testing with a GGJT model to try to get the best performance on a little laptop. I did 2 tests for each change to n_threads. Tests were conducted while nothing else was open.

Results On an 8 thread CPU

n_threads=1

Test 1

1. Mercury 2. Venus 3. Earth 4. Mars 5. Jupiter 6. Saturn 7. Uranus 8. Neptune
llama_print_timings:        load time = 14464.13 ms
llama_print_timings:      sample time =    20.63 ms /    40 runs   (    0.52 ms per run)
llama_print_timings: prompt eval time = 14463.85 ms /    19 tokens (  761.26 ms per token)
llama_print_timings:        eval time = 38962.48 ms /    39 runs   (  999.04 ms per run)
llama_print_timings:       total time = 57510.54 ms

Test 2

1. Mercury 2. Venus 3. Earth 4. Mars 5. Jupiter 6. Saturn 7. Uranus 8. Neptune
llama_print_timings:        load time = 14054.52 ms
llama_print_timings:      sample time =    24.77 ms /    40 runs   (    0.62 ms per run)
llama_print_timings: prompt eval time = 14054.15 ms /    19 tokens (  739.69 ms per token)
llama_print_timings:        eval time = 50090.37 ms /    39 runs   ( 1284.37 ms per run)
llama_print_timings:       total time = 69022.43 ms

n_threads=2

Test 1

1. Mercury 2. Venus 3. Earth 4. Mars 5. Jupiter 6. Saturn 7. Uranus 8. Neptune
llama_print_timings:        load time =  9662.71 ms
llama_print_timings:      sample time =    22.36 ms /    40 runs   (    0.56 ms per run)
llama_print_timings: prompt eval time =  9662.48 ms /    19 tokens (  508.55 ms per token)
llama_print_timings:        eval time = 25339.74 ms /    39 runs   (  649.74 ms per run)
llama_print_timings:       total time = 39422.48 ms

Test 2

1. Mercury 2. Venus 3. Earth 4. Mars 5. Jupiter 6. Saturn 7. Uranus 8. Neptune
llama_print_timings:        load time = 13699.18 ms
llama_print_timings:      sample time =    27.64 ms /    40 runs   (    0.69 ms per run)
llama_print_timings: prompt eval time = 13698.78 ms /    19 tokens (  720.99 ms per token)
llama_print_timings:        eval time = 27051.24 ms /    39 runs   (  693.62 ms per run)
llama_print_timings:       total time = 46124.61 ms

n_threads=4

Test 1

1. Mercury 2. Venus 3. Earth 4. Mars 5. Jupiter 6. Saturn 7. Uranus 8. Neptune
llama_print_timings:        load time =  9804.36 ms
llama_print_timings:      sample time =    29.62 ms /    40 runs   (    0.74 ms per run)
llama_print_timings: prompt eval time =  9803.58 ms /    19 tokens (  515.98 ms per token)
llama_print_timings:        eval time = 22367.64 ms /    39 runs   (  573.53 ms per run)
llama_print_timings:       total time = 38015.92 ms

Test 2

1. Mercury 2. Venus 3. Earth 4. Mars 5. Jupiter 6. Saturn 7. Uranus 8. Neptune
llama_print_timings:        load time =  7894.51 ms
llama_print_timings:      sample time =    23.41 ms /    40 runs   (    0.59 ms per run)
llama_print_timings: prompt eval time =  7894.35 ms /    19 tokens (  415.49 ms per token)
llama_print_timings:        eval time = 17166.80 ms /    39 runs   (  440.17 ms per run)
llama_print_timings:       total time = 29655.03 ms

n_threads=6

Test 1

1. Mercury 2. Venus 3. Earth 4. Mars 5. Jupiter 6. Saturn 7. Uranus 8. Neptune
llama_print_timings:        load time =  8732.21 ms
llama_print_timings:      sample time =    29.93 ms /    40 runs   (    0.75 ms per run)
llama_print_timings: prompt eval time =  8731.88 ms /    19 tokens (  459.57 ms per token)
llama_print_timings:        eval time = 26798.23 ms /    39 runs   (  687.13 ms per run)
llama_print_timings:       total time = 41384.27 ms

Test 2

1. Mercury 2. Venus 3. Earth 4. Mars 5. Jupiter 6. Saturn 7. Uranus 8. Neptune
llama_print_timings:        load time =  4623.47 ms
llama_print_timings:      sample time =    21.79 ms /    40 runs   (    0.54 ms per run)
llama_print_timings: prompt eval time =  4623.19 ms /    19 tokens (  243.33 ms per token)
llama_print_timings:        eval time = 17870.62 ms /    39 runs   (  458.22 ms per run)
llama_print_timings:       total time = 26962.23 ms

n_threads=7 (Seems better than 8, but not as good as 6)

Test 1

1. Mercury 2. Venus 3. Earth 4. Mars 5. Jupiter 6. Saturn 7. Uranus 8. Neptune
llama_print_timings:        load time = 13266.94 ms
llama_print_timings:      sample time =    22.37 ms /    40 runs   (    0.56 ms per run)
llama_print_timings: prompt eval time = 13266.64 ms /    19 tokens (  698.24 ms per token)
llama_print_timings:        eval time = 31370.05 ms /    39 runs   (  804.36 ms per run)
llama_print_timings:       total time = 49092.33 ms

Test 2

1. Mercury 2. Venus 3. Earth 4. Mars 5. Jupiter 6. Saturn 7. Uranus 8. Neptune
llama_print_timings:        load time =  9676.00 ms
llama_print_timings:      sample time =    30.28 ms /    40 runs   (    0.76 ms per run)
llama_print_timings: prompt eval time =  9675.46 ms /    19 tokens (  509.23 ms per token)
llama_print_timings:        eval time = 51035.98 ms /    39 runs   ( 1308.61 ms per run)
llama_print_timings:       total time = 66633.10 ms

n_threads=8 (Max threads)

Test 1

1. Mercury 2. Venus 3. Earth 4. Mars 5. Jupiter 6. Saturn 7. Uranus 8. Neptune
llama_print_timings:        load time = 31573.62 ms
llama_print_timings:      sample time =    23.12 ms /    40 runs   (    0.58 ms per run)
llama_print_timings: prompt eval time = 31573.35 ms /    19 tokens ( 1661.76 ms per token)
llama_print_timings:        eval time = 80649.37 ms /    39 runs   ( 2067.93 ms per run)
llama_print_timings:       total time = 119573.09 ms

Test 2

1. Mercury 2. Venus 3. Earth 4. Mars 5. Jupiter 6. Saturn 7. Uranus 8. Neptune
llama_print_timings:        load time = 31926.09 ms
llama_print_timings:      sample time =    22.00 ms /    40 runs   (    0.55 ms per run)
llama_print_timings: prompt eval time = 31925.73 ms /    19 tokens ( 1680.30 ms per token)
llama_print_timings:        eval time = 67654.42 ms /    39 runs   ( 1734.73 ms per run)
llama_print_timings:       total time = 103776.36 ms
@alxspiker alxspiker changed the title Performance Suggestion Performance Suggestion / Benchmarks May 16, 2023
@alxspiker
Copy link
Contributor Author

alxspiker commented May 16, 2023

Script used for benchmarking:
Requires llama-cpp-python==0.1.49

import json
import argparse

from llama_cpp import Llama

parser = argparse.ArgumentParser()
parser.add_argument("-m", "--model", type=str, default="./newggjt.bin")
args = parser.parse_args()

llm = Llama(model_path=args.model, n_threads=6)

stream = llm(
    "Question: What are the names of the planets in the solar system? Answer: ",
    max_tokens=48,
    stop=["Q:", "\n"],
    stream=True,
)

for output in stream:
    print(output["choices"][0]["text"], end="")
    #print(json.dumps(output, indent=2))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants