Skip to content

Conversation

@gracehonv
Copy link
Contributor

Moved the prompt making call to randomly_sample_sonnet_lines_prompt outside of load request send loop so that the send loop can generate load to the server faster. Otherwise there's an artificial delay due to making the next prompt which slows down the benchmark throughput/sec.
Also changed tokenizer instantiation to just once outside the prompt generation loop to speed up the overall test.
After this change I've seen up to 2x improvement in server achieved throughput in some small workloads. This change will allow better measurement of true server throughput.

@gracehonv
Copy link
Contributor Author

@avnishn or @rickyyx would it be possible to get this PR reviewed? Thank you!

Copy link
Member

@rickyyx rickyyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!!

@rickyyx rickyyx merged commit 03872a4 into ray-project:main Jun 19, 2024
@gracehonv gracehonv deleted the grace_nv/loadgen branch June 19, 2024 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants