Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to reduce the ranging mode duration #315

Closed
arjunsuresh opened this issue Jun 22, 2023 · 1 comment
Closed

Proposal to reduce the ranging mode duration #315

arjunsuresh opened this issue Jun 22, 2023 · 1 comment

Comments

@arjunsuresh
Copy link
Contributor

An email was sent to the power,inference WGs for this, still adding as an issue for better tracking.

During the last LLM taskforce meeting the long runtime of the LLMs was raised as a concern and this is double in the case of power runs. I'm just giving one of our systems as an example for gpt-j model where we would like to submit some open variants.

  1. Runtime for the offline scenario: ~13 hours (at 450W GPU). If we run 4 models, it is more than 2 days for just the offline scenario on a single system, and with a complete ranging mode run, this means an additional 2 days at least. This will be way worse for those doing closed models and we are not even talking about GPT3 model here.

In order to find the optimal duration of the ranging mode I collected the below data from the 3.0 inference results and we can see that with just a 2-minute ranging mode run, the worst-case power delta compared to a full duration is ~10%. Compared to inference 3.0 round, in the current master branch, we are already multiplying the current measured during ranging mode by 10% and so this means that even if we reduce the ranging mode duration by 2 minutes, there won't be any change in the results and it can benefit all the power submitters. Of course, we can multiply the current measured during the ranging mode by 1.25 or make the ranging mode go to 5 minutes to be extra safe but if we are forced to do a full ranging mode run, we won't be able to do any power submission for LLMs.

AI: There is no code change needed to reduce the ranging mode duration - but submitters must be allowed to use different user_conf files for the ranging and testing mode runs. We already have the code in power-dev which checks for the avg_power delta between the ranging and testing modes and so this change should be completely safe.

Power Data from inference 3.0

In the below graph, X-axis shows the avg_power for the specified durations and the y-axis shows the delta of the avg_power for the given duration compared to the average power during the entire duration.

image

@arjunsuresh
Copy link
Contributor Author

This can be done in a much cleaner way inside loadgen so that it becomes transparent to the users if loadgen can identify the ranging and testing mode runs. Since this is not currently possible, we tried this mechanism and it worked well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant