You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An email was sent to the power,inference WGs for this, still adding as an issue for better tracking.
During the last LLM taskforce meeting the long runtime of the LLMs was raised as a concern and this is double in the case of power runs. I'm just giving one of our systems as an example for gpt-j model where we would like to submit some open variants.
Runtime for the offline scenario: ~13 hours (at 450W GPU). If we run 4 models, it is more than 2 days for just the offline scenario on a single system, and with a complete ranging mode run, this means an additional 2 days at least. This will be way worse for those doing closed models and we are not even talking about GPT3 model here.
In order to find the optimal duration of the ranging mode I collected the below data from the 3.0 inference results and we can see that with just a 2-minute ranging mode run, the worst-case power delta compared to a full duration is ~10%. Compared to inference 3.0 round, in the current master branch, we are already multiplying the current measured during ranging mode by 10% and so this means that even if we reduce the ranging mode duration by 2 minutes, there won't be any change in the results and it can benefit all the power submitters. Of course, we can multiply the current measured during the ranging mode by 1.25 or make the ranging mode go to 5 minutes to be extra safe but if we are forced to do a full ranging mode run, we won't be able to do any power submission for LLMs.
AI: There is no code change needed to reduce the ranging mode duration - but submitters must be allowed to use different user_conf files for the ranging and testing mode runs. We already have the code in power-dev which checks for the avg_power delta between the ranging and testing modes and so this change should be completely safe.
In the below graph, X-axis shows the avg_power for the specified durations and the y-axis shows the delta of the avg_power for the given duration compared to the average power during the entire duration.
The text was updated successfully, but these errors were encountered:
This can be done in a much cleaner way inside loadgen so that it becomes transparent to the users if loadgen can identify the ranging and testing mode runs. Since this is not currently possible, we tried this mechanism and it worked well.
An email was sent to the power,inference WGs for this, still adding as an issue for better tracking.
During the last LLM taskforce meeting the long runtime of the LLMs was raised as a concern and this is double in the case of power runs. I'm just giving one of our systems as an example for gpt-j model where we would like to submit some open variants.
In order to find the optimal duration of the ranging mode I collected the below data from the 3.0 inference results and we can see that with just a 2-minute ranging mode run, the worst-case power delta compared to a full duration is ~10%. Compared to inference 3.0 round, in the current master branch, we are already multiplying the current measured during ranging mode by 10% and so this means that even if we reduce the ranging mode duration by 2 minutes, there won't be any change in the results and it can benefit all the power submitters. Of course, we can multiply the current measured during the ranging mode by 1.25 or make the ranging mode go to 5 minutes to be extra safe but if we are forced to do a full ranging mode run, we won't be able to do any power submission for LLMs.
AI: There is no code change needed to reduce the ranging mode duration - but submitters must be allowed to use different user_conf files for the ranging and testing mode runs. We already have the code in power-dev which checks for the avg_power delta between the ranging and testing modes and so this change should be completely safe.
Power Data from inference 3.0
In the below graph, X-axis shows the avg_power for the specified durations and the y-axis shows the delta of the avg_power for the given duration compared to the average power during the entire duration.
The text was updated successfully, but these errors were encountered: