New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when calculating TTC and pilot running time in ra.session #25
Comments
As a follow up, same stack:
This is not outrageous as 100 hours but it is still very much an outlier at 17 hours. Digging a bit deeper, I see that TTC is consistent:
And so it is also the total time spent executing units:
I check the Tq for each pilot:
I see several offenders, among which pilot.0030 seems to be the responsible for the 17 hours spike. I check whether any unit has been executing on that pilot:
And I check whether the execution time (Tx) of those units is smaller than Tq, just in case something went wrong with calculating the time overlap among pilots' Tq and units' Tx:
Finally I check whether these Tx are analogous to those of a pilot with a much lower Tq:
So, in summary, it really seems we have hit Tq on OSG after all :) Seeing that it is a outlier by a factor 100, I would not mind for you to have a look too in the context of this ticket. Maybe this kind of digging can also give us some ideas about how to expand our consistency checking. |
Same stack as above, same location of the experiment data. Confirmed that for the same two sessions described in the first message above, the execution time ( |
I think this is related to radical-cybertools/radical.pilot#1117 |
Data for replication at: https://github.com/radical-experiments/AIMES-Experience/tree/master/OSG/analysis
Errors:
and
It does feel like 100 years but I doubt that corresponds to actual time ;)
The text was updated successfully, but these errors were encountered: