Fixed '0mb' pbs.size bug, added PTL test #2287
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Describe Bug or Feature
Unfortunately the base types library has a routine to_bytes that takes a size but only supports the "k" multiplier.
That routine is used in transform_sizes which is called in many places, and there it is called when a quantity is smaller than 10kb.
No doubt the author of the code thought "if it's smaller than 10kb then surely it can't use a multiplier larger than 'k'" but alas, that is forgetting about zero bytes, which is a round number for all multipliers, i.e. also 'm', 'g', 't' and 'p'.
The easiest way to get into trouble is to try to assign a pbs.size("0mb") to a node resource in a hook. That will crash with an exception:
03/03/2021 21:53:15.824408;0001;pbs_python;Svr;pbs_python;PBS server internal error (15011) in Error evaluating Python script, <class 'ValueError'>
03/03/2021 21:53:15.824413;0001;pbs_python;Svr;pbs_python;PBS server internal error (15011) in Error evaluating Python script, invalid literal for int() with base 10: '0m'
In another hook I have a more complete backtrace (this is 2020.1.1):
03/03/2021 20:01:12.420402;0080;pbs_python;Hook;pbs_python;['Traceback (most recent call last):', ' File "", line 6122, in main', ' File "", line 1016, in invoke_handler', ' File "", line 1228, in _exechost_startup_handler', ' File "", line 2601, in crea
te_vnodes', ' File "/opt/pbs/lib/python/altair/pbs/v1/base_types.py", line 1602, in _setitem', ' setattr(self, resname, resval)', ' File "/opt/pbs/lib/python/altair
/pbs/v1/base_types.py", line 1661, in __setattr', ' super().setattr(name, value)', ' File "/opt/pbs/lib/python/altair/pbs/v1/_base_types.py", line 188, in _set
', ' elif (value is None) or (value == "")
', ' File "/opt/pbs/lib/python/altair/pbs/v1/base_types.py", line 479, in _eq', ' so = transform_sizes(self, other)',
' File "/opt/pbs/lib/python/altair/pbs/v1/_base_types.py", line 358, in transform_sizes', ' s_num = to_bytes(s) + 1073741824', ' File "/opt/pbs/lib/python/altair/pbs/v
1/_base_types.py", line 339, in to_bytes', ' s_num = int(s_str)', "ValueError: invalid literal for int() with base 10: '0m'"]
There is another dangerous construct in there: (value == "") is being tried even when value is not a string. In this case it calls the eq for the size class, and that calls transform_sizes which in turn calls the incomplete to_bytes.
But if we compare to the empty string, we'd better guard against 'value' not being a string in the first place; calling the pbs.size eq is nonsensical.
Describe Your Change
Define basestring if necessary (in Python 3 it's undefined and shoud just be set to be synonymous to str) and see if a value is of that type before comparing it to "" to avoid pbs.size comparisons with the empty string.
Extend to_bytes function to deal with suffixes 'm', 'g', 't' and 'p' (the valid ones as per the Reference Guide), to get rid of the incorrect assumption 'values under 10k cannot use large suffixes", which is true for strictly positive sizes but not 0.
Link to Design Doc
Pure bug fix.
Attach Test and Valgrind Logs/Output
Added a test in pbs_test_pbs_python.py:
Test_pbs_python.test_pbs_python_zero_size
Without the fix:
2021-03-09 11:37:09,364 INFO created test python script /tmp/test_0mbubpzhyrw.py
2021-03-09 11:37:09,364 INFO created dummy input file /tmp/dummyhrglixtg.in
2021-03-09 11:37:09,364 INFO running ['/opt/pbs/bin/pbs_python', '--hook', '-i', '/tmp/dummyhrglixtg.in', '/tmp/test_0mbubpzhyrw.py']
2021-03-09 11:37:09,364 INFOCLI2 dragon: sudo -H /opt/pbs/bin/pbs_python --hook -i /tmp/dummyhrglixtg.in /tmp/test_0mbubpzhyrw.py
[...]
2021-03-09 11:37:09,926 INFO ======================================================================
2021-03-09 11:37:09,926 INFO FAILED: test_pbs_python_zero_size (tests.functional.pbs_python_test.Test_pbs_python)
2021-03-09 11:37:09,926 INFO m_oo_m
2021-03-09 11:37:09,926 INFO Traceback (most recent call last):
File "/home/alexis/git/github/alexis-cousein/pbspro/test/tests/functional/pbs_python_test.py", line 98, in test_pbs_python_zero_size
self.assertTrue(msg in combined_output, "Test hooklet rejected event")
AssertionError: False is not true : Test hooklet rejected event
The rejection is the result of an assertion which is on stderr of the process that ran pbs_python:
pbs_python: PBS server internal error (15011) in Error evaluating Python script, <class 'ValueError'>
pbs_python: PBS server internal error (15011) in Error evaluating Python script, invalid literal for int() with base 10: '0m'
With the fix:
2021-03-09 11:41:01,705 INFO created test python script /tmp/test_0mbe0vyvl3j.py
2021-03-09 11:41:01,705 INFO created dummy input file /tmp/dummy621_qvww.in
2021-03-09 11:41:01,705 INFO running ['/opt/pbs/bin/pbs_python', '--hook', '-i', '/tmp/dummy621_qvww.in', '/tmp/test_0mbe0vyvl3j.py']
2021-03-09 11:41:01,705 INFOCLI2 dragon: sudo -H /opt/pbs/bin/pbs_python --hook -i /tmp/dummy621_qvww.in /tmp/test_0mbe0vyvl3j.py
2021-03-09 11:41:01,733 INFO Test hooklet accepted event