New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix timer hostname bug #3043
fix timer hostname bug #3043
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what a pain to debug! thanks for the fix. we'll backport this for 1.8.2
Thank you! Hopefully it helps someone else |
Thanks for the patch. The Linux man page states that
On Windows the maximum length is 256 bytes. Also one can remove some of the hard-coding. I will file a PR. |
Description
This PR fixes a bug that @swarthout and I ran into while running psi4 on AWS.
A non-negligible fraction of our psi4 calculations that run through qcschema (e.g. many body, cbs) result in the following non-deterministic error:
This error tells us that psi4 is unable to read its own
timer.dat
file.Upon further examination of these problematic
timer.dat
files, we noticed that the "host" field appears to be corrupted. Here is one such corrupted header of atimer.dat
file, represented w/ a latin-1 encoding (since it can't be read w/ the standard utf-8 encoding):In all of these problematic timer.dat files, the host name is truncated and ends with a random assortment of bytes. In the above example, the full host name should be
ip-172-31-XX-XXX.us-east-2.compute.internal
.We then examined how psi4 determines and processes the host name. It turns out, psi4 uses the
gethostname
function from the C API to get up to the first 40 bytes of the host name, and then it writes those bytes totimer.dat
. The host name of this particular compute cluster is over 40 chars/bytes. This is unsafe because if a host name has more than 40 characters, the null byte (\0
) won't be written totimer.dat
to signify the end of the string, and psi4 will continue to write whatever is in memory past the 40 chars/bytes until it hits a null byte. This also explains the original error, b/c writing random bytes to a file can lead to non-utf-8-compliant files.It turns out that linux defines a maximum host name length of 64, so the easy fix here is to just increase the size of the host name buffer from 40 to 65 (== 64 + 1 for the null byte at the end). I have no idea why this length was previously limited to 40.
User API & Changelog headlines
UnicodeDecodeError
and corruptedtimer.dat
filesDev notes & details
timer.dat
Checklist
Status