Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR fixes a bug that @swarthout and I ran into while running psi4 on AWS.
A non-negligible fraction of our psi4 calculations that run through qcschema (e.g. many body, cbs) result in the following non-deterministic error:
This error tells us that psi4 is unable to read its own
timer.dat
file.Upon further examination of these problematic
timer.dat
files, we noticed that the "host" field appears to be corrupted. Here is one such corrupted header of atimer.dat
file, represented w/ a latin-1 encoding (since it can't be read w/ the standard utf-8 encoding):In all of these problematic timer.dat files, the host name is truncated and ends with a random assortment of bytes. In the above example, the full host name should be
ip-172-31-XX-XXX.us-east-2.compute.internal
.We then examined how psi4 determines and processes the host name. It turns out, psi4 uses the
gethostname
function from the C API to get up to the first 40 bytes of the host name, and then it writes those bytes totimer.dat
. The host name of this particular compute cluster is over 40 chars/bytes. This is unsafe because if a host name has more than 40 characters, the null byte (\0
) won't be written totimer.dat
to signify the end of the string, and psi4 will continue to write whatever is in memory past the 40 chars/bytes until it hits a null byte. This also explains the original error, b/c writing random bytes to a file can lead to non-utf-8-compliant files.It turns out that linux defines a maximum host name length of 64, so the easy fix here is to just increase the size of the host name buffer from 40 to 65 (== 64 + 1 for the null byte at the end). I have no idea why this length was previously limited to 40.
User API & Changelog headlines
UnicodeDecodeError
and corruptedtimer.dat
filesDev notes & details
timer.dat
Checklist
Status