Hello, I was running the evaluation locally and kept seeing the entire eval crash for branches I knew were correct. I traced it back to container.py:85:
output = result.stdout + result.stderr
So every command was going through execute(), and every execute() call uses bash -lc, which is a login shell. When bash -lc runs inside a Docker container without a TTY it tends to emit a line or two to stderr from sourcing login profile scripts like things like mesg: ttyname failed: No such device or bash: cannot set terminal process group (-1): Inappropriate ioctl for device. These are harmless for most steps since the output is only logged. But _run_test_branch uses r["output"] to pull the JUnit XML out of the container (eval.py:604) liek this
r = self._run_step("cat eval/results.xml", ..., step_name="results_read", ...)
return r["output"]
So the string Python receives looks like:
<?xml version="1.0" encoding="utf-8"?>
<testsuites>...</testsuites>
mesg: ttyname failed: No such device
ET.fromstring() rejects this because content after the root element is invalid XML, raising XmlParseError.
The fix is straightforward to return stdout and stderr separately from execute()
Hello, I was running the evaluation locally and kept seeing the entire eval crash for branches I knew were correct. I traced it back to
container.py:85:So every command was going through
execute(), and everyexecute()call usesbash -lc, which is a login shell. Whenbash -lcruns inside a Docker container without a TTY it tends to emit a line or two to stderr from sourcing login profile scripts like things likemesg: ttyname failed: No such deviceorbash: cannot set terminal process group (-1): Inappropriate ioctl for device. These are harmless for most steps since the output is only logged. But_run_test_branchusesr["output"]to pull the JUnit XML out of the container (eval.py:604) liek thisSo the string Python receives looks like:
ET.fromstring()rejects this because content after the root element is invalid XML, raisingXmlParseError.The fix is straightforward to return stdout and stderr separately from
execute()