New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMD64 Debian PGO 3.x, AMD64 Clang UBSan 2.7 buildbots: No space left on device #82450
Comments
https://buildbot.python.org/all/#/builders/47/builds/3578 This issue was found from #60563 The main issue of this failure is the lack of storage space. OSError: [Errno 28] No space left on device: '/tmp/tmpnmcjxia9/bin/python' -> '/tmp/tmpnmcjxia9/bin/python3' |
I think there was a bug in the past in regrtest or tempfile where the temporary files for tests were not deleted and lead to disk space filled up in several buildbots. |
I contacted Gregory P. Smith, the buildbot worker owner, to ask him to have a look.
regrtest now has a --cleanup command to remove all build/test_python_xxx directories. But this command cannot be run by buildbot on a worker which allows multiple jobs in parallel, since the command removes temporary directory of parallel jobs... I fixed many bugs in regrtest recently to reduce the risk of leaving temporary files on the disk. |
It appears that something in the buildbot configuration (typo?) has changed which caused an entire new set of directories for the builder to be created: @clang-ubsan:/var/lib/buildbot/clang-ubsan$ ls -al Notice the directories named -clang-ubsan.clang-usban that appeared on September 19th. From twistd.log: 2019-09-19 14:33:20+0000 [Broker,client] Lost connection to buildbot.python.org:9020
2019-09-19 14:33:20+0000 [Broker,client] <twisted.internet.tcp.Connector instance at 0x7fae10e69c68> will retry in 3 seconds
2019-09-19 14:33:20+0000 [-] Stopping factory <buildslave.bot.BotFactory instance at 0x7fae10e697a0>
2019-09-19 14:33:23+0000 [-] Starting factory <buildslave.bot.BotFactory instance at 0x7fae10e697a0>
2019-09-19 14:33:23+0000 [-] Connecting to buildbot.python.org:9020
2019-09-19 14:33:23+0000 [Uninitialized] Connection to buildbot.python.org:9020 failed: Connection Refused
2019-09-19 14:33:23+0000 [Uninitialized] <twisted.internet.tcp.Connector instance at 0x7fae10e69c68> will retry in 8 seconds
2019-09-19 14:33:23+0000 [-] Stopping factory <buildslave.bot.BotFactory instance at 0x7fae10e697a0>
2019-09-19 14:33:32+0000 [-] Starting factory <buildslave.bot.BotFactory instance at 0x7fae10e697a0>
2019-09-19 14:33:32+0000 [-] Connecting to buildbot.python.org:9020
2019-09-19 14:33:32+0000 [Uninitialized] Connection to buildbot.python.org:9020 failed: Connection Refused
2019-09-19 14:33:32+0000 [Uninitialized] <twisted.internet.tcp.Connector instance at 0x7fae10e69c68> will retry in 22 seconds
2019-09-19 14:33:32+0000 [-] Stopping factory <buildslave.bot.BotFactory instance at 0x7fae10e697a0>
2019-09-19 14:33:54+0000 [-] Starting factory <buildslave.bot.BotFactory instance at 0x7fae10e697a0>
2019-09-19 14:33:54+0000 [-] Connecting to buildbot.python.org:9020
2019-09-19 14:33:55+0000 [Broker,client] message from master: attached
2019-09-19 14:33:55+0000 [Broker,client] Peer will receive following PB traceback:
2019-09-19 14:33:55+0000 [Broker,client] Unhandled Error
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/twisted/spread/banana.py", line 173, in gotItem
self.callExpressionReceived(item)
File "/usr/lib/python2.7/dist-packages/twisted/spread/banana.py", line 136, in callExpressionReceived
self.expressionReceived(obj)
File "/usr/lib/python2.7/dist-packages/twisted/spread/pb.py", line 575, in expressionReceived
method(*sexp[1:])
File "/usr/lib/python2.7/dist-packages/twisted/spread/pb.py", line 896, in proto_message
self._recvMessage(self.localObjectForID, requestID, objectID, message, answerRequired, netArgs, netKw)
--- <exception caught here>
2019-09-19 14:33:55+0000 [Broker,client] message from master: buildbot-slave detected, failing back to deprecated buildslave API. (I |
python/buildmaster-config#108 is to blame. |
Yep that was my change as some jobs couldn't be run on the same worker due to the configs using the same directory. |
I'm not going to spend time manually deleting the unused build directories until the typo in the new buildsuffix that caused the disk to fill up is fixed. I don't even understand why the buildsuffix entries were added. It doesn't matter on these systems and the original PR doesn't link to any issue describing why. |
From a yet another one of a plethora of reasons to hate buildbot point of view... A _log message_ saying "i'm not using this anymore, you can delete it" is infinitely worse than just going ahead and automatically deleting it. I shouldn't, as a human, have needed to be involved for this one. :P |
Let me know when pr 111 is deployed on the build master so I can log in and cleanup the current typo names. otherwise, things are probably running fine for the moment. |
It's already explained that the build directories are duplicated, however it could be more verbose indeed. When a config is being used alongside a config which inherits from the previous one, then buildbot aborts with an error as it tries to compile cpython in the same directory. |
Done ( python/buildmaster-config#111 ) |
@gregory.p.smith @vstinner |
When it's an issue about specific buildbots, I prefer to first check that the buildbot is back to green. AMD64 Debian PGO 3.x buildbot is back to green. AMD64 Clang UBSan 2.7 is back to green. So yes, it seems like the disk has free space again ;-) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: