Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMD64 Debian PGO 3.x, AMD64 Clang UBSan 2.7 buildbots: No space left on device #82450

Closed
corona10 opened this issue Sep 25, 2019 · 14 comments
Closed
Assignees
Labels
3.9 only security fixes tests Tests in the Lib/test dir

Comments

@corona10
Copy link
Member

BPO 38269
Nosy @gpshead, @vstinner, @zooba, @stratakis, @corona10, @pablogsal, @tirkarthi

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = 'https://github.com/gpshead'
closed_at = <Date 2019-09-27.09:18:57.932>
created_at = <Date 2019-09-25.07:40:46.139>
labels = ['tests', '3.9']
title = 'AMD64 Debian PGO 3.x, AMD64 Clang UBSan 2.7 buildbots: No space left on device'
updated_at = <Date 2019-09-27.09:18:57.931>
user = 'https://github.com/corona10'

bugs.python.org fields:

activity = <Date 2019-09-27.09:18:57.931>
actor = 'vstinner'
assignee = 'gregory.p.smith'
closed = True
closed_date = <Date 2019-09-27.09:18:57.932>
closer = 'vstinner'
components = ['Tests']
creation = <Date 2019-09-25.07:40:46.139>
creator = 'corona10'
dependencies = []
files = []
hgrepos = []
issue_num = 38269
keywords = []
message_count = 14.0
messages = ['353154', '353155', '353163', '353164', '353229', '353230', '353231', '353232', '353233', '353234', '353235', '353237', '353346', '353350']
nosy_count = 7.0
nosy_names = ['gregory.p.smith', 'vstinner', 'steve.dower', 'cstratak', 'corona10', 'pablogsal', 'xtreak']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue38269'
versions = ['Python 3.9']

@corona10
Copy link
Member Author

https://buildbot.python.org/all/#/builders/47/builds/3578

This issue was found from #60563

The main issue of this failure is the lack of storage space.

OSError: [Errno 28] No space left on device: '/tmp/tmpnmcjxia9/bin/python' -> '/tmp/tmpnmcjxia9/bin/python3'

@corona10 corona10 added 3.9 only security fixes tests Tests in the Lib/test dir labels Sep 25, 2019
@tirkarthi
Copy link
Member

I think there was a bug in the past in regrtest or tempfile where the temporary files for tests were not deleted and lead to disk space filled up in several buildbots.

@vstinner
Copy link
Member

I contacted Gregory P. Smith, the buildbot worker owner, to ask him to have a look.

I think there was a bug in the past in regrtest or tempfile where the temporary files for tests were not deleted and lead to disk space filled up in several buildbots.

regrtest now has a --cleanup command to remove all build/test_python_xxx directories. But this command cannot be run by buildbot on a worker which allows multiple jobs in parallel, since the command removes temporary directory of parallel jobs...

I fixed many bugs in regrtest recently to reduce the risk of leaving temporary files on the disk.

@vstinner vstinner changed the title test_venv failed on AMD64 Debian PGO 3.x AMD64 Debian PGO 3.x, AMD64 Clang UBSan 2.7 buildbots: No space left on device Sep 25, 2019
@gpshead gpshead self-assigned this Sep 25, 2019
@gpshead
Copy link
Member

gpshead commented Sep 25, 2019

It appears that something in the buildbot configuration (typo?) has changed which caused an entire new set of directories for the builder to be created:

@clang-ubsan:/var/lib/buildbot/clang-ubsan$ ls -al
total 68056
drwxr-xr-x 14 buildbot buildbot 4096 Sep 19 14:33 .
drwxr-xr-x 5 buildbot buildbot 4096 Sep 24 02:36 ..
drwx------ 3 buildbot buildbot 4096 May 20 2018 2.7.gps-clang-ubsan
drwx------ 3 buildbot buildbot 4096 Sep 24 02:43 2.7.gps-clang-ubsan.clang-usban
drwx------ 3 buildbot buildbot 4096 May 20 2018 3.6.gps-clang-ubsan
drwx------ 3 buildbot buildbot 4096 May 20 2018 3.7.gps-clang-ubsan
drwx------ 3 buildbot buildbot 4096 Sep 19 16:41 3.7.gps-clang-ubsan.clang-usban
drwx------ 3 buildbot buildbot 4096 Jun 4 20:30 3.8.gps-clang-ubsan
drwx------ 3 buildbot buildbot 4096 Sep 19 16:05 3.8.gps-clang-ubsan.clang-usban
drwx------ 3 buildbot buildbot 4096 May 20 2018 3.x.gps-clang-ubsan
drwx------ 3 buildbot buildbot 4096 Sep 19 14:39 3.x.gps-clang-ubsan.clang-usban
-rw------- 1 buildbot buildbot 1333 May 20 2018 buildbot.tac
drwx------ 3 buildbot buildbot 4096 Jun 2 2018 custom.gps-clang-ubsan
drwx------ 2 buildbot buildbot 4096 Sep 19 14:33 custom.gps-clang-ubsan.clang-usban
drwxr-xr-x 2 buildbot buildbot 4096 May 20 2018 info
-rw------- 1 buildbot buildbot 12 May 15 22:14 twistd.hostname
-rw-r--r-- 1 buildbot buildbot 9574124 Sep 25 20:39 twistd.log
-rw-r--r-- 1 buildbot buildbot 10000179 Jul 30 18:35 twistd.log.1
-rw-r--r-- 1 buildbot buildbot 10000101 Jun 2 22:17 twistd.log.2
-rw-r--r-- 1 buildbot buildbot 10000025 Mar 9 2019 twistd.log.3
-rw-r--r-- 1 buildbot buildbot 10000056 Dec 10 2018 twistd.log.4
-rw-r--r-- 1 buildbot buildbot 10000014 Oct 9 2018 twistd.log.5
-rw-r--r-- 1 buildbot buildbot 10000168 Jul 24 2018 twistd.log.6
-rw------- 1 buildbot buildbot 3 May 15 22:14 twistd.pid

Notice the directories named -clang-ubsan.clang-usban that appeared on September 19th. From twistd.log:

2019-09-19 14:33:20+0000 [Broker,client] Lost connection to buildbot.python.org:9020
2019-09-19 14:33:20+0000 [Broker,client] <twisted.internet.tcp.Connector instance at 0x7fae10e69c68> will retry in 3 seconds
2019-09-19 14:33:20+0000 [-] Stopping factory <buildslave.bot.BotFactory instance at 0x7fae10e697a0>
2019-09-19 14:33:23+0000 [-] Starting factory <buildslave.bot.BotFactory instance at 0x7fae10e697a0>
2019-09-19 14:33:23+0000 [-] Connecting to buildbot.python.org:9020
2019-09-19 14:33:23+0000 [Uninitialized] Connection to buildbot.python.org:9020 failed: Connection Refused
2019-09-19 14:33:23+0000 [Uninitialized] <twisted.internet.tcp.Connector instance at 0x7fae10e69c68> will retry in 8 seconds
2019-09-19 14:33:23+0000 [-] Stopping factory <buildslave.bot.BotFactory instance at 0x7fae10e697a0>
2019-09-19 14:33:32+0000 [-] Starting factory <buildslave.bot.BotFactory instance at 0x7fae10e697a0>
2019-09-19 14:33:32+0000 [-] Connecting to buildbot.python.org:9020
2019-09-19 14:33:32+0000 [Uninitialized] Connection to buildbot.python.org:9020 failed: Connection Refused
2019-09-19 14:33:32+0000 [Uninitialized] <twisted.internet.tcp.Connector instance at 0x7fae10e69c68> will retry in 22 seconds
2019-09-19 14:33:32+0000 [-] Stopping factory <buildslave.bot.BotFactory instance at 0x7fae10e697a0>
2019-09-19 14:33:54+0000 [-] Starting factory <buildslave.bot.BotFactory instance at 0x7fae10e697a0>
2019-09-19 14:33:54+0000 [-] Connecting to buildbot.python.org:9020
2019-09-19 14:33:55+0000 [Broker,client] message from master: attached
2019-09-19 14:33:55+0000 [Broker,client] Peer will receive following PB traceback:
2019-09-19 14:33:55+0000 [Broker,client] Unhandled Error
        Traceback (most recent call last):
          File "/usr/lib/python2.7/dist-packages/twisted/spread/banana.py", line 173, in gotItem
            self.callExpressionReceived(item)
          File "/usr/lib/python2.7/dist-packages/twisted/spread/banana.py", line 136, in callExpressionReceived
            self.expressionReceived(obj)
          File "/usr/lib/python2.7/dist-packages/twisted/spread/pb.py", line 575, in expressionReceived
            method(*sexp[1:])
          File "/usr/lib/python2.7/dist-packages/twisted/spread/pb.py", line 896, in proto_message
            self._recvMessage(self.localObjectForID, requestID, objectID, message, answerRequired, netArgs, netKw)
        --- <exception caught here> 

      File "/usr/lib/python2.7/dist-packages/twisted/spread/pb.py", line 913, in \_recvMessage
        netResult = object.remoteMessageReceived(self, message, netArgs, netKw)
      File "/usr/lib/python2.7/dist-packages/twisted/spread/flavors.py", line 118, in remoteMessageReceived
        raise NoSuchMethod("No such method: remote_%s" % (message,))
    twisted.spread.flavors.NoSuchMethod: No such method: remote_getWorkerInfo

2019-09-19 14:33:55+0000 [Broker,client] message from master: buildbot-slave detected, failing back to deprecated buildslave API. (I
gnoring missing getWorkerInfo method.)
2019-09-19 14:33:55+0000 [Broker,client] changing builddir for builder AMD64 Clang UBSan 2.7 from 2.7.gps-clang-ubsan to 2.7.gps-clang-ubsan.clang-usban
2019-09-19 14:33:55+0000 [Broker,client] changing builddir for builder AMD64 Clang UBSan 3.8 from 3.8.gps-clang-ubsan to 3.8.gps-clang-ubsan.clang-usban
2019-09-19 14:33:55+0000 [Broker,client] changing builddir for builder AMD64 Clang UBSan 3.7 from 3.7.gps-clang-ubsan to 3.7.gps-clang-ubsan.clang-usban
2019-09-19 14:33:55+0000 [Broker,client] changing builddir for builder AMD64 Clang UBSan 3.x from 3.x.gps-clang-ubsan to 3.x.gps-clang-ubsan.clang-usban
2019-09-19 14:33:55+0000 [Broker,client] changing builddir for builder AMD64 Clang UBSan custom from custom.gps-clang-ubsan to custom.gps-clang-ubsan.clang-usban
2019-09-19 14:33:55+0000 [Broker,client] I have a leftover directory '3.7.gps-clang-ubsan' that is not being used by the buildmaster: you can delete it now
2019-09-19 14:33:55+0000 [Broker,client] I have a leftover directory 'custom.gps-clang-ubsan' that is not being used by the buildmaster: you can delete it now
2019-09-19 14:33:55+0000 [Broker,client] I have a leftover directory '3.6.gps-clang-ubsan' that is not being used by the buildmaster: you can delete it now
2019-09-19 14:33:55+0000 [Broker,client] I have a leftover directory '3.x.gps-clang-ubsan' that is not being used by the buildmaster: you can delete it now
2019-09-19 14:33:55+0000 [Broker,client] I have a leftover directory '3.8.gps-clang-ubsan' that is not being used by the buildmaster: you can delete it now
2019-09-19 14:33:55+0000 [Broker,client] I have a leftover directory '2.7.gps-clang-ubsan' that is not being used by the buildmaster: you can delete it now
2019-09-19 14:33:56+0000 [Broker,client] message from master: attached
2019-09-19 14:33:56+0000 [Broker,client] message from master: attached
2019-09-19 14:33:56+0000 [Broker,client] message from master: attached
2019-09-19 14:33:56+0000 [Broker,client] message from master: attached
2019-09-19 14:33:56+0000 [Broker,client] message from master: attached
2019-09-19 14:33:56+0000 [Broker,client] Connected to buildbot.python.org:9020; slave is ready
2019-09-19 14:33:56+0000 [Broker,client] sending application-level keepalives every 600 seconds
2019-09-19 14:39:12+0000 [Broker,client] message from master: ping

@gpshead
Copy link
Member

gpshead commented Sep 25, 2019

python/buildmaster-config#108 is to blame.

@stratakis
Copy link
Mannequin

stratakis mannequin commented Sep 25, 2019

Yep that was my change as some jobs couldn't be run on the same worker due to the configs using the same directory.

@gpshead
Copy link
Member

gpshead commented Sep 25, 2019

I'm not going to spend time manually deleting the unused build directories until the typo in the new buildsuffix that caused the disk to fill up is fixed.

python/buildmaster-config#111

I don't even understand why the buildsuffix entries were added. It doesn't matter on these systems and the original PR doesn't link to any issue describing why.

@gpshead
Copy link
Member

gpshead commented Sep 25, 2019

From a yet another one of a plethora of reasons to hate buildbot point of view... A _log message_ saying "i'm not using this anymore, you can delete it" is infinitely worse than just going ahead and automatically deleting it. I shouldn't, as a human, have needed to be involved for this one. :P

@gpshead
Copy link
Member

gpshead commented Sep 25, 2019

Let me know when pr 111 is deployed on the build master so I can log in and cleanup the current typo names.

otherwise, things are probably running fine for the moment.

@stratakis
Copy link
Mannequin

stratakis mannequin commented Sep 25, 2019

It's already explained that the build directories are duplicated, however it could be more verbose indeed. When a config is being used alongside a config which inherits from the previous one, then buildbot aborts with an error as it tries to compile cpython in the same directory.

@vstinner
Copy link
Member

Let me know when pr 111 is deployed on the build master so I can log in and cleanup the current typo names.

Done ( python/buildmaster-config#111 )

@corona10
Copy link
Member Author

@gregory.p.smith @vstinner
Looks like python/buildmaster-config#111 is merged.
Can we close this issue?

@vstinner
Copy link
Member

Can we close this issue?

When it's an issue about specific buildbots, I prefer to first check that the buildbot is back to green.

https://buildbot.python.org/all/#/builders/47/builds/3578

AMD64 Debian PGO 3.x buildbot is back to green.

https://buildbot.python.org/all/#builders/136/builds/311

AMD64 Clang UBSan 2.7 is back to green.

So yes, it seems like the disk has free space again ;-)

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.9 only security fixes tests Tests in the Lib/test dir
Projects
None yet
Development

No branches or pull requests

4 participants