Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: freebsd-arm-paulzhol builder is offline #41028

Closed
cagedmantis opened this issue Aug 25, 2020 · 3 comments
Closed

x/build: freebsd-arm-paulzhol builder is offline #41028

cagedmantis opened this issue Aug 25, 2020 · 3 comments

Comments

@cagedmantis
Copy link
Contributor

@cagedmantis cagedmantis commented Aug 25, 2020

Per https://farmer.golang.org:

host-freebsd-arm-paulzhol: 0/0 (1 missing)

/cc @andybons @dmitshur @toothrot @paulzhol

@paulzhol
Copy link
Member

@paulzhol paulzhol commented Aug 25, 2020

It has been down for several months, I've restored it only this weekend. It keeps dying from out of memory every few builds as far as I can tell

login: Aug 25 11:47:06 arm-virt-12 shutdown[9186]: power-down by paulzhol:
Aug 25 11:47:06 arm-virt-12 syslogd: exiting on signal 15
Waiting (max 60 seconds) for system process `vnlru' to stop... done
Waiting (max 60 seconds) for system process `syncer' to stop...
Syncing disks, vnodes remaining... 0 0 0 done
Waiting (max 60 seconds) for system thread `bufdaemon' to stop... done
Waiting (max 60 seconds) for system thread `bufspacedaemon-0' to stop... done
All buffers synced.
No strategy for buffer at 0xc5c14e98
vnode 0xdcb02cb8: tag none, type VBAD
    usecount 1, writecount 0, refcount 5266
    flags (VI_DOOMED)
    lock type nfs: UNLOCKED
swap_pager: I/O error - pagein failed; blkno 112048,size 4096, error 45
panic: swap_pager_force_pagein: read from swap failed
cpuid = 0
time = 1598356035
KDB: stack backtrace:
db_trace_self() at db_trace_self
         pc = 0xc061f144  lr = 0xc0078724 (db_trace_self_wrapper+0x30)
         sp = 0xc5b70b20  fp = 0xc5b70c38
db_trace_self_wrapper() at db_trace_self_wrapper+0x30                                                                                                                                                                                                  [5/209]
         pc = 0xc0078724  lr = 0xc02bb538 (vpanic+0x16c)
         sp = 0xc5b70c40  fp = 0xc5b70c60
         r4 = 0x00000100  r5 = 0x00000001
         r6 = 0xc0766882  r7 = 0xc0ac8f80
vpanic() at vpanic+0x16c
         pc = 0xc02bb538  lr = 0xc02bb3cc (vpanic)
         sp = 0xc5b70c68  fp = 0xc5b70c6c
         r4 = 0x00000148  r5 = 0x00000005
         r6 = 0xc5ed33f0  r7 = 0xdaf4d5a0
         r8 = 0x00000030  r9 = 0x00000000
        r10 = 0xc1c965d0
vpanic() at vpanic
         pc = 0xc02bb3cc  lr = 0xc05dc104 (swapoff_one+0x9fc)
         sp = 0xc5b70c74  fp = 0xc5b70cd0
         r4 = 0xdaf4d5a0  r5 = 0x00000030
         r6 = 0x00000000  r7 = 0xc1c965d0
         r8 = 0xc5b70c6c  r9 = 0xc02bb3cc
        r10 = 0xc5b70c74
swapoff_one() at swapoff_one+0x9fc
         pc = 0xc05dc104  lr = 0xc05dc264 (swapoff_all+0x154)
         sp = 0xc5b70cd8  fp = 0xc5b70cf8
         r4 = 0xc0ae3fe4  r5 = 0xc0772525
         r6 = 0xc09184d1  r7 = 0xc42e8ac0
         r8 = 0x00000000  r9 = 0xc077252c
        r10 = 0xc0aedca0
swapoff_all() at swapoff_all+0x154
         pc = 0xc05dc264  lr = 0xc0374d7c (bufshutdown+0x354)
         sp = 0xc5b70d00  fp = 0xc5b70d38
         r4 = 0xc0adf800  r5 = 0x00000000
         r6 = 0xc5c156a8  r7 = 0x00000000
         r8 = 0x00000000  r9 = 0x00000000
        r10 = 0xc076d889
bufshutdown() at bufshutdown+0x354
         pc = 0xc0374d7c  lr = 0xc02bae68 (kern_reboot+0x2d8)
         sp = 0xc5b70d40  fp = 0xc5b70d78
         r4 = 0xc09184d1  r5 = 0x00000000
         r6 = 0x00000000  r7 = 0xc5ec791c
         r8 = 0xc08f843c  r9 = 0x00004008
        r10 = 0xdcb783c0
kern_reboot() at kern_reboot+0x2d8
         pc = 0xc02bae68  lr = 0xc02bab90 (kern_reboot)
         sp = 0xc5b70d80  fp = 0xc5b70db0
         r4 = 0xdcb78660  r5 = 0xdcb783c0
         r6 = 0xdda70720  r7 = 0x00000000
         r8 = 0xdcb78658  r9 = 0x00000000
        r10 = 0xdcb783c0
kern_reboot() at kern_reboot
         pc = 0xc02bab90  lr = 0xc0640b2c (swi_handler+0x340)
         sp = 0xc5b70db8  fp = 0xc5b70e40
         r4 = 0xc09184d0  r5 = 0x00000000
         r6 = 0xdda70720  r7 = 0x00000000
         r8 = 0xdcb78658  r9 = 0x00000000
        r10 = 0xdcb783c0
swi_handler() at swi_handler+0x340
         pc = 0xc0640b2c  lr = 0xc0621a2c (swi_exit)
         sp = 0xc5b70e48  fp = 0xbfbfee98
         r4 = 0x00000004  r5 = 0x00010413
         r6 = 0x00004008  r7 = 0x00000037
         r8 = 0x00000320  r9 = 0x00000001
        r10 = 0x00000000
swi_exit() at swi_exit
         pc = 0xc0621a2c  lr = 0xc0621a2c (swi_exit)
         sp = 0xc5b70e48  fp = 0xbfbfee98
KDB: enter: panic
[ thread pid 9186 tid 100821 ]
Stopped at      kdb_enter+0x58: ldrb    r15, [r15, r15, ror r15]!
db>
db> ps
  pid  ppid  pgrp   uid  state   wmesg   wchan       cmd
 9186     1  9186  1001  Rs      CPU 0               halt
   20     0     0     0  DL      kpsusp  0xc426dbb0  [syncer]
   19     0     0     0  DL      kpsusp  0xc426e100  [vnlru]
   18     0     0     0  DL      (threaded)          [bufdaemon]
100041                   D       ktsusp  0xc428b818  [bufdaemon]
100051                   D       ktsusp  0xc4289098  [bufspacedaemon-0]
   17     0     0     0  DL      psleep  0xc0ae42f8  [vmdaemon]
   16     0     0     0  DL      (threaded)          [pagedaemon]
100039                   D       psleep  0xc0af0070  [dom0]
100049                   D       launds  0xc0af007c  [laundry: dom0]
100050                   D       umarcl  0xc05e29e4  [uma]
   15     0     0     0  DL      -       0xc0adf474  [soaiod4]
   14     0     0     0  DL      -       0xc0adf474  [soaiod3]
    9     0     0     0  DL      -       0xc0adf474  [soaiod2]
    8     0     0     0  DL      -       0xc0adf474  [soaiod1]
    7     0     0     0  DL      -       0xc09512dc  [rand_harvestq]
    6     0     0     0  DL      waiting 0xc0aef16c  [sctp_iterator]
    5     0     0     0  DL      (threaded)          [cam]
100025                   D       -       0xc094e140  [doneq0]
100034                   D       -       0xc094e06c  [scanner]
   13     0     0     0  DL      seqstat 0xc40c224c  [sequencer 00]
    4     0     0     0  DL      crypto_ 0xc5ec73d4  [crypto returns 1]
    3     0     0     0  DL      crypto_ 0xc5ec739c  [crypto returns 0]
    2     0     0     0  DL      crypto_ 0xc0ae3a20  [crypto]
   12     0     0     0  DL      (threaded)          [geom]
100015                   D       -       0xc0aed8e4  [g_event]
100016                   D       -       0xc0aed8ec  [g_up]
100017                   D       -       0xc0aed8e0  [g_down]
   11     0     0     0  WL      (threaded)          [intr]
100005                   I                           [swi6: task queue]
100006                   I                           [swi6: Giant taskq]
100008                   I                           [swi5: fast taskq]
100011                   I                           [swi4: clock (0)]
100012                   I                           [swi4: clock (1)]
100013                   I                           [swi3: vm]
100014                   I                           [swi1: netisr 0]
100026                   I                           [gic0,s4: +]
100027                   I                           [gic0,s5: +]
100030                   I                           [swi0: uart]
   10     0     0     0  RL      (threaded)          [idle]
100002                   CanRun                      [idle: cpu0]
100003                   Run     CPU 1               [idle: cpu1]
    1     0     1     0  SLs     wait    0xc5ece000  [init]
    0     0     0     0  DLs     (threaded)          [kernel]
100000                   D       swapin  0xc0aed900  [swapper]
100004                   D       -       0xc5ec8400  [aiod_kick taskq]
100007                   D       -       0xc5ec8280  [thread taskq]
100009                   D       -       0xc5ec8180  [config_0]
100010                   D       -       0xc5ec8100  [kqueue_ctx taskq]
100018                   D       -       0xc5ec7480  [firmware taskq]
100019                   D       -       0xc5ec7400  [crypto_0]
100020                   D       -       0xc5ec7400  [crypto_1]
100028                   D       -       0xc6089900  [vtnet0 rxq 0]
100029                   D       -       0xc6089880  [vtnet0 txq 0]
100033                   D       -       0xc4037b00  [CAM taskq]
100044                   D       -       0xc603cf00  [if_config_tqg_0]
100045                   D       -       0xc603ce80  [if_io_tqg_0]
100046                   D       -       0xc603ce00  [if_io_tqg_1]
100047                   D       -       0xc603cd80  [softirq_0]
100048                   D       -       0xc603cd00  [softirq_1]

I think It's dying on reboot because it can't read from the swapfile (which is on NFS) because it is no longer mounted.
I'm going to mark the NFS filesystem as "late", same as the swapfile to see if it helps.

@paulzhol
Copy link
Member

@paulzhol paulzhol commented Aug 28, 2020

I think I am hitting https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224479#c32 (the work dir, and everything writable on the builder is in a tmpfs /tmp mount and I was using an NFS swapfile - tried with mdconfig and without).
I switched the builder to use a partition based swap. It is still backed by an NFS file, but the hypervisor is managing that device.

@paulzhol paulzhol closed this Aug 28, 2020
@cagedmantis
Copy link
Contributor Author

@cagedmantis cagedmantis commented Aug 31, 2020

@paulzhol Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.