-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Panic error when running siegfried #172
Comments
thanks for reporting this Miguel, I'll investigate |
@richardlehane any updates on this? Currently, to workaround this issue, we've downgraded Siegfried to 1.7.13 where the issue doesn't seem to affect us. |
hi @luis100 @hmiguim I am wondering if you have any more information about what's triggering the error? I have reinstalled 1.9.1 on Ubuntu 21.04, Go 1.16.4 and I am able to return through GET and POST requests on the command line and through the web UI. Before running I ran Example GET:
Example POST:
Perhaps your command, operating environment, or build are having an impact here? NB. It looks like the error begins here. We can see what happens when we step through it with more info. NB++. I have a windows installation available, running 1.9.1 from the releases, I can also run the following in the browser (I haven't access to cURL on this machine to try that):
|
We are using a docker image based on Debian where we install siegfried and use it in server mode. docker run -it --rm keeps/roda:v4.1.1 bash
# lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 11 (bullseye)
Release: 11
Codename: bullseye
# sf -version
siegfried 1.9.1
/usr/share/siegfried/default.sig (2020-10-06T19:13:40+02:00)
identifiers:
- pronom: DROID_SignatureFile_V97.xml; container-signature-20201001.xml
config:
- multi: 32
# supervisorctl status siegfried
siegfried RUNNING pid 696055, uptime 0:41:37
# ps aux | grep sf
root 696055 8.7 1.1 1886728 187856 ? Sl 14:03 3:39 sf -serve localhost:5138 The issue only occurs when we run many threads against siegfried server using RODA Web application. I tried to reproduce the issue using command-line tools, but could not make it fail in the same way. We use base64 encoding of path, but I tried both just in case. # apt-get update; apt-get install -y parallel libany-uri-escape-perl
# find / -type f -not -path '/proc/*' -not -path '/sys/*' -not -empty | tee /tmp/filelist | wc -l
find: ‘/proc/10/map_files’: Permission denied
19195
# parallel --progress --linebuffer 'wget -q --server-response -O /dev/null "http://localhost:5138/identify/{= use URI::Escape; $_=uri_escape($_) =}?base64=false&format=json" 2>&1 | awk "/^ HTTP/{print $2}" ' :::: /tmp/filelist | sort | uniq -c
Computers / CPU cores / Max jobs to run
1:local / 8 / 8
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
local:0/19195/100%/0.0s
19195 HTTP/1.1 200 OK
# cat /tmp/filelist | parallel --progress --linebuffer 'echo "http://localhost:5138/identify/{= use MIME::Base64; $_=encode_base64($_, ""); =}?base64=true&format=json" | sed s/\\\\//g | xargs -I% wget -q --server-response -O /dev/null "%" 2>&1 | awk "/^ HTTP/{print $2}"' | sort | uniq -c
Computers / CPU cores / Max jobs to run
1:local / 8 / 8
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
local:0/19195/100%/0.0s
19195 HTTP/1.1 200 OK
# cat /var/log/supervisor/siegfried-stderr-*.log
No relevant errors But the error occurs consistently in production servers when using siegfried 1.9.1. There is also some correlation between panics and a higher amount of scans in empty files, but I am not sure that is relevant. # grep panic -A 10 /var/log/supervisor/siegfried-stderr---supervisor-70990jtu.log
panic: sync: negative WaitGroup counter
goroutine 20 [running]:
sync.(*WaitGroup).Add(0xc005f55ac0, 0xffffffffffffffff)
/home/travis/.gimme/versions/go1.15.2.linux.amd64/src/sync/waitgroup.go:74 +0x147
sync.(*WaitGroup).Done(...)
/home/travis/.gimme/versions/go1.15.2.linux.amd64/src/sync/waitgroup.go:99
main.printer(0xc00088d440, 0xc00010f7c0)
/home/travis/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:168 +0x145
created by main.main
/home/travis/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:423 +0xceb
--
panic: sync: negative WaitGroup counter
goroutine 5 [running]:
sync.(*WaitGroup).Add(0xc0058859b0, 0xffffffffffffffff)
/home/travis/.gimme/versions/go1.15.2.linux.amd64/src/sync/waitgroup.go:74 +0x147
sync.(*WaitGroup).Done(...)
/home/travis/.gimme/versions/go1.15.2.linux.amd64/src/sync/waitgroup.go:99
main.printer(0xc00080d860, 0xc0001017c0)
/home/travis/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:168 +0x145
created by main.main
/home/travis/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:423 +0xceb
--
panic: sync: negative WaitGroup counter
goroutine 5 [running]:
sync.(*WaitGroup).Add(0xc005f7f970, 0xffffffffffffffff)
/home/travis/.gimme/versions/go1.15.2.linux.amd64/src/sync/waitgroup.go:74 +0x147
sync.(*WaitGroup).Done(...)
/home/travis/.gimme/versions/go1.15.2.linux.amd64/src/sync/waitgroup.go:99
main.printer(0xc00018b4a0, 0xc0001e0f00)
/home/travis/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:168 +0x145
created by main.main
/home/travis/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:423 +0xceb
--
panic: sync: negative WaitGroup counter
goroutine 8 [running]:
sync.(*WaitGroup).Add(0xc0063ff600, 0xffffffffffffffff)
/home/travis/.gimme/versions/go1.15.2.linux.amd64/src/sync/waitgroup.go:74 +0x147
sync.(*WaitGroup).Done(...)
/home/travis/.gimme/versions/go1.15.2.linux.amd64/src/sync/waitgroup.go:99
main.printer(0xc000830c00, 0xc000101770)
/home/travis/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:168 +0x145
created by main.main
/home/travis/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:423 +0xceb
# cat /var/log/supervisor/siegfried-stderr-*.log | grep "empty source" | wc -l
1065 We will continue to try to reproduce the issue in a way we can provide it to you. |
Expanding testing into a production server using a much larger ammount of files, errors start to creep in. The panic errors come in pairs, or tupples, and seems to be this is a unprobable race condition. # find /roda/data/storage/aip/ -type f | parallel --progress --linebuffer 'wget -q --server-response -O /dev/null "http://localhost:5138/identify/{= use URI::Escape; $_=uri_escape($_) =}?base64=false&format=json" 2>&1 | awk "/^ HTTP/{print $2}"' | sort | uniq -c
Computers / CPU cores / Max jobs to run
1:local / 4 / 4
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
local:0/704393/100%/0.0s
702492 HTTP/1.1 200 OK
1901 HTTP/1.1 404 Not Found
# grep panic /var/log/supervisor/siegfried-stderr---supervisor-70990jtu.log | wc -l
35
# grep panic -A 10 /var/log/supervisor/siegfried-stderr---supervisor-70990jtu.log
panic: sync: negative WaitGroup counter
goroutine 20 [running]:
sync.(*WaitGroup).Add(0xc005f55ac0, 0xffffffffffffffff)
/home/travis/.gimme/versions/go1.15.2.linux.amd64/src/sync/waitgroup.go:74 +0x147
sync.(*WaitGroup).Done(...)
/home/travis/.gimme/versions/go1.15.2.linux.amd64/src/sync/waitgroup.go:99
main.printer(0xc00088d440, 0xc00010f7c0)
/home/travis/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:168 +0x145
created by main.main
/home/travis/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:423 +0xceb
--
panic: sync: negative WaitGroup counter
goroutine 5 [running]:
sync.(*WaitGroup).Add(0xc0058859b0, 0xffffffffffffffff)
/home/travis/.gimme/versions/go1.15.2.linux.amd64/src/sync/waitgroup.go:74 +0x147
sync.(*WaitGroup).Done(...)
/home/travis/.gimme/versions/go1.15.2.linux.amd64/src/sync/waitgroup.go:99
main.printer(0xc00080d860, 0xc0001017c0)
/home/travis/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:168 +0x145
created by main.main
/home/travis/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:423 +0xceb
-- |
Sorry for my delay in resolving this, it is quite a tricky one! I have made some progress:
Hopefully good news soon :) |
I've found a quick fix for you: rather than run with Changing this setting should only impact performance if you are giving siegfried directories to scan. If each of your http requests sends a single filename then it shouldn't make any difference to the speed & you get concurrency anyway because you can be sending multiple requests at a time. |
Thank you for your update, we will try the quick fix for now, let us know when a there's a final solution to the issue. |
I found and fixed a race condition in the code over the weekend that in my testing has resolved this bug. Please re-open if you see it again. You can access the fix in the latest release (v1.9.2). I'll be updating the debian package repository this evening. You download the new release here: https://github.com/richardlehane/siegfried/releases/tag/v1.9.2 Thanks again for taking the time to report this. |
Siegfried version: 1.9.1 in server mode
siegfried-stderr.log:
The text was updated successfully, but these errors were encountered: