New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Master timeouts during dirAssign volume growth #5213
Comments
I downgraded to 3.59 and the timeouts/failed assigns seem to be gone. Also the log message below disappeared.
As you can see the traffic on our POST server is much more stable after the downgrade. The diskutil of the volume servers is also much better. I think it could be related to #5154 |
Added a fix for #5154 |
I'm still getting the timeouts described by @bvanelst with that fix, and I think I found what's causing it. In seaweedfs/weed/server/master_server_handlers.go Lines 146 to 149 in 439377b
we wait on the seaweedfs/weed/server/master_grpc_server_volume.go Lines 64 to 67 in 439377b
but not if we exit it there seaweedfs/weed/server/master_grpc_server_volume.go Lines 72 to 76 in 439377b
It would be easy enough to explicitly close it in that seaweedfs/weed/server/master_grpc_server_volume.go Lines 30 to 34 in 439377b
@chrislusf What do you think? |
Describe the bug
Timeouts when requesting a /dir/assign at the master(s).
System Setup
/usr/local/bin/weed -v=3 -logdir=/var/log/seaweedfs master -mdir=/etc/seaweedfs -ip=10.0.9.15 -port=9333 -metrics.address=10.0.9.17:9091 -defaultReplication=010 -volumePreallocate -garbageThreshold=0.3 -volumeSizeLimitMB=20000 -peers=10.0.9.17:9333,10.0.9.14:9333,10.0.9.15:9333
/usr/local/bin/weed -v=3 -logdir=/var/log/seaweedfs volume -index=leveldb -mserver=10.0.9.17:9333,10.0.9.14:9333,10.0.9.15:9333 -dir=/volumes/98fb3388c280,/volumes/LHHGS,/volumes/e000c055cbe4,/volumes/c5a9aff45527,/volumes/619c9a0827f4,/volumes/f8c44345756f,/volumes/eeedca023938,/volumes/cae089cd2dd9,/volumes/20F30GRVRD,/volumes/KWEGS,/volumes/3d5638f4fd34,/volumes/18f39b04390d,/volumes/6a9e8c97ba2a,/volumes/LDTGS,/volumes/a9a6e2d048de,/volumes/20F308T27D,/volumes/19641ea6d6c5,/volumes/20F30JB3JE,/volumes/6e19dd8da77b,/volumes/3d1614841bcc,/volumes/372cb7e5ac18,/volumes/152d865d39ce,/volumes/20F305249D,/volumes/1289675b7f03,/volumes/222079443d03,/volumes/cc66d284719d,/volumes/ca6f98cd3c16,/volumes/6611045c7cf2,/volumes/381ee044d930,/volumes/ff81968af32c,/volumes/9d611128cfed,/volumes/21F306AD4F,/volumes/595892cb8709,/volumes/0553ccb52b90 -max=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 -concurrentDownloadLimitMB=20000 -concurrentUploadLimitMB=20000 -hasSlowRead=true -readBufferSizeMB=8 -compactionMBps=10 -rack=store02 -ip=10.0.9.2
Debian GNU/Linux 12 (bookworm) / Linux store02 6.1.0-17-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.69-1 (2023-12-30) x86_64 GNU/Linux
weed version
30GB 3.62 59b8af99b0aca1b9e88fec7b5f27c7d15e5e8604 linux amd64
Expected behavior
When we assign/request multiple keys or a single key at the Leader master, we don't expect timeouts. Normally we get an instant response. But sometimes (every x minutes) we get a timeout on the a request like: http://10.0.9.17:9333/dir/assign?collection=nntp&count=10000&replication=001. Also when we lower the count. We sadly don't get an error at this request, but I noticed when this happens I see the following log entry:
seaweedfs-master[459769]: I0116 14:50:10.468052 master_server_handlers.go:125 dirAssign volume growth {"collection":"nntp","replication":{"node":1},"ttl":{"Count":0,"Unit":0}} from 10.0.9.12:40308
It looks that this always happens when there is a dirAssign volume growth.
In parallel there are constantly POST requests directly to the volume servers to store data.
I thought a work-around was to use replication 010 or 002, but working only for a while.
We just started testing SeaweedFS and started with 3.60, but also after the upgrades to 3.62 we still see this issue. ( I didn't tested older versions )
Additional context
How I test/reproduce it:
while true; do curl --max-time 3 'http://localhost:9333/dir/assign?collection=nntp&replication=001'; echo "" ; done
I get once in a couple of seconds/minutes a timeout (also when I increase the max-time):
curl: (28) Operation timed out after 3000 milliseconds with 0 bytes received
at that moment I see always a volume growth message in the logging:
seaweedfs-master[459769]: I0116 14:50:10.468052 master_server_handlers.go:125 dirAssign volume growth {"collection":"nntp","replication":{"node":1},"ttl":{"Count":0,"Unit":0}} from 10.0.9.12:40308
Screen shot
The text was updated successfully, but these errors were encountered: