You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Can you reproduce it? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.
I have one of these incidents every couple of days, very sporadically. There are around 100 or so truncates done on these files every minute. The truncates generally happen immediately after an append on the file, which is to say, I add some data to the end of the file and then immediately remove it.
Include any warning/errors/backtraces from the system logs.
Here's the moosefs-master syslog:
Oct 05 21:25:12 vogar mfsmaster[10840]: (X.X.X.242:9422) chunk: 0000000023348B35 truncate status: Operation not completed
Oct 05 21:25:13 vogar mfsmaster[10840]: (X.X.X.242:9422) chunk: 0000000023348B25 truncate status: Operation not completed
Oct 05 21:25:21 vogar mfsmaster[10840]: (X.X.X.242:9422) chunk: 00000000233489DE truncate status: Operation not completed
Oct 05 21:25:27 vogar mfsmaster[10840]: (X.X.X.242:9422) chunk: 0000000023348B3C truncate status: Operation not completed
Oct 05 21:25:31 vogar mfsmaster[10840]: (X.X.X.242:9422) chunk: 0000000023348ADA truncate status: Operation not completed
Oct 05 21:25:31 vogar mfsmaster[10840]: (X.X.X.242:9422) chunk: 0000000023348A93 truncate status: Operation not completed
Oct 05 21:25:31 vogar mfsmaster[10840]: chunk 0000000023348B3C_00000314: chunk in middle of operation TRUNCATE, but no chunk server is busy - finish operation
Oct 05 21:25:31 vogar mfsmaster[10840]: (X.X.X.245:9422) chunk: 0000000023348B3C truncate status: Operation not completed
Oct 05 21:25:31 vogar mfsmaster[10840]: chunk 0000000023348B3C_00000313: chunk in middle of operation TRUNCATE, but no chunk server is busy - finish operation
Oct 05 21:25:31 vogar mfsmaster[10840]: (X.X.X.245:9422) chunk: 0000000023348B3C truncate status: Operation not completed
Oct 05 21:25:32 vogar mfsmaster[10840]: chunk 0000000023348B3C_00000312: chunk in middle of operation TRUNCATE, but no chunk server is busy - finish operation
Oct 05 21:25:32 vogar mfsmaster[10840]: (X.X.X.245:9422) chunk: 0000000023348B3C truncate status: Operation not completed
Oct 05 21:25:32 vogar mfsmaster[10840]: chunk 0000000023348B3C_00000311: chunk in middle of operation TRUNCATE, but no chunk server is busy - finish operation
Oct 05 21:25:32 vogar mfsmaster[10840]: (X.X.X.245:9422) chunk: 0000000023348B3C truncate status: Operation not completed
Oct 05 21:25:33 vogar mfsmaster[10840]: (X.X.X.242:9422) chunk: 0000000023348B3C truncate status: Operation not completed
Oct 05 21:25:33 vogar mfsmaster[10840]: chunk 0000000023348B3C_00000310: chunk in middle of operation TRUNCATE, but no chunk server is busy - finish operation
Oct 05 21:25:33 vogar mfsmaster[10840]: (X.X.X.245:9422) chunk: 0000000023348B3C truncate status: Operation not completed
Oct 05 21:25:34 vogar mfsmaster[10840]: (X.X.X.242:9422) chunk: 0000000023348B50 truncate status: Operation not completed
Oct 05 21:25:34 vogar mfsmaster[10840]: chunk 0000000023348B3C_00000311: chunk in middle of operation TRUNCATE, but no chunk server is busy - finish operation
Oct 05 21:25:34 vogar mfsmaster[10840]: chunk 0000000023348B3C has only copies with wrong versions (1) - please repair it manually
Oct 05 21:25:34 vogar mfsmaster[10840]: chunk 0000000023348B3C_00000311 - invalid copy on (X.X.X.245 - ver:00000000)
Oct 05 21:25:34 vogar mfsmaster[10840]: chunk 0000000023348B3C_00000311 - copy with wrong version on (X.X.X.242 - ver:00000311)
Oct 05 21:25:34 vogar mfsmaster[10840]: (X.X.X.245:9422) chunk: 0000000023348B3C truncate status: Wrong chunk version
Oct 05 21:25:35 vogar mfsmaster[10840]: chunk 0000000023348B3C has only copies with wrong versions (1) - please repair it manually
Oct 05 21:25:35 vogar mfsmaster[10840]: chunk 0000000023348B3C_00000311 - invalid copy on (X.X.X.245 - ver:00000000)
Oct 05 21:25:35 vogar mfsmaster[10840]: chunk 0000000023348B3C_00000311 - copy with wrong version on (X.X.X.242 - ver:00000311)
The text was updated successfully, but these errors were encountered:
@njaard you can try and patch your instance with the new fix we have just submitted and this should solve your problem. However, your load is quite high. How many max workers do you have in your chunk server's config? If the overall system load of your machines is not too high, you can consider upping this number to increase the overall performance of your MooseFS instance.
Have you read through available documentation, open Github issues and Github Q&A Discussions?
Yes
System information
Your moosefs version and its origin (moosefs.com, packaged by distro, built from source, ...).
Debian packages from moosefs.com version 3.0.116.
Operating system (distribution) and kernel version.
Debian bullseye, 5.10.0-8
Hardware / network configuration, and underlying filesystems on master, chunkservers, and clients.
4 chunkservers and 1Gbit or 10Gbit ethernet between them. The same machines (plus two more) have the clients and one of them has the master.
The filesystems are by and large xfs with a smaller number that are ext4.
The chunkservers often report
connect to X.X.X.X:9422 failed, error: ETIMEDOUT (Operation timed out)
The machines are all quite busy, but the disk drives themselves are exclusively used by moosefs-chunkserver.
How much data is tracked by moosefs master (order of magnitude)?
Describe the problem you observed.
This has been happening rarely, occasionally, a file (with potential for concurrent readers) gets
ftruncate
d, and then the last chunk shows this:Can you reproduce it? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.
I have one of these incidents every couple of days, very sporadically. There are around 100 or so truncates done on these files every minute. The truncates generally happen immediately after an append on the file, which is to say, I add some data to the end of the file and then immediately remove it.
Include any warning/errors/backtraces from the system logs.
Here's the moosefs-master syslog:
The text was updated successfully, but these errors were encountered: