-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
observed in noobaa log as "cancelled due to ctime change" actual upload didn't fail #7833
Comments
uploading the noobaa.log file (gzip file) |
@romayalon or @naveenpaul1 , any update here on this |
@rkomandu I tried to reproduce the issue with the same script in my local system but couldn't reproduce it. I will try with different set up and let you know |
@romayalon and I looked at the code and think this can happen in race case if upload part is sent to different endpoints, and each one uses stat() to check the write-file attributes. However it seems that this stat is not really needed to use that ctime check because it doesn't really use those xattrs other than looking at the encryption xattr. We can address this by either removing that stat call in _finish_upload in the case of upload-part, or just disable the ctime check for that case. Will defer to next version. |
@rkomandu this issue was fixed, can you please verify? |
@romayalon I reproduced this issue with 5.15.4 using s3cat:
the client upload is stuck from that point, and in the endpoint logs I see:
|
@guymguym I tried to reproduce it 3 times on my mac without success, Are you running it on linux? |
yes, on RHEL 8.6 with GPFS 5.1.8. |
on my machine
upload done though ls -lhrt s3user-8402-dir/bucket-7833/ rpm -qa |grep noobaa kernel: 5.14.0-362.8.1.el9_3.x86_64 |
After our discussion on the recreate, uploaded the noobaa.log file --> https://ibm.box.com/s/i9qnck3az7nvh08f5poijx52nux9238h @romayalon , you should have an access to this. |
Closing the issue per merging a fix - #8158 |
@romayalon , are we fixing this in 5.15.5 ? |
@rkomandu @madhuthorat The request for backport to 5.15.5 should come from Madhu, Madhu please check this issue and if needed add it to your 5.15.5 candidates and let us know you added it. |
This is not a priority as of now for 5.15.5, but good to have if you can. |
Environment info
d/s version 0129 (noobaa-core-5.15.0-20240129.el9.x86_64)
rpm -qi noobaa-core-5.15.0-20240129.el9.x86_64
Name : noobaa-core
Version : 5.15.0
Release : 20240129.el9
Architecture: x86_64
Install Date: Wed 07 Feb 2024 01:19:10 AM CST
Group : Unspecified
Size : 409313992
License : Apache-2.0
Signature : RSA/SHA256, Tue 30 Jan 2024 10:48:47 AM CST, Key ID 1904eb93bd37e2d9
Source RPM : noobaa-core-5.15.0-20240129.el9.src.rpm
Build Date : Mon 29 Jan 2024 08:29:07 AM CST
Build Host : x86-64-07.build.eng.rdu2.redhat.com
Packager : Red Hat, Inc. http://bugzilla.redhat.com/bugzilla
Vendor : Red Hat, Inc.
...
Ran the upload of the 10G/5G/8G in a loop, 99 times the results are expected as for the first time the files doesn't exist to upload for both 10G and 8G.
grep "10737418240 file_10G" upload.log | wc -l
99
grep "8589934592 file_8G" upload.log |wc -l
99
grep "5368709120 file_5G" upload.log |wc -l
100
noobaa.log
Feb 14 05:30:14 c83f1-app3 node[3420425]: [nsfs/3420425] #33[36m [L0]#33[39m core.endpoint.s3.ops.s3_put_object_uploadId:: PUT OBJECT PART newbucket-14feb
file_10G #33[33m1178#033[39m
Feb 14 05:30:14 c83f1-app3 node[3420425]: #33[32mFeb-14 5:30:14.007#033[35m [nsfs/3420425] #33[36m [L0]#33[39m core.endpoint.s3.ops.s3_put_object_uploadI
d:: PUT OBJECT PART newbucket-14feb file_10G #33[33m1178#033[39m
Feb 14 05:30:14 c83f1-app3 node[3420425]: [nsfs/3420425] #33[31m[ERROR]#33[39m core.endpoint.s3.s3_rest:: S3 ERROR
InternalError
We encountered an internal error. Please try again./newbucket-14feb/file_10G?uploadId=1e01a581-723e-4c90-805b-b6d4aa5aef06&partNumber=1178lslpn793-1gh7mr-b72 PUT /newbucket-14feb/file_10G?uploadId=1e01a581-723e-4c90-805b-b6d4aa5aef06&partNumber=1178 {"host":"172.20.100.33:6443","accept-encoding":"identity","user-agent":"aws-cli/2.15.19 Python/3.11.6 Linux/4.18.0-240.el8.x86_64 exe/x86_64.rhel.8 prompt/off command/s3.cp","content-md5":"lplbWNTL9qqpBBtPAMf2rg==","expect":"100-continue","x-amz-date":"20240214T113014Z","x-amz-content-sha256":"UNSIGNED-PAYLOAD","authorization":"AWS4-HMAC-SHA256 Credential=qVuzjvWlHiVhcrrXf4uS/20240214/us-east-1/s3/aws4_request, SignedHeaders=content-md5;host;x-amz-content-sha256;x-amz-date, Signature=6ce9c23abc18309d2de824f4e7d15bcb5456dbe05fc5780915a21c67a123be15","content-length":"8388608"} Error: FileStat: _path=/gpfs/remote-fvt_fs/s3user6002-dir/newbucket-14feb/.noobaa-nsfs_65cc98c7237eb991d649b04c/multipart-uploads/1e01a581-723e-4c90-805b-b6d4aa5aef06 cancelled due to ctime changeFeb 14 05:30:14 c83f1-app3 node[3420425]: #33[32mFeb-14 5:30:14.009#033[35m [nsfs/3420425] #33[31m[ERROR]#33[39m core.endpoint.s3.s3_rest:: S3 ERROR
InternalError
We encountered an internal error. Please try again./newbucket-14feb/file_10G?uploadId=1e01a581-723e-4c90-805b-b6d4aa5aef06&partNumber=1178lslpn793-1gh7mr-b72 PUT /newbucket-14feb/file_10G?uploadId=1e01a581-723e-4c90-805b-b6d4aa5aef06&partNumber=1178 {"host":"172.20.100.33:6443","accept-encoding":"identity","user-agent":"aws-cli/2.15.19 Python/3.11.6 Linux/4.18.0-240.el8.x86_64 exe/x86_64.rhel.8 prompt/off command/s3.cp","content-md5":"lplbWNTL9qqpBBtPAMf2rg==","expect":"100-continue","x-amz-date":"20240214T113014Z","x-amz-content-sha256":"UNSIGNED-PAYLOAD","authorization":"AWS4-HMAC-SHA256 Credential=qVuzjvWlHiVhcrrXf4uS/20240214/us-east-1/s3/aws4_request, SignedHeaders=content-md5;host;x-amz-content-sha256;x-amz-date, Signature=6ce9c23abc18309d2de824f4e7d15bcb5456dbe05fc5780915a21c67a123be15","content-length":"8388608"} Error: FileStat: _path=/gpfs/remote-fvt_fs/s3user6002-dir/newbucket-14feb/.noobaa-nsfs_65cc98c7237eb991d649b04c/multipart-uploads/1e01a581-723e-4c90-805b-b6d4aa5aef06 cancelled due to ctime changeFeb 14 05:30:14 c83f1-app3 node[3420426]: [nsfs/3420426] #33[36m [L0]#33[39m core.endpoint.s3.ops.s3_put_object_uploadId:: PUT OBJECT PART newbucket-14feb file_10G #33[33m1179#033[39m
RH 9.3, x86_64 arch
Actual behavior
Didn't see any failures though, but what is this message , can this be investigated. As anyone highlighting as an Error will catch an eye
Expected behavior
Can this be investigated to identify it as "WARN" if really this is not an "ERROR"
Steps to reproduce
Script tried is as follows
i=1
while [ $i -le 100 ]
do
AWS_ACCESS_KEY_ID=x AWS_SECRET_ACCESS_KEY=x aws --endpoint https://172.20.100.33:6443 --no-verify-ssl s3 cp /root/file_10G s3://newbucket-14feb
AWS_ACCESS_KEY_ID=x AWS_SECRET_ACCESS_KEY=x aws --endpoint https://172.20.100.34:6443 --no-verify-ssl s3 cp /root/file_8G s3://newbucket-14feb
AWS_ACCESS_KEY_ID=x AWS_SECRET_ACCESS_KEY=x aws --endpoint https://172.20.100.31:6443 --no-verify-ssl s3 cp /root/file_5G s3://newbucket-14feb
echo "list the file"
AWS_ACCESS_KEY_ID=x AWS_SECRET_ACCESS_KEY=x aws --endpoint https://172.20.100.33:6443 --no-verify-ssl s3 ls s3://newbucket-14feb
echo "delete the file"
AWS_ACCESS_KEY_ID=x AWS_SECRET_ACCESS_KEY=x aws --endpoint https://172.20.100.36:6443 --no-verify-ssl s3 rm s3://newbucket-13feb/file_5G
AWS_ACCESS_KEY_ID=x AWS_SECRET_ACCESS_KEY=x aws --endpoint https://172.20.100.35:6443 --no-verify-ssl s3 rm s3://newbucket-14feb/file_10G
AWS_ACCESS_KEY_ID=x AWS_SECRET_ACCESS_KEY=x aws --endpoint https://172.20.100.32:6443 --no-verify-ssl s3 rm s3://newbucket-14feb/file_8G
i=
expr $i + 1
echo $i is value =====
sleep 60
done
grep "cancelled due to ctime change" messages-20240218 | grep "file_10G" | wc -l --> it happened for only this 10G object upload
20
grep "cancelled due to ctime change" messages-20240218 | grep "file_8G" | wc -l
0
grep "cancelled due to ctime change" messages-20240218 | grep "file_5G" | wc -l
0
More information - Screenshots / Logs / Other output
Will upload the noobaa.log (it is currently logged under /var/log/messages
The text was updated successfully, but these errors were encountered: