Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fake upload error #3771

Closed
wivern-co-uk opened this issue Feb 3, 2020 · 6 comments
Closed

Fake upload error #3771

wivern-co-uk opened this issue Feb 3, 2020 · 6 comments
Labels
Bug Something isn't working Tracked in Jira The issue has been tracked in Jira

Comments

@wivern-co-uk
Copy link

Right after update to storagenode v31.12 logs got flooded with upload errors. While navigating logs I found that some pieces, which supposedly failed to be uploaded, are being successfully downloaded. This is not logical, thus there might be a bug.

03.02.20 23:35:55 (+0100)  main  2020-02-03T22:35:55.805Z       INFO    piecestore      upload started  {"Piece ID": "Y3YG5SWEVXJWCXIQPQF37KKNZMKLQEMFAFH7Y7V2NXN7LQ7SORQQ", "Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "PUT"}
03.02.20 23:35:57 (+0100)  main  2020-02-03T22:35:57.328Z       INFO    piecestore      upload failed   {"Piece ID": "Y3YG5SWEVXJWCXIQPQF37KKNZMKLQEMFAFH7Y7V2NXN7LQ7SORQQ", "Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "PUT", "error": "context canceled", "errorVerbose": "context canceled\n\tstorj.io/common/rpc/rpcstatus.Wrap:79\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doUpload:483\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Upload:257\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:1066\n\tstorj.io/drpc/drpcserver.(*Server).doHandle:175\n\tstorj.io/drpc/drpcserver.(*Server).HandleRPC:153\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:114\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:147\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}
03.02.20 23:36:28 (+0100)  main  2020-02-03T22:36:28.144Z       INFO    piecestore      download started        {"Piece ID": "Y3YG5SWEVXJWCXIQPQF37KKNZMKLQEMFAFH7Y7V2NXN7LQ7SORQQ", "Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET"}
03.02.20 23:36:32 (+0100)  main  2020-02-03T22:36:32.655Z       INFO    piecestore      downloaded      {"Piece ID": "Y3YG5SWEVXJWCXIQPQF37KKNZMKLQEMFAFH7Y7V2NXN7LQ7SORQQ", "Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET"}
03.02.20 23:37:42 (+0100)  main  2020-02-03T22:37:42.606Z       INFO    piecestore      download started        {"Piece ID": "Y3YG5SWEVXJWCXIQPQF37KKNZMKLQEMFAFH7Y7V2NXN7LQ7SORQQ", "Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET"}
03.02.20 23:37:48 (+0100)  main  2020-02-03T22:37:48.298Z       INFO    piecestore      downloaded      {"Piece ID": "Y3YG5SWEVXJWCXIQPQF37KKNZMKLQEMFAFH7Y7V2NXN7LQ7SORQQ", "Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET"}
03.02.20 23:38:23 (+0100)  main  2020-02-03T22:38:23.729Z       INFO    piecestore      download started        {"Piece ID": "Y3YG5SWEVXJWCXIQPQF37KKNZMKLQEMFAFH7Y7V2NXN7LQ7SORQQ", "Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET"}
03.02.20 23:38:28 (+0100)  main  2020-02-03T22:38:28.556Z       INFO    piecestore      downloaded      {"Piece ID": "Y3YG5SWEVXJWCXIQPQF37KKNZMKLQEMFAFH7Y7V2NXN7LQ7SORQQ", "Satellite ID": "118UWpMCHzs6CvSgWd9BfFVjw5K9pZbJjkfZJexMtSkmKxvvAW", "Action": "GET"}
@MaxMaximus
Copy link

MaxMaximus commented Feb 4, 2020

Well. may be satellite was trying to upload same pieces few times? So second (and later) attempts fails due to node already has this pieces. While it can be downloaded normally.
Try to scan log for other entries with same piece ID.

I see similar but little different issues. After update to v31.12 i also see huge increase upload errors (it was ~1 - 2% before update and > 20% now).
But on my node i see satellites regularly ask node to delete pieces which was reported as upload error before. So node should not has such pieces stored (as upload was canceled) but satellite thinks node store it (and it actually do!):

2020-02-04T03:01:59.639+0300	INFO	piecestore	upload started	{"Piece ID": "C7K6JV2OG7MACS7ITJDPU7SLPKIJ36OSDPQPW5KF2MJFA4AKQKXA", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "PUT"}
2020-02-04T03:02:02.283+0300	INFO	piecestore	upload failed	{"Piece ID": "C7K6JV2OG7MACS7ITJDPU7SLPKIJ36OSDPQPW5KF2MJFA4AKQKXA", "Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Action": "PUT", "error": "context canceled", "errorVerbose": "context canceled\n\tstorj.io/common/rpc/rpcstatus.Wrap:79\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doUpload:483\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Upload:257\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:1066\n\tstorj.io/drpc/drpcserver.(*Server).doHandle:175\n\tstorj.io/drpc/drpcserver.(*Server).HandleRPC:153\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:114\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:147\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}
2020-02-04T03:03:01.005+0300	INFO	piecestore	upload started	{"Piece ID": "BEWM33JN5TUYFLT32M34QK5EX42KKIPG5QX3QQNE2U47EFQ3QIIQ", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "PUT"}
2020-02-04T03:03:02.945+0300	INFO	piecestore	upload failed	{"Piece ID": "BEWM33JN5TUYFLT32M34QK5EX42KKIPG5QX3QQNE2U47EFQ3QIIQ", "Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Action": "PUT", "error": "context canceled", "errorVerbose": "context canceled\n\tstorj.io/common/rpc/rpcstatus.Wrap:79\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doUpload:483\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Upload:257\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:1066\n\tstorj.io/drpc/drpcserver.(*Server).doHandle:175\n\tstorj.io/drpc/drpcserver.(*Server).HandleRPC:153\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:114\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:147\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}
2020-02-04T03:03:13.403+0300	INFO	piecestore	deleted	{"Satellite ID": "12L9ZFwhzVpuEKMUNUqkaTLGzwY9G24tbiigLiXpmZWKwmcNDDs", "Piece ID": "BEWM33JN5TUYFLT32M34QK5EX42KKIPG5QX3QQNE2U47EFQ3QIIQ"}
--------------------- cut ----------------------
2020-02-04T03:10:49.396+0300	INFO	piecestore	deleted	{"Satellite ID": "121RTSDpyNZVcEU84Ticf2L1ntiuUimbWgfATz21tuvgk3vzoA6", "Piece ID": "C7K6JV2OG7MACS7ITJDPU7SLPKIJ36OSDPQPW5KF2MJFA4AKQKXA"}

Also i pick last upload failed from end of log:

2020-02-04T03:16:59.306+0300	INFO	piecestore	upload started	{"Piece ID": "MB7C7ASR4JPU4RFPDEEQM3YMRTFQRZPG5EZ5ZXB6YWWICTOAAW2Q", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT"}
2020-02-04T03:17:01.288+0300	INFO	piecestore	upload failed	{"Piece ID": "MB7C7ASR4JPU4RFPDEEQM3YMRTFQRZPG5EZ5ZXB6YWWICTOAAW2Q", "Satellite ID": "12EayRS2V1kEsWESU9QMRseFhdxYxKicsiFmxrsLZHeLUtdps3S", "Action": "PUT", "error": "context canceled", "errorVerbose": "context canceled\n\tstorj.io/common/rpc/rpcstatus.Wrap:79\n\tstorj.io/storj/storagenode/piecestore.(*Endpoint).doUpload:483\n\tstorj.io/storj/storagenode/piecestore.(*drpcEndpoint).Upload:257\n\tstorj.io/common/pb.DRPCPiecestoreDescription.Method.func1:1066\n\tstorj.io/drpc/drpcserver.(*Server).doHandle:175\n\tstorj.io/drpc/drpcserver.(*Server).HandleRPC:153\n\tstorj.io/drpc/drpcserver.(*Server).ServeOne:114\n\tstorj.io/drpc/drpcserver.(*Server).Serve.func2:147\n\tstorj.io/drpc/drpcctx.(*Tracker).track:51"}

And scan "blob" folder and have found corresponding piece stored on the disk
\blobs\ukfu6bhbboxilvt7jrwlqk7y2tapb5d2r2tsmj2sjxvw5qaaaaaa\mb\7c7asr4jpu4rfpdeeqm3ymrtfqrzpg5ez5zxb6ywwictoaaw2q.sj1

So i think these upload errors are not actual errors during upload but wrong reporting to log of successfully uploads pieces as failed/canceled uploads.

@ifraixedes ifraixedes added Bug Something isn't working Needs investigation The issue needs investigation to know what's the problem and the solution labels Feb 4, 2020
@wivern-co-uk
Copy link
Author

IMHO, if satellite attempts to store piece and it's already exists at node, node should reply with success rather than error. Nodes should satisfy satellites' desires about network state, rather than exposing internal technical issues.

For example, here is analogy with deletes:
if satellite asks to delete particular piece, when it's already deleted, node response should be positive, since node in desired state already, rather than error "pieceId not found, deletion failed".

@kaloyan-raev
Copy link
Member

This issue is also discussed in the forum: https://forum.storj.io/t/context-cancelled-on-all-uploads-downloads/4399

The upload that is logged as failed was indeed a successful one. We are working on fixing the logging.

@kaloyan-raev kaloyan-raev added Tracked in Jira The issue has been tracked in Jira and removed Needs investigation The issue needs investigation to know what's the problem and the solution labels Feb 5, 2020
@phpbg
Copy link

phpbg commented Feb 23, 2020

Hi,
any update on this?

@egonelbre
Copy link
Member

egonelbre commented Feb 24, 2020

Change 8f20085 should improve the messages. It shows a distinction between cancelled and failed uploads, and similarly it won't show an error when the upload is cancelled after writing the file to the disk (since at that point, we might and might not need to serve it to the satellite).

If I remember correctly, it'll be part of v0.34 release.

@egonelbre
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working Tracked in Jira The issue has been tracked in Jira
Projects
None yet
Development

No branches or pull requests

6 participants