Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download fails: Migrate error loop #198

Closed
patrikhermansson opened this issue Mar 13, 2021 · 9 comments
Closed

Download fails: Migrate error loop #198

patrikhermansson opened this issue Mar 13, 2021 · 9 comments
Labels
area: client telegram package issues bug Something isn't working

Comments

@patrikhermansson
Copy link

Hi, sorry if this is not a clearly defined issue. I'm getting a very high number of errors on file downloads using v0.27.0 but this happened with previous versions as well.

Somewhere between 20-60% percent of file downloads fail (depending on time I try) which seems too high but it's notably not all of them, many succeeds as well and I can't differentiate between the ones who succeeds and fails. They usually fail with error: get file: get next chunk: rpcDoRequest: rpc error code 400: FILE_ID_INVALID which could be my fault but info logs looks strange. Once in a while they fail with get file: get next chunk: invoke pool: rpcDoRequest: rpc error code 401: AUTH_KEY_UNREGISTERED.

Before the download errors with either of these messages the INFO log prints, the line below: Got migrate error: Creating sub-connection several hundred times before I usually get 400: FILE_ID_INVALID, occaccionally I get FILE_ID_INVALID right away. However, in some cases it also succeeds but the amount of migrate error seems strange?

This is the line that gets printed hundreds of times, sometimes for up to to several minutes, before the download fails:
INFO telegram/invoke_raw.go:23 Got migrate error: Creating sub-connection {"v": "v0.27.0", "error": "FILE_MIGRATE", "dc": 5}
INFO telegram/invoke_raw.go:23 Got migrate error: Creating sub-connection {"v": "v0.27.0", "error": "FILE_MIGRATE", "dc": 5}
INFO telegram/invoke_raw.go:23 Got migrate error: Creating sub-connection {"v": "v0.27.0", "error": "FILE_MIGRATE", "dc": 5}

I'm calling both downloader.NewDownloader().DownloadDirect(c, loc).Stream(ctx, f) as well as downloader.NewDownloader().Download(c, loc).Stream(ctx, f) so there doesn't appear to be a difference when I use CDN.

Happy to try to provide more info if you tell me what could be useful.

@ernado ernado added area: client telegram package issues bug Something isn't working labels Mar 13, 2021
@ernado
Copy link
Member

ernado commented Mar 13, 2021

Hi! Thank you for your report, we are investigating this problem.

The Got migrate error: Creating sub-connection is fine, because in current architecture we are performing two requests per chunk (e.g. first request to primary dc, then getting a migrate, then target dc). This should improve after refactoring of downloader.

The AUTH_KEY_UNREGISTERED is probably due to some kind of TTL of connections with imported authentication, probably we can handle this error via auto-reconnecting.

The FILE_ID_INVALID is probably due to access hash expiration.

Can you please answer following questions:

  1. Does retries help? E.g. if you are trying to download failed file again, does it eventually succeed?
  2. How long it takes between start of download to failure?
  3. How long it takes between getting file access hash to failure?
  4. What is your session main DC? It usually appear in log.

ernado added a commit that referenced this issue Mar 13, 2021
@ernado
Copy link
Member

ernado commented Mar 13, 2021

I've tried (roughtly) following code for couple of minutes:

func run(ctx context.Context) error {
	client := telegram.NewClient(telegram.TestAppID, telegram.TestAppHash, telegram.Options{})

	d := downloader.NewDownloader()
	raw := tg.NewClient(client)
	sender := message.NewSender(raw)

	return client.Run(ctx, func(ctx context.Context) error {
		peer, err := sender.Resolve("https://t.me/ahangify").AsInputPeer(ctx)
		if err != nil {
			return xerrors.Errorf("resolve: %w", err)
		}

		return messages.NewQueryBuilder(raw).Search(peer).Video().
			ForEach(ctx, func(ctx context.Context, elem messages.Elem) error {
				msg, ok := elem.Msg.(*tg.Message)
				if !ok {
					return nil
				}

				media, ok := msg.Media.(*tg.MessageMediaDocument)
				if !ok {
					return xerrors.Errorf("unexpected type %T", msg.Media)
				}

				doc, ok := media.Document.AsNotEmpty()
				if !ok {
					return xerrors.Errorf("unexpected type %T", media.Document)
				}

				_, err := d.Download(raw, doc.AsInputDocumentFileLocation()).Stream(ctx, io.Discard)
				return err
			})
	})
}

This downloads all videos from @ahangify channel (just selected most popular Iranian channel for test).

I've encountered no errors so far. My primary DC was 4, and channel DC was 2, so I've got lots of Got migrate error: Creating sub-connection in log.

My primary hypothesis is that you are downloading files by references that are already expired.

@patrikhermansson
Copy link
Author

Thank for looking into it! I'm not so sure it's about expired hashes. I've run a test now for a few hours on some active channels to try to answer your questions. Also made some new discoveries. Generally see little difference between time till download from receiving and time to finished download between successful and unsuccessful attempts.

Worth noting is that I'm not scraping specific channels but saving attachments to message that come as update streams. Otherwise what I'm doing looks similar to yours.

Does retries help? E.g. if you are trying to download failed file again, does it eventually succeed?
Almost never, except in one or two cases with timeout errors but that's different. I try all downloads 3 times and they work on the first or none at all after, almost without exceptions.

How long it takes between start of download to failure?
Often less than a second, sometimes slightly more but <2 seconds usually. I realise this contradicts what I wrote earlier about a long list of migrate error but these seems to be exceptions. I've got individual cases of successful downloads up to about 40 seconds after receiving.

How long it takes between getting file access hash to failure?
Also < 5 seconds. Usually <2-3 seconds from receiving a message with media to having download failure, some higher ones around 10 seconds but I've got successes at that those times as well. Generally there is little difference between time to success and time to failure. Almost all cases <5 seconds. Assume hash can't expire that fast but can't find any documentation on this.

What is your session main DC? It usually appear in log.
2 and 4. I've tried with different accounts as well as both connected at the same time.

Having run my test now for a few hours it appears to be getting worse, higher failure rate with FILE_ID_INVALID but also a lot more AUTH_KEY_UNREGISTERED, I can in many cases get a AUTH_KEY_UNREGISTERED only to get an FILE_ID_INVALID on the second attempt but this is still mostly <2 seconds after receiving message. I've not got any FLOOD warnings though.

@ernado
Copy link
Member

ernado commented Mar 14, 2021

I'll try doing same on active channels to reproduce this issue.

My second hypothesis is that telegram doing some weird throttling, so you can try rate limiting your downloads.

@ernado
Copy link
Member

ernado commented Mar 14, 2021

Can you please try v0.27.1 version? I've removed confusing errors (that are not actually errors) from log.

@patrikhermansson
Copy link
Author

Sorry for late reply, I've been trying to figure this out but not gotten really far. 27.1 helped in debugging however. Problem seems to be AUTH_KEY_UNREGISTERED mainly and it starts coming after about 20-40min after starting. Could be that it's a strange error message that actually indicates rate limiting however I don't think so as if I start getting AUTH_KEY_UNREGISTERED on every message and then just restart, it works again right away.

Here's a log that might help, I get a few messages indicating a reconnect / re-auth, and right after I get the first AUTH_KEY_UNREGISTERED and after that I get it on all attempts to download. I get one or two connection errors before this which doesn't cause downloads to break however.

Download start / download failed are log messages i add to keep track of my downloads.

�[36mtelegram_1 |�[0m 2021-03-14 18:44:11.899816 I | Successful document download on attempt: 1
�[36mtelegram_1 |�[0m 2021-03-14 18:44:11.899883 I | Seconds from task add: 1235 from start download: 17
�[36mtelegram_1 |�[0m 2021-03-14 18:44:12.207359 I | Download start: 5942904513734226707
�[36mtelegram_1 |�[0m 2021-03-14 18:44:12.340187 I | Successful photo download on attempt: 1
�[36mtelegram_1 |�[0m 2021-03-14 18:44:12.340240 I | Seconds from task add: 1228 from start download: 0
�[36mtelegram_1 |�[0m 2021-03-14 18:44:13.063043 I | Download start: 5940333064818264036
�[36mtelegram_1 |�[0m 2021-03-14 18:44:18.848617 I | Successful document download on attempt: 1
�[36mtelegram_1 |�[0m 2021-03-14 18:44:18.848768 I | Seconds from task add: 1234 from start download: 5
�[36mtelegram_1 |�[0m 2021-03-14T18:44:19.789Z INFO conn.mtproto.rpc rpc/engine.go:264 Close called {"conn_id": 2, "dc_id": 1, "addr": "149.154.175.60:443"}
�[36mtelegram_1 |�[0m 2021-03-14T18:44:19.790Z INFO conn manager/conn.go:120 Connection dead {"conn_id": 2, "dc_id": 1, "error": "group: task pingLoop: disconnect (pong missed): context deadline exceeded", "errorVerbose": "group:\n github.com/gotd/td/internal/mtproto.(*Conn).Run\n /Users/patrikhermansson/go/pkg/mod/github.com/gotd/td@v0.27.1/internal/mtproto/conn.go:185\n - task pingLoop:\n github.com/gotd/td/internal/tdsync.(*LogGroup).Go.func1\n /Users/patrikhermansson/go/pkg/mod/github.com/gotd/td@v0.27.1/internal/tdsync/log_group.go:48\n - disconnect (pong missed):\n github.com/gotd/td/internal/mtproto.(*Conn).pingLoop\n /Users/patrikhermansson/go/pkg/mod/github.com/gotd/td@v0.27.1/internal/mtproto/ping.go:120\n - context deadline exceeded"}
�[36mtelegram_1 |�[0m 2021-03-14 18:44:21.035112 I | Download start: 2277752265240477760
�[36mtelegram_1 |�[0m 2021-03-14T18:44:21.069Z INFO conn.mtproto.rpc rpc/engine.go:52 Initialized {"conn_id": 4, "dc_id": 1, "addr": "149.154.175.60:443", "retry_interval": "5s", "max_retries": 5}
�[36mtelegram_1 |�[0m 2021-03-14T18:44:21.184Z INFO conn.mtproto mtproto/conn.go:201 Dialed transport {"conn_id": 4, "dc_id": 1, "addr": "149.154.175.60:443", "addr": "1|149.154.175.60:443"}
�[36mtelegram_1 |�[0m 2021-03-14T18:44:21.184Z INFO conn.mtproto mtproto/conn.go:207 Generating new auth key {"conn_id": 4, "dc_id": 1, "addr": "149.154.175.60:443"}
�[36mtelegram_1 |�[0m 2021-03-14 18:44:21.575766 I | Recieved delete: {1346723132 35942}
�[36mtelegram_1 |�[0m 2021-03-14T18:44:22.675Z INFO conn.mtproto mtproto/conn.go:213 Auth key generated {"conn_id": 4, "dc_id": 1, "addr": "149.154.175.60:443", "duration": "1.490696504s"}
�[36mtelegram_1 |�[0m 2021-03-14T18:44:22.787Z INFO conn manager/conn.go:76 SessionInit {"conn_id": 4, "dc_id": 1}
�[36mtelegram_1 |�[0m 2021-03-14 18:44:22.900805 I | Document download failed: get file: get next chunk: invoke pool: rpcDoRequest: rpc error code 401: AUTH_KEY_UNREGISTERED on attempt 1
�[36mtelegram_1 |�[0m 2021-03-14 18:44:22.900844 I | Seconds from task add: 1236 from start download: 1
�[36mtelegram_1 |�[0m 2021-03-14 18:44:22.901125 I | failed download task: get file: get next chunk: invoke pool: rpcDoRequest: rpc error code 401: AUTH_KEY_UNREGISTERED
�[36mtelegram_1 |�[0m 2021-03-14 18:44:22.907163 I | Download start: 2767052720765403161
�[36mtelegram_1 |�[0m 2021-03-14T18:44:23.062Z INFO conn.mtproto mtproto/rpc.go:41 Retrying request after basMsgErr {"conn_id": 1, "dc_id": 1, "addr": "149.154.175.51:443", "msg_id": 6939582508838983484}
�[36mtelegram_1 |�[0m 2021-03-14 18:44:23.434667 I | Successful document download on attempt: 1
�[36mtelegram_1 |�[0m 2021-03-14 18:44:23.434712 I | Seconds from task add: 1231 from start download: 0
�[36mtelegram_1 |�[0m 2021-03-14 18:44:23.588667 I | Download start: 5073395745776730409
�[36mtelegram_1 |�[0m 2021-03-14T18:44:24.087Z INFO conn.mtproto.rpc rpc/engine.go:264 Close called {"conn_id": 1, "dc_id": 1, "addr": "149.154.175.51:443"}
�[36mtelegram_1 |�[0m 2021-03-14 18:44:24.087556 I | Document download failed: get file: get next chunk: invoke pool: rpcDoRequest: engine forcibly closed: context canceled on attempt 1
�[36mtelegram_1 |�[0m 2021-03-14 18:44:24.087580 I | Seconds from task add: 1231 from start download: 0
�[36mtelegram_1 |�[0m 2021-03-14 18:44:24.087770 I | failed download task: get file: get next chunk: invoke pool: rpcDoRequest: engine forcibly closed: context canceled
�[36mtelegram_1 |�[0m 2021-03-14T18:44:24.087Z INFO conn manager/conn.go:120 Connection dead {"conn_id": 1, "dc_id": 1, "error": "group: task pingLoop: disconnect (pong missed): context deadline exceeded", "errorVerbose": "group:\n github.com/gotd/td/internal/mtproto.(*Conn).Run\n /Users/patrikhermansson/go/pkg/mod/github.com/gotd/td@v0.27.1/internal/mtproto/conn.go:185\n - task pingLoop:\n github.com/gotd/td/internal/tdsync.(*LogGroup).Go.func1\n /Users/patrikhermansson/go/pkg/mod/github.com/gotd/td@v0.27.1/internal/tdsync/log_group.go:48\n - disconnect (pong missed):\n github.com/gotd/td/internal/mtproto.(*Conn).pingLoop\n /Users/patrikhermansson/go/pkg/mod/github.com/gotd/td@v0.27.1/internal/mtproto/ping.go:120\n - context deadline exceeded"}
�[36mtelegram_1 |�[0m 2021-03-14 18:44:24.097004 I | Download start: 825482114311913735
�[36mtelegram_1 |�[0m 2021-03-14 18:44:24.119111 I | Recieved delete: {1331374507 208947}
�[36mtelegram_1 |�[0m 2021-03-14T18:44:24.130Z INFO conn.mtproto.rpc rpc/engine.go:52 Initialized {"conn_id": 4, "dc_id": 1, "addr": "149.154.175.51:443", "retry_interval": "5s", "max_retries": 5}
�[36mtelegram_1 |�[0m 2021-03-14T18:44:24.237Z INFO conn.mtproto mtproto/conn.go:201 Dialed transport {"conn_id": 4, "dc_id": 1, "addr": "149.154.175.51:443", "addr": "1|149.154.175.51:443"}
�[36mtelegram_1 |�[0m 2021-03-14T18:44:24.237Z INFO conn.mtproto mtproto/conn.go:207 Generating new auth key {"conn_id": 4, "dc_id": 1, "addr": "149.154.175.51:443"}
�[36mtelegram_1 |�[0m 2021-03-14 18:44:25.161091 I | failed to get input GetMessageFromChannelByID for channel -1001346125173
�[36mtelegram_1 |�[0m 2021-03-14 18:44:25.161135 I | Failed to get replied user for msg 2014963564544 failed to get input GetMessageFromChannelByID for channel
�[36mtelegram_1 |�[0m 2021-03-14T18:44:25.253Z INFO conn.mtproto mtproto/conn.go:213 Auth key generated {"conn_id": 4, "dc_id": 1, "addr": "149.154.175.51:443", "duration": "1.015815551s"}
�[36mtelegram_1 |�[0m 2021-03-14T18:44:25.358Z INFO conn manager/conn.go:76 SessionInit {"conn_id": 4, "dc_id": 1}
�[36mtelegram_1 |�[0m 2021-03-14 18:44:25.467663 I | Document download failed: get file: get next chunk: invoke pool: rpcDoRequest: rpc error code 401: AUTH_KEY_UNREGISTERED on attempt 1
�[36mtelegram_1 |�[0m 2021-03-14 18:44:25.467734 I | Seconds from task add: 1225 from start download: 1
�[36mtelegram_1 |�[0m 2021-03-14 18:44:25.468285 I | failed download task: get file: get next chunk: invoke pool: rpcDoRequest: rpc error code 401: AUTH_KEY_UNREGISTERED
�[36mtelegram_1 |�[0m 2021-03-14 18:44:25.474510 I | Download start: 5001809537284440512
�[36mtelegram_1 |�[0m 2021-03-14 18:44:25.622791 I | Document download failed: get file: get next chunk: invoke pool: rpcDoRequest: rpc error code 401: AUTH_KEY_UNREGISTERED on attempt 1
�[36mtelegram_1 |�[0m 2021-03-14 18:44:25.622827 I | Seconds from task add: 1224 from start download: 0
�[36mtelegram_1 |�[0m 2021-03-14 18:44:25.623372 I | failed download task: get file: get next chunk: invoke pool: rpcDoRequest: rpc error code 401: AUTH_KEY_UNREGISTERED
�[36mtelegram_1 |�[0m 2021-03-14 18:44:25.634789 I | Download start: 5046782848510460180
�[36mtelegram_1 |�[0m 2021-03-14 18:44:25.788814 I | Document download failed: get file: get next chunk: invoke pool: rpcDoRequest: rpc error code 401: AUTH_KEY_UNREGISTERED on attempt 1
�[36mtelegram_1 |�[0m 2021-03-14 18:44:25.788916 I | Seconds from task add: 1221 from start download: 0
�[36mtelegram_1 |�[0m 2021-03-14 18:44:25.789257 I | failed download task: get file: get next chunk: invoke pool: rpcDoRequest: rpc error code 401: AUTH_KEY_UNREGISTERED
�[36mtelegram_1 |�[0m 2021-03-14 18:44:25.795211 I | Download start: 5942904066601388580

@tdakkota
Copy link
Member

Hi @patrikhermansson! Thank you for your report!

I found some bugs related to session management, fixed and released v0.28.1. Can you try it please?

@patrikhermansson
Copy link
Author

Thanks! v0.28.1 solves the problem! I've been running it for a few hours now without a single AUTH_KEY_UNREGISTERED which I've not managed before. Could close this.

@ernado
Copy link
Member

ernado commented Mar 15, 2021

Great! Closing this for now.

@ernado ernado closed this as completed Mar 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: client telegram package issues bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants