Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After recording for about 5 minutes, the Aeron Client check timeout exception #373

Closed
jordanxlj opened this issue Jul 7, 2017 · 21 comments

Comments

@jordanxlj
Copy link

After recording for about 5 minutes, the Aeron Client check timeout exception. The exception is below:

io.aeron.exceptions.ConductorServiceTimeoutException: Timeout between service calls over 5000000000ns
at io.aeron.ClientConductor.onCheckTimeouts(ClientConductor.java:508)
at io.aeron.ClientConductor.doWork(ClientConductor.java:431)
at io.aeron.ClientConductor.doWork(ClientConductor.java:143)
at org.agrona.concurrent.AgentInvoker.invoke(AgentInvoker.java:88)
at io.aeron.archiver.ArchiveConductor.doWork(ArchiveConductor.java:116)
at org.agrona.concurrent.AgentRunner.run(AgentRunner.java:140)
at java.lang.Thread.run(Thread.java:745)

@jordanxlj
Copy link
Author

Stat is below:
15:12:52 - Aeron Stat (CnC v7), client liveness 5,000,000,000 ns

0: 1,881,375,360 - Bytes sent
1: 8,368,437,024 - Bytes received
2: 0 - Failed offers to ReceiverProxy
3: 0 - Failed offers to SenderProxy
4: 0 - Failed offers to DriverConductorProxy
5: 1,800 - NAKs sent
6: 0 - NAKs received
7: 21,109 - Status Messages sent
8: 4,633 - Status Messages received
9: 6,065 - Heartbeats sent
10: 11,909 - Heartbeats received
11: 0 - Retransmits sent
12: 113 - Flow control under runs
13: 710 - Flow control over runs
14: 0 - Invalid packets
15: 0 - Errors
16: 0 - Short sends
17: 3,888 - Client keep-alives
18: 16 - Sender flow control limits applied
19: 0 - Unblocked Publications
20: 0 - Unblocked Control Commands
21: 0 - Possible TTL Asymmetry
22: 0 - ControllableIdleStrategy status
23: 0 - Loss gap fills
24: 1 - snd-channel: aeron:udp?endpoint=224.110.110.111:8090
25: 0 - snd-pos: 1 1279736809 65535 aeron:udp?endpoint=224.110.110.111:8090
26: 0 - snd-lmt: 1 1279736809 65535 aeron:udp?endpoint=224.110.110.111:8090
27: 0 - pub-lmt: 1 1279736809 65535 aeron:udp?endpoint=224.110.110.111:8090
28: 1 - snd-channel: aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10
29: 1,887,433,888 - snd-pos: 5 1279736810 15 aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10
30: 1,887,436,800 - snd-lmt: 5 1279736810 15 aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10
31: 1,887,433,888 - pub-lmt: 5 1279736810 15 aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10

15:12:53 - Aeron Stat (CnC v7), client liveness 5,000,000,000 ns

0: 1,881,375,680 - Bytes sent
1: 8,368,437,024 - Bytes received
2: 0 - Failed offers to ReceiverProxy
3: 0 - Failed offers to SenderProxy
4: 0 - Failed offers to DriverConductorProxy
5: 1,800 - NAKs sent
6: 0 - NAKs received
7: 21,109 - Status Messages sent
8: 4,633 - Status Messages received
9: 6,075 - Heartbeats sent
10: 11,909 - Heartbeats received
11: 0 - Retransmits sent
12: 113 - Flow control under runs
13: 710 - Flow control over runs
14: 0 - Invalid packets
15: 0 - Errors
16: 0 - Short sends
17: 3,894 - Client keep-alives
18: 16 - Sender flow control limits applied
19: 0 - Unblocked Publications
20: 0 - Unblocked Control Commands
21: 0 - Possible TTL Asymmetry
22: 0 - ControllableIdleStrategy status
23: 0 - Loss gap fills
24: 1 - snd-channel: aeron:udp?endpoint=224.110.110.111:8090
25: 0 - snd-pos: 1 1279736809 65535 aeron:udp?endpoint=224.110.110.111:8090
26: 0 - snd-lmt: 1 1279736809 65535 aeron:udp?endpoint=224.110.110.111:8090
27: 0 - pub-lmt: 1 1279736809 65535 aeron:udp?endpoint=224.110.110.111:8090
28: 1 - snd-channel: aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10
29: 1,887,433,888 - snd-pos: 5 1279736810 15 aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10
30: 1,887,436,800 - snd-lmt: 5 1279736810 15 aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10
31: 1,887,433,888 - pub-lmt: 5 1279736810 15 aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10

15:12:54 - Aeron Stat (CnC v7), client liveness 5,000,000,000 ns

0: 1,881,376,000 - Bytes sent
1: 8,368,437,024 - Bytes received
2: 0 - Failed offers to ReceiverProxy
3: 0 - Failed offers to SenderProxy
4: 0 - Failed offers to DriverConductorProxy
5: 1,800 - NAKs sent
6: 0 - NAKs received
7: 21,109 - Status Messages sent
8: 4,633 - Status Messages received
9: 6,085 - Heartbeats sent
10: 11,909 - Heartbeats received
11: 0 - Retransmits sent
12: 113 - Flow control under runs
13: 710 - Flow control over runs
14: 0 - Invalid packets
15: 0 - Errors
16: 0 - Short sends
17: 3,900 - Client keep-alives
18: 16 - Sender flow control limits applied
19: 0 - Unblocked Publications
20: 0 - Unblocked Control Commands
21: 0 - Possible TTL Asymmetry
22: 0 - ControllableIdleStrategy status
23: 0 - Loss gap fills
24: 1 - snd-channel: aeron:udp?endpoint=224.110.110.111:8090
25: 0 - snd-pos: 1 1279736809 65535 aeron:udp?endpoint=224.110.110.111:8090
26: 0 - snd-lmt: 1 1279736809 65535 aeron:udp?endpoint=224.110.110.111:8090
27: 0 - pub-lmt: 1 1279736809 65535 aeron:udp?endpoint=224.110.110.111:8090
28: 1 - snd-channel: aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10
29: 1,887,433,888 - snd-pos: 5 1279736810 15 aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10
30: 1,887,436,800 - snd-lmt: 5 1279736810 15 aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10
31: 1,887,433,888 - pub-lmt: 5 1279736810 15 aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10

15:12:55 - Aeron Stat (CnC v7), client liveness 5,000,000,000 ns

0: 1,881,376,320 - Bytes sent
1: 8,368,437,024 - Bytes received
2: 0 - Failed offers to ReceiverProxy
3: 0 - Failed offers to SenderProxy
4: 0 - Failed offers to DriverConductorProxy
5: 1,800 - NAKs sent
6: 0 - NAKs received
7: 21,109 - Status Messages sent
8: 4,633 - Status Messages received
9: 6,095 - Heartbeats sent
10: 11,909 - Heartbeats received
11: 0 - Retransmits sent
12: 113 - Flow control under runs
13: 710 - Flow control over runs
14: 0 - Invalid packets
15: 0 - Errors
16: 0 - Short sends
17: 3,906 - Client keep-alives
18: 16 - Sender flow control limits applied
19: 0 - Unblocked Publications
20: 0 - Unblocked Control Commands
21: 0 - Possible TTL Asymmetry
22: 0 - ControllableIdleStrategy status
23: 0 - Loss gap fills
24: 1 - snd-channel: aeron:udp?endpoint=224.110.110.111:8090
25: 0 - snd-pos: 1 1279736809 65535 aeron:udp?endpoint=224.110.110.111:8090
26: 0 - snd-lmt: 1 1279736809 65535 aeron:udp?endpoint=224.110.110.111:8090
27: 0 - pub-lmt: 1 1279736809 65535 aeron:udp?endpoint=224.110.110.111:8090
28: 1 - snd-channel: aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10
29: 1,887,433,888 - snd-pos: 5 1279736810 15 aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10
30: 1,887,436,800 - snd-lmt: 5 1279736810 15 aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10
31: 1,887,433,888 - pub-lmt: 5 1279736810 15 aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10

15:12:56 - Aeron Stat (CnC v7), client liveness 5,000,000,000 ns

0: 1,881,376,640 - Bytes sent
1: 8,368,437,024 - Bytes received
2: 0 - Failed offers to ReceiverProxy
3: 0 - Failed offers to SenderProxy
4: 0 - Failed offers to DriverConductorProxy
5: 1,800 - NAKs sent
6: 0 - NAKs received
7: 21,109 - Status Messages sent
8: 4,633 - Status Messages received
9: 6,105 - Heartbeats sent
10: 11,909 - Heartbeats received
11: 0 - Retransmits sent
12: 113 - Flow control under runs
13: 710 - Flow control over runs
14: 0 - Invalid packets
15: 0 - Errors
16: 0 - Short sends
17: 3,912 - Client keep-alives
18: 16 - Sender flow control limits applied
19: 0 - Unblocked Publications
20: 0 - Unblocked Control Commands
21: 0 - Possible TTL Asymmetry
22: 0 - ControllableIdleStrategy status
23: 0 - Loss gap fills
24: 1 - snd-channel: aeron:udp?endpoint=224.110.110.111:8090
25: 0 - snd-pos: 1 1279736809 65535 aeron:udp?endpoint=224.110.110.111:8090
26: 0 - snd-lmt: 1 1279736809 65535 aeron:udp?endpoint=224.110.110.111:8090
27: 0 - pub-lmt: 1 1279736809 65535 aeron:udp?endpoint=224.110.110.111:8090
28: 1 - snd-channel: aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10
29: 1,887,433,888 - snd-pos: 5 1279736810 15 aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10
30: 1,887,436,800 - snd-lmt: 5 1279736810 15 aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10
31: 1,887,433,888 - pub-lmt: 5 1279736810 15 aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10

15:12:57 - Aeron Stat (CnC v7), client liveness 5,000,000,000 ns

0: 1,881,376,960 - Bytes sent
1: 8,368,437,024 - Bytes received
2: 0 - Failed offers to ReceiverProxy
3: 0 - Failed offers to SenderProxy
4: 0 - Failed offers to DriverConductorProxy
5: 1,800 - NAKs sent
6: 0 - NAKs received
7: 21,109 - Status Messages sent
8: 4,633 - Status Messages received
9: 6,115 - Heartbeats sent
10: 11,909 - Heartbeats received
11: 0 - Retransmits sent
12: 113 - Flow control under runs
13: 710 - Flow control over runs
14: 0 - Invalid packets
15: 0 - Errors
16: 0 - Short sends
17: 3,918 - Client keep-alives
18: 16 - Sender flow control limits applied
19: 0 - Unblocked Publications
20: 0 - Unblocked Control Commands
21: 0 - Possible TTL Asymmetry
22: 0 - ControllableIdleStrategy status
23: 0 - Loss gap fills
24: 1 - snd-channel: aeron:udp?endpoint=224.110.110.111:8090
25: 0 - snd-pos: 1 1279736809 65535 aeron:udp?endpoint=224.110.110.111:8090
26: 0 - snd-lmt: 1 1279736809 65535 aeron:udp?endpoint=224.110.110.111:8090
27: 0 - pub-lmt: 1 1279736809 65535 aeron:udp?endpoint=224.110.110.111:8090
28: 1 - snd-channel: aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10
29: 1,887,433,888 - snd-pos: 5 1279736810 15 aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10
30: 1,887,436,800 - snd-lmt: 5 1279736810 15 aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10
31: 1,887,433,888 - pub-lmt: 5 1279736810 15 aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10

15:12:58 - Aeron Stat (CnC v7), client liveness 5,000,000,000 ns

0: 1,881,377,280 - Bytes sent
1: 8,368,437,024 - Bytes received
2: 0 - Failed offers to ReceiverProxy
3: 0 - Failed offers to SenderProxy
4: 0 - Failed offers to DriverConductorProxy
5: 1,800 - NAKs sent
6: 0 - NAKs received
7: 21,109 - Status Messages sent
8: 4,633 - Status Messages received
9: 6,125 - Heartbeats sent
10: 11,909 - Heartbeats received
11: 0 - Retransmits sent
12: 113 - Flow control under runs
13: 710 - Flow control over runs
14: 0 - Invalid packets
15: 0 - Errors
16: 0 - Short sends
17: 3,924 - Client keep-alives
18: 16 - Sender flow control limits applied
19: 0 - Unblocked Publications
20: 0 - Unblocked Control Commands
21: 0 - Possible TTL Asymmetry
22: 0 - ControllableIdleStrategy status
23: 0 - Loss gap fills
24: 1 - snd-channel: aeron:udp?endpoint=224.110.110.111:8090
25: 0 - snd-pos: 1 1279736809 65535 aeron:udp?endpoint=224.110.110.111:8090
26: 0 - snd-lmt: 1 1279736809 65535 aeron:udp?endpoint=224.110.110.111:8090
27: 0 - pub-lmt: 1 1279736809 65535 aeron:udp?endpoint=224.110.110.111:8090
28: 1 - snd-channel: aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10
29: 1,887,433,888 - snd-pos: 5 1279736810 15 aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10
30: 1,887,436,800 - snd-lmt: 5 1279736810 15 aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10
31: 1,887,433,888 - pub-lmt: 5 1279736810 15 aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10

15:12:59 - Aeron Stat (CnC v7), client liveness 5,000,000,000 ns

0: 1,881,377,600 - Bytes sent
1: 8,368,437,024 - Bytes received
2: 0 - Failed offers to ReceiverProxy
3: 0 - Failed offers to SenderProxy
4: 0 - Failed offers to DriverConductorProxy
5: 1,800 - NAKs sent
6: 0 - NAKs received
7: 21,109 - Status Messages sent
8: 4,633 - Status Messages received
9: 6,135 - Heartbeats sent
10: 11,909 - Heartbeats received
11: 0 - Retransmits sent
12: 113 - Flow control under runs
13: 710 - Flow control over runs
14: 0 - Invalid packets
15: 0 - Errors
16: 0 - Short sends
17: 3,930 - Client keep-alives
18: 16 - Sender flow control limits applied
19: 0 - Unblocked Publications
20: 0 - Unblocked Control Commands
21: 0 - Possible TTL Asymmetry
22: 0 - ControllableIdleStrategy status
23: 0 - Loss gap fills
24: 1 - snd-channel: aeron:udp?endpoint=224.110.110.111:8090
25: 0 - snd-pos: 1 1279736809 65535 aeron:udp?endpoint=224.110.110.111:8090
26: 0 - snd-lmt: 1 1279736809 65535 aeron:udp?endpoint=224.110.110.111:8090
27: 0 - pub-lmt: 1 1279736809 65535 aeron:udp?endpoint=224.110.110.111:8090
28: 1 - snd-channel: aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10
29: 1,887,433,888 - snd-pos: 5 1279736810 15 aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10
30: 1,887,436,800 - snd-lmt: 5 1279736810 15 aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10
31: 1,887,433,888 - pub-lmt: 5 1279736810 15 aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10

15:13:00 - Aeron Stat (CnC v7), client liveness 5,000,000,000 ns

0: 1,881,377,920 - Bytes sent
1: 8,368,437,024 - Bytes received
2: 0 - Failed offers to ReceiverProxy
3: 0 - Failed offers to SenderProxy
4: 0 - Failed offers to DriverConductorProxy
5: 1,800 - NAKs sent
6: 0 - NAKs received
7: 21,109 - Status Messages sent
8: 4,633 - Status Messages received
9: 6,145 - Heartbeats sent
10: 11,909 - Heartbeats received
11: 0 - Retransmits sent
12: 113 - Flow control under runs
13: 710 - Flow control over runs
14: 0 - Invalid packets
15: 0 - Errors
16: 0 - Short sends
17: 3,936 - Client keep-alives
18: 16 - Sender flow control limits applied
19: 0 - Unblocked Publications
20: 0 - Unblocked Control Commands
21: 0 - Possible TTL Asymmetry
22: 0 - ControllableIdleStrategy status
23: 0 - Loss gap fills
24: 1 - snd-channel: aeron:udp?endpoint=224.110.110.111:8090
25: 0 - snd-pos: 1 1279736809 65535 aeron:udp?endpoint=224.110.110.111:8090
26: 0 - snd-lmt: 1 1279736809 65535 aeron:udp?endpoint=224.110.110.111:8090
27: 0 - pub-lmt: 1 1279736809 65535 aeron:udp?endpoint=224.110.110.111:8090
28: 1 - snd-channel: aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10
29: 1,887,433,888 - snd-pos: 5 1279736810 15 aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10
30: 1,887,436,800 - snd-lmt: 5 1279736810 15 aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10
31: 1,887,433,888 - pub-lmt: 5 1279736810 15 aeron:udp?endpoint=225.0.1.3:40464|interface=192.168.0.10
^CExiting...

@nitsanw
Copy link
Contributor

nitsanw commented Jul 7, 2017

Hi,
Given the active development stage of the archiver it would be good if you can state the git-hash of the codebase for which you report this bug.
Also please note how you start the archiver.
I'll look into this issue ASAP.
Thanks

@jordanxlj
Copy link
Author

@nitsanw , I use the master trunk version of 2017.6.8. I write a recording control, it read the recording stream and channel list, then call ArchiverConductor's startRecording.

@jordanxlj
Copy link
Author

jordanxlj commented Jul 7, 2017

@nitsanw , I have about 8 streams for recording, the publishers is over several machines.

@nitsanw
Copy link
Contributor

nitsanw commented Jul 7, 2017

@jordanxlj ArchiverConductor is internal API so no direct interaction with it is intended. You should launch a ArchivingMediaDriver/LightweightArchivingMediaDriver and interact via the control stream.
At the moment the recording is intended for local publications only.
Having said that, you may have hit a genuine bug. Can you share your code please?

@nitsanw
Copy link
Contributor

nitsanw commented Jul 7, 2017

Can you update your version of Aeron too? you are a month behind

@jordanxlj
Copy link
Author

I think about the updating, but I also will locate the bug at the mean time. I will collector more debug information of the media driver.

@jordanxlj
Copy link
Author

Is the reason that I had recorded too may data?

@nitsanw
Copy link
Contributor

nitsanw commented Jul 7, 2017

I think you can save yourself some time by updating(to avoid any bugs we may have fixed), and switch to intended usage(to avoid bugs resulting from unplanned interaction)

@jordanxlj
Copy link
Author

192.168.0.11 debug log, I start recording on the host.

192.168.0.11.txt

@jordanxlj
Copy link
Author

192.168.0.10 debug log.
192.168.0.10.txt

@jordanxlj
Copy link
Author

192.168.0.12 debug log.
192.168.0.12.txt

@jordanxlj
Copy link
Author

I use 225.0.1.3 multicast address to send data information, use 224.110.110.111 to send topology information.

@jordanxlj
Copy link
Author

jordanxlj commented Jul 9, 2017

I send various messages, length from 1.3mb to 200Bytes. The frequency of the different messages is different.

@jordanxlj
Copy link
Author

My code is shared below.

aeron-master-07-06.zip

@jordanxlj
Copy link
Author

@mjpt777 , Could you help see this problem? Thanks.

@mjpt777
Copy link
Contributor

mjpt777 commented Jul 9, 2017

@jordanxlj Have you updated to the latest version of Aeron from master?

@jordanxlj
Copy link
Author

No, I haven't update to the latest version of master, I plan to update after the new release version come out. I thought the problem had no relationship with the version. When offer failed, I would tried 5 times to retry. I found when the retring is serious, the problem would occured more possibly.

@jordanxlj
Copy link
Author

I will change to use no reliability way to see if the problem will occured.

@jordanxlj
Copy link
Author

In 192.168.0.11 debug log, at line 52088, check timeout, and close the archiver. I see the CMD_IN_KEEPALIVE_CLIENT is always recorded, how to find the responding Client of the Recorder?

@mjpt777
Copy link
Contributor

mjpt777 commented Jul 10, 2017

This could happen due to a large GC pause whereby your client was not able to perform the duty cycle for over 5 seconds. It could also happen due to resource starvation in the client or the archive conductor thread blocked waiting on some action such as IO.

You could increase the aeron.client.liveness.timeout system property on the driver to see if it no longer happens.

Given that you are using development code in a non-standard way and not keeping up with the latest build we cannot offer support without a support contract.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants