Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LockedTransfer is not being sent #4383

Closed
offerm opened this issue Jul 12, 2019 · 8 comments · Fixed by #4387
Closed

LockedTransfer is not being sent #4383

offerm opened this issue Jul 12, 2019 · 8 comments · Fixed by #4387
Assignees
Labels
Component / Transport Transport related issues Severity / Medium State / Investigating For issues that are currently being looked into before labeling further

Comments

@offerm
Copy link
Contributor

offerm commented Jul 12, 2019

Have a raiden server with about 24 channels.

I'm trying to make a payment to a peer which is available.

The payment is not arraving.

Actually, I don't even see a send_raw log entry.

Can you please have a look and let me know the reason and best way to get out of it?

 curl -s -X POST http://localhost:5001/api/v1/payments/0x9F50cEA29307d7D91c5176Af42f3aB74f0190dD3/0x283EAD0338f8C202b996146d77a542fDfB60d437 -H 'Content-Type: application/json' --data-raw '{"amount": 3000000000000000000, "identifier": 4444444}'

Thanks
raiden-debug_2019-07-12T13_50_58.575034.log

@offerm
Copy link
Contributor Author

offerm commented Jul 12, 2019

@hackaugusto
Copy link
Contributor

hackaugusto commented Jul 12, 2019

@offerm , the logs are normal for the node x7ed0299fa1ada71d10536b866231d447cdfa48b9 i need logs of the node 0x283EAD0338f8C202b996146d77a542fDfB60d437 (edit: okay, I will have a look at the new logs)

{"node": "7ed0299f", "registry_address": "0x07679899f46422A2a9e5A3502815131Fe6d0dd0a", "token_address": "0x9F50cEA29307d7D91c5176Af42f3aB74f0190dD3", "target_address": "0x283EAD0338f8C202b996146d77a542fDfB60d437", "amount": 3000000000000000000, "payment_identifier": 4444444, "secret": null, "secret_hash": null, "event": "Initiating payment", "logger": "raiden.api.rest", "level": "debug", "timestamp": "2019-07-12 13:53:18.314654"}
{"initiator": "7ed0299f", "target": "283ead03", "token": "9f50cea2", "amount": 3000000000000000000, "identifier": 4444444, "event": "Initiating transfer", "logger": "raiden.api.python", "level": "debug", "timestamp": "2019-07-12 13:53:18.315370"}
{"node": "7ed0299f", "state_change": "{\"transfer\": {\"payment_network_identifier\": \"0x07679899f46422A2a9e5A3502815131Fe6d0dd0a\", \"payment_identifier\": \"4444444\", \"amount\": \"3000000000000000000\", \"allocated_fee\": \"0\", \"token_network_identifier\": \"0x23a7036a025FEF30A38f1f3464FFbe2b02a5a627\", \"initiator\": \"0x7ed0299Fa1ADA71D10536B866231D447cDFa48b9\", \"target\": \"0x283EAD0338f8C202b996146d77a542fDfB60d437\", \"secret\": \"0x6182ff4452cce8984a72281c8347b67e467e65345d6eda5d244330ca91595b97\", \"secrethash\": \"0x8e321e8b1c727fb2052574733b9e15594fa97119e275a17fead9b6a425b3b4ff\", \"_type\": \"raiden.transfer.mediated_transfer.state.TransferDescriptionWithSecretState\", \"_version\": 0}, \"routes\": [{\"node_address\": \"0x283EAD0338f8C202b996146d77a542fDfB60d437\", \"channel_identifier\": \"3\", \"_type\": \"raiden.transfer.state.RouteState\", \"_version\": 0}, {\"node_address\": \"0x00EE7BF6F8aBF6b44A45493f38B6Ae0d831387a9\", \"channel_identifier\": \"13\", \"_type\": \"raiden.transfer.state.RouteState\", \"_version\": 0}, {\"node_address\": \"0x0D875a3D5E434F1821125DEa6CE598B33966DBcc\", \"channel_identifier\": \"17\", \"_type\": \"raiden.transfer.state.RouteState\", \"_version\": 0}, {\"node_address\": \"0x0F79e0104d6e1B5a7041C54c15B2115D228A6966\", \"channel_identifier\": \"14\", \"_type\": \"raiden.transfer.state.RouteState\", \"_version\": 0}, {\"node_address\": \"0x271B02101075806910B23af7281855df9FF0aD5e\", \"channel_identifier\": \"11\", \"_type\": \"raiden.transfer.state.RouteState\", \"_version\": 0}, {\"node_address\": \"0x2B88992DEd5C96aa7Eaa9CFE1AE52350df7dc5DF\", \"channel_identifier\": \"12\", \"_type\": \"raiden.transfer.state.RouteState\", \"_version\": 0}, {\"node_address\": \"0x69eB3CbB7Ee12BaDf7214E0BADcDbB88f0Ec3814\", \"channel_identifier\": \"10\", \"_type\": \"raiden.transfer.state.RouteState\", \"_version\": 0}, {\"node_address\": \"0x72D67154052955959716C0e70f7a64fc08B162Bf\", \"channel_identifier\": \"15\", \"_type\": \"raiden.transfer.state.RouteState\", \"_version\": 0}, {\"node_address\": \"0x843e5974e96452eCEa36381343d18380d2437BDe\", \"channel_identifier\": \"9\", \"_type\": \"raiden.transfer.state.RouteState\", \"_version\": 0}, {\"node_address\": \"0x98258435D51049C0C27acEe527a629Cb21093B0e\", \"channel_identifier\": \"2\", \"_type\": \"raiden.transfer.state.RouteState\", \"_version\": 0}, {\"node_address\": \"0xabB5d243e42b4F79b568534794b135D95301be1B\", \"channel_identifier\": \"6\", \"_type\": \"raiden.transfer.state.RouteState\", \"_version\": 0}, {\"node_address\": \"0xd651e81c9bf3490fE435aBc3Aed58fc85665D58f\", \"channel_identifier\": \"18\", \"_type\": \"raiden.transfer.state.RouteState\", \"_version\": 0}, {\"node_address\": \"0xD9867d04F817Efa7293cfaA93d8A9127421A2903\", \"channel_identifier\": \"8\", \"_type\": \"raiden.transfer.state.RouteState\", \"_version\": 0}, {\"node_address\": \"0xedCE67680a512DFE34c08eBc0cDB836212a923DA\", \"channel_identifier\": \"16\", \"_type\": \"raiden.transfer.state.RouteState\", \"_version\": 0}, {\"node_address\": \"0xEe16b12115542264C745De5feCC7142295b8721d\", \"channel_identifier\": \"4\", \"_type\": \"raiden.transfer.state.RouteState\", \"_version\": 0}, {\"node_address\": \"0xF4C63F471C718C4859143FEE25B4085A930af989\", \"channel_identifier\": \"5\", \"_type\": \"raiden.transfer.state.RouteState\", \"_version\": 0}], \"_type\": \"raiden.transfer.mediated_transfer.state_change.ActionInitInitiator\", \"_version\": 0}", "event": "State change", "logger": "raiden.raiden_service", "level": "debug", "timestamp": "2019-07-12 13:53:18.333513"}
{"node": "7ed0299f", "raiden_events": ["{\"recipient\": \"0x283EAD0338f8C202b996146d77a542fDfB60d437\", \"channel_identifier\": \"3\", \"message_identifier\": \"17062359626986500878\", \"transfer\": {\"payment_identifier\": \"4444444\", \"token\": \"0x9F50cEA29307d7D91c5176Af42f3aB74f0190dD3\", \"balance_proof\": {\"nonce\": 8, \"transferred_amount\": \"18000000000000000000\", \"locked_amount\": \"18000000000000000000\", \"locksroot\": \"0x3a1b6f6f3c60e05e4bb031e254f57eae06b2235686c1016431b4beaea92f9c03\", \"canonical_identifier\": {\"chain_identifier\": \"4321\", \"token_network_address\": \"0x23a7036a025FEF30A38f1f3464FFbe2b02a5a627\", \"channel_identifier\": \"3\"}, \"balance_hash\": \"0x6df38896dafe374f1463689b99c9c6eb754a49a4c2fbdd6207d4d77b07343539\", \"_type\": \"raiden.transfer.state.BalanceProofUnsignedState\", \"_version\": 0}, \"lock\": {\"amount\": 3000000000000000000, \"expiration\": 2152104, \"secrethash\": \"0x8e321e8b1c727fb2052574733b9e15594fa97119e275a17fead9b6a425b3b4ff\", \"_type\": \"raiden.transfer.state.HashTimeLockState\", \"_version\": 0}, \"initiator\": \"0x7ed0299Fa1ADA71D10536B866231D447cDFa48b9\", \"target\": \"0x283EAD0338f8C202b996146d77a542fDfB60d437\", \"_type\": \"raiden.transfer.mediated_transfer.state.LockedTransferUnsignedState\", \"_version\": 0}, \"balance_proof\": {\"nonce\": 8, \"transferred_amount\": \"18000000000000000000\", \"locked_amount\": \"18000000000000000000\", \"locksroot\": \"0x3a1b6f6f3c60e05e4bb031e254f57eae06b2235686c1016431b4beaea92f9c03\", \"canonical_identifier\": {\"chain_identifier\": \"4321\", \"token_network_address\": \"0x23a7036a025FEF30A38f1f3464FFbe2b02a5a627\", \"channel_identifier\": \"3\"}, \"balance_hash\": \"0x6df38896dafe374f1463689b99c9c6eb754a49a4c2fbdd6207d4d77b07343539\", \"_type\": \"raiden.transfer.state.BalanceProofUnsignedState\", \"_version\": 0}, \"_type\": \"raiden.transfer.mediated_transfer.events.SendLockedTransfer\", \"_version\": 0}"], "event": "Raiden events", "logger": "raiden.raiden_service", "level": "debug", "timestamp": "2019-07-12 13:53:18.364702"}
{"current_user": "@0x7ed0299fa1ada71d10536b866231d447cdfa48b9:raidentransport.exchangeunion.com", "node": "7ed0299f", "receiver_address": "283ead03", "message": "<LockedTransfer [chainid:4321 msgid:17062359626986500878 paymentid:4444444 token_network:23a7036a channel_identifier:3 nonce:8 transferred_amount:18000000000000000000 locked_amount:18000000000000000000 locksroot:3a1b6f6f hash:30ea905a secrethash:8e321e8b expiration:2152104 amount:3000000000000000000]>", "queue_identifier": "<QueueIdentifier recipient:283ead03 channel_identifier:3>", "event": "Send async", "logger": "raiden.network.transport.matrix.transport", "level": "debug", "timestamp": "2019-07-12 13:53:18.373422"}

@offerm
Copy link
Contributor Author

offerm commented Jul 12, 2019

from sender:

{
  "current_user": "@0x7ed0299fa1ada71d10536b866231d447cdfa48b9:raidentransport.exchangeunion.com",
  "node": "7ed0299f",
  "receiver_address": "283ead03",
  "message": "<LockedTransfer [chainid:4321 msgid:17062359626986500878 paymentid:4444444 token_network:23a7036a channel_identifier:3 nonce:8 transferred_amount:18000000000000000000 locked_amount:180000000
00000000000 locksroot:3a1b6f6f hash:30ea905a secrethash:8e321e8b expiration:2152104 amount:3000000000000000000]>",
  "queue_identifier": "<QueueIdentifier recipient:283ead03 channel_identifier:3>",
  "event": "Send async",
  "logger": "raiden.network.transport.matrix.transport",
  "level": "debug",
  "timestamp": "2019-07-12 13:53:18.373422"
}
{
  "current_user": "@0x7ed0299fa1ada71d10536b866231d447cdfa48b9:raidentransport.exchangeunion.com",
  "node": "7ed0299f",
  "receiver": "0x283ead0338f8c202b996146d77a542fdfb60d437",
  "event": "Retrying message",
  "logger": "raiden.network.transport.matrix.transport",
  "level": "debug",
  "greenlet_name": "RetryQueue node:7ed0299f recipient:283ead03",
  "timestamp": "2019-07-12 13:53:18.374787"
}
{
  "current_user": "@0x7ed0299fa1ada71d10536b866231d447cdfa48b9:raidentransport.exchangeunion.com",
  "node": "7ed0299f",
  "partner": "283ead03",
  "status": "<AddressReachability.UNREACHABLE: 2>",
  "event": "Partner not reachable. Skipping.",
  "logger": "raiden.network.transport.matrix.transport",
  "level": "debug",
  "greenlet_name": "RetryQueue node:7ed0299f recipient:283ead03",
  "timestamp": "2019-07-12 13:53:18.375298"
}

@offerm
Copy link
Contributor Author

offerm commented Jul 12, 2019

Looks like this is where retry gave up (sender):

{
  "current_user": "@0x7ed0299fa1ada71d10536b866231d447cdfa48b9:raidentransport.exchangeunion.com",
  "address": "0x283ead0338f8c202b996146d77a542fdfb60d437",
  "prev_state": "<AddressReachability.REACHABLE: 1>",
  "state": "<AddressReachability.UNREACHABLE: 2>",
  "event": "Changing address presence state",
  "logger": "raiden.network.transport.matrix.utils",
  "level": "debug",
  "greenlet_name": "GMatrixClient._sync user_id:@0x7ed0299fa1ada71d10536b866231d447cdfa48b9:raidentransport.exchangeunion.com sync_token:s379062_650629_0_1_78_1_1_1560_1",
  "timestamp": "2019-07-12 13:51:39.844547"
}

@ulope
Copy link
Collaborator

ulope commented Jul 12, 2019

@offerm I looked through the logs and think I know what happened.

(I'm calling the node of the first log Node A and the second one Node B, in the processed log file the addresses are replaced with <<AAA>> and <<BBB>>)

Processed log files, interesting records are highlighted with a green border:

Timeline from Node A's perspective:

  • (L 80) 13:51:04.45: Node B changes state to REACHABLE
  • (L 91) 13:51:04.76: Existing Room !pUEaqdaRQsYrwtRCpB:raidentransport.exchangeunion.com is discovered
  • (L 93) 13:51:04.77: Matrix transport is started
  • (L110) 13:51:09.48: Retry SecretRequest to B is actually sent
  • (L169) 13:51:39.84: Node B changes state to UNREACHABLE

Timeline from Node B's perspective:

  • (L 46) 13:50:34.14 Existing Room !pUEaqdaRQsYrwtRCpB:raidentransport.exchangeunion.com is discovered
  • (L 47) 13:50:34.15 Matrix transport is started
  • (L 69) 13:51:03.41 Node A changes state to REACHABLE

Particularly interesting are Node A lines 110 and 169. There are almost exactly 30s between those two events.
30s also happens to be the Matrix long polling timeout and the interval after which the server will set a client to offline if it has seen no activity from a node.

Combined with the fact that there are no Matrix related log entries from Node B after 13:51:03.41 suggests to me that the sync greenlet in Node B crashed for some reason. This should have been logged but we did have some bugs with that in the past (and you're still on a relatively old version).

Is there any output on stdout/stderr for Node B?

@offerm
Copy link
Contributor Author

offerm commented Jul 15, 2019

@ulope I spend some time on this problem over the weekend. I haven't seen any indication of a crash. However, I believe I found the reason for it.

Looks like node B is blocked (forever) at raiden/network/transport/matrix/transport.py:184 on

self.transport._global_send_queue.join()

There is a race with self._prioritize_global_messages = True

This flags forces global messages to have priority (one time, at start) . After sending the queued global messages the flag is set to False.
When the flag is True the self.transport._global_send_queue.join() will be called and if there are tasks in the _global_send_quere it will block until these tasks are task_done().

The scenario is that there are a few global messages (PFS) in the queue when join() is called. When these messages are being sent there is only one call to task_done() (line 570).

The problem is in these lines:

            while self._global_send_queue.qsize() > 0:
                room_name, message = self._global_send_queue.get()
                messages[room_name].append(message)
            for room_name, messages_for_room in messages.items():
                message_text = "\n".join(
                    json.dumps(message.to_dict()) for message in messages_for_room
                )
                _send_global(room_name, message_text)
                self._global_send_queue.task_done()

While there are several messages in _global_send_queue (all for the same room), only one call to task_done() will be made and the join() will not be released. If there is only one message in queue, there will be no problem

I changed the flag to be False and the problem gone.

@rakanalh rakanalh added State / Investigating For issues that are currently being looked into before labeling further Component / Transport Transport related issues Severity / Medium labels Jul 15, 2019
@rakanalh rakanalh added this to Backlog in Raiden Client via automation Jul 15, 2019
@rakanalh rakanalh added this to the Next Release milestone Jul 15, 2019
@rakanalh rakanalh moved this from Backlog to Planned in Raiden Client Jul 15, 2019
ulope added a commit to ulope/raiden that referenced this issue Jul 15, 2019
If multiple global messages were to be sent to a single recipient during
startup the `_global_send_queue.join()` call would block forever since
the number of calls to `task_done()` was unbalanced with the number of
items in the queue.

Thanks @offerm for the bug report and investigation.
Fixes: raiden-network#4383
@ulope
Copy link
Collaborator

ulope commented Jul 15, 2019

@offerm Very nice catch!

#4387 should fix it.

@offerm
Copy link
Contributor Author

offerm commented Jul 15, 2019

I'm sure you have a bug of the month reward :-)

Happy to help.

@rakanalh rakanalh moved this from Planned to In progress in Raiden Client Jul 15, 2019
Raiden Client automation moved this from In progress to Done Jul 15, 2019
ulope added a commit that referenced this issue Jul 15, 2019
If multiple global messages were to be sent to a single recipient during
startup the `_global_send_queue.join()` call would block forever since
the number of calls to `task_done()` was unbalanced with the number of
items in the queue.

Thanks @offerm for the bug report and investigation.
Fixes: #4383
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component / Transport Transport related issues Severity / Medium State / Investigating For issues that are currently being looked into before labeling further
Projects
No open projects
Raiden Client
  
Done
Development

Successfully merging a pull request may close this issue.

4 participants