Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raiden 0.100.5a1.dev161+gf3180af42 not working with geth 1.9.8/1.9.9 on mainnet #5557

Open
kilrau opened this issue Jan 3, 2020 · 16 comments
Open

Comments

@kilrau
Copy link

@kilrau kilrau commented Jan 3, 2020

We tried this on several different setups, some of them freshly synced geth/raiden instances to prevent data corruption being the reason.

We found that raiden is not listeing on 5001.

kilrau@K-Yoga:~/.xud-docker/mainnet$ docker exec -it mainnet_raiden_1 bash
bash-5.0# netstat -ant | grep LISTEN
tcp        0      0 127.0.0.11:42425        0.0.0.0:*               LISTEN 

Geth is definitely fully synced and all states importes and it looks like Raiden is running fine for some seconds, then crashes and gets restarted by supervisord. This in infinity loop (probably the reason why raiden never starts listening on 5001)

Raiden is running in production mode
Checking if the ethereum node is synchronized

You are connected to the 'mainnet' network and the DB path is: /root/.raiden/node_202b9ab1/netid_1/network_d32f5e0f/v23_log.db
Default fee settings are used. If you want use Raiden with mediation fees - flat, proportional and imbalance fees - see https://raiden-network.readthedocs.io/en/latest/overview_and_guide.html#firing-it-up
2019-12-19 14:49:04,542 INFO exited: raiden (exit status 1; not expected)
2019-12-19 14:49:05,546 INFO spawned: 'raiden' with pid 392
2019-12-19 14:49:06,548 INFO success: raiden entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Welcome to Raiden, version 0.100.5a1.dev161+gf3180af42!

Note: Checking if the ethereum node is synchronized the empty line below did show sync progress / target when geth was still syncing. When geth is fully synced, it's just empty as above. We are using our own mainnet contracts.

@LefterisJP

This comment has been minimized.

Copy link
Collaborator

@LefterisJP LefterisJP commented Jan 3, 2020

Hey @kilrau thanks for the issue and happy new year!

We already have an issue for this here: #5386

@kilrau

This comment has been minimized.

Copy link
Author

@kilrau kilrau commented Jan 3, 2020

https://github.com/raiden-network/raiden/pull/5382/files (fix for #5386) is just bumping the geth version number, trying this on our branch now too. Keep you posted.

@kilrau

This comment was marked as resolved.

Copy link
Author

@kilrau kilrau commented Jan 3, 2020

Update: did the version bump to 1.9.9. Now, raiden is complaining about a not reachable rpc port:

The underlying ethereum node does not have the web3 rpc interface enabled. Please run it with --rpcapi eth,net,web3,txpool for geth and --jsonrpc-apis=eth,net,web3,parity for parity.
2020-01-03 12:23:40,987 INFO exited: raiden (exit status 1; not expected)

whereas geth is reachable (output below from raiden container)

bash-5.0# curl -X POST -H 'Content-Type: application/json' --data '{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}' http://geth:8545/
{"jsonrpc":"2.0","id":1,"result":false}

and the required rpc options are set too (output below from geth container)

bash-5.0# ps | cat
PID   USER     TIME  COMMAND
    1 root     20:46 geth --syncmode fast --rpc --rpcaddr 0.0.0.0 --rpcapi eth,net,web3,txpool,personal,admin --rpcvhosts=* --cache=1024 --nousb --datadir.ancient=/root/.ethereum/chaindata
  160 root      0:01 geth attach
  583 root      0:00 bash
  589 root      0:01 geth attach
  615 root      0:00 ps
  616 root      0:00 cat
@kilrau

This comment was marked as resolved.

Copy link
Author

@kilrau kilrau commented Jan 3, 2020

from the raiden container (output below) geth is reachable and shows all required rpc options:

bash-5.0# curl -X POST -H 'Content-Type: application/json' --data '{"jsonrpc":"2.0","method":"rpc_modules","params":[],"id":1}' http://geth:8545/
{"jsonrpc":"2.0","id":1,"result":{"admin":"1.0","eth":"1.0","net":"1.0","personal":"1.0","rpc":"1.0","txpool":"1.0","web3":"1.0"}}

this doesn't make any sense to me..

@kilrau

This comment was marked as resolved.

Copy link
Author

@kilrau kilrau commented Jan 3, 2020

Tried 0.200.0rc2. Exactly the same behaviour.

@Dominik2002 Dominik2002 moved this from Backlog to Sprint Backlog (Team does that) in Raiden Berlin Week 7 - Sync with Matrix in <20s Jan 6, 2020
@kilrau

This comment was marked as resolved.

Copy link
Author

@kilrau kilrau commented Jan 6, 2020

Scrape the RPC error, that was an environment issue. Back to the behavior in the original post.

@LefterisJP

This comment has been minimized.

Copy link
Collaborator

@LefterisJP LefterisJP commented Jan 6, 2020

@kilrau Can you provide the exact command that runs Raiden with all the arguments?

Also can you let us know if this crash happens when you use an account that already had data in its DB only? What happens if you try to use a completely new account? Does the crash also happen?

@Dominik2002 Dominik2002 moved this from Sprint Backlog (Team does that) to In progress in Raiden Berlin Week 7 - Sync with Matrix in <20s Jan 6, 2020
@Dominik2002 Dominik2002 added the [SP 5] label Jan 6, 2020
@Dominik2002 Dominik2002 moved this from In progress to Sprint Backlog (Team does that) in Raiden Berlin Week 7 - Sync with Matrix in <20s Jan 6, 2020
@kilrau

This comment has been minimized.

Copy link
Author

@kilrau kilrau commented Jan 6, 2020

Can you provide the exact command that runs Raiden with all the arguments?
It's a docker setup that sets the variables:

  --rpc \
  --accept-disclaimer \
  --no-sync-check \
  --address $addr \
  --keystore-path $KEYSTORE_PATH \
  --resolver-endpoint $RESOLVER_ENDPOINT \
  --eth-rpc-endpoint $ETH_RPC_ENDPOINT \
  --network-id $NETWORK_ID \
  --password-file $PASSWORD_FILE \
  --datadir $DATA_DIR \
  --api-address $API_ADDRESS \
  --environment-type $ENVIRONMENT_TYPE \
  --tokennetwork-registry-contract-address $TOKENNETWORK_REGISTRY_CONTRACT \
  --secret-registry-contract-address $SECRET_REGISTRY_CONTRACT \
  --service-registry-contract-address $SERVICE_REGISTRY_CONTRACT \
  --one-to-n-contract-address $ONE_TO_N_CONTRACT \
  --monitoring-service-contract-address $MONITORING_SERVICE_CONTRACT \
  --gas-price $GAS_PRICE \
  --matrix-server $MATRIX_SERVER \
  --routing-mode $ROUTING_MODE

Also can you let us know if this crash happens when you use an account that already had data in its DB only? What happens if you try to use a completely new account? Does the crash also happen?

Also happens with a "fresh" raiden.

One important update though: we just managed to get our raiden 0.100.5a1.dev161+gf3180af42 working with a freshly synced geth 1.9.9 . On an environment with a 3-month old geth it's crashing as described above. That would imply that the recent hardfork caused issues. We'll verify this by resyncing geth in the environment where it's currently crashing and let you know. Any other ideas in the meanwhile are welcome.

@LefterisJP

This comment has been minimized.

Copy link
Collaborator

@LefterisJP LefterisJP commented Jan 6, 2020

One important update though: we just managed to get our raiden 0.100.5a1.dev161+gf3180af42 working with a freshly synced geth 1.9.9 . On an environment with a 3-month old geth it's crashing as described above. That would imply that the recent hardfork caused issues. We'll verify this by resyncing geth in the environment where it's currently crashing and let you know.

That's interesting. Please let us know.

So from my side the update is that since our mainnet geth node is currently resyncing from scratch I tried it on Goerli with geth v1.9.9.

I could not reproduce any crash with a fresh raiden node and latest develop.

Which raiden version is this? 0.100.5a1.dev161+gf3180af42?
It's not in our nightlies: https://raiden-nightlies.ams3.digitaloceanspaces.com/index.html?prefix=NIGHTLY/

Judging by the commit hash in the name I would say it's this commit: f3180af

Which is not anywhere in our develop, but it's something you guys did on your own? Funny how Github shows it though ...
And it's from Oct 2 2019?

I tried to run with a raiden node on a commit close to that and still could not reproduce a crash. Is there no debug log of Raiden you can provide?

@hackaugusto hackaugusto moved this from Sprint Backlog (Team does that) to In progress in Raiden Berlin Week 7 - Sync with Matrix in <20s Jan 7, 2020
@hackaugusto hackaugusto self-assigned this Jan 7, 2020
@hackaugusto

This comment has been minimized.

Copy link
Collaborator

@hackaugusto hackaugusto commented Jan 7, 2020

Scrape the RPC error, that was an environment issue. Back to the behavior in the original post.

@kilrau What could we do to make it understanding this error a bit easier?

@Dominik2002 Dominik2002 moved this from In progress to Sprint Backlog (Team does that) in Raiden Berlin Week 7 - Sync with Matrix in <20s Jan 7, 2020
@erkarl

This comment has been minimized.

Copy link

@erkarl erkarl commented Jan 7, 2020

Judging by the commit hash in the name I would say it's this commit: f3180af

That's correct. That has been our "stable" version that works with our deployed contracts. We also tried increasing the HIGHEST_SUPPORTED_GETH_VERSION in ExchangeUnion@87ebbf8

I've got two 1.9.9 geth freshly synced mainnet nodes and am able to reproduce the problem on 1 of them.

Stdout:

Welcome to Raiden, version 0.100.5a1.dev162+g87ebbf856!
----------------------------------------------------------------------
| This is an Alpha version of experimental open source software      |
| released as a test version under an MIT license and may contain    |
| errors and/or bugs. No guarantee or representations whatsoever is  |
| made regarding its suitability (or its use) for any purpose or     |
| regarding its compliance with any applicable laws and regulations. |
| Use of the software is at your own risk and discretion and by      |
| using the software you acknowledge that you have read this         |
| disclaimer, understand its contents, assume all risk related       |
| thereto and hereby release, waive, discharge and covenant not to   |
| sue Brainbot Labs Establishment or any officers, employees or      |
| affiliates from and for any direct or indirect liability resulting |
| from the use of the software as permissible by applicable laws and |
| regulations.                                                       |
|                                                                    |
| Privacy Warning: Please be aware, that by using the Raiden Client, |
| among others, your Ethereum address, channels, channel deposits,   |
| settlements and the Ethereum address of your channel counterparty  |
| will be stored on the Ethereum chain, i.e. on servers of Ethereum  |
| node operators and ergo are to a certain extent publicly available.|
| The same might also be stored on systems of parties running Raiden |
| nodes connected to the same token network. Data present in the     |
| Ethereum chain is very unlikely to be able to be changed, removed  |
| or deleted from the public arena.                                  |
|                                                                    |
| Also be aware, that data on individual Raiden token transfers will |
| be made available via the Matrix protocol to the recipient,        |
| intermediating nodes of a specific transfer as well as to the      |
| Matrix server operators.                                           |
----------------------------------------------------------------------
Raiden is running in production mode
Checking if the ethereum node is synchronized

You are connected to the 'mainnet' network and the DB path is: /root/.raiden/node_c76b574a/netid_1/network_d32f5e0f/v23_log.db
Default fee settings are used. If you want use Raiden with mediation fees - flat, proportional and imbalance fees - see https://raiden-network.readthedocs.io/en/latest/overview_and_guide.html#firing-it-up
2020-01-07 19:40:39,686 INFO exited: raiden (exit status 1; not expected)

Stderr and raiden-debug*.log:
raiden-debug_2020-01-07T19:30:32.077264.log
raiden-debug_2020-01-07T19:38:10.885991.log
raiden-debug_2020-01-07T19:39:27.230192.log
raiden-debug_2020-01-07T19:40:42.249007.log
raiden-debug_2020-01-07T19:41:57.031366.log
raiden-debug_2020-01-07T19:43:12.340486.log
raiden-debug_2020-01-07T19:44:27.159227.log
raiden-stderr---supervisor-HtFhik.log

Can you provide the exact command that runs Raiden with all the arguments?

bash-5.0# ps -f | more
PID   USER     TIME  COMMAND
    1 root      0:00 {supervisord} /usr/bin/python2 /usr/bin/supervisord -c /etc/supervisor/conf.d/supervisord.conf
   83 root      0:00 bash
  259 root      0:02 python -m raiden --rpc --accept-disclaimer --resolver-endpoint http://xud:8887/resolveraiden --eth-rpc-endpoint http://geth:8545 --password-file /root/.raiden/passphrase.txt --datadir /root/.raiden --api-address 0.0.0.0:5001 --matrix-server https://raidentransport.exchangeunion.com --address 0xRemovedIntentionally --keystore-path /root/.raiden/keystore --network-id mainnet --routing-mode private --environment-type production --tokennetwork-registry-contract-address 0xd32F5E0fF172d41a20b32B6DAb17948B257aa371 --secret-registry-contract-address 0x322681a720690F174a4071DBEdB51D86E7B9FF84 --service-registry-contract-address 0x281937D366C7bCE202481c45d613F67500b93E69 --user-deposit-contract-address 0x4F26957E8fd331D53DD60feE77533FBE7564F5Fe --monitoring-service-contract-address 0x37cC37D7703554--Mo

Geth's runtime arguments:

bash-5.0# ps -f | more
PID   USER     TIME  COMMAND
    1 root     37:36 geth --syncmode fast --rpc --rpcaddr 0.0.0.0 --rpcapi eth,net,web3,txpool,personal,admin --rpcvhosts=* --cache=1024 --nousb --datadir.ancient=/root/.ethereum/chaindata
@LefterisJP

This comment has been minimized.

Copy link
Collaborator

@LefterisJP LefterisJP commented Jan 7, 2020

Hey @erkarl thank you for the response and the logs.

Looking at the log raiden-stderr---supervisor-HtFhik.log it seems that Raiden times out at the start when trying to query the events to sync with the blockchain. This used to happen also in the past but we addressed it by limiting the blocks that can be queried in a single query and since then it has not happened again.

Question: Do other versions of geth work fine for you? Did it specifically start with geth v1.9.9 version only?

Also please note that since the Raiden version you are using we have considerably changed the part of the code that is querying the blockchain for events and included many optimizations so the reason for the timeout (if it's on the raiden part) may be fixed in latest develop.

Someone from our team will have a look but it may be sometime since our mainnet geth is resyncing.

@erkarl

This comment has been minimized.

Copy link

@erkarl erkarl commented Jan 8, 2020

Hey @LefterisJP thanks for quick response and looking for into this.

This used to happen also in the past but we addressed it by limiting the blocks that can be queried in a single query and since then it has not happened again.

I also came across that issue, but since it was from Feb 2019 I didn't think much of it.

Question: Do other versions of geth work fine for you? Did it specifically start with geth v1.9.9 version only?

1.9.8 was the same.

Also please note that since the Raiden version you are using we have considerably changed the part of the code that is querying the blockchain for events and included many optimizations so the reason for the timeout (if it's on the raiden part) may be fixed in latest develop.

I also tested this with your latest develop commit c99ce63e7bf0d4ed08e5d21f914e2f49e14dfd42 - getting a similar output.

raiden.conf:

keystore-path = "/home/ar/raiden-mainnet/keystore"
datadir = "/home/ar/raiden-mainnet"
network-id = "mainnet"
environment-type = "production"
routing-mode = "private"
api-address = "0.0.0.0:5001"
rpc = true
eth-rpc-endpoint = "localhost:8545"
log-json = true
[log-config]
"" = "debug"

stdout.log
raiden-exception-2020-01-08T08-42xxe__uru.txt

Thanks again for looking into this. I understand the raiden version we're using is quite old. Ideally, we'd love to switch to one of your upcoming release versions. Until that happens I'd like to cherry-pick a fix for this issue to our "stable" version.

@kilrau

This comment has been minimized.

Copy link
Author

@kilrau kilrau commented Jan 8, 2020

PS: all our Raiden setups which are crashing as described above work perfectly fine when connecting Raiden to infura instead of a local geth.

@erkarl

This comment has been minimized.

Copy link

@erkarl erkarl commented Jan 8, 2020

I'm also experiencing this in geth logs:

WARN [01-08|09:19:46.907] Served eth_getLogs                       conn=10.0.3.5:43284 reqid=114 t=10.045262131s err="context canceled"

Do you think it's related to: ethereum/go-ethereum#20426?

Looking at raiden's stdout it seems to crash after making eth_getLogs request to geth. What is interesting that I have another instance of geth/raiden running with same runtime arguments (and versions) and I'm not experiencing crashes.

@ulope ulope mentioned this issue Jan 9, 2020
0 of 13 tasks complete
@Dominik1999 Dominik1999 moved this from Sprint Backlog (Team does that) to Backlog in Raiden Berlin Week 7 - Sync with Matrix in <20s Jan 9, 2020
@Dominik1999 Dominik1999 moved this from Backlog to Sprint Backlog (Team does that) in Raiden Berlin Week 7 - Sync with Matrix in <20s Jan 13, 2020
@kilrau

This comment has been minimized.

Copy link
Author

@kilrau kilrau commented Jan 13, 2020

Any updates?

@Dominik1999 Dominik1999 moved this from Sprint Backlog (Team does that) to Backlog in Raiden Berlin Week 7 - Sync with Matrix in <20s Jan 16, 2020
@Dominik1999 Dominik1999 moved this from Backlog to Sprint Backlog (Team does that) in Raiden Berlin Week 7 - Sync with Matrix in <20s Jan 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Raiden Berlin Week 7 - Sync with Matr...
  
Sprint Backlog (Team does that)
6 participants
You can’t perform that action at this time.