New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raiden doesn't replay all past state changes before starting the Raiden Service #4498
Comments
As promised will take a first look at this. |
@eorituz I can't say too much from your logs. I would need:
What I can say from the logs you have posted is that:
Have you tried to deploy a new TokenNetwork in that network? With v0.100.5.dev0 new contracts were deployed. |
Thanks for the review @LefterisJP. Answers
I messed up the logs so I created new ones: ProblemThe raiden node itself starts perfectly fine. However the raiden service gets startet before the replay of all known blocks is completed:
The problem now is when I use an echo node (or any other software that starts interacting with raiden as soon as the API/service is available) this leads to unexpected behavior. |
Ah @eorituz was faster. Note the highlighted records:
|
Hey @ulope @eorituz thank you for your additional information. So I found out what is wrong, but according to the code comments and the git commits there has been some kind of code reorganization around initial blockchain event polling while I was gone that I have no idea about so I am pinging the commiter, @hackaugusto So let me say what happens first.
raiden/raiden/raiden_service.py Lines 375 to 378 in 96af590
raiden/raiden/raiden_service.py Lines 1003 to 1024 in 96af590
raiden/raiden/raiden_service.py Lines 752 to 763 in 96af590
raiden/raiden/raiden_service.py Lines 576 to 623 in 96af590
So @hackaugusto two questions here:
From the commit:
What does a single transaction here mean?
Lines 206 to 221 in 40fd0b9
|
If the problem is that the greenlets are not being waited for, then it was introduced by this PR: #2985 .
Some more thought has to go into this. We have to poll for all the filters on the first run, including the filters installed because of |
@LefterisJP On a second look, the greenlets are not related to the problem. The blockchain event callbacks are called here: raiden/raiden/raiden_service.py Lines 595 to 596 in 5eeb82e
and this will be called synchronously: raiden/raiden/blockchain_events_handler.py Lines 22 to 38 in 5eeb82e
IOW, the filter is not installed by the Raiden event handler, but by the Raiden service itself (previously the blockchain event handler). did I miss something from the explanation? |
@hackaugusto I missed that line. Thanks for pointing it out. But still looking into the code, the filter is installed but not queried. Which I guess is where the problem lies. |
Test recreating the problem in raiden-network#4498
Test recreating the problem in raiden-network#4498
Some notes about the event polling: Think of the smart contracts as a collections of trees, where a node is a smart contract, and each deployed smart contract had the deploying as a parent. For the current implementation these trees would have depth 2, where the token network registry is the root, and the token network is the immediate child. Because of the above, we can get away with just polling the events twice. The first run will poll for the events of the known root smart contracts (The token network registries that are configured to be used by the client), this installs the filters for the children, and the second run is for the token networks. The above strategy can be describe with this pseudo code: def first_run(self):
for _ in range(CONTRACTS_DEPTH):
self.poll()
def poll(self):
for stateless_filter in self.filters:
for event in stateless_filter.poll_all_events_until_latest():
process(event) The problem with the above code is that def pool():
for curr_block in range(self.latest_processed_block, self.latest_confirmed_block):
for stateless_filter in self.filters:
for event in stateless_filter.get_events_until(curr_block):
process(event) The above would process one block at the time, as soon as an event for a new subcontract is seen, Internally The above, however, still has one corner case, when a contract is deployed and a transaction is optimistically sent, generating two events for at the same block. To fix this a stack is probably the way to go: def pool():
for curr_block in range(self.latest_processed_block, self.latest_confirmed_block):
pending_filters = list(self.filters)
while pending_filters:
stateless_filter = pending_filters.pop()
for event in stateless_filter.get_events_until(curr_block):
process(event, pending_filters) # If necessary add the new filter to the stack |
Test recreating the problem in #4498
The first run can take a bit of time, depending on the number of token networks registered and when the original registry smart contract was deployed. This does not improve how long it takes to synchronize with these old and full registries, but it does make sure that while doing so, less memory is used, and the work is not lost if the node is restarted. This was achieve by using the same approach from the StatelessFilter, in there the queries are done in batches to avoid timeouts while doing the request, in the first run this can be used to limit the number of state changes that are currently held in memory. In order to make sure the batches of the RaidenService and the underlying filters are alined, the same constant for the batch size was used. This does fix one bug, the issue raiden-network#4498 fixed the problem of not fetching the events from a newly registered token network on the same batch, but only for the initialiation and without taking into account the recoverability. This fixes that bug in a general way, by handling this corner case in the `synchronize_to_confirmed_block_in_batches` method. Once that method returns it is known that all events for all the smart contracts of interest have been handled. And on an event of a crash it is safe to use the block number from the state machine. Note that this does not solve the above problem in general, only for newly registered token networks. The problem in general is a bit harder since potentially there may be many layers of smart contracts, when there is a tree of smart contracts that would have to be recursively followed. The general case would require a form of recursion to handle all cases.
The first run can take a bit of time, depending on the number of token networks registered and when the original registry smart contract was deployed. This does not improve how long it takes to synchronize with these old and full registries, but it does make sure that while doing so, less memory is used, and the work is not lost if the node is restarted. This was achieve by using the same approach from the StatelessFilter, in there the queries are done in batches to avoid timeouts while doing the request, in the first run this can be used to limit the number of state changes that are currently held in memory. In order to make sure the batches of the RaidenService and the underlying filters are alined, the same constant for the batch size was used. This does fix one bug, the issue raiden-network#4498 fixed the problem of not fetching the events from a newly registered token network on the same batch, but only for the initialiation and without taking into account the recoverability. This fixes that bug in a general way, by handling this corner case in the `synchronize_to_confirmed_block_in_batches` method. Once that method returns it is known that all events for all the smart contracts of interest have been handled. And on an event of a crash it is safe to use the block number from the state machine. Note that this does not solve the above problem in general, only for newly registered token networks. The problem in general is a bit harder since potentially there may be many layers of smart contracts, when there is a tree of smart contracts that would have to be recursively followed. The general case would require a form of recursion to handle all cases.
The first run can take a bit of time, depending on the number of token networks registered and when the original registry smart contract was deployed. This does not improve how long it takes to synchronize with these old and full registries, but it does make sure that while doing so, less memory is used, and the work is not lost if the node is restarted. This was achieve by using the same approach from the StatelessFilter, in there the queries are done in batches to avoid timeouts while doing the request, in the first run this can be used to limit the number of state changes that are currently held in memory. In order to make sure the batches of the RaidenService and the underlying filters are alined, the same constant for the batch size was used. This does fix one bug, the issue raiden-network#4498 fixed the problem of not fetching the events from a newly registered token network on the same batch, but only for the initialiation and without taking into account the recoverability. This fixes that bug in a general way, by handling this corner case in the `synchronize_to_confirmed_block_in_batches` method. Once that method returns it is known that all events for all the smart contracts of interest have been handled. And on an event of a crash it is safe to use the block number from the state machine. Note that this does not solve the above problem in general, only for newly registered token networks. The problem in general is a bit harder since potentially there may be many layers of smart contracts, when there is a tree of smart contracts that would have to be recursively followed. The general case would require a form of recursion to handle all cases.
The first run can take a bit of time, depending on the number of token networks registered and when the original registry smart contract was deployed. This does not improve how long it takes to synchronize with these old and full registries, but it does make sure that while doing so, less memory is used, and the work is not lost if the node is restarted. This was achieve by using the same approach from the StatelessFilter, in there the queries are done in batches to avoid timeouts while doing the request, in the first run this can be used to limit the number of state changes that are currently held in memory. In order to make sure the batches of the RaidenService and the underlying filters are alined, the same constant for the batch size was used. This does fix one bug, the issue raiden-network#4498 fixed the problem of not fetching the events from a newly registered token network on the same batch, but only for the initialiation and without taking into account the recoverability. This fixes that bug in a general way, by handling this corner case in the `synchronize_to_confirmed_block_in_batches` method. Once that method returns it is known that all events for all the smart contracts of interest have been handled. And on an event of a crash it is safe to use the block number from the state machine. Note that this does not solve the above problem in general, only for newly registered token networks. The problem in general is a bit harder since potentially there may be many layers of smart contracts, when there is a tree of smart contracts that would have to be recursively followed. The general case would require a form of recursion to handle all cases.
The first run can take a bit of time, depending on the number of token networks registered and when the original registry smart contract was deployed. This does not improve how long it takes to synchronize with these old and full registries, but it does make sure that while doing so, less memory is used, and the work is not lost if the node is restarted. This was achieve by using the same approach from the StatelessFilter, in there the queries are done in batches to avoid timeouts while doing the request, in the first run this can be used to limit the number of state changes that are currently held in memory. In order to make sure the batches of the RaidenService and the underlying filters are alined, the same constant for the batch size was used. This does fix one bug, the issue raiden-network#4498 fixed the problem of not fetching the events from a newly registered token network on the same batch, but only for the initialiation and without taking into account the recoverability. This fixes that bug in a general way, by handling this corner case in the `synchronize_to_confirmed_block_in_batches` method. Once that method returns it is known that all events for all the smart contracts of interest have been handled. And on an event of a crash it is safe to use the block number from the state machine. Note that this does not solve the above problem in general, only for newly registered token networks. The problem in general is a bit harder since potentially there may be many layers of smart contracts, when there is a tree of smart contracts that would have to be recursively followed. The general case would require a form of recursion to handle all cases.
The first run can take a bit of time, depending on the number of token networks registered and when the original registry smart contract was deployed. This does not improve how long it takes to synchronize with these old and full registries, but it does make sure that while doing so, less memory is used, and the work is not lost if the node is restarted. This was achieve by using the same approach from the StatelessFilter, in there the queries are done in batches to avoid timeouts while doing the request, in the first run this can be used to limit the number of state changes that are currently held in memory. In order to make sure the batches of the RaidenService and the underlying filters are alined, the same constant for the batch size was used. This does fix one bug, the issue raiden-network#4498 fixed the problem of not fetching the events from a newly registered token network on the same batch, but only for the initialiation and without taking into account the recoverability. This fixes that bug in a general way, by handling this corner case in the `synchronize_to_confirmed_block_in_batches` method. Once that method returns it is known that all events for all the smart contracts of interest have been handled. And on an event of a crash it is safe to use the block number from the state machine. Note that this does not solve the above problem in general, only for newly registered token networks. The problem in general is a bit harder since potentially there may be many layers of smart contracts, when there is a tree of smart contracts that would have to be recursively followed. The general case would require a form of recursion to handle all cases.
The first run can take a bit of time, depending on the number of token networks registered and when the original registry smart contract was deployed. This does not improve how long it takes to synchronize with these old and full registries, but it does make sure that while doing so, less memory is used, and the work is not lost if the node is restarted. This was achieve by using the same approach from the StatelessFilter, in there the queries are done in batches to avoid timeouts while doing the request, in the first run this can be used to limit the number of state changes that are currently held in memory. In order to make sure the batches of the RaidenService and the underlying filters are alined, the same constant for the batch size was used. This does fix one bug, the issue raiden-network#4498 fixed the problem of not fetching the events from a newly registered token network on the same batch, but only for the initialiation and without taking into account the recoverability. This fixes that bug in a general way, by handling this corner case in the `synchronize_to_confirmed_block_in_batches` method. Once that method returns it is known that all events for all the smart contracts of interest have been handled. And on an event of a crash it is safe to use the block number from the state machine. Note that this does not solve the above problem in general, only for newly registered token networks. The problem in general is a bit harder since potentially there may be many layers of smart contracts, when there is a tree of smart contracts that would have to be recursively followed. The general case would require a form of recursion to handle all cases.
The first run can take a bit of time, depending on the number of token networks registered and when the original registry smart contract was deployed. This does not improve how long it takes to synchronize with these old and full registries, but it does make sure that while doing so, less memory is used, and the work is not lost if the node is restarted. This was achieve by using the same approach from the StatelessFilter, in there the queries are done in batches to avoid timeouts while doing the request, in the first run this can be used to limit the number of state changes that are currently held in memory. In order to make sure the batches of the RaidenService and the underlying filters are alined, the same constant for the batch size was used. This does fix one bug, the issue raiden-network#4498 fixed the problem of not fetching the events from a newly registered token network on the same batch, but only for the initialiation and without taking into account the recoverability. This fixes that bug in a general way, by handling this corner case in the `synchronize_to_confirmed_block_in_batches` method. Once that method returns it is known that all events for all the smart contracts of interest have been handled. And on an event of a crash it is safe to use the block number from the state machine. Note that this does not solve the above problem in general, only for newly registered token networks. The problem in general is a bit harder since potentially there may be many layers of smart contracts, when there is a tree of smart contracts that would have to be recursively followed. The general case would require a form of recursion to handle all cases.
This introduces batch event polling, the goal is to collapse all requests into one, so that instead of having one request per filter, only a single batch request is done for all filters. This is particularly important for our test enviornments, since it is possible for a token network registry to end up with hundreds of registered tokens, were previously that was an equal amount of JSON-RPC requests per block. Additionally, this fixes the bug raiden-network#4498 for the runtime of the node too, and not just for the initialization. This is achieve by only returning a batch of events after all filters have been installed and fetched.
This introduces batch event polling, the goal is to collapse all requests into one, so that instead of having one request per filter, only a single batch request is done for all filters. This is particularly important for our test enviornments, since it is possible for a token network registry to end up with hundreds of registered tokens, were previously that was an equal amount of JSON-RPC requests per block. Additionally, this fixes the bug raiden-network#4498 for the runtime of the node too, and not just for the initialization. This is achieve by only returning a batch of events after all filters have been installed and fetched.
This introduces batch event polling, the goal is to collapse all requests into one, so that instead of having one request per filter, only a single batch request is done for all filters. This is particularly important for our test enviornments, since it is possible for a token network registry to end up with hundreds of registered tokens, were previously that was an equal amount of JSON-RPC requests per block. Additionally, this fixes the bug raiden-network#4498 for the runtime of the node too, and not just for the initialization. This is achieve by only returning a batch of events after all filters have been installed and fetched.
This introduces batch event polling, the goal is to collapse all requests into one, so that instead of having one request per filter, only a single batch request is done for all filters. This is particularly important for our test enviornments, since it is possible for a token network registry to end up with hundreds of registered tokens, were previously that was an equal amount of JSON-RPC requests per block. Additionally, this fixes the bug raiden-network#4498 for the runtime of the node too, and not just for the initialization. This is achieve by only returning a batch of events after all filters have been installed and fetched.
This introduces batch event polling, the goal is to collapse all requests into one, so that instead of having one request per filter, only a single batch request is done for all filters. This is particularly important for our test enviornments, since it is possible for a token network registry to end up with hundreds of registered tokens, were previously that was an equal amount of JSON-RPC requests per block. Additionally, this fixes the bug raiden-network#4498 for the runtime of the node too, and not just for the initialization. This is achieve by only returning a batch of events after all filters have been installed and fetched.
This introduces batch event polling, the goal is to collapse all requests into one, so that instead of having one request per filter, only a single batch request is done for all filters. This is particularly important for our test enviornments, since it is possible for a token network registry to end up with hundreds of registered tokens, were previously that was an equal amount of JSON-RPC requests per block. Additionally, this fixes the bug raiden-network#4498 for the runtime of the node too, and not just for the initialization. This is achieve by only returning a batch of events after all filters have been installed and fetched.
This introduces batch event polling, the goal is to collapse all requests into one, so that instead of having one request per filter, only a single batch request is done for all filters. This is particularly important for our test enviornments, since it is possible for a token network registry to end up with hundreds of registered tokens, were previously that was an equal amount of JSON-RPC requests per block. Additionally, this fixes the bug raiden-network#4498 for the runtime of the node too, and not just for the initialization. This is achieve by only returning a batch of events after all filters have been installed and fetched.
This introduces batch event polling, the goal is to collapse all requests into one, so that instead of having one request per filter, only a single batch request is done for all filters. This is particularly important for our test enviornments, since it is possible for a token network registry to end up with hundreds of registered tokens, were previously that was an equal amount of JSON-RPC requests per block. Additionally, this fixes the bug raiden-network#4498 for the runtime of the node too, and not just for the initialization. This is achieve by only returning a batch of events after all filters have been installed and fetched.
This introduces batch event polling, the goal is to collapse all requests into one, so that instead of having one request per filter, only a single batch request is done for all filters. This is particularly important for our test enviornments, since it is possible for a token network registry to end up with hundreds of registered tokens, were previously that was an equal amount of JSON-RPC requests per block. Additionally, this fixes the bug raiden-network#4498 for the runtime of the node too, and not just for the initialization. This is achieve by only returning a batch of events after all filters have been installed and fetched.
This introduces batch event polling, the goal is to collapse all requests into one, so that instead of having one request per filter, only a single batch request is done for all filters. This is particularly important for our test enviornments, since it is possible for a token network registry to end up with hundreds of registered tokens, were previously that was an equal amount of JSON-RPC requests per block. Additionally, this fixes the bug raiden-network#4498 for the runtime of the node too, and not just for the initialization. This is achieve by only returning a batch of events after all filters have been installed and fetched.
This introduces batch event polling, the goal is to collapse all requests into one, so that instead of having one request per filter, only a single batch request is done for all filters. This is particularly important for our test enviornments, since it is possible for a token network registry to end up with hundreds of registered tokens, were previously that was an equal amount of JSON-RPC requests per block. Additionally, this fixes the bug raiden-network#4498 for the runtime of the node too, and not just for the initialization. This is achieve by only returning a batch of events after all filters have been installed and fetched.
This introduces batch event polling, the goal is to collapse all requests into one, so that instead of having one request per filter, only a single batch request is done for all filters. This is particularly important for our test enviornments, since it is possible for a token network registry to end up with hundreds of registered tokens, were previously that was an equal amount of JSON-RPC requests per block. Additionally, this fixes the bug raiden-network#4498 for the runtime of the node too, and not just for the initialization. This is achieve by only returning a batch of events after all filters have been installed and fetched.
This introduces batch event polling, the goal is to collapse all requests into one, so that instead of having one request per filter, only a single batch request is done for all filters. This is particularly important for our test enviornments, since it is possible for a token network registry to end up with hundreds of registered tokens, were previously that was an equal amount of JSON-RPC requests per block. Additionally, this fixes the bug raiden-network#4498 for the runtime of the node too, and not just for the initialization. This is achieve by only returning a batch of events after all filters have been installed and fetched.
This introduces batch event polling, the goal is to collapse all requests into one, so that instead of having one request per filter, only a single batch request is done for all filters. This is particularly important for our test enviornments, since it is possible for a token network registry to end up with hundreds of registered tokens, were previously that was an equal amount of JSON-RPC requests per block. Additionally, this fixes the bug raiden-network#4498 for the runtime of the node too, and not just for the initialization. This is achieve by only returning a batch of events after all filters have been installed and fetched.
This introduces batch event polling, the goal is to collapse all requests into one, so that instead of having one request per filter, only a single batch request is done for all filters. This is particularly important for our test enviornments, since it is possible for a token network registry to end up with hundreds of registered tokens, were previously that was an equal amount of JSON-RPC requests per block. Additionally, this fixes the bug raiden-network#4498 for the runtime of the node too, and not just for the initialization. This is achieve by only returning a batch of events after all filters have been installed and fetched.
This introduces batch event polling, the goal is to collapse all requests into one, so that instead of having one request per filter, only a single batch request is done for all filters. This is particularly important for our test enviornments, since it is possible for a token network registry to end up with hundreds of registered tokens, were previously that was an equal amount of JSON-RPC requests per block. Additionally, this fixes the bug raiden-network#4498 for the runtime of the node too, and not just for the initialization. This is achieve by only returning a batch of events after all filters have been installed and fetched.
This introduces batch event polling, the goal is to collapse all requests into one, so that instead of having one request per filter, only a single batch request is done for all filters. This is particularly important for our test enviornments, since it is possible for a token network registry to end up with hundreds of registered tokens, were previously that was an equal amount of JSON-RPC requests per block. Additionally, this fixes the bug raiden-network#4498 for the runtime of the node too, and not just for the initialization. This is achieve by only returning a batch of events after all filters have been installed and fetched.
This introduces batch event polling, the goal is to collapse all requests into one, so that instead of having one request per filter, only a single batch request is done for all filters. This is particularly important for our test enviornments, since it is possible for a token network registry to end up with hundreds of registered tokens, were previously that was an equal amount of JSON-RPC requests per block. Additionally, this fixes the bug #4498 for the runtime of the node too, and not just for the initialization. This is achieve by only returning a batch of events after all filters have been installed and fetched.
This introduces batch event polling, the goal is to collapse all requests into one, so that instead of having one request per filter, only a single batch request is done for all filters. This is particularly important for our test enviornments, since it is possible for a token network registry to end up with hundreds of registered tokens, were previously that was an equal amount of JSON-RPC requests per block. Additionally, this fixes the bug raiden-network#4498 for the runtime of the node too, and not just for the initialization. This is achieve by only returning a batch of events after all filters have been installed and fetched.
This introduces batch event polling, the goal is to collapse all requests into one, so that instead of having one request per filter, only a single batch request is done for all filters. This is particularly important for our test enviornments, since it is possible for a token network registry to end up with hundreds of registered tokens, were previously that was an equal amount of JSON-RPC requests per block. Additionally, this fixes the bug #4498 for the runtime of the node too, and not just for the initialization. This is achieve by only returning a batch of events after all filters have been installed and fetched.
Problem Definition
Raiden client doesn't replay all state changes before starting the Raiden Service.
(In my case that led to a crash of the echonode).
raiden.log
2019-07-31 15:44:39.172715 [debug ] Raiden Service started [raiden.raiden_service] node=0xdD84b4E3B4Fb3b7AD427Fa0A0FF2009ca4B363da
Happens way before replaying state changes from the past
2019-07-31 15:44:57.085114 [debug ] State changes [raiden.raiden_service] greenlet_name=AlarmTask._run node:0xdD84b4E3B4Fb3b7AD427Fa0A0FF2009ca4B363da node=0xdD84b4E3B4Fb3b7AD427Fa0A0FF2009ca4B363da state_changes=['{"block_number": "4830525", "gas_limit": "7073246", ...
Expectation
I'd expect the raiden client to replay all state changes from the past before starting the raiden service.
Reproduce
Starting Raiden with new datadir on any? (tested with goerli and rinkeby) network.
System Description
Raiden version
v0.100.5.dev0
.Used bbot internal ethnodes.
The text was updated successfully, but these errors were encountered: