New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JDC panics after receiving RequestTransactionDataError message #709
Comments
JDC do not handle request tx data error we should decide what to do when it happen. This is not straightforward cause a TP that is not able to provide txs data for tx that have been included in the block by the TP should likley be considered unreliable and the connection should be closed. But maybe we prefer to try again? Some discussion is needed here. |
Removed tha bug label as it is a new feature more than a bug |
In the specific case mentioned here, JDC received the RequestTransactionDataError for template 1946. That was the first template created after the new block was found. Right after that, a new |
Templates are forgotten after a new block is found so that's probably what happened. It's on my list to address that. |
Hi @Sjors, I can confirm you that every time we get this error is right after a new block is found and a SetNewPrevHash is received. |
@GitGab19 the lastest https://github.com/Sjors/bitcoin/tree/2024/02/sv2-poll-ellswift will keep the old template around for 10 seconds after a new block arrives. I do still have to implement code that holds on to mempool transactions. |
From dev call:
|
Have you already tested this branch? |
Do we want to go down this path? What do you think about it? |
drop the template and use the new one. I don't see any other possible action |
If we want to manage it in this way, I think there's no need to add any new error codes to specs at all. |
yes we need the new error code, to know that we are in this case. Other cases still need to drop the connection with upstream |
For example? |
We should have at least one for id not found and one for expired id |
Otherwise how we could know if we want to drop connection or just drop the template and wait for new one |
I may misunderstood your opinion because of your suggestion of dropping templates in previous comments. So, I try to recap what we could have:
How does it sound to you? |
seems a good solution |
@Sjors what do you think about it? |
I could add I would prefer only supporting |
@Sjors I understood that you cache templates if yes, would make more sense having the 2 different errors. And drop templates, but if something very bad (wither on the TP or the JDC side is happening) we close the connection |
Ofc if JDC is trying to retreive templates of 1 hour ago this count as something very bad |
If we have one error we implicitly assume that the only case of a old or invalid template id is the one that we are seeing now |
Oh wait, I guess I can return |
yep now i remember during the call this was the idea |
Alright, I'll work on that next week. |
I like the approach, but in this way JDC behaviour (in response to the error code received) still depends on how long the timer for keeping old templates is set. |
Trying to get txs for a very old template it means that there is a big issue I think that is fine to drop the connection |
We can have the max time as a config parameter. Btw i would say anything older then 1 minute |
I agree with you on this. |
Yep for still valid templates we should just answear the req with the tx data for that template. The miner will start mine on the template as soon as is received without waiting for the tx data, that means that there will be valid shares that can be sent to the pool for that template, and if we return an error the job declaration will never complete and the miner will lose some shares. |
We could specify a minimum grace period in the spec. I think 10 seconds should be plenty. Anything longer implies a serious bug? I'd rather not make this configurable, because holding on to old templates is potentially a memory DoS vector. We can't release the full transactions from memory until all templates let go. So imagine a 1 MB inscription with 1000 RBF bumps in a few minutes... Every bump might have enough extra fees to justify a new template (especially once we have cluster mempool and can make templates faster than today). I'd rather have the "freedom" to drop old templates quickly when memory becomes a problem, or keep them longer if memory is fine. And I'd rather not explain all that in the docs for such a setting. |
yep 10 sec is more than enough IMO. I said 1 min just to take a gigantic value. |
My idea would be this:
The Template Provider does not handle shares.
Not sure how to handle this. Does the pool actually have to check the transaction for the share? I assumed it would check the templates immediately, not when it receives a share. As long as the pool does that within 10 seconds there should be no problem. And it it takes longer than that, there's something wrong with the pool? |
Mmm, but (2) is not great for the scenario where a pool wants to check a template and shares are already being submitted. I see two options there:
(2) seems like a waste of bandwidth though, when you really want to prioritize the next block |
As soon as the JDC receive the template (1) it send job downstream, (2) it start the procedure for job declaration. If TP send template A and then template B and between A and B the miner have done some work but not yet declared it we want to complete the RequestTransactionData when A is still valid (block_eight(A) = block_eight(B)). I do not thin that this is an issue for bandwidth use as it is very rare. If A is a stale block, jobs for A IMO are not valid anymore so is fine to just do not complete the job declaration. If instead jobs for A are still valid cause A is not stale eg when B have the same height of A but more fees, IMO we want to
|
Yeah I also see it in the same way as @Fi3.
All other cases are good:
What do you think about this? |
It sounds like the Job Declarator Server should also have a way to tell the client that the block was rejected* because it was stale? Otherwise the client might automatically switch pools?
|
It seems to me there's not (looking at the specs), but actually @Fi3 implemented the pool-switching mechanism, so he surely knows better how it works on SRI |
I added a commit It returns I haven't tested it though. |
This is a very good point. The JDS do not know which prev hash the JDC want to use1, so it should be the pool to say if a block is stale or not. One possible way to handle this case could be for the JDC to compare the last prev hash communicated by the pool with the one of current job and when they change stop sending shares to the pool, the pool should have a threshold in which stale shares are accepted to account for the lag of the system. We need also to define what JDS should do if it receive txs that are in the last block mined and what the JDC should do. There are various possibility:
This issue for sure deserve further discussions. For that i propose to treat it as different issue, and for now proceed with the proposed solution so that we can fix the panic, that is the most urgent thing. Also I feel like that whatever we will decide do not belong to the spec but rather into the guidelines for implementation. Footnotes
|
JDC panics after receiving a RequestTransactionDataError message.
The line of interest is roles/jd-client/src/template_receiver/message_handler.rs:46:9.
For a better context, here's a screenshot of the error, we got it right after the last testnet block mined:
The text was updated successfully, but these errors were encountered: