Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDS mempool management enhancement #772

Merged
merged 38 commits into from
Mar 21, 2024

Conversation

GitGab19
Copy link
Collaborator

This PR is an enhancement of the mempool instance inside Job Declarator Server.
As discussed during last dev call (here's dev notes), now JDS mempool is updated every X seconds (set in config file) by retrieving all data for every tx contained in the node's mempool. While on the testnet this is not a big issues (since there are few txs), on the mainnet this method is not sustainable at all.
As pointed out during mainnet testing, this operation is too slow, and so after a DeclareMiningJob is received by JDC, JDS answers with a ProvideMissingTransactions message (asking basically all the txs proposed by JDC, because its mempool is not updated yet). The answers by JDC is huge, so lot of time is taken to process all data, leading to delays, which lead to crashes (this is exactly a reference to issue #740)

My proposal in this PR is a big change in how JDS manages its local mempool:

  1. every X seconds (set in config) JDS updates it local mempool by inserting ONLY current tx_ids into a hashtable
  2. as soon as a DeclareMiningJob is received, JDS retrieves txs data (using RPC call getrawtransaction) from Bitcoin node, filing them into the same hashtable
  3. if some txs contained in DeclareMiningJob are not present into its local mempool (hashtable), JDS asks for them with ProvideMissingTransactions message (as usual)
  4. JDC sends txs data trough the answer (ProvideMissingTransactions.Success) and JDS adds those txs data to its local mempool
  5. txs which are mined in the meantime are automatically deleted from JDS mempool since the update is done by calling RPC getrawmempool so there's no risk that hashmap blows up

Already tested on both testnet and mainnet and it seems to work very well.
FIx #740

@GitGab19 GitGab19 changed the title Jds mempool management enhancement JDS mempool management enhancement Feb 23, 2024
@pavlenex pavlenex requested review from Sjors and Fi3 February 23, 2024 11:13
Copy link
Contributor

github-actions bot commented Feb 23, 2024

🐰Bencher

ReportThu, March 21, 2024 at 20:56:43 UTC
ProjectStratum v2 (SRI)
Branchjds-mempool-enhancement
Testbedsv1
BenchmarkEstimated CyclesEstimated Cycles Results
estimated cycles | (Δ%)
Estimated Cycles Upper Boundary
estimated cycles | (%)
InstructionsInstructions Results
instructions | (Δ%)
Instructions Upper Boundary
instructions | (%)
L1 AccessesL1 Accesses Results
accesses | (Δ%)
L1 Accesses Upper Boundary
accesses | (%)
L2 AccessesL2 Accesses Results
accesses | (Δ%)
L2 Accesses Upper Boundary
accesses | (%)
RAM AccessesRAM Accesses Results
accesses | (Δ%)
RAM Accesses Upper Boundary
accesses | (%)
get_authorize✅ (view plot)8440.000 (-0.40%)8542.976 (98.79%)✅ (view plot)3746.000 (-0.22%)3780.434 (99.09%)✅ (view plot)5250.000 (-0.27%)5309.947 (98.87%)✅ (view plot)8.000 (+9.86%)9.639 (83.00%)✅ (view plot)90.000 (-0.74%)92.729 (97.06%)
get_submit✅ (view plot)95541.000 (-0.04%)95670.882 (99.86%)✅ (view plot)59439.000 (-0.04%)59543.638 (99.82%)✅ (view plot)85351.000 (-0.05%)85536.987 (99.78%)✅ (view plot)57.000 (+6.26%)58.120 (98.07%)✅ (view plot)283.000 (-0.08%)286.881 (98.65%)
get_subscribe✅ (view plot)8011.000 (-0.08%)8101.227 (98.89%)✅ (view plot)2841.000 (-0.11%)2853.213 (99.57%)✅ (view plot)3966.000 (-0.20%)3989.943 (99.40%)✅ (view plot)18.000 (+22.73%)18.151 (99.17%)✅ (view plot)113.000 (-0.38%)115.975 (97.43%)
serialize_authorize✅ (view plot)12159.000 (-0.74%)12407.083 (98.00%)✅ (view plot)5317.000 (-0.16%)5351.434 (99.36%)✅ (view plot)7414.000 (-0.17%)7468.034 (99.28%)✅ (view plot)11.000 (+3.13%)12.838 (85.68%)✅ (view plot)134.000 (-1.67%)139.916 (95.77%)
serialize_deserialize_authorize✅ (view plot)24476.000 (-0.19%)24607.424 (99.47%)✅ (view plot)9898.000 (-0.16%)9964.345 (99.33%)✅ (view plot)13956.000 (-0.20%)14073.547 (99.16%)✅ (view plot)39.000 (+5.26%)40.421 (96.49%)✅ (view plot)295.000 (-0.27%)297.732 (99.08%)
serialize_deserialize_handle_authorize✅ (view plot)30165.000 (-0.12%)30291.243 (99.58%)✅ (view plot)12101.000 (-0.07%)12135.434 (99.72%)✅ (view plot)17120.000 (-0.09%)17181.284 (99.64%)✅ (view plot)61.000 (+2.19%)64.439 (94.66%)✅ (view plot)364.000 (-0.22%)368.388 (98.81%)
serialize_deserialize_handle_submit✅ (view plot)126434.000 (-0.01%)126526.278 (99.93%)✅ (view plot)73224.000 (-0.03%)73326.841 (99.86%)✅ (view plot)104939.000 (-0.04%)105124.473 (99.82%)✅ (view plot)127.000 (+5.65%)128.836 (98.57%)✅ (view plot)596.000 (-0.03%)599.681 (99.39%)
serialize_deserialize_handle_subscribe✅ (view plot)27519.000 (+0.04%)27613.563 (99.66%)✅ (view plot)9643.000 (-0.03%)9655.213 (99.87%)✅ (view plot)13634.000 (-0.05%)13657.360 (99.83%)✅ (view plot)68.000 (+2.04%)71.905 (94.57%)✅ (view plot)387.000 (+0.08%)390.428 (99.12%)
serialize_deserialize_submit✅ (view plot)115023.000 (-0.05%)115265.888 (99.79%)✅ (view plot)68001.000 (-0.07%)68206.682 (99.70%)✅ (view plot)97553.000 (-0.09%)97911.247 (99.63%)✅ (view plot)71.000 (+5.37%)71.788 (98.90%)✅ (view plot)489.000 (+0.03%)491.808 (99.43%)
serialize_deserialize_subscribe✅ (view plot)22908.000 (-0.09%)22985.597 (99.66%)✅ (view plot)8195.000 (-0.06%)8215.047 (99.76%)✅ (view plot)11538.000 (-0.09%)11576.559 (99.67%)✅ (view plot)41.000 (+5.13%)43.466 (94.33%)✅ (view plot)319.000 (-0.18%)321.596 (99.19%)
serialize_submit✅ (view plot)99824.000 (-0.10%)100098.886 (99.73%)✅ (view plot)61483.000 (-0.04%)61587.638 (99.83%)✅ (view plot)88199.000 (-0.05%)88377.852 (99.80%)✅ (view plot)57.000 (+4.17%)57.645 (98.88%)✅ (view plot)324.000 (-0.57%)328.950 (98.50%)
serialize_subscribe✅ (view plot)11306.000 (-0.56%)11517.467 (98.16%)✅ (view plot)4188.000 (-0.07%)4200.213 (99.71%)✅ (view plot)5826.000 (-0.10%)5844.492 (99.68%)✅ (view plot)18.000 (+17.59%)18.397 (97.84%)✅ (view plot)154.000 (-1.31%)160.136 (96.17%)

Bencher - Continuous Benchmarking
View Public Perf Page
Docs | Repo | Chat | Help

Copy link
Contributor

github-actions bot commented Feb 23, 2024

🐰Bencher

ReportThu, March 21, 2024 at 20:56:42 UTC
ProjectStratum v2 (SRI)
Branch772/merge
Testbedsv1
BenchmarkLatencyLatency Results
nanoseconds (ns) | (Δ%)
Latency Upper Boundary
nanoseconds (ns) | (%)
client-submit-serialize✅ (view plot)6973.600 (+1.00%)7157.748 (97.43%)
client-submit-serialize-deserialize✅ (view plot)7829.800 (-0.37%)8154.478 (96.02%)
client-submit-serialize-deserialize-handle/client-submit-serialize-deserialize-handle✅ (view plot)8546.300 (+1.76%)8670.552 (98.57%)
client-sv1-authorize-serialize-deserialize-handle/client-sv1-authorize-serialize-deserialize-handle✅ (view plot)904.220 (+0.50%)931.249 (97.10%)
client-sv1-authorize-serialize-deserialize/client-sv1-authorize-serialize-deserialize✅ (view plot)693.130 (-0.84%)724.984 (95.61%)
client-sv1-authorize-serialize/client-sv1-authorize-serialize✅ (view plot)250.060 (+0.43%)258.124 (96.88%)
client-sv1-get-authorize/client-sv1-get-authorize✅ (view plot)155.900 (-1.07%)162.768 (95.78%)
client-sv1-get-submit✅ (view plot)6760.600 (+1.07%)6939.504 (97.42%)
client-sv1-get-subscribe/client-sv1-get-subscribe✅ (view plot)284.770 (+1.51%)294.747 (96.61%)
client-sv1-subscribe-serialize-deserialize-handle/client-sv1-subscribe-serialize-deserialize-handle✅ (view plot)761.090 (+1.37%)771.808 (98.61%)
client-sv1-subscribe-serialize-deserialize/client-sv1-subscribe-serialize-deserialize✅ (view plot)609.430 (-1.30%)648.095 (94.03%)
client-sv1-subscribe-serialize/client-sv1-subscribe-serialize✅ (view plot)205.930 (-1.24%)224.739 (91.63%)

Bencher - Continuous Benchmarking
View Public Perf Page
Docs | Repo | Chat | Help

Copy link
Contributor

github-actions bot commented Feb 23, 2024

🐰Bencher

ReportThu, March 21, 2024 at 20:56:39 UTC
ProjectStratum v2 (SRI)
Branchjds-mempool-enhancement
Testbedsv2
BenchmarkLatencyLatency Results
nanoseconds (ns) | (Δ%)
Latency Upper Boundary
nanoseconds (ns) | (%)
client_sv2_handle_message_common✅ (view plot)44.646 (+0.76%)45.427 (98.28%)
client_sv2_handle_message_mining✅ (view plot)75.016 (+4.11%)82.155 (91.31%)
client_sv2_mining_message_submit_standard✅ (view plot)14.632 (-0.05%)14.740 (99.27%)
client_sv2_mining_message_submit_standard_serialize✅ (view plot)262.380 (-3.05%)295.355 (88.84%)
client_sv2_mining_message_submit_standard_serialize_deserialize✅ (view plot)605.950 (+2.32%)636.405 (95.21%)
client_sv2_open_channel✅ (view plot)164.250 (-1.62%)185.529 (88.53%)
client_sv2_open_channel_serialize✅ (view plot)276.330 (-2.92%)296.253 (93.28%)
client_sv2_open_channel_serialize_deserialize✅ (view plot)370.760 (-1.82%)411.244 (90.16%)
client_sv2_setup_connection✅ (view plot)167.940 (+2.27%)177.593 (94.56%)
client_sv2_setup_connection_serialize✅ (view plot)467.950 (-1.03%)502.831 (93.06%)
client_sv2_setup_connection_serialize_deserialize✅ (view plot)946.050 (-4.84%)1169.751 (80.88%)

Bencher - Continuous Benchmarking
View Public Perf Page
Docs | Repo | Chat | Help

Copy link
Contributor

github-actions bot commented Feb 23, 2024

🐰Bencher

ReportThu, March 21, 2024 at 20:56:37 UTC
ProjectStratum v2 (SRI)
Branchjds-mempool-enhancement
Testbedsv2
BenchmarkEstimated CyclesEstimated Cycles Results
estimated cycles | (Δ%)
Estimated Cycles Upper Boundary
estimated cycles | (%)
InstructionsInstructions Results
instructions | (Δ%)
Instructions Upper Boundary
instructions | (%)
L1 AccessesL1 Accesses Results
accesses | (Δ%)
L1 Accesses Upper Boundary
accesses | (%)
L2 AccessesL2 Accesses Results
accesses | (Δ%)
L2 Accesses Upper Boundary
accesses | (%)
RAM AccessesRAM Accesses Results
accesses | (Δ%)
RAM Accesses Upper Boundary
accesses | (%)
client_sv2_handle_message_common✅ (view plot)2115.000 (+1.22%)2153.278 (98.22%)✅ (view plot)473.000 (+0.59%)482.130 (98.11%)✅ (view plot)730.000 (+0.12%)745.553 (97.91%)✅ (view plot)11.000 (+30.79%)12.573 (87.49%)✅ (view plot)38.000 (+0.88%)39.288 (96.72%)
client_sv2_handle_message_mining✅ (view plot)8257.000 (+0.97%)8338.428 (99.02%)✅ (view plot)2137.000 (+0.94%)2174.981 (98.25%)✅ (view plot)3157.000 (+0.94%)3221.345 (98.00%)✅ (view plot)40.000 (-0.64%)46.475 (86.07%)✅ (view plot)140.000 (+1.05%)142.006 (98.59%)
client_sv2_mining_message_submit_standard✅ (view plot)6330.000 (+0.01%)6493.552 (97.48%)✅ (view plot)1750.000 (-0.01%)1759.054 (99.49%)✅ (view plot)2550.000 (-0.02%)2565.561 (99.39%)✅ (view plot)21.000 (+1.87%)25.019 (83.94%)✅ (view plot)105.000 (-0.02%)109.756 (95.67%)
client_sv2_mining_message_submit_standard_serialize✅ (view plot)14977.000 (+1.65%)15083.176 (99.30%)✅ (view plot)4694.000 (-0.00%)4703.054 (99.81%)✅ (view plot)6747.000 (-0.10%)6770.446 (99.65%)✅ (view plot)50.000 (-1.12%)55.966 (89.34%)✅ (view plot)228.000 (+3.26%)230.888 (98.75%)
client_sv2_mining_message_submit_standard_serialize_deserialize✅ (view plot)27730.000 (+1.15%)27879.919 (99.46%)✅ (view plot)10545.000 (+0.05%)10553.389 (99.92%)✅ (view plot)15335.000 (-0.01%)15352.995 (99.88%)✅ (view plot)85.000 (-1.34%)89.578 (94.89%)✅ (view plot)342.000 (+2.78%)346.243 (98.77%)
client_sv2_open_channel✅ (view plot)4565.000 (+1.30%)4654.837 (98.07%)✅ (view plot)1461.000 (+0.02%)1470.956 (99.32%)✅ (view plot)2150.000 (-0.10%)2166.244 (99.25%)✅ (view plot)14.000 (+7.69%)16.792 (83.37%)✅ (view plot)67.000 (+2.43%)69.545 (96.34%)
client_sv2_open_channel_serialize✅ (view plot)14384.000 (+1.51%)14490.256 (99.27%)✅ (view plot)5064.000 (+0.01%)5073.956 (99.80%)✅ (view plot)7309.000 (-0.10%)7334.317 (99.65%)✅ (view plot)43.000 (+4.03%)45.596 (94.31%)✅ (view plot)196.000 (+3.21%)198.982 (98.50%)
client_sv2_open_channel_serialize_deserialize✅ (view plot)22844.000 (+1.07%)23024.668 (99.22%)✅ (view plot)7987.000 (+0.07%)7996.333 (99.88%)✅ (view plot)11609.000 (-0.02%)11629.720 (99.82%)✅ (view plot)77.000 (+2.32%)83.482 (92.24%)✅ (view plot)310.000 (+2.22%)314.762 (98.49%)
client_sv2_setup_connection✅ (view plot)4741.000 (+0.09%)4798.216 (98.81%)✅ (view plot)1502.000 (+0.02%)1511.956 (99.34%)✅ (view plot)2276.000 (+0.07%)2291.225 (99.34%)✅ (view plot)10.000 (-12.16%)14.491 (69.01%)✅ (view plot)69.000 (+0.41%)70.383 (98.04%)
client_sv2_setup_connection_serialize✅ (view plot)16376.000 (+0.86%)16474.962 (99.40%)✅ (view plot)5963.000 (+0.01%)5972.956 (99.83%)✅ (view plot)8651.000 (-0.04%)8671.591 (99.76%)✅ (view plot)47.000 (+0.49%)52.774 (89.06%)✅ (view plot)214.000 (+1.93%)216.409 (98.89%)
client_sv2_setup_connection_serialize_deserialize✅ (view plot)35684.000 (+0.37%)35778.708 (99.74%)✅ (view plot)14814.000 (+0.04%)14823.333 (99.94%)✅ (view plot)21744.000 (-0.02%)21762.706 (99.91%)✅ (view plot)107.000 (+7.91%)113.638 (94.16%)✅ (view plot)383.000 (+0.73%)385.560 (99.34%)

Bencher - Continuous Benchmarking
View Public Perf Page
Docs | Repo | Chat | Help

Copy link
Collaborator

@Sjors Sjors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the JDS calls getrawmempool and then deletes absent transactions from its local pseudo-mempool, it will delete transactions that have just been mined. But it will also delete transactions that have been (RBF) replaced.

If miners are still working on templates that point to the replaced transaction, then you can't construct the block. The clients themselves can still send SubmitSolution to the Template Provider and it can still relay the block.

But it would be safer to hold on to transactions in templates that are still pending. Perhaps you could hold on to each template, and the template hold a reference to the pseudo-mempool entries it needs. And then you have a background job that throws out old templates some time after a block is found, which lets go of the reference. The mempool pruning process then just needs to make sure no references are held to an entry before deleting it.

I find it difficult to review the code, because I'm not a Rust dev and there's no comments explaining what it's trying to achieve.

tx: Transaction,
pub struct TransactionWithHash {
pub id: Txid,
pub tx: Option<Transaction>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of storing a txid without its corresponding full transaction?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you run the JDS, it starts creating its local mempool (through the use of a hashmap) by just loading the txids (which are the key of the hashmap).
As soon as DeclareMiningJob messages are received, JDS fills those entries by retrieving tx data.
In the meantime, it keeps on updating its local mempool, and the txids are used to retrieve (since are the hashmap keys) and pass tx data present in the current mempool to the updated one (so that it's no needed a second fetch from the node).
Here's the specific lines I'm mentioning: https://github.com/GitGab19/stratum/blob/93901fb4c525ccbedfe0dce7714ba11001d78ad1/roles/jd-server/src/lib/mempool/mod.rs#L86

@GitGab19
Copy link
Collaborator Author

If the JDS calls getrawmempool and then deletes absent transactions from its local pseudo-mempool, it will delete transactions that have just been mined. But it will also delete transactions that have been (RBF) replaced.

If miners are still working on templates that point to the replaced transaction, then you can't construct the block. The clients themselves can still send SubmitSolution to the Template Provider and it can still relay the block.

But it would be safer to hold on to transactions in templates that are still pending. Perhaps you could hold on to each template, and the template hold a reference to the pseudo-mempool entries it needs. And then you have a background job that throws out old templates some time after a block is found, which lets go of the reference. The mempool pruning process then just needs to make sure no references are held to an entry before deleting it.

I find it difficult to review the code, because I'm not a Rust dev and there's no comments explaining what it's trying to achieve.

Good point, thanks for reviewing!
I double-checked and every time a DeclareMiningJob is received from JDC, JDS stores a declared_mining_job which contains every tx data needed to build the block before propagating it. So there shouldn't be any problem with the case you mentioned here. If a miner works and finds a solution for a template which contains a RBF-replaced tx, JDS is able to construct it because it stores all data necessary, even if its local mempool has just been refreshed/updated and does not contain anymore that specific tx data.

@Sjors
Copy link
Collaborator

Sjors commented Feb 23, 2024

JDS stores a declared_mining_job which contains every tx data needed to build the block before propagating it

Ok, but then what's the point of the local mempool?

@GitGab19
Copy link
Collaborator Author

JDS stores a declared_mining_job which contains every tx data needed to build the block before propagating it

Ok, but then what's the point of the local mempool?

A local mempool is needed in order to get tx data faster (taking them from the hashmap) instead of asking them to every JDC through a ProvideMissingTransaction message

@Sjors
Copy link
Collaborator

Sjors commented Feb 23, 2024

I see, so the local mempool feeds the declared_mining_job data structure?

(maybe someone can draw a diagram of the data structures and information flow of the JDS)

@GitGab19
Copy link
Collaborator Author

I see, so the local mempool feeds the declared_mining_job data structure?

(maybe someone can draw a diagram of the data structures and information flow of the JDS)

Yeah exactly.
Regarding diagram, I can do it while tackling issue #750, adding it to JDS readme

None => None,
}
});
let id = Txid::from_str(&id).unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After rebasing remember to manage the unhappy path!

}
}
},
None => missing_txs.push(i as u16),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you do not try to get the tx data from the node, before adding it to missing_txs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Becuase this path is the one taken when the txid is not present into the current mempool, so it means that tx is not present at all.

Copy link
Collaborator

@lorbax lorbax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good PR!
I left some comments.

match short_id_mempool.get(&sid_) {
Some(tx_data) => match &tx_data.tx {
Some(tx) => {
if i >= txs_in_job.len() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here why you resize and then ad add the transaction? Isn't enough to use push() method?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to do it because we need to maintain the same order defined in DeclaredMiningJob message in the txs_in_job. Using push() method, if a tx is not present in current JDS mempool (and so it's added to the missing_txs vec), another one would be inserted into that specific index (causing a wrong tx order in declared_mining_job)

Copy link
Collaborator

@lorbax lorbax Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that it is as you are describing. In main branch they should be already positioned correctly. If you have (0, tx0), (1, tx1), (2, tx2), (3, _), (4, tx4), (5, _) and (6, tx6) then it will be

tx_in_job = vec![tx0, tx1, tx2, tx4, tx6];
missing_ts = vec![3, 5]

then when you receive the message PMTS
tx3 is inserted in tx_in_job[3], which becomes

assert_eq[tx_in_job, vec![tx0, tx1, tx3, tx2, tx4, tx6]]

and then tx5 is inserted in tx_in_job[5]

assert_eq[tx_in_job, vec![tx0, tx1, tx3, tx2, tx4, tx5, tx6]]

It is tricky and we should find a way to manage it better, possibly to store each transaction with its index in the block payload. Like it is now works but it is very error prone

@@ -153,7 +168,15 @@ impl ParseClientJobDeclarationMessages for JobDeclaratorDownstream {
),
),
)))? as usize;
transactions.insert(index, tx);
if index >= transactions.len() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From doc.rs

let mut vec = vec!["hello"];
vec.resize(3, "world");
assert_eq!(vec, ["hello", "world", "world"]);

This is the vector for the transactions that will be mined, in this order. I think that this is used to reconstruct the block. If some transaction is duplicated, this may lead propagation of invalid block. Be careful!
I suggest not resizing, as, in fact, it is already performed by insert() method few lines below.

@@ -93,6 +104,10 @@ impl ParseClientJobDeclarationMessages for JobDeclaratorDownstream {
missing_txs.clone(),
));

if !txs_to_retrieve.is_empty() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you are asking the transactions to the rpc client. You are doing this before sending ProvideMissingTransactions (PMT). My fear is that in this way the JD-server must wait the rpc client response in order to send PMT message. Please be careful!
If so, I suggest to save the variable txs_to_retrieve in the declared_mining_job field of JobDeclaratorDownstream and modify add_tx_data_to_job in order to be called in main.

Copy link
Collaborator Author

@GitGab19 GitGab19 Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's happening, since add_tx_data_to_job is not async and the handler doesn't await anything. Also analyzing logs it seems that PMT message is created and sent to JDC without waiting nothing

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the point is is add_tx_data_to_job is not async. If it was, there wouldn't be any problem I think.

let index = tx.1 .1;
let new_tx_data: Result<Transaction, JdsMempoolError> = mempool
.safe_lock(|x| x.get_client())
.unwrap()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a big deal to propagate errors instead panicking?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did it here da314ef

@@ -114,14 +128,18 @@ impl JDsMempool {
Ok(())
}

pub fn to_short_ids(&self, nonce: u64) -> Option<HashMap<[u8; 6], Transaction>> {
pub fn to_short_ids(&self, nonce: u64) -> Option<HashMap<[u8; 6], TransactionWithHash>> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The struct TransactionWithHash was introduced on to represent transactions in the mempool, but you changed this representation. Did you consider to get rid of it?
You can return Option<HashMap<[u8; 6], (Txid, Transaction)>>.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's needed to get the Txid used in the handle_declare_mining_job for fetching txs data through add_tx_data_to_job function

@Fi3
Copy link
Collaborator

Fi3 commented Mar 15, 2024

I think that the overall architecture can be improved. From my understanding we clone txdata into too
many places and that can lead to memory issues at scale.

Each JobDeclaratorDownstream have a full copy of the transaction in the last declared block that
means that you need at least mempool_size + average_block_size * number_of_clients of memory. Is very
easy too see that this scale very badly.

What I instead suggest is to have txdata only in the JDsMempool and get them when we construct the
block then we get the data and we cretae the hex blok. I don't see any reason to have the data for each
JobDeclaratorDownstream instead of just having the ids.

Another issue that I spotted is that get_block_hex when called assume that all txs data have been
fetched from the node. I don't think that this is the case but if it is maybe we should explain why
it is in a comment.

@GitGab19
Copy link
Collaborator Author

I think that the overall architecture can be improved. From my understanding we clone txdata into too many places and that can lead to memory issues at scale.

Each JobDeclaratorDownstream have a full copy of the transaction in the last declared block that means that you need at least mempool_size + average_block_size * number_of_clients of memory. Is very easy too see that this scale very badly.

What I instead suggest is to have txdata only in the JDsMempool and get them when we construct the block then we get the data and we cretae the hex blok. I don't see any reason to have the data for each JobDeclaratorDownstream instead of just having the ids.

Another issue that I spotted is that get_block_hex when called assume that all txs data have been fetched from the node. I don't think that this is the case but if it is maybe we should explain why it is in a comment.

You're right.
Done here 4cc2624

@GitGab19
Copy link
Collaborator Author

Just mined a block (https://mempool.space/it/testnet/block/000000000000001313d29fcd6627788c457cc4d8a1e5747abe34db0ea2117dae), and everything went right: JDS correctly fetched txs from its local mempool, reconstructed the block and propagated it through the node.

}
// TODO check it
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove this TODO?

GitGab19 and others added 26 commits March 21, 2024 21:52
the method get_raw_mempool_verbose perform the rpc call
`getrawmempool`, whichout verbose. This commit remove this inconsistency
Now the DeclaratorDownstream has a method that retrieves the the
recognized transactions in the mempool. This method is called inside the
JobdeclaratorDownstream::accept_incomin_connections().
…, get them from jds mempool when a block is found
As suggested in
stratum-mining#772 (comment)
A task that before was blocking now is moved as a separate task on main
As suggested in
stratum-mining#772 (comment)
This commit introduces a verification that all the transactions in a job
are correctly recognized when a submit block message appears.
Error management when some transactions are missing. The jds should not
break. Instead, tries to recover it by triggering the jds mempool to
retrieve the transactions from its bitcoin node
In the message handler the message the triggers the JDS mempool to fill
the transactions is implemented in a bad way. Now it is fixed

Add some documentation
@GitGab19 GitGab19 merged commit db994da into stratum-mining:main Mar 21, 2024
13 checks passed
NonsoAmadi10 pushed a commit to NonsoAmadi10/stratum that referenced this pull request Mar 22, 2024
As suggested in
stratum-mining#772 (comment)
A task that before was blocking now is moved as a separate task on main
NonsoAmadi10 pushed a commit to NonsoAmadi10/stratum that referenced this pull request Mar 22, 2024
As suggested in
stratum-mining#772 (comment)
This commit introduces a verification that all the transactions in a job
are correctly recognized when a submit block message appears.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JDS randomly answers to JDC with delay
4 participants