fix(node): implementation of RecordStore::records() libp2p trait fn was not returning all records #271

bochaco · 2023-05-16T18:07:26Z

No description provided.

reviewpad · 2023-05-16T18:07:52Z

AI-Generated Summary: This pull request modifies the implementation of the RecordStore::records() function in disk_backed_record_store.rs to fix an issue where not all records were being returned. It replaces the imperative code with a more functional approach, using filter_map to get records, followed by the collect() call to convert them into a vector.

reviewpad · 2023-05-16T18:07:53Z

Reviewpad Report

⚠️ Warnings

Please link an issue to the pull request

reviewpad · 2023-05-16T20:07:19Z

AI-Generated Summary: This pull request changes the records() function in the DiskBackedRecordStore implementation and introduces the RecordsIterator to read records from disk. The new iterator is now the type for type RecordsIter<'a> in the RecordStore trait implementation for DiskBackedRecordStore. The patch also includes some formatting and refactoring changes in the imports and constants.

reviewpad · 2023-05-16T20:33:23Z

AI-Generated Summary: This pull request includes a change in the implementation of the RecordStore::records() function within the safenode/src/domain/storage/disk_backed_record_store.rs file. The previous implementation was not returning all records. The patch adds a new RecordsIterator struct and changes the records() function to use it. Additionally, the code has been refactored for readability, such as by moving the code that reads records from disk into a separate function.

joshuef · 2023-05-17T01:01:34Z

I think, while you're rigth, our impl is off trait. It's probably what we want just now. And we should try and figure out how we want to push replication going forwards.

Do we want to stay w/ that logic in libp2p (ideally yes). And if so, what do we need to do to libp2p to make it work for us (I'm thinking PRing upstream).

For clarity: As I understand things, libp2p pulls in all records every replication interval and sends them to all relevant nodes. Regardless if they have them or other ndoes might do this. For us, with larger Records, this is a massive memory hog.

So one option is to just dial back node size and record count. Spread things thinner so to speak.

If we don't want to do that, something like the random record selection we have reduces message overlap between nodes and does not load as many records into mem each iteration... (while potentially still covering a large Record surface between all nodes)...

…as not returning all records

reviewpad · 2023-05-18T01:01:05Z

AI-Generated Summary: This pull request includes two patches:

In the first patch, the RecordStore::records() trait function has been fixed, as it was not returning all the records. The implementation now utilizes a newly introduced RecordsIterator, which reads records from the disk at the very moment the consumer/user is iterating each item. Additionally, minor changes have been made to the import statements and formatting.
In the second patch, replication intervals have been staggered to avoid a message surge. Random replication intervals are now generated for each node between REPLICATION_INTERVAL_LOWER_BOUND and REPLICATION_INTERVAL_UPPER_BOUND. This helps ease the intense replication activity across the network. The changes affect disk_backed_record_store.rs, mod.rs, and network/mod.rs.

reviewpad · 2023-05-18T01:50:13Z

AI-Generated Summary: This pull request includes two patches:

Fixing the implementation of the RecordStore::records() function in the DiskBackedRecordStore module. The previous implementation wasn't returning all records, which has now been fixed by introducing a custom iterator named RecordsIterator. This iterator reads records from the disk for each individual key as the consumer/user iterates over the items.
Introducing staggered replication intervals in the DiskBackedRecordStore module. Previously, all nodes had a fixed replication interval, causing an intense replication activity across the network. To avoid unnecessary message surges, the replication interval is randomly set between an upper and lower bound for each node. This change helps in staggering the replication activity and reducing the load on the network.

reviewpad · 2023-05-18T02:05:08Z

AI-Generated Summary: This pull request includes two patches:

A fix for the implementation of RecordStore::records(). The previous implementation of libp2p trait function did not return all the records. This patch includes changes to disk_backed_record_store.rs with additional read logic and the iterator implementation for records.
A feature to stagger replication intervals for nodes in order to avoid message surges. This patch modifies disk_backed_record_store.rs, storage/mod.rs, and network/mod.rs, and adds a new file: git_hash.rs. The staggered intervals are introduced as random durations between a lower and an upper bound, which is expected to spread out replication activity across the network.

safenode/src/git_hash.rs

reviewpad · 2023-05-18T02:32:46Z

AI-Generated Summary: This pull request includes two patches:

Fix(node): The implementation of RecordStore::records() libp2p trait function was not returning all records. The patch refactors the code, changing the RecordStore::RecordsIter type from vec::IntoIter to custom RecordsIterator, handling reading records from disk during iteration.
Feat: Stagger replication intervals to avoid message surge. The patch introduces random replication intervals for nodes instead of a fixed one. The replication interval for each node is a random value between specified lower and upper bounds, avoiding simultaneous replication activity across the network.

maqi · 2023-05-18T09:29:22Z

safenode/src/domain/storage/disk_backed_record_store.rs

+    type Item = Cow<'a, Record>;
+
+    fn next(&mut self) -> Option<Self::Item> {
+        for key in self.keys.by_ref() {


will this always return the first one?

maqi · 2023-05-18T09:30:49Z

safenode/src/domain/storage/disk_backed_record_store.rs

-                records.push(record);
-            }
+        RecordsIterator {
+            keys: self.records.iter(),


if decided no longer using replication_records , then it shall be removed totally?

replication_records

I thiiink this could be useful in a follow up PR? Although, that's a PR you're looking at @maqi so let me know what you think

bochaco · 2023-05-18T16:03:38Z

Superseded by #282

reviewpad bot requested a review from RolandSherwin May 16, 2023 18:07

reviewpad bot added Small Pull request is small waiting-for-review labels May 16, 2023

bochaco added the DoNotMerge label May 16, 2023

bochaco force-pushed the fix-disk-backed-record-store branch from d4ee9c1 to 2ec7c61 Compare May 16, 2023 20:06

bochaco removed the DoNotMerge label May 16, 2023

bochaco force-pushed the fix-disk-backed-record-store branch from 2ec7c61 to 8200069 Compare May 16, 2023 20:32

fix(node): implementation of RecordStore::records() libp2p trait fn w…

28e0540

…as not returning all records

joshuef force-pushed the fix-disk-backed-record-store branch from 8200069 to 2685439 Compare May 18, 2023 01:00

reviewpad bot added Medium Medium sized PR and removed Small Pull request is small labels May 18, 2023

joshuef force-pushed the fix-disk-backed-record-store branch from 2685439 to 1a2d8a8 Compare May 18, 2023 01:49

joshuef force-pushed the fix-disk-backed-record-store branch from 1a2d8a8 to 82933b0 Compare May 18, 2023 02:04

github-advanced-security bot found potential problems May 18, 2023

View reviewed changes

safenode/src/git_hash.rs Fixed Show fixed Hide fixed

This was referenced May 18, 2023

Add in churn based, targetted replication #279

Closed

Improve republish #260

Closed

feat: stagger replication intervals to avoid msg surge

6ebb398

joshuef force-pushed the fix-disk-backed-record-store branch from 82933b0 to 6ebb398 Compare May 18, 2023 02:32

joshuef approved these changes May 18, 2023

View reviewed changes

maqi reviewed May 18, 2023

View reviewed changes

bochaco enabled auto-merge May 18, 2023 13:51

bochaco added this pull request to the merge queue May 18, 2023

github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 18, 2023

maqi mentioned this pull request May 18, 2023

feat: Targeted replication #282

Merged

bochaco closed this May 18, 2023

bochaco reopened this May 18, 2023

bochaco closed this May 18, 2023

bochaco deleted the fix-disk-backed-record-store branch May 18, 2023 20:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(node): implementation of RecordStore::records() libp2p trait fn was not returning all records #271

fix(node): implementation of RecordStore::records() libp2p trait fn was not returning all records #271

bochaco commented May 16, 2023

reviewpad bot commented May 16, 2023

reviewpad bot commented May 16, 2023

reviewpad bot commented May 16, 2023

reviewpad bot commented May 16, 2023

joshuef commented May 17, 2023

reviewpad bot commented May 18, 2023

reviewpad bot commented May 18, 2023

reviewpad bot commented May 18, 2023

reviewpad bot commented May 18, 2023

maqi May 18, 2023

maqi May 18, 2023

joshuef May 18, 2023

bochaco commented May 18, 2023

fix(node): implementation of RecordStore::records() libp2p trait fn was not returning all records #271

fix(node): implementation of RecordStore::records() libp2p trait fn was not returning all records #271

Conversation

bochaco commented May 16, 2023

reviewpad bot commented May 16, 2023

reviewpad bot commented May 16, 2023

reviewpad bot commented May 16, 2023

reviewpad bot commented May 16, 2023

joshuef commented May 17, 2023

reviewpad bot commented May 18, 2023

reviewpad bot commented May 18, 2023

reviewpad bot commented May 18, 2023

reviewpad bot commented May 18, 2023

maqi May 18, 2023

Choose a reason for hiding this comment

maqi May 18, 2023

Choose a reason for hiding this comment

joshuef May 18, 2023

Choose a reason for hiding this comment

bochaco commented May 18, 2023