Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(node): implementation of RecordStore::records() libp2p trait fn was not returning all records #271

Closed
wants to merge 2 commits into from

Conversation

bochaco
Copy link
Member

@bochaco bochaco commented May 16, 2023

No description provided.

@reviewpad reviewpad bot requested a review from RolandSherwin May 16, 2023 18:07
@reviewpad reviewpad bot added Small Pull request is small waiting-for-review labels May 16, 2023
@reviewpad
Copy link

reviewpad bot commented May 16, 2023

AI-Generated Summary: This pull request modifies the implementation of the RecordStore::records() function in disk_backed_record_store.rs to fix an issue where not all records were being returned. It replaces the imperative code with a more functional approach, using filter_map to get records, followed by the collect() call to convert them into a vector.

@reviewpad
Copy link

reviewpad bot commented May 16, 2023

Reviewpad Report

⚠️ Warnings

  • Please link an issue to the pull request

@reviewpad
Copy link

reviewpad bot commented May 16, 2023

AI-Generated Summary: This pull request changes the records() function in the DiskBackedRecordStore implementation and introduces the RecordsIterator to read records from disk. The new iterator is now the type for type RecordsIter<'a> in the RecordStore trait implementation for DiskBackedRecordStore. The patch also includes some formatting and refactoring changes in the imports and constants.

@bochaco bochaco force-pushed the fix-disk-backed-record-store branch from 2ec7c61 to 8200069 Compare May 16, 2023 20:32
@reviewpad
Copy link

reviewpad bot commented May 16, 2023

AI-Generated Summary: This pull request includes a change in the implementation of the RecordStore::records() function within the safenode/src/domain/storage/disk_backed_record_store.rs file. The previous implementation was not returning all records. The patch adds a new RecordsIterator struct and changes the records() function to use it. Additionally, the code has been refactored for readability, such as by moving the code that reads records from disk into a separate function.

@joshuef
Copy link
Contributor

joshuef commented May 17, 2023

I think, while you're rigth, our impl is off trait. It's probably what we want just now. And we should try and figure out how we want to push replication going forwards.

Do we want to stay w/ that logic in libp2p (ideally yes). And if so, what do we need to do to libp2p to make it work for us (I'm thinking PRing upstream).

For clarity: As I understand things, libp2p pulls in all records every replication interval and sends them to all relevant nodes. Regardless if they have them or other ndoes might do this. For us, with larger Records, this is a massive memory hog.

So one option is to just dial back node size and record count. Spread things thinner so to speak.

If we don't want to do that, something like the random record selection we have reduces message overlap between nodes and does not load as many records into mem each iteration... (while potentially still covering a large Record surface between all nodes)...

@joshuef joshuef force-pushed the fix-disk-backed-record-store branch from 8200069 to 2685439 Compare May 18, 2023 01:00
@reviewpad reviewpad bot added Medium Medium sized PR and removed Small Pull request is small labels May 18, 2023
@reviewpad
Copy link

reviewpad bot commented May 18, 2023

AI-Generated Summary: This pull request includes two patches:

  1. In the first patch, the RecordStore::records() trait function has been fixed, as it was not returning all the records. The implementation now utilizes a newly introduced RecordsIterator, which reads records from the disk at the very moment the consumer/user is iterating each item. Additionally, minor changes have been made to the import statements and formatting.

  2. In the second patch, replication intervals have been staggered to avoid a message surge. Random replication intervals are now generated for each node between REPLICATION_INTERVAL_LOWER_BOUND and REPLICATION_INTERVAL_UPPER_BOUND. This helps ease the intense replication activity across the network. The changes affect disk_backed_record_store.rs, mod.rs, and network/mod.rs.

@joshuef joshuef force-pushed the fix-disk-backed-record-store branch from 2685439 to 1a2d8a8 Compare May 18, 2023 01:49
@reviewpad
Copy link

reviewpad bot commented May 18, 2023

AI-Generated Summary: This pull request includes two patches:

  1. Fixing the implementation of the RecordStore::records() function in the DiskBackedRecordStore module. The previous implementation wasn't returning all records, which has now been fixed by introducing a custom iterator named RecordsIterator. This iterator reads records from the disk for each individual key as the consumer/user iterates over the items.

  2. Introducing staggered replication intervals in the DiskBackedRecordStore module. Previously, all nodes had a fixed replication interval, causing an intense replication activity across the network. To avoid unnecessary message surges, the replication interval is randomly set between an upper and lower bound for each node. This change helps in staggering the replication activity and reducing the load on the network.

@joshuef joshuef force-pushed the fix-disk-backed-record-store branch from 1a2d8a8 to 82933b0 Compare May 18, 2023 02:04
@reviewpad
Copy link

reviewpad bot commented May 18, 2023

AI-Generated Summary: This pull request includes two patches:

  1. A fix for the implementation of RecordStore::records(). The previous implementation of libp2p trait function did not return all the records. This patch includes changes to disk_backed_record_store.rs with additional read logic and the iterator implementation for records.

  2. A feature to stagger replication intervals for nodes in order to avoid message surges. This patch modifies disk_backed_record_store.rs, storage/mod.rs, and network/mod.rs, and adds a new file: git_hash.rs. The staggered intervals are introduced as random durations between a lower and an upper bound, which is expected to spread out replication activity across the network.

safenode/src/git_hash.rs Fixed Show fixed Hide fixed
@joshuef joshuef force-pushed the fix-disk-backed-record-store branch from 82933b0 to 6ebb398 Compare May 18, 2023 02:32
@reviewpad
Copy link

reviewpad bot commented May 18, 2023

AI-Generated Summary: This pull request includes two patches:

  1. Fix(node): The implementation of RecordStore::records() libp2p trait function was not returning all records. The patch refactors the code, changing the RecordStore::RecordsIter type from vec::IntoIter to custom RecordsIterator, handling reading records from disk during iteration.

  2. Feat: Stagger replication intervals to avoid message surge. The patch introduces random replication intervals for nodes instead of a fixed one. The replication interval for each node is a random value between specified lower and upper bounds, avoiding simultaneous replication activity across the network.

type Item = Cow<'a, Record>;

fn next(&mut self) -> Option<Self::Item> {
for key in self.keys.by_ref() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this always return the first one?

records.push(record);
}
RecordsIterator {
keys: self.records.iter(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if decided no longer using replication_records , then it shall be removed totally?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replication_records

I thiiink this could be useful in a follow up PR? Although, that's a PR you're looking at @maqi so let me know what you think

@bochaco bochaco enabled auto-merge May 18, 2023 13:51
@bochaco bochaco added this pull request to the merge queue May 18, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 18, 2023
@maqi maqi mentioned this pull request May 18, 2023
@bochaco
Copy link
Member Author

bochaco commented May 18, 2023

Superseded by #282

@bochaco bochaco closed this May 18, 2023
@bochaco bochaco reopened this May 18, 2023
@bochaco bochaco closed this May 18, 2023
@bochaco bochaco deleted the fix-disk-backed-record-store branch May 18, 2023 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Medium Medium sized PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants