Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix gossip issue - Have FetchEntry include meta aspects from source chain #1998

Merged
merged 8 commits into from Dec 19, 2019

Conversation

@lucksus
Copy link
Member

lucksus commented Dec 19, 2019

PR summary

While diagnosing the issue of pending validations with unresolved links, I discovered something interesting when looking at sim2h logs and comparing hashes with the debug view of my local Holoscape instance:

Sim2h reported several missing aspects for all my running instances. Including the Personas&Profiles instance which is made to be unique: currently there can only be exactly one instance of the same DNA - an installation on another device would create a different random hash.

So every aspect hash Sim2h knows about can only come from my instance. But then how can there be something missing? With these changes we have sim2h_worker make core store/hold everything that is in the authoring list. That is, sim2h_worker first fetches every entry/aspect via a HandleFetchEntry message, and then translates the response into StoreEntryAspects.

The problem was that the HandleFetchEntry handler only asked the DHT/EAV for data to construct all aspects of the requested entry. But if we are relying on FetchEntry to get the meta aspects from the chain into the DHT/EAV, those aspects will never get there.

The latter was the case, and both explains:

  1. why sim2h server was stuck finding missing aspects and had no way to get those
  2. why we could get into a situation where a node gets incomplete data through gossip -> being stuck with entries in the validation queue for which it never gets the dependencies (the node that gossiped those entry/aspects might have had the needed dependencies in its source chain and used those to validate the bogus entries, but nobody else can if those source chain entries never get gossiped)

testing/benchmarking notes

I could verify that point 1 above is fixed with these changes. I am confident that it also fixes point 2, but in order to be sure we need to run according experiments. If CI proves that there are no regressions I would suggest to merge this without the proof for 2. It is still an important improvement and remaining issues with dependent validations could get fixed as a follow up.

followups

( any new tickets/concerns that were discovered or created during this work but aren't in scope for review here )

changelog

  • if this is a code change that effects some consumer (e.g. zome developers) of holochain core, then it has been added to our between-release changelog with the format
- summary of change [PR#1234](https://github.com/holochain/holochain-rust/pull/1234)

documentation

aspects.insert(content_aspect);
for result in &[
get_meta_aspects_from_chain(&address, context.clone()),
get_meta_aspects_from_dht_eav(&address, context.clone()),

This comment has been minimized.

Copy link
@lucksus

lucksus Dec 19, 2019

Author Member

This is where we used to only get meta aspects from the DHT/EAV (the old fn name get_meta_aspects() implies that we were not thinking about getting it from the chain at all). Now we ask both sources for meta aspects for the given entry address and merge the results.


/// The network has requested a DHT entry from us.
/// Lets try to get it and trigger a response.
pub fn handle_fetch_entry(get_dht_data: FetchEntryData, context: Arc<Context>) {
let address = get_dht_data.entry_address.clone();
let mut aspects = vec![];
let mut aspects: HashSet<EntryAspect> = HashSet::new();

This comment has been minimized.

Copy link
@lucksus

lucksus Dec 19, 2019

Author Member

Because of this we need the Eq derivations and the Hash implementation below.

@@ -333,6 +335,42 @@ impl Sim2hWorker {
}
}

fn self_store_authored_aspects(&mut self) {

This comment has been minimized.

Copy link
@lucksus

lucksus Dec 19, 2019

Author Member

The changes in this file make the booting of a conductor faster in the case of already holding/authoring a lot. With yesterday's naive implementation we would always send store (and first fetch-) requests for everything in the authoring list, no matter if we are holding already. This waits until we have both lists so we can actually compare if there is anything in the source chain that needs to be held.
(Which needs to be here in sim2h because it is a full-sync DHT assumption and should go away or get deactivated once we activate sharding)

This comment has been minimized.

Copy link
@maackle

maackle Dec 19, 2019

Member

If there is code here that needs to go away can you make that visible in the comments with a TODO?

This comment has been minimized.

Copy link
@thedavidmeister

thedavidmeister Dec 19, 2019

Contributor

@maackle @lucksus i don't think this is worth blocking a merge here as we are getting fairly overdue for a release and @lucksus is heading off for the year...

a TODO would be nice but also it sounds like sharding simply won't work until this is upgraded so i doubt we'd miss it 🤷‍♂

This comment has been minimized.

Copy link
@maackle

maackle Dec 19, 2019

Member

To me it sounds like sharding will work just fine, but we'll be potentially Holding stuff that the sharded network doesn't need us to hold. Either way I guess it's enough that at least the three of us have taken note.

@zippy
zippy approved these changes Dec 19, 2019
/// base address to which it is meta, if the entry is the source entry of a meta aspect,
/// i.e. a CRUD or link entry.
/// If the entry is not that it returns None.
fn entry_to_meta_aspect(entry: Entry, header: ChainHeader) -> Option<(Address, EntryAspect)> {

This comment has been minimized.

Copy link
@thedavidmeister

thedavidmeister Dec 19, 2019

Contributor

@lucksus when i see things like this my first reaction is that we want a From or TryFrom somewhere so that the rust compiler can help us more, but it's not worth blocking this PR at all

&& self.initial_gossiping_list.is_some()
&& self.initial_authoring_list.is_some()
{
let authoring_list = self.initial_authoring_list.take().unwrap();

This comment has been minimized.

Copy link
@thedavidmeister

thedavidmeister Dec 19, 2019

Contributor

minor point, but this is weird and the kind of thing clippy doesn't like, what would it mean for a list to be None vs. Some-yet-empty, and then why do we do an is_some() in the condition but then need to unwrap() in the body? @lucksus

This comment has been minimized.

Copy link
@maackle

maackle Dec 19, 2019

Member

Yeah this could have been written where the lists are taken and bound to local vars before the condition, to avoid the unwrap. But, minor.

The some-ness indicates whether or not the list got returned from the network, but if it's empty, that's just what was fetched.

This comment has been minimized.

Copy link
@thedavidmeister

thedavidmeister Dec 23, 2019

Contributor

@maackle ah, i see, yeah that does make sense "unknown" vs. "known and empty"

@thedavidmeister thedavidmeister merged commit 8cf24eb into develop Dec 19, 2019
7 checks passed
7 checks passed
ci/circleci: app-spec-tests-sim2h Your tests passed on CircleCI!
Details
ci/circleci: build Your tests passed on CircleCI!
Details
ci/circleci: cli-tests Your tests passed on CircleCI!
Details
ci/circleci: cluster-tests Your tests passed on CircleCI!
Details
ci/circleci: fmt Your tests passed on CircleCI!
Details
ci/circleci: stress-tests-sim2h Your tests passed on CircleCI!
Details
ci/circleci: wasm-conductor-tests Your tests passed on CircleCI!
Details
@zippy zippy deleted the gossip-improvements branch Jan 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.