Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Header Publishing (The Return Of) #1640

Merged
merged 35 commits into from Sep 26, 2019
Merged

Conversation

@willemolding
Copy link
Contributor

willemolding commented Aug 6, 2019

Re-opened header publishing PR

  • There is now a specific action (and corresponding reducer) for publishing a header (PublishHeaderEntry). Headers are no longer published by the Publish action and publish_header must be called in the top level workflow.

  • Dna and AgentId entries are only published when the chain is initialized for the first time rather than at every startup

  • Headers are returned as part of the author list so they are gossiped to new joining agents

changelog

Please check one of the following, relating to the CHANGELOG-UNRELEASED.md

  • this is a code change that effects some consumer (e.g. zome developers) of holochain core so it is added to the CHANGELOG-UNRELEASED.md (linked above), with the format - summary of change [PR#1234](https://github.com/holochain/holochain-rust/pull/1234)
  • this is not a code change, or doesn't effect anyone outside holochain core development
@willemolding willemolding changed the title ensure dna/agent entries are only published on first startup, also ad… Feature - Header Publishing (The Return Of) Aug 6, 2019
@willemolding willemolding changed the title Feature - Header Publishing (The Return Of) WIP - Header Publishing (The Return Of) Aug 6, 2019
@willemolding willemolding changed the title WIP - Header Publishing (The Return Of) Header Publishing (The Return Of) Aug 7, 2019
willemolding added 5 commits Aug 7, 2019
fmt
@willemolding

This comment has been minimized.

Copy link
Contributor Author

willemolding commented Aug 7, 2019

While running the tests it prints out errors around the author list saying:

thread 'handle_authoring_list/puid-26e-0' panicked at 'Error getting entry aspects of authoring list for chain headers: EntryNotFoundLocally'

Despite this the tests still pass. @lucksus @struktured is this expected or does this suggest the tests are not correctly picking up a failure?

@willemolding willemolding requested review from lucksus, zippy and struktured Aug 7, 2019
willemolding added 4 commits Aug 8, 2019
);
},
Err(_err) => log_debug!(context,
"handler/get_authoring_list: Error getting entry aspects of authoring list for entry with address: {}",

This comment has been minimized.

Copy link
@willemolding

willemolding Aug 8, 2019

Author Contributor

I changed this to log the error properly rather than panic the thread

use std::{thread, time};

#[test]
fn test_can_get_chain_header_list() {

This comment has been minimized.

Copy link
@willemolding

willemolding Aug 8, 2019

Author Contributor

@lucksus These two tests show the loading of the header entries and their aspects working fine. I am wondering if the issue is happening only at startup when the Dna and Agent headers are in the chain but the header entries are not committed yet.


use holochain_core_types::error::HcResult;
use holochain_persistence_api::cas::content::Address;

#[derive(Clone, Debug)]
pub enum ActionResponse {
Publish(HcResult<Address>),
PublishHeaderEntry(HcResult<Address>),

This comment has been minimized.

Copy link
@AshantiMutinta

AshantiMutinta Aug 14, 2019

Contributor

Can we use the same publish for this? They seem to share the same type

Copy link
Contributor

AshantiMutinta left a comment

Looks good but I think we could cut out some code if we merge the publish/publish_entry, maybe where the subtle differences show themselves we could use the enum? Seeing a few parallels with the PublishEntry

@zippy

This comment has been minimized.

Copy link
Member

zippy commented Aug 15, 2019

@willemolding do you know why the two tests are failing? Are they part of the flakeyness?

@willemolding

This comment has been minimized.

Copy link
Contributor Author

willemolding commented Aug 15, 2019

Looks good but I think we could cut out some code if we merge the publish/publish_entry, maybe where the subtle differences show themselves we could use the enum? Seeing a few parallels with the PublishEntry

This was previously implemented using the same action as PublishEntry. This PR is a rewrite adding its own action to intentionally keep them seperate. The problems were with the waiter (which waits for a corresponding Hold action for a publish) not doing so correctly for header entries. This way the two do not interfere (although the waiter won't wait for headers at this stage)

That said I think it could probably be DRYed up a bit. I'll take another look

@AshantiMutinta

This comment has been minimized.

Copy link
Contributor

AshantiMutinta commented Aug 15, 2019

Looks good but I think we could cut out some code if we merge the publish/publish_entry, maybe where the subtle differences show themselves we could use the enum? Seeing a few parallels with the PublishEntry

This was previously implemented using the same action as PublishEntry. This PR is a rewrite adding its own action to intentionally keep them seperate. The problems were with the waiter (which waits for a corresponding Hold action for a publish) not doing so correctly for header entries. This way the two do not interfere (although the waiter won't wait for headers at this stage)

That said I think it could probably be DRYed up a bit. I'll take another look

Yeah I was thinking of having
pub enum PublishType
{
Entry(..),
Header(...),
}

then Publish(PublishType)

that way even though they similar they can still use the same futures e.t.c

Would it still cause problems for the waiter? If still just discard my comment. Will still approve :)

Copy link
Member

lucksus left a comment

While running the tests it prints out errors around the author list saying:

thread 'handle_authoring_list/puid-26e-0' panicked at 'Error getting entry aspects of authoring list for chain headers: EntryNotFoundLocally'

Despite this the tests still pass. @lucksus @struktured is this expected or does this suggest the tests are not correctly picking up a failure?

Hm, that actually looks like a problem that our tests are not picking up yet.
Also, as @zippy mentioned, there are two tests failing in app-spec showing that some sources are missing. I don't think that this has to do with the n3h flakyness..

CHANGELOG-UNRELEASED.md Outdated Show resolved Hide resolved
@lucksus

This comment has been minimized.

Copy link
Member

lucksus commented Aug 16, 2019

Hm, actually, I think the problem is that we try get the meta aspects for all the things we author. You applied the same (I think mistaken) thing I did earlier for all chain entries: just calling get_all_aspect_addresses() on chain entries and now also headers. But get_all_aspect_addresses calls get_meta_aspects which we only have if we are DHT-holding the entry. Which is the case in all our current tests because of the full-sync, but it is something that happens after authoring asynchronously.

I think the fix is to only add the content aspect when constructing the authoring list, both for all entries as well as headers.

@lucksus

This comment has been minimized.

Copy link
Member

lucksus commented Aug 16, 2019

get_content_aspect() always gets the content from the DHT and as such is not really catering for the authoring list at all. I'm working on this and will push a fix soon...

Have get_content_aspect() try the source chain first.
Construct content aspects for headers on the fly.
@lucksus

This comment has been minimized.

Copy link
Member

lucksus commented Aug 16, 2019

So I've change get_content_aspect() so that it first tries to get the content from the source chain and if that fails to look it up in the local DHT cas.

Also, I've change it to not use that function for header content aspects as these need to be created on the fly.

The old version of get_content_aspect() very likely have failed in other contexts as well. It might actually be a reasonable explanation for some of the problems we see right now trying to switch over to lib3h @zippy, @struktured.

There isn't really a good way to tests this currently as those lists are only queried (and needed) when we restart a conductor. So this is another reason for having the conductor start/stop feature in try-o-rama soon, @maackle.

@lucksus

This comment has been minimized.

Copy link
Member

lucksus commented Aug 16, 2019

Hm, I've run app-spec on this several times now locally and I have seen various different tests fail. Definitely flaky. I'm running develop right now to compare against the overall n3h flakyness which seemed to be very low recently.

Mainly CRUD tests were failing and I could tell from the logs that there were inconsistencies about the entries being held by the three test agents.

I'm suspecting the following:

  • With header publishing, each Commit results in two Holds now
  • The conistency model and hachiko only wait for one Hold per agent after each commit
  • Holding the header is regarded (by hachiko) as holding the entry and it stops waiting
  • => following test assertions fail non-deterministically dependent on timings

Is that plausible, @maackle, @willemolding?

willemolding and others added 3 commits Aug 17, 2019
Co-Authored-By: Nicolas Luck <nicolas@lucksus.eu>
@willemolding

This comment has been minimized.

Copy link
Contributor Author

willemolding commented Aug 20, 2019

Hm, I've run app-spec on this several times now locally and I have seen various different tests fail. Definitely flaky. I'm running develop right now to compare against the overall n3h flakyness which seemed to be very low recently.

Mainly CRUD tests were failing and I could tell from the logs that there were inconsistencies about the entries being held by the three test agents.

I'm suspecting the following:

* With header publishing, each `Commit` results in two `Hold`s now

* The conistency model and hachiko only wait for one `Hold` per agent after each commit

* Holding the header is regarded (by hachiko) as holding the entry and it stops waiting

* => following test assertions fail non-deterministically dependent on timings

Is that plausible, @maackle, @willemolding?

I'm not sure if this is quite right. The commit workflow will result in a Publish and PublishHeaderEntry action. This will result in two Hold actions but these will have contain different entries (the actual entry and the headerEntry) and so shouldn't confuse the waiter any more than other unrelated entries do right?

The last part I guess really depends on how the waiter is working and I think I need to dig a bit deeper here.

@zippy

This comment has been minimized.

Copy link
Member

zippy commented Sep 10, 2019

@lucksus and I were talking, and it seems that the reason agent ID was removed from publish initially is because that should now be handled by the lib3h GetAuthoringEntryList event that happens. That's how that entry should get initially disseminated

…ature-add-header-publishing
fmt
@willemolding willemolding merged commit 608f1ee into develop Sep 26, 2019
6 of 7 checks passed
6 of 7 checks passed
continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
ci/circleci: app-spec-tests Your tests passed on CircleCI!
Details
ci/circleci: build Your tests passed on CircleCI!
Details
ci/circleci: cli-tests Your tests passed on CircleCI!
Details
ci/circleci: cluster-tests Your tests passed on CircleCI!
Details
ci/circleci: fmt Your tests passed on CircleCI!
Details
ci/circleci: wasm-conductor-tests Your tests passed on CircleCI!
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.