Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Networking bugs & correct network configuration for v0.0.32-alpha2 #6

Closed
pospi opened this issue Oct 17, 2019 · 9 comments
Closed

Networking bugs & correct network configuration for v0.0.32-alpha2 #6

pospi opened this issue Oct 17, 2019 · 9 comments

Comments

@pospi
Copy link

pospi commented Oct 17, 2019

I've recently started updating my Diorama tests to Try-o-rama. Operations which previously ran fine are now hanging the backend.

Reproducible on both 0.1.2-beta.4 and 0.1.1-beta.1. In both cases I have been using core v0.0.32-alpha2.

It appears as though this might be due to the status of networking support? These are the results I'm getting:

With n3h network configuration provided to Config.genConfig and using "n3h": "github:holochain/n3h#0.0.20-alpha" as a dependency, I get an internal crash:

☯☯☯ [[[CONDUCTOR alice]]]
☯ response: Err(ErrorMessage { msg: "bad exit Some(1) \"internal/modules/cjs/loader.js:857\\n  return process.dlopen(module, path.toNamespacedPath(filename));\\n                 ^\\n\\nError: /home/pospi/projects/holo-rea/oce-holo/node_modules/better-sqlite3/build/better_sqlite3.node: undefined symbol: _ZN2v816FunctionTemplate3NewEPNS_7IsolateEPFvRKNS_20FunctionCallbackInfoINS_5ValueEEEENS_5LocalIS4_EENSA_INS_9SignatureEEEiNS_19ConstructorBehaviorE\\n    at Object.Module._extensions..node (internal/modules/cjs/loader.js:857:18)\\n    at Module.load (internal/modules/cjs/loader.js:685:32)\\n    at Function.Module._load (internal/modules/cjs/loader.js:620:12)\\n    at Module.require (internal/modules/cjs/loader.js:723:19)\\n    at require (internal/modules/cjs/helpers.js:14:16)\\n    at Object.<anonymous> (/home/pospi/projects/holo-rea/oce-holo/node_modules/better-sqlite3/lib/database.js:5:21)\\n    at Module._compile (internal/modules/cjs/loader.js:816:30)\\n    at Object.Module._extensions..js (internal/modules/cjs/loader.js:827:10)\\n    at Module.load (internal/modules/cjs/loader.js:685:32)\\n    at Function.Module._load (internal/modules/cjs/loader.js:620:12)\\n\"" }

With lib3h or websocket networks configured, the configuration is rejected:

★★★ [[[CONDUCTOR alice]]]
★ Error while trying to boot from config: IoError("Error loading configuration: unknown variant `websocket`, expected one of `n3h`, `lib3h`, `memory`, `sim1h` for key `network.type`")

With sim1h or sim2h networks configured, it looks like the config is rejected further down the stack:

14:58:35 [try-o-rama] error: Tried to use an invalid value for a complex type and found the following problems:
    - Expecting "n3h" at 0.0.network.0.0 but instead got: "sim1h".
    - Expecting "memory" at 0.0.network.0.1 but instead got: "sim1h".
    - Expecting "websocket" at 0.0.network.0.2 but instead got: "sim1h".
    - Expecting { [K in string]: any } at 0.0.network.1 but instead got: "sim1h".

With memory network configured, the conductor appears unable to determine when a call through the DNA has run to completion. I just end up getting zome call timed out after 90 seconds on conductor 'alice' and a failure of the first request in the test scenario. When I enable detailed logging, it looks as though it's looping infinitely through this sequence:

☯ DEBUG 2019-10-17 14:24:29 [lib3h::gateway::gateway_transport] net_worker_thread/puid-5-2f /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/lib3h-0.0.13/src/gateway/gateway_transport.rs:231 gateway_transport: SendMessage, first resolving address Lib3hUri("transportid:HcMcJmwRIZE33p6vee3cXt99rPGUPzospf7cBvn8aarb3znyFx6dNpmVH54tigz")
☯ DEBUG 2019-10-17 14:24:29 [lib3h::engine::network_layer] net_worker_thread/puid-12-2d /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/lib3h-0.0.13/src/engine/network_layer.rs:117 mem-agent-puid-13-0 << handle_network_transport_request: ReceivedData { uri: Lib3hUri("mem://addr_1/"), payload: "�\u{0}�����app_spec_memory��Ktransportid:HcMcJmwRIZE33p6vee3cXt99rPGUPzospf7cBvn8aarb3znyFx6dNpmVH54tigz��Ktransportid:HcMcJ77qfoxbNxgt5hiOhQXTuNu634pc9bbKcrDbWs9du8ghQGRgMVeb4wagaui�\u{650}kgGRk5HZS3RyYW5zcG9ydGlkOkhjTWNKNzdxZm94Yk54Z3Q1aGlPaFFYVHVOdTYzNHBjOWJiS2NyRGJXczlkdThnaFFHUmdNVmViNHdhZ2F1aZGtbWVtOi8vYWRkcl8xL88AAAFt1/WQTQ==" }
☯ 
14:24:29 error: 
☸☸☸ {{{CONDUCTOR alice}}}
☸ DEBUG 2019-10-17 14:24:29 [lib3h::engine::network_layer] net_worker_thread/puid-12-2d /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/lib3h-0.0.13/src/engine/network_layer.rs:140 Received message from: mem://addr_1/ | size: 325
☸ DEBUG 2019-10-17 14:24:29 [lib3h::engine::network_layer] net_worker_thread/puid-12-2d /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/lib3h-0.0.13/src/engine/network_layer.rs:58 mem-agent-puid-13-0 << handle_network_dht_request: GossipTo(GossipToData { peer_name_list: [Lib3hUri("transportid:HcMcJ77qfoxbNxgt5hiOhQXTuNu634pc9bbKcrDbWs9du8ghQGRgMVeb4wagaui")], bundle: "�\u{1}����Ktransportid:HcMcJmwRIZE33p6vee3cXt99rPGUPzospf7cBvn8aarb3znyFx6dNpmVH54tigz��mem://addr_2/�\u{0}\u{0}\u{1}m���a" })
☸ DEBUG 2019-10-17 14:24:29 [lib3h::engine::ghost_engine] net_worker_thread/puid-12-2d /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/lib3h-0.0.13/src/engine/ghost_engine.rs:727 (app_spec_memory) handle_GossipTo: GossipToData { peer_name_list: [Lib3hUri("transportid:HcMcJ77qfoxbNxgt5hiOhQXTuNu634pc9bbKcrDbWs9du8ghQGRgMVeb4wagaui")], bundle: "�\u{1}����Ktransportid:HcMcJmwRIZE33p6vee3cXt99rPGUPzospf7cBvn8aarb3znyFx6dNpmVH54tigz��mem://addr_2/�\u{0}\u{0}\u{1}m���a" }
☸ DEBUG 2019-10-17 14:24:29 [lib3h::gateway::gateway_transport] net_worker_thread/puid-12-2d /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/lib3h-0.0.13/src/gateway/gateway_transport.rs:231 gateway_transport: SendMessage, first resolving address Lib3hUri("transportid:HcMcJ77qfoxbNxgt5hiOhQXTuNu634pc9bbKcrDbWs9du8ghQGRgMVeb4wagaui")
☸ DEBUG 2019-10-17 14:24:29 [lib3h::engine::network_layer] net_worker_thread/puid-5-2f /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/lib3h-0.0.13/src/engine/network_layer.rs:117 mem-agent-puid-6-0 << handle_network_transport_request: ReceivedData { uri: Lib3hUri("mem://addr_2/"), payload: "�\u{0}�����app_spec_memory��Ktransportid:HcMcJ77qfoxbNxgt5hiOhQXTuNu634pc9bbKcrDbWs9du8ghQGRgMVeb4wagaui��Ktransportid:HcMcJmwRIZE33p6vee3cXt99rPGUPzospf7cBvn8aarb3znyFx6dNpmVH54tigz�\u{650}kgGRk5HZS3RyYW5zcG9ydGlkOkhjTWNKbXdSSVpFMzNwNnZlZTNjWHQ5OXJQR1VQem9zcGY3Y0J2bjhhYXJiM3pueUZ4NmROcG1WSDU0dGlnepGtbWVtOi8vYWRkcl8yL88AAAFt1/WYYQ==" }
☸ DEBUG 2019-10-17 14:24:29 [lib3h::engine::network_layer] net_worker_thread/puid-5-2f /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/lib3h-0.0.13/src/engine/network_layer.rs:140 Received message from: mem://addr_2/ | size: 325
☸ DEBUG 2019-10-17 14:24:29 [holochain_core_types::sync] hc_guard_watcher/puid-0-0 core_types/src/sync.rs:152 tracking 1 active guard(s) alive for > 500ms:
☸ KIND       PUID      ELAPSED (ms)
☸ Read   puid-21-0         8393
☸ DEBUG 2019-10-17 14:24:29 [lib3h::engine::network_layer] net_worker_thread/puid-5-2f /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/lib3h-0.0.13/src/engine/network_layer.rs:58 mem-agent-puid-6-0 << handle_network_dht_request: GossipTo(GossipToData { peer_name_list: [Lib3hUri("transportid:HcMcJmwRIZE33p6vee3cXt99rPGUPzospf7cBvn8aarb3znyFx6dNpmVH54tigz")], bundle: "�\u{1}����Ktransportid:HcMcJ77qfoxbNxgt5hiOhQXTuNu634pc9bbKcrDbWs9du8ghQGRgMVeb4wagaui��mem://addr_1/�\u{0}\u{0}\u{1}m���M" })
☸ DEBUG 2019-10-17 14:24:29 [lib3h::engine::ghost_engine] net_worker_thread/puid-5-2f /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/lib3h-0.0.13/src/engine/ghost_engine.rs:727 (app_spec_memory) handle_GossipTo: GossipToData { peer_name_list: [Lib3hUri("transportid:HcMcJmwRIZE33p6vee3cXt99rPGUPzospf7cBvn8aarb3znyFx6dNpmVH54tigz")], bundle: "�\u{1}����Ktransportid:HcMcJ77qfoxbNxgt5hiOhQXTuNu634pc9bbKcrDbWs9du8ghQGRgMVeb4wagaui��mem://addr_1/�\u{0}\u{0}\u{1}m���M" }

...from here, it starts back at net_worker_thread/puid-5-2f attempting to resolve for Lib3hUri("transportid:HcMcJmwRIZE33p6vee3cXt99rPGUPzospf7cBvn8aarb3znyFx6dNpmVH54tigz"). To me that seems indicative of gossip payloads endlessly cycling back and forth, and never being interpreted as final?

So I guess sub-questions for this issue are:

  • Which networking implementation should I be using with Try-o-rama, and are there dependencies required to set it up?
    • If so, can I request a Holonix configuration to include those dependencies in an app? (CC @thedavidmeister)
  • Am I doing anything wrong in my tests (see link @ start of post) or orchestrator configuration which might be responsible for these hangs, or do I just need to wait for fixes here?
@maackle
Copy link
Member

maackle commented Oct 17, 2019

Probably best to stick to n3h if it works. It's the only one that has a chance of working with zero config. i.e., set globalOptions: {network: 'n3h'} in the orchestrator config

The error you show above for n3h has to do with some native node deps, in particular:

Error: /home/pospi/projects/holo-rea/oce-holo/node_modules/better-sqlite3/build/better_sqlite3.node: undefined symbol: _ZN2v816FunctionTemplate3NewEPNS_7IsolateEPFvRKNS_20FunctionCallbackInfoINS_5ValueEEEENS_5LocalIS4_EENSA_INS_9SignatureEEEiNS_19ConstructorBehaviorE

Can you try running it in a holonix nix shell? That might help. No need to include n3h as a dep by the way.

For the other configs:

  • memory will only work with singleConductorMiddleware (importable from try-o-rama) since memory networking can't connect separate conductor processes. Not recommended at the moment.
  • websocket just plain doesn't work yet
  • sim1h will work if you also spin up a local dynamodb instance (you can run dynamodb-memory from a holochain-rust nix-shell, but that's kind of a pain if you don't already have holochain-rust cloned)
  • sim2h won't even work at the moment, because support for it hasn't made it into a holochain release yet (and you'll still have to run a local server for it)

sim1h is the most reliable, n3h is the only no-config option that may work.

bear in mind that the network config can be a string or an object with a "type" key. The former is a shortcut. So both of these are valid config values:

globalConfig: {network: 'n3h'}

and

globalConfig: {network: {type: 'sim1h', dynamo_url: 'http://localhost:9000'}}

(there is no string-only config for sim1h, only for n3h, memory, and websocket)

@pospi
Copy link
Author

pospi commented Oct 18, 2019

I am running from within Nix- see https://github.com/holo-rea/holo-rea/blob/master/default.nix

It looks like n3h isn't present in the latest Holonix releases? :/

10:08:25 info: 
☯☯☯ [[[CONDUCTOR alice]]]
☯ response: Err(ErrorMessage { msg: "Failed to execute \"n3h\" \"--version\": Os { code: 2, kind: NotFound, message: \"No such file or directory\" }" }
☯ 
☯ 
10:08:25 info: 
☸☸☸ [[[CONDUCTOR alice]]]
☸ stack backtrace:
☸    0: <failure::backtrace::Backtrace as core::default::Default>::default::hf47ec305011f6e9a (0x5569b349e5ff)
☸    1: holochain_net::ipc::n3h::sub_check_n3h_version::h2ab800f5bc859471 (0x5569b2f5e7ea)
☸    2: holochain_net::ipc::n3h::check_n3h_version::h13c40df9f85b4ff1 (0x5569b2f5cc43)
☸    3: holochain_net::ipc::n3h::get_verify_n3h::h25bfa3c137243952 (0x5569b2f58956)
☸    4: holochain_net::ipc::spawn::ipc_spawn::h39c6fce7925fc2ea (0x5569b2f5f94d)
☸    5: holochain_conductor_api::conductor::base::Conductor::boot_from_config::hfd0813cc0f559a5f (0x5569b2c01134)
☸    6: holochain::main::h42015483a858f682 (0x5569b36e2ec4)
☸    7: std::rt::lang_start::{{closure}}::hac2398bb76549806 (0x5569b36f02a3)
☸    8: main (0x5569b36dec8e)
☸    9: __libc_start_main (0x7f03d1fafb8e)
☸   10: _start (0x5569b28ef28a)
☸   11: <unknown> (0x0))
☸ 
10:08:25 info: 
☮☮☮ [[[CONDUCTOR alice]]]
☮ response: Err(ErrorMessage { msg: "Failed to execute \"n3h\" \"--appimage-extract-and-run\" \"--version\": Os { code: 2, kind: NotFound, message: \"No such file or directory\" }" }
☮ 
☮ stack backtrace:
☮    0: <failure::backtrace::Backtrace as core::default::Default>::default::hf47ec305011f6e9a (0x5569b349e5ff)
☮    1: holochain_net::ipc::n3h::sub_check_n3h_version::h2ab800f5bc859471 (0x5569b2f5e7ea)
☮ 
10:08:25 info: 
☉☉☉ [[[CONDUCTOR alice]]]
☉    2: holochain_net::ipc::n3h::check_n3h_version::h13c40df9f85b4ff1 (0x5569b2f5cd67)
☉    3: holochain_net::ipc::n3h::get_verify_n3h::h25bfa3c137243952 (0x5569b2f58956)
☉    4: holochain_net::ipc::spawn::ipc_spawn::h39c6fce7925fc2ea (0x5569b2f5f94d)
☉    5: holochain_conductor_api::conductor::base::Conductor::boot_from_config::hfd0813cc0f559a5f (0x5569b2c01134)
☉    6: holochain::main::h42015483a858f682 (0x5569b36e2ec4)
☉    7: std::rt::lang_start::{{closure}}::hac2398bb76549806 (0x5569b36f02a3)
☉    8: main (0x5569b36dec8e)
☉    9: __libc_start_main (0x7f03d1fafb8e)
☉   10: _start (0x5569b28ef28a)
☉   11: <unknown> (0x0))
☉ 

@maackle
Copy link
Member

maackle commented Oct 18, 2019

@pospi those aren't actual problems. If there is no n3h available, it will be downloaded from the internet. But not before logging that unsettling error.

@pospi
Copy link
Author

pospi commented Oct 18, 2019

If there is no n3h available, it will be downloaded from the internet.

Hmm... not for me. Am I supposed to add it as a dependency of my test package's package.json?

@pospi
Copy link
Author

pospi commented Oct 18, 2019

No, wait. Cleared modules, re-added "n3h": "github:holochain/n3h#0.0.20-alpha" as a dependency, seems to be chewing through tests now.

Maybe worth adding n3h to peerDependencies? Is that still a thing?

@pospi
Copy link
Author

pospi commented Oct 18, 2019

Not major, but I am also getting a heap of n3h debug output in my logs, even with logger: false. It begins with the following comment and is quite noisy-

★ ##############################################
★ # BEGIN N3H DEBUG SNAPSHOT DUMP
★ ##############################################

@pospi pospi changed the title Tests hang when writing to DNA Networking bugs & correct network configuration for v0.0.32-alpha2 Oct 18, 2019
@maackle
Copy link
Member

maackle commented Oct 18, 2019

@pospi yeah I was annoyed by that too. I think the latest n3h release contains this fix: holochain/n3h#114 which turns off the debug snapshot when N3H_QUIET=1 env var is set

@thedavidmeister
Copy link

@pospi @maackle next holonix release that passes tests should have a working sim2h_server binary in it, so you can run your own for testing

@pospi
Copy link
Author

pospi commented Jan 3, 2020

It looks like we can mostly close this issue... there isn't a solution yet, but this whole API is going to change in 0.3.x anyway and I'm confident the issues outlined here will be fixed with the new networking backends.

@pospi pospi closed this as completed Jan 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants