Skip to content
This repository has been archived by the owner on Feb 3, 2023. It is now read-only.

SPIKE: 2019 01 04 codecov returns #813

Closed
wants to merge 96 commits into from

Conversation

thedavidmeister
Copy link
Contributor

@thedavidmeister thedavidmeister commented Jan 4, 2019

progress report/squirl 2019-01-07

deeper rabbit hole than expected, originally was thinking it would be a simple matter of dependency hell then pinning some versions

something that has changed recently, either in tarpaulin or in our code, is changing the way that state setup/teardown is treated

  • current rust nightly is 2018-12-26
  • only version of tarpaulin that supports this nightly is 0.6.11

originally we had a problem due to nightly features and transitive dependencies, which is why we disabled tarpaulin: xd009642/tarpaulin#179

the transitive deps seem to be resolved

3 new issues appeared:

  • instrumentation clashes reported by the compiler
  • segfaults
  • bad context state access

instrumentation clashes

this appears to be caused by incremental compilation

RN the only solution is to disable incremental compilation, or accept the clashes

apparently the clashes aren't fatal, they indicate potentially inaccurate measurements due to LOC mappings being off, no idea how major/minor the problem might be (didn't get that far tbh)

the instrumentation clash warnings may not be errors as such, a multiple lines of code may map to the same addresses and that just warns on occurrences of that which may affect the accuracy of results

xd009642/tarpaulin#190 (comment)

segfaults

dunno, seeking help upstream xd009642/tarpaulin#190

I hope it has nothing to do with zmq pytorch/pytorch#2507

definitely never seen segfaults in anything else i've done with rust, but here's a guide that looks helpful to debug https://jvns.ca/blog/2017/12/23/segfault-debugging/

bad context state access

this is the bit i think we need to clean up first

actually i'm not sure exactly why our regular cargo test runs aren't failing

what i see is that we initialise a lot of "live" things (loops n' channels n' globals n' shit) in our tests into a "sort of" initialised state, then throw logic at them that assumes something fully initialised - these tests should fail, tarpaulin is doing the correct thing here i believe

some examples of what i mean:

  • Context::new creates None action channels, then reducers unwrap() action channels
  • the "mock" network has live ticks that can panic if the context state isn't ready for network logic
  • the action channels also have live ticks that can panic if the context state isn't ready for actions

this becomes extra gnarly to debug when we throw concurrency and inability to cleanly tear down our own state into the mix (i guess cargo test is forcefully dropping our state for us "fast enough" to prevent it blowing up, while tarpaulin keeps event loops "long enough" to panic 🤷‍♂️ )

on circle, and locally when running hc-tarpaulin i can setup a live ticking network that allows the test it was intended for to pass, then see that same ticking network blow up a None action loop 5 tests later because the context wasn't torn down properly in the original test (or is there some global state being reused here? 🤔 ) - i tested this by creating a mock network with no-ops to the test that died on a network tick

these are pretty common problems and we identified them a while ago, but now it's blocking us for whatever reason

proposal

i'm hoping none of this is controversial... i see this as a relatively straightforward tech debt cleanup thing

real question is when/how to prioritise it considering it seems to be a requirement for getting codecov back up and running (unless we get kcov working, that is also a potential alternative but a lot more complex from an ops perspective because it relies on kcov being installed locally correctly)

  • split everything cleanly into unit tests and integration/scenario tests
    • unit tests are stateless and make up 90% of our coverage
      • fast, reliable, focused, deterministic, etc.
      • is what TDD is talking about
      • use real mocks/fakes/stubs or whatever we want to call it https://en.wikipedia.org/wiki/Mock_object
      • deterministic, stateless, presents only the interface to what we are trying to test (no internals)
      • simplest example is implementing traits (e.g. NetWorker) where every method is a no-op, so we can satisfy the compiler when we're not actively testing that trait (e.g. not testing the network)
      • a mock context would be great, simulating the whole state but always returning known test data but every method is a no-op or at least free of side effects
    • integration/scenario tests may have state but must cleanly teardown any/all state and be minority of tests overall
      • slow, stateful, complex, high maintenance overhead over time
      • "shotgun" approach to testing, often overestimates codecov etc. simply by booting (often test frameworks allow integration tests to be ignored for codecov calculations)
      • harder to make deterministic
      • often need to block/wait on a lot of moving parts to progress to the next step
      • a little goes a long way https://testing.googleblog.com/2015/04/just-say-no-to-more-end-to-end-tests.html
      • clean state teardown is surprisingly not easy with low level rust primitives only (e.g. Arc) because you can't go "the other way" to trigger a teardown for everything holding a reference
      • should use fully initialised state to avoid false positive/negatives caused by simulating impossible states while missing real-world behaviour (e.g. don't have a live network ticking over without an action channel set 😅 ), we shouldn't write tests against things we know can't happen and should be writing tests for things we know will happen
      • no shared/global state for anything that can be setup/torn down between tests!
  • gracefully handle the case where optional state is None without panicking
    • e.g. the unwrap at the top of get_entry_from_dht is pointing to an action channel that defaults to None
    • pretty much any unwrap in the codebase that is pointing to a stateful Option should be treated with suspicion at this point
    • this goes the other way too, if we're unwrap()ing something, is it really optional?
  • deterministic tests
    • if the tests were deterministic, changing the testing harness would not break the test logic

prior art

@codecov
Copy link

codecov bot commented Jan 4, 2019

Codecov Report

❗ No coverage uploaded for pull request base (develop@0887587). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##             develop     #813   +/-   ##
==========================================
  Coverage           ?   64.05%           
==========================================
  Files              ?      145           
  Lines              ?     5003           
  Branches           ?        0           
==========================================
  Hits               ?     3204           
  Misses             ?     1799           
  Partials           ?        0
Impacted Files Coverage Δ
net_connection/src/protocol.rs 61.91% <ø> (ø)
core/src/network/reducers/get_entry.rs 69.57% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0887587...2c5126a. Read the comment docs.

@codecov
Copy link

codecov bot commented Jan 4, 2019

Codecov Report

❗ No coverage uploaded for pull request base (develop@cd80cff). Click here to learn what that means.
The diff coverage is 34.62%.

Impacted file tree graph

@@            Coverage Diff             @@
##             develop     #813   +/-   ##
==========================================
  Coverage           ?   67.56%           
==========================================
  Files              ?      154           
  Lines              ?     5490           
  Branches           ?        0           
==========================================
  Hits               ?     3709           
  Misses             ?     1781           
  Partials           ?        0
Impacted Files Coverage Δ
cmd/src/cli/test_context.rs 0% <ø> (ø)
container_api/src/context_builder.rs 92.86% <ø> (ø)
core/src/network/reducers/get_entry.rs 65.39% <ø> (ø)
net/src/mock_worker.rs 77.64% <ø> (ø)
core/src/nucleus/ribosome/callback/genesis.rs 100% <ø> (ø)
core/src/dht/actions/add_link.rs 76.93% <ø> (ø)
core/src/network/reducers/send_direct_message.rs 60% <ø> (ø)
core/src/network/reducers/get_links.rs 72% <ø> (ø)
core/src/network/reducers/publish.rs 47.68% <ø> (ø)
net/src/p2p_config.rs 27.59% <0%> (ø)
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cd80cff...115271e. Read the comment docs.

@thedavidmeister thedavidmeister changed the title WIP: 2019 01 04 codecov returns 2019 01 04 codecov returns Jan 4, 2019
@thedavidmeister
Copy link
Contributor Author

seeing this local:

.......................................thread '<unnamed>' panicked at 'Action channel not initialized', src/libcore/option.rs:1008:5
free(): invalid pointer
Error a segfault occured when executing test

first hit on google is something about zmq, hope that's a red herring -_-

pytorch/pytorch#2507

@thedavidmeister thedavidmeister changed the title WIP: 2019 01 04 codecov returns SPIKE: 2019 01 04 codecov returns Jan 7, 2019
@maackle
Copy link
Collaborator

maackle commented Jan 7, 2019

I have some more general thoughts I'd like to share, but I'll do that in a separate comment after this.

To reply to the immediate concerns here, I do feel it's very important to be able to shut things down completely, especially for testing. We're building a creature that is supposed to easily glom onto other similar creatures and start talking to them as soon as they're visible over a resilient, unenclosable network. That's exactly what we don't want for isolated tests! It seems we may already be running into some issues of lingering networks bleeding into each other, and it's only going to get worse.

The two ways that tests can interfere with each other are across time (the next test starts before the last one's network stopped) and across threads. To fully remedy that interference, we need

  • a way to shut down everything (partial work done in WIP: Proper shutdown of network and instances when stopping Container #803)
  • a way to make networks reliably independent per test case
    • this is partially handled by the practice of jostling the DNA UUID per test, but we currently run all mock networks on the same MockSingleton so that's a further way for them to be coupled. Add "network name" as mock network config #820 allows for separate mock networks but is not yet merged. It's not the best because it still depends on the user to make the networks unique by choosing a unique string, but at least this makes it possible.

@maackle
Copy link
Collaborator

maackle commented Jan 7, 2019

As promised, here are a lot more thoughts. TL;DR, this is all stuff that I think will help us prevent more of the kinds of problems in @thedavidmeister's report, but most of it is not immediately actionable in that we could use it to fix these tests in the next few weeks.

clean state teardown is surprisingly not easy with low level rust primitives only (e.g. Arc) because you can't go "the other way" to trigger a teardown for everything holding a reference

It's true, my PR for shutdown behavior was an attempt to shut things down "pretty well", but it's hard to be confident that it's a 100% clean shutdown as soon as we introduce Cloneable references like Arc and (indirectly in networking) Sender. Sometimes these are unavoidable but sometimes they are. In any case we can't be compiler-sure of shutdown unless we can actually deallocate and drop an entity, which is not possible for these cloneable references. The system is not super duper complex yet so someone who knows the architecture in and out could probably write some solid shutdown functionality (without the help of the compiler), but I'm not that person yet.

As things get more complex that shutdown ability might get hairy to maintain. @thedavidmeister and I talked about writing lifecycle management tools eventually. As soon as things get hairy (is that now?) we might consider that.

pretty much any unwrap in the codebase that is pointing to a stateful Option should be treated with suspicion at this point

I actually secretly spent an evening last month trying to see if I could refactor Context to remove state and manually inject the state in places where it's actually needed. It became clear that it's a huge undertaking; the state Arc in Context is deeply rooted. My takeaway was that State and Context are circularly dependent on each other, and that State doesn't really belong in Context.

Everything I've said so far points to a strong lesson I've learned from working with Rust and other languages: when working with a strong type system, trust the type system! If something turns out to be awkward to express, or feels downright wrong (many Option::unwraps), it's easy to blame it on the language and find a way to work around it more conveniently or comfortably. But really those difficulties are a blessing, they give us a chance to rethink the design to make it less stateful and prone to runtime inconsistencies. Rust has been hard to learn, it is a shame if we don't fully leverage its power. It slows us down, but in a good way (unless we don't heed its advice, and then it's just plain slow for no reason).

For instance, any time we have a type that contains an Option that's not really optional, we introduce a possible invalid state. One simple way we can avoid that is by splitting the type into two separate types, for instance Context and ContextWithState, so that we are forced use the one we need, and then all other code will have to reflect those two possible meanings of Context, which is more explicit and verbose, but also perfectly illustrates exactly what's going on with no question. Sometimes by restructuring things it falls out even more cleanly than that. If you look at paritytech's jsonrpc code, they make extensive use of the type system, doing stuff I can barely understand, to make strong compiler guarantees.

With well-placed use of types we can make invalid states uncompilable and hence impossible. That's so much more powerful than just being careful that we always initialize things in the right order. The closer we can get to this ideal the more resilient the code will be, the less we'll have to rely on integration tests, and the more we'll be able to onboard new people more quickly and collaborate in general.

I know a lot of code was written and established while still learning Rust, when it was hard to even get anything to compile. Maybe y'all have learned similar lessons already. I just wanted to say it explicitly even if we all know it now.

This has been a bit of a rant about something seemingly tangential, but I say it to back up @thedavidmeister's point that we could stand to be more disciplined and rigorous in testing. If we're also more rigorous in our use of the type system, we can greatly reduce the number of moving parts and our tests will cover more of the actual space of possible outcomes. If we do it to the utmost, we won't have to worry about whether something is partially or fully or incorrectly initialized; if the object exists then it is good to go. Then we can focus more on unit tests and all the logic the compiler can't check.

real question is when/how to prioritise it considering it seems to be a requirement for getting codecov back up and running

Yes, this is the real question. Most of what I suggested is more long-term and not things we can change in the code right now. I think in the short term, I think deterministic tests are high priority, and to get those we need clean shutdown and truly independent mock networks. As far as the correctness of state initialization, we just have to be careful, and moving forward, I'd like to see more of that carefulness offloaded to the type system.

@thedavidmeister
Copy link
Contributor Author

FYI all following this thread

there is some progress on the upstream segfaults:

xd009642/tarpaulin#190 (comment)

  • minimal repro identified
  • happens when there is more than one test with threads
  • probable cause identified through debug output

@zippy zippy added the upstream-action-needed upstream action need label May 6, 2019
@thedavidmeister
Copy link
Contributor Author

PR to just deal with segfaults and instrumentation issues in tarpaulin at #1405

@zippy zippy deleted the 2019-01-04-codecov-returns branch July 5, 2019 13:29
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
upstream-action-needed upstream action need
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants