Skip to content
This repository has been archived by the owner on Jun 13, 2023. It is now read-only.

First test run usually fails #76

Closed
hannobraun opened this issue Sep 18, 2020 · 3 comments
Closed

First test run usually fails #76

hannobraun opened this issue Sep 18, 2020 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@hannobraun
Copy link
Owner

After connecting the target/assistant and flashing the firmware, the first run of the test suite usually fails.

Output from an example run:

running 2 tests
test it_should_read_input_level ... FAILED
test it_should_set_pin_level ... FAILED

failures:

---- it_should_read_input_level stdout ----
Error: TargetPinRead(Timeout)
thread 'it_should_read_input_level' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`: the test returned a termination value with a non-zero status code (1) which indicates a failure', /home/hanno/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/macros.rs:16:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- it_should_set_pin_level stdout ----
Error: AssistantPinRead(Timeout)
thread 'it_should_set_pin_level' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`: the test returned a termination value with a non-zero status code (1) which indicates a failure', /home/hanno/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/macros.rs:16:9


failures:
    it_should_read_input_level
    it_should_set_pin_level

test result: FAILED. 0 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out

Log output of the assistant:

11:57:54.184 Starting assistant.
11:58:06.688 panicked at 'Error receiving from USART0: Usart(Framing)', src/main.rs:564:14

The target shows no sign of problems.

Doing a reset of the target and assistant using the reset button on the boards fixes the problem for me. Subsequent test runs are 100% reliable, as far as I can tell.

I've been aware of this problem for a while, but initially assumed that it is a problem with my machine (at the same time, I started seeing weird USB issues which have since gone away). I've received confirmation that others are having the same problem, so its definitely not something specific to me. If I recall directly, I was doing work on the USART API in LPC8xx HAL when I first saw this, so this might be an issue I introduced there.

@hannobraun hannobraun added the bug Something isn't working label Sep 18, 2020
@hannobraun hannobraun self-assigned this Sep 18, 2020
@hannobraun
Copy link
Owner Author

Here's a screenshot from PulseView that captures the problem:
pulse-view-error

Here's another version, but this time the overall communication is less garbled. You can clearly see the 4-1-1-1-0 sequence, which is a valid message:
pulse-view-error-less-garbled

For comparison, here's how it looks when it's working:
pulse-view-working

Observations:

  • There's a break condition on the line, which causes the framing error on the assistant.
  • The break occurs when opening the serial port, but only on the first open after connecting the USB cables.
  • Resetting the board makes no difference here. If I reset it after connecting and before running the test suite, the break condition still occurs. If I don't reset the board after it ocured once, the host will send valid data on subsequent test runs.
  • Sending data is not required to cause the break condition.

Here's the output, when I quit the test process right after opening the serial port:
pulse-view-open-only

I haven't figure out yet what causes this, but it seems to be some weird thing caused on the host side. I'm wondering what this looks like on another operating system (I'm seeing this on Arch Linux).

@hannobraun
Copy link
Owner Author

Some more notes.

First, I've asked myself if cargo-embed (or the act of flashing the boards in general) could somehow trigger this issue.

  • So far, after the error occurred, I've always reset the boards using the reset buttons to work around that (after that, the test runs work). I've tried to re-run cargo embed for both boards instead, which also clears the error (subsequent test runs work). This seems to indicate that cargo-embed doesn't trigger the error.
  • However, I've also tried unplugging the boards after the failure and plugging them in again. That works just as well as resetting them using the reset buttons.

So to summarize, after the error occurred:

  • If you reset the boards using the reset buttons, the error no longer occurs.
  • If you reset the boards by unplugging/re-plugging the USB cables, the error no longer occurs.
  • If you re-run cargo embed for both boards, the error no longer occurs.
  • But if you unplug/re-plug and run cargo embed for both boards, the error shows up again.

I don't know what to make of this.

Next, I've asked myself why the break condition occurs when communicating with the the assistant, not with the target.

  • If I change the order in which the serial ports are opened, it does not make a difference.
  • If I change the order in which the boards are connected (leading to switched assignment of /dev/ttyACM0//dev/ttyACM1 and update test-stand.toml accordingly, it does not make a difference.

I'm now starting to look into the test cases themselves, and if the order of operations there might affect how the error occurs.

@hannobraun
Copy link
Owner Author

This has gotten really weird :-)

So, the error is somehow caused by the code on the assistant that detects whether the RTS signal has changed. For some reason, it detects changes under the circumstances described above. And for some reason, it trying to notify the host about this, somehow causes the break condition.

My best guess is that the specific circumstances I described before cause electrical noise (and while it would be fascinating to dig into this, I don't think getting a degree in electrical engineering would be the best use of my time right now), that electrical noise causes the assistant to send messages to the host, and because all of this is going on while the host is connecting, it's too early for those messages, to which Linux reacts with the break condition. All of this is speculation, but it kind of makes sense.

I think the most practical way to solve this is to prevent the assistant from sending those messages. Either by somehow filtering out the noise, or by changing the protocol so that the host has to ask for changes, instead of the assistant just sending them.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant