First test run usually fails #76

hannobraun · 2020-09-18T10:05:02Z

After connecting the target/assistant and flashing the firmware, the first run of the test suite usually fails.

Output from an example run:

running 2 tests
test it_should_read_input_level ... FAILED
test it_should_set_pin_level ... FAILED

failures:

---- it_should_read_input_level stdout ----
Error: TargetPinRead(Timeout)
thread 'it_should_read_input_level' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`: the test returned a termination value with a non-zero status code (1) which indicates a failure', /home/hanno/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/macros.rs:16:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- it_should_set_pin_level stdout ----
Error: AssistantPinRead(Timeout)
thread 'it_should_set_pin_level' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`: the test returned a termination value with a non-zero status code (1) which indicates a failure', /home/hanno/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/macros.rs:16:9


failures:
    it_should_read_input_level
    it_should_set_pin_level

test result: FAILED. 0 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out

Log output of the assistant:

11:57:54.184 Starting assistant.
11:58:06.688 panicked at 'Error receiving from USART0: Usart(Framing)', src/main.rs:564:14

The target shows no sign of problems.

Doing a reset of the target and assistant using the reset button on the boards fixes the problem for me. Subsequent test runs are 100% reliable, as far as I can tell.

I've been aware of this problem for a while, but initially assumed that it is a problem with my machine (at the same time, I started seeing weird USB issues which have since gone away). I've received confirmation that others are having the same problem, so its definitely not something specific to me. If I recall directly, I was doing work on the USART API in LPC8xx HAL when I first saw this, so this might be an issue I introduced there.

The text was updated successfully, but these errors were encountered:

hannobraun · 2020-09-30T13:51:06Z

Here's a screenshot from PulseView that captures the problem:

Here's another version, but this time the overall communication is less garbled. You can clearly see the 4-1-1-1-0 sequence, which is a valid message:

For comparison, here's how it looks when it's working:

Observations:

There's a break condition on the line, which causes the framing error on the assistant.
The break occurs when opening the serial port, but only on the first open after connecting the USB cables.
Resetting the board makes no difference here. If I reset it after connecting and before running the test suite, the break condition still occurs. If I don't reset the board after it ocured once, the host will send valid data on subsequent test runs.
Sending data is not required to cause the break condition.

Here's the output, when I quit the test process right after opening the serial port:

I haven't figure out yet what causes this, but it seems to be some weird thing caused on the host side. I'm wondering what this looks like on another operating system (I'm seeing this on Arch Linux).

hannobraun · 2020-10-02T12:45:24Z

Some more notes.

First, I've asked myself if cargo-embed (or the act of flashing the boards in general) could somehow trigger this issue.

So far, after the error occurred, I've always reset the boards using the reset buttons to work around that (after that, the test runs work). I've tried to re-run cargo embed for both boards instead, which also clears the error (subsequent test runs work). This seems to indicate that cargo-embed doesn't trigger the error.
However, I've also tried unplugging the boards after the failure and plugging them in again. That works just as well as resetting them using the reset buttons.

So to summarize, after the error occurred:

If you reset the boards using the reset buttons, the error no longer occurs.
If you reset the boards by unplugging/re-plugging the USB cables, the error no longer occurs.
If you re-run cargo embed for both boards, the error no longer occurs.
But if you unplug/re-plug and run cargo embed for both boards, the error shows up again.

I don't know what to make of this.

Next, I've asked myself why the break condition occurs when communicating with the the assistant, not with the target.

If I change the order in which the serial ports are opened, it does not make a difference.
If I change the order in which the boards are connected (leading to switched assignment of /dev/ttyACM0//dev/ttyACM1 and update test-stand.toml accordingly, it does not make a difference.

I'm now starting to look into the test cases themselves, and if the order of operations there might affect how the error occurs.

hannobraun · 2020-10-06T12:27:38Z

This has gotten really weird :-)

So, the error is somehow caused by the code on the assistant that detects whether the RTS signal has changed. For some reason, it detects changes under the circumstances described above. And for some reason, it trying to notify the host about this, somehow causes the break condition.

My best guess is that the specific circumstances I described before cause electrical noise (and while it would be fascinating to dig into this, I don't think getting a degree in electrical engineering would be the best use of my time right now), that electrical noise causes the assistant to send messages to the host, and because all of this is going on while the host is connecting, it's too early for those messages, to which Linux reacts with the break condition. All of this is speculation, but it kind of makes sense.

I think the most practical way to solve this is to prevent the assistant from sending those messages. Either by somehow filtering out the noise, or by changing the protocol so that the host has to ask for changes, instead of the assistant just sending them.

hannobraun added the bug Something isn't working label Sep 18, 2020

hannobraun self-assigned this Sep 18, 2020

hannobraun closed this as completed in 7344dc3 Oct 7, 2020

hannobraun mentioned this issue Nov 20, 2020

STM32L4: First USART test after connecting boards often fails #93

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First test run usually fails #76

First test run usually fails #76

hannobraun commented Sep 18, 2020

hannobraun commented Sep 30, 2020

hannobraun commented Oct 2, 2020

hannobraun commented Oct 6, 2020

First test run usually fails #76

First test run usually fails #76

Comments

hannobraun commented Sep 18, 2020

hannobraun commented Sep 30, 2020

hannobraun commented Oct 2, 2020

hannobraun commented Oct 6, 2020