Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Artemis ATP firmware compiled with Arduino_Apollo3 versions 2.x.x Hangs after about 1430 seconds (clock time) #388

Closed
nigelb opened this issue May 6, 2021 · 20 comments
Milestone

Comments

@nigelb
Copy link
Contributor

nigelb commented May 6, 2021

Board: Sparkfun RedBoard Artemis ATP
Arduino Version: 1.8.13
Sparkfun Apollo3 Boards version 2.0.6, 2.0.3, and I assume the versions in between. This issue does not occur in version 1.2.1.

With the following sketch (derived from this example):

#define BAUD 115200       // any number, common choices: 9600, 115200, 230400, 921600
#define CONFIG SERIAL_8N1 // a config value from HardwareSerial.h (defaults to SERIAL_8N1)
int count = 0;
void setup() {
  Serial.begin(BAUD); // set the baud rate with the begin() method
  Serial.println("\n\nApollo3 - Serial");
}

void loop() {
  Serial.print("Count: ");
  Serial.print(count++);
  Serial.print(" ");
  Serial.println(millis());
  delay(1000);
}

After uploading and running it hangs at step 1430 (or, occasionally 1431):

Apollo3 - Serial
Count: 0 2
Count: 1 1002
Count: 2 2003
Count: 3 3004
.
.
.
Count: 1427 1428428
Count: 1428 1429429
Count: 1429 1430430
Count: 1430 1431431

It seems to happen after a certain amount of time running because if I change the delay to 100 we get this:

Apollo3 - Serial
Count: 0 2
Count: 1 102
Count: 2 203
.
.
.
Count: 14171 1431272
Count: 14172 1431373
Count: 14173 1431474
Count: 14174 1431575

I have tested this on two different ATP boards and had a friend try on his as well.
This Issue occurred in all cases that was compiled with the V2 series Arduino_Apollo3.

@Wenn0101
Copy link
Contributor

Wenn0101 commented May 6, 2021

Thanks for the report. This sounds like a pretty big issue. Testing now.

@Wenn0101
Copy link
Contributor

Wenn0101 commented May 6, 2021

Im at 3300 right now on the latest release-candidate with no hang.... interesting....
I used your exact sketch.
I am switching now to strait 2.0.6.

Count: 3300 3303302
Count: 3301 3304303
Count: 3302 3305304
Count: 3303 3306305

@Wenn0101
Copy link
Contributor

Wenn0101 commented May 6, 2021

I am not getting this failure on v2.0.6

Count: 2741 2743742
Count: 2742 2744743
Count: 2743 2745744
Count: 2744 2746745

Lets try to find the difference in what I am doing vs you.

Board: Sparkfun RedBoard Artemis ATP
Arduino Version: 1.8.13
Sparkfun Apollo3 Boards version: 2.0.6
Operating System: Win10
Terminal Program: Arduino Serial Monitor

Is it possible you are having a problem with your serial monitor?

I am running this sketch unedited

#define BAUD 115200       // any number, common choices: 9600, 115200, 230400, 921600
#define CONFIG SERIAL_8N1 // a config value from HardwareSerial.h (defaults to SERIAL_8N1)
int count = 0;
void setup() {
  Serial.begin(BAUD); // set the baud rate with the begin() method
  Serial.println("\n\nApollo3 - Serial");
}

void loop() {
  Serial.print("Count: ");
  Serial.print(count++);
  Serial.print(" ");
  Serial.println(millis());
  delay(1000);
}

Were you having problems with a sketch that was doing something else/more. I know the first few revisions of v2 had a problem where after a certain number of BLE messages, the heap would fill and it would stop sending. Anything like that?

@paulvha
Copy link
Contributor

paulvha commented May 6, 2021

FYI: Just tried the same. no issues it keeps counting...

Board: Sparkfun RedBoard Artemis ATP
Arduino Version: 1.8.13
Sparkfun Apollo3 Boards version: 2.0.6
Operating System: Ubuntu 20.4
Terminal Program: Arduino Serial Monitor
Sketch: as was posted by nigelb (copy/paste/compile/run)

Count: 1754 1755756
Count: 1755 1756757
Count: 1756 1757758
Count: 1757 1758759

regards,
Paul

@Wenn0101
Copy link
Contributor

Wenn0101 commented May 6, 2021

Update:
Next day and I am still running

Count: 44615 44659616
Count: 44616 44660617
Count: 44617 44661618
Count: 44618 44662619
Count: 44619 44663620

@nigelb am still very interested in this, as it was reproducible between you and a friend. Let me know if you can figure out what we are doing different.

@nigelb
Copy link
Contributor Author

nigelb commented May 11, 2021

Hi All,

In my original post I did not mention that my friend and I are running Windows 10 and are using the Arduino Serial Monitor.

I just tested this issue on my Ubuntu PC:

Board: Sparkfun RedBoard Artemis ATP
Arduino Version: 1.8.13
Sparkfun Apollo3 Boards version: 2.0.6
Operating System: Ubuntu 20.04
Terminal Program: Arduino Serial Monitor

and after compiling and flashing the ATP it did not hang:

.
.
.
Count: 1759 1760760
Count: 1760 1761761
Count: 1761 1762762
Count: 1762 1763763
Count: 1763 1764764
Count: 1764 1765765

Surprisingly the MD5 sum of the firmwares compiled on my Win10 and Linux machines matched:
Windows:

$ md5sum.exe Modified_Serial_2.ino.bin
0f61028df18d8832479b2c49fcb8068a *Modified_Serial.ino.bin

Linux:

$ md5sum Modified_Serial_2.ino.bin
0f61028df18d8832479b2c49fcb8068a  Modified_Serial.ino.bin

So I copied the firmware compile on my Win10 machine over to my Linux machine and flashed it onto my ATP board:

$ /home/user/.arduino15/packages/SparkFun/hardware/apollo3/2.0.6/tools/uploaders/svl/dist/linux/svl /dev/ttyUSB0 -f /home/user/Modified_Serial.ino.bin -b 921600 -v 


Artemis SVL Bootloader
Script version 1.7

Phase:	Setup
	Cleared startup blip
	Got SVL Bootloader Version: 5
	Sending 'enter bootloader' command

Phase:	Bootload
	have 118944 bytes to send in 59 frames
	Sending frame #1, length: 2048
	Sending frame #2, length: 2048
	Sending frame #3, length: 2048
	Sending frame #4, length: 2048
	Sending frame #5, length: 2048
	Sending frame #6, length: 2048
	Sending frame #7, length: 2048
	Sending frame #8, length: 2048
	Sending frame #9, length: 2048
	Sending frame #10, length: 2048
	Sending frame #11, length: 2048
	Sending frame #12, length: 2048
	Sending frame #13, length: 2048
	Sending frame #14, length: 2048
	Sending frame #15, length: 2048
	Sending frame #16, length: 2048
	Sending frame #17, length: 2048
	Sending frame #18, length: 2048
	Sending frame #19, length: 2048
	Sending frame #20, length: 2048
	Sending frame #21, length: 2048
	Sending frame #22, length: 2048
	Sending frame #23, length: 2048
	Sending frame #24, length: 2048
	Sending frame #25, length: 2048
	Sending frame #26, length: 2048
	Sending frame #27, length: 2048
	Sending frame #28, length: 2048
	Sending frame #29, length: 2048
	Sending frame #30, length: 2048
	Sending frame #31, length: 2048
	Sending frame #32, length: 2048
	Sending frame #33, length: 2048
	Sending frame #34, length: 2048
	Sending frame #35, length: 2048
	Sending frame #36, length: 2048
	Sending frame #37, length: 2048
	Sending frame #38, length: 2048
	Sending frame #39, length: 2048
	Sending frame #40, length: 2048
	Sending frame #41, length: 2048
	Sending frame #42, length: 2048
	Sending frame #43, length: 2048
	Sending frame #44, length: 2048
	Sending frame #45, length: 2048
	Sending frame #46, length: 2048
	Sending frame #47, length: 2048
	Sending frame #48, length: 2048
	Sending frame #49, length: 2048
	Sending frame #50, length: 2048
	Sending frame #51, length: 2048
	Sending frame #52, length: 2048
	Sending frame #53, length: 2048
	Sending frame #54, length: 2048
	Sending frame #55, length: 2048
	Sending frame #56, length: 2048
	Sending frame #57, length: 2048
	Sending frame #58, length: 2048
	Sending frame #59, length: 160

	Upload complete

	Nominal bootload bps: 44413.86

🎉 The hanging issue no longer occurs:

.
.
.
Count: 1628 1629629
Count: 1629 1630630
Count: 1630 1631631
Count: 1631 1632632
Count: 1632 1633633

At this point, on the Win10 machine, I closed Arduino, and deleted the C:\Users\user\AppData\Local\Arduino15\ directory.
Then after reinstalling the Sparkfun Apollo3 Boards version: 2.0.6 and re-flashing my ATB board:

.
.
.
Count: 1500 1501501
Count: 1501 1502502
Count: 1502 1503503
Count: 1503 1504504

The hanging issue no longer occurs.
I will confirm that this fixes the issue for my friend as well.

@Wenn0101 Wenn0101 reopened this May 24, 2021
@Wenn0101
Copy link
Contributor

Wenn0101 commented May 24, 2021

Reopened after a report from @nseidle that he was getting the same issue.

A failure after ~23 minutes. His original example was a BLE project that would quit after 23 minutes, but he is able to reproduce the problem with this sketch

int count = 0;
void setup() {
  Serial.begin(115200); // set the baud rate with the begin() method
  Serial.println("\n\nApollo3 - Serial");
}

void loop() {
  Serial.print("Count: ");
  Serial.print(count++);
  Serial.print(" ");
  Serial.println(micros());
  delay(10);
}

I am trying to reproduce the problem on my end.

@paulvha
Copy link
Contributor

paulvha commented May 24, 2021

I'll do the same on 2.10.

@Wenn0101
Copy link
Contributor

Wenn0101 commented May 24, 2021

Thanks @paulvha, any luck? I should add that he was running with:
Windows 10
Arduino v1.8.13
v2.1.0 core
Artemis Nano

I used the same platform and version and am not getting the problem. This is an interesting problem.

@paulvha
Copy link
Contributor

paulvha commented May 24, 2021 via email

@Wenn0101
Copy link
Contributor

Wenn0101 commented May 24, 2021

I think its unlikely to be a serial issue. The original observation from Nate was a BLE sketch (that was not connected over serial), that simply stopped showing up and blinking a heartbeat LED after 23 minutes.

I am also unable to reproduce the problem. I am trying to figure out why this problem seems to happen to only some users. I find it interesting that nigel was able to fix the problem on his end by re-installing, and that the compiled binaries appeared to be the same between working an non-working builds. perhaps there is a problem with the tools, but I cant see how it would cause this issue. Mostly just typing my thought right now,

@paulvha
Copy link
Contributor

paulvha commented May 24, 2021 via email

@paulvha
Copy link
Contributor

paulvha commented May 25, 2021

it has been running for more than 12 hours without a problem.
Count: 3665488 919823695
Count: 3665489 919835698
Count: 3665490 919847697
Count: 3665491 919859695
Count: 3665492 919871695
Count: 3665493 919883698
Count: 3665494 919895697
Count: 3665495 919907695
Count: 3665496 919919695
Count: 3665497 919931698
Count: 3665498 919943697
Count: 3665499 919955695
Count: 3665500 919967695
Count: 3665501 919979698
switching it off now.

regards,
Paul

@Wenn0101
Copy link
Contributor

Thanks for the help Paul! This testing is so valuable in helping me figure narrow down who this affects.

-Kyle

@Wenn0101
Copy link
Contributor

Wenn0101 commented May 29, 2021

Ok, here is what I have found.

The problem appeared to be mostly board specific - a failing board would typically fail regardless of the computer it was being used on, and a nonfailing board would work on computers originally thought to be suspect.

Interesting information that lead to the resolution:
The stimer will rollover in 2^32/3M = 1431.65 seconds.
The stimer overflow interrupt is not handled properly, because the stimer overflow interrupt is not enabled.
Suggesting - For some reason, the stimer rollover interrupt is firing (only on the affected boards), causing the undefined behavior.

Explicitly disabling this interrupt in the us_ticker setup should resolve this issue on affected boards.

am_hal_stimer_int_disable(AM_HAL_STIMER_INT_OVERFLOW);

I have confirmed this on the 1 board I have that would fail, waiting on Nate to confirm that the problem is resolved on his boards.

@paulvha
Copy link
Contributor

paulvha commented May 30, 2021

interesting... could it be a leftover from the SVL bootloader, where it is set and used to autodetect the baud rate. It does perform a disable before starting the loaded app, but maybe it not happening on all boards. timing ?? I load the sketches on the ATP board with the ASB uploader.

@Wenn0101
Copy link
Contributor

I was thinking maybe an older version of the SVL bootloader doesn't disable it before loading the app. I did notice all of the "problem" boards are older. I haven't looked into it yet, but I like the reasoning that it could be the bootloader. Either way I think the fix is to have the app explicitly disable it, to be safe.

@paulvha
Copy link
Contributor

paulvha commented Jun 1, 2021

look at this post that I saw today... https://forum.sparkfun.com/viewtopic.php?f=168&t=5139. They had the same issue.
regards
Paul

@nigelb
Copy link
Contributor Author

nigelb commented Jun 7, 2021

@Wenn0101 and @paulvha my friend actually could not fix the issue by re-installing. When I think back I believe I tried both the SVL and ASB bootloaders and managed to break the SVL bootloader somehow. I had to "Burn Bootloader" to fix the issue. In the same test iteration I re-installed the "Sparkfun Apollo3 Boards" platform. After this everything was working. My friend is going to try updating the bootloader and see if this solves his problem. We both purchased our boards in around 2019.

@Wenn0101 Wenn0101 added this to the v2.1.1 milestone Jun 10, 2021
@Wenn0101
Copy link
Contributor

This should be fixed as of v2.1.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants