Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pyboard SysTick losing ticks: interrupts sometimes locked out for 170ms #2424

Open
kentindell opened this issue Sep 16, 2016 · 8 comments
Open

Comments

@kentindell
Copy link

I've spent a couple of days chasing a timing bug and I think I've nailed the cause: the SysTick handler appears to sporadically miss out a chunk of time (about 170ms). I checked this by modifying the SysTick handler to monitor itself:

uint32_t tick_timestamp;
uint32_t last_tick_timestamp;
uint32_t tick_overruns = 0;
uint32_t last_overrun;

/**
  * @brief  This function handles SysTick Handler.
  * @param  None
  * @retval None
  */
void SysTick_Handler(void) {
    // Instead of calling HAL_IncTick we do the increment here of the counter.
    // This is purely for efficiency, since SysTick is called 1000 times per
    // second at the highest interrupt priority.
    extern volatile uint32_t uwTick;

    last_tick_timestamp = tick_timestamp;
    tick_timestamp = DWT->CYCCNT;
    if((tick_timestamp - last_tick_timestamp) > 336000U) { /* 2ms late */
       last_overrun = tick_timestamp - last_tick_timestamp;
        tick_overruns++;
    }

    uwTick += 1;

Basically it uses the debug CPU clock counter to see how long since it last ran. Obviously this is normally around 168000 clocks (i.e. 1ms), sometimes a bit less, sometimes a bit more (because of ISR jitter). But every now and then it misses a huge chunk of time. Here's the output of one of my test programs:

ERROR: elapsed_ms and ms_delta mismatch, margin=169
overruns=11, overrun by=168001(28862777)

There are a bunch of overruns at boot then everything is fine for a while. Then (in the above example) overrun 11 hit. The key number above is 28862777: when divided by / 168000 it is 171.8ms. Basically the SysTick handler itself sees that it hasn't run for a big chunk of time. Which can only happen if something is locking out all interrupts for a massive amount of time or SysTick has been disabled. I'm pretty sure there's nothing disabling it.

I wondered what might be locking out interrupts for such a huge chunk of time so I wrote a minimal test program:

from pyb import delay, rng
from time import ticks_cpu


def timestamps():
    start = ticks_cpu()
    print("Starting with {}".format(start))

    while True:
        delay_time = rng() % 200
        print("Delay time = {}ms".format(delay_time))
        before = ticks_cpu() & 0x3fffffff
        delay(delay_time)
        after = ticks_cpu() & 0x3fffffff
        realtime = (after - before) & 0x3fffffff
        realtime_ms = realtime / 168000

        if abs(realtime_ms - delay_time) > 1:
            print("ERROR: delay of {}ms actually took {}ms ({:08x},{:08x} = {})".format(delay_time, realtime_ms, after, before, after - before))
            return

I apply masks above to use only the bottom 30 bits to get modulo arithmetic working right for calculating delays.

I had to modify the ticks_cpu() function to return a longer range than just a small int (the clock runs so fast 16 bits enough):

STATIC mp_obj_t time_ticks_cpu(void) {
    static bool enabled = false;
    if (!enabled) {
        CoreDebug->DEMCR |= CoreDebug_DEMCR_TRCENA_Msk;
        DWT->CYCCNT = 0;
        DWT->CTRL |= DWT_CTRL_CYCCNTENA_Msk;
        enabled = true;
    }
    return mp_obj_new_int_from_uint(DWT->CYCCNT);
}
STATIC MP_DEFINE_CONST_FUN_OBJ_0(time_ticks_cpu_obj, time_ticks_cpu);

Here's an example run:

Delay time = 22ms
Delay time = 105ms
Delay time = 199ms
Delay time = 170ms
ERROR: delay of 170ms actually took 340.4643ms (1b10cd27,17a80774 = 57198003)

This affects all timing that is based on SysTick. So delay(), millis(), etc. all go wrong.

I investigated further into what is happening and I know what at least one of the culprits is: the flash file system mounted on USB. If you run this test program then copy something to the file system it trips the bug. But it also trips at random (which may be the host OS - Ubuntu in my case - touching the file system for indexing or something).

@dbc
Copy link
Contributor

dbc commented Sep 16, 2016

But it also trips at random (which may be the host OS - Ubuntu in my case - touching the file system for indexing or something).

Somebody who knows better than I will hopefully be along shortly, but I dimly recall from a while back that, yes, your host file system may be periodically hitting the flash mount point. That will lock interrupts. What happens if you unmount the pyboard flash drive?

edit:
I think this is related to the issue that I dimly remembered:
#1389
@dhylands will hopefully be along shortly with a more enlightened reply to your question.

@kentindell
Copy link
Author

That's my next experiment to do. I also want to see if SD vs. internal flash makes a difference, since SD card writes should be asynchronous. I can see why internal flash writes need severe concurrency control but 170ms is a huge chunk of time for all interrupts to be locked out (with knock-on consequences for time measurements, timeouts and timestamps).

@kentindell
Copy link
Author

An update: I ran a soak test with the drive unmounted and there was no SysTick ISR lockout and the delay functions udelay() (and delay()) spin for the correct amount of time.

@dpgeorge
Copy link
Member

There seems to be 2 separate things being reported here:

  1. SysTick losing ticks. This is bad and should not happen. Upon inspection of the flash and SD card code I can't immediately see any problem with them. They do block IRQs, but only priority 2 and above, so SysTick (at prio=0) should be active the entire time.

  2. pyb.delay() taking longer than requested. This is not really a bug: if you ask for a 1ms delay but there's lots of other processing that needs to take place "in the background" (eg servicing USB requests) then pyb.delay can't be expected to return early. (Think of it this way: pyb.delay() may return in the correct time period just before the USB request comes in, then the USB request takes ages but you never know about it.)

A 170ms delay is long and it would be nice to understand why it occurs. Is it just internal flash being mounted that gives the problem, or SD card, or both?

@dhylands
Copy link
Contributor

yeah - I looked through the code as well and could only find 2 places that even disable interrupts, one in extint.c and one in systick.c and they only disable for a very short time period. As @dpgeorge says, systick runs at the highest priority, so the systick interrupt itself should be running.

I can see lots of things that could delay your code doing the measurement (as @dpgeorge suggests).

@dpgeorge
Copy link
Member

I looked through the code as well and could only find 2 places that even disable interrupts

Also note that sdcard.c uses raise_irq_pri, but that should be fine as well.

@kentindell
Copy link
Author

"2) pyb.delay() taking longer than requested. This is not really a bug"

Yes, it's not a bug since such a delay is always "at least". I mentioned it only because it's using the SysTick as a timebase. It's the SysTick not running for 170ms that's the problem.

I will investigate SD vs internal flash next. I will put in more instrumentation to see if it's SysTick itself that's somehow taking 170ms or something else is locking it out.

@ryannathans
Copy link
Contributor

ERROR: delay of 170ms actually took 340.4643ms (1b10cd27,17a80774 = 57198003)

very interesting... I'm currently debugging a problem where my pyb.delay(n) is actually taking 2n. For example, all my 1000ms delays are actually taking 2000ms.

I'm using the STM32F439 compiled on the v1.9.4 tag. The problem does not exist on the pyboard lite running the v1.9.4 downloaded from the website. They have the same clock tree and clock tree config. My PLL settings are different on the F439 but I'm sure they're correct otherwise my USB wouldn't be working. The only difference I can see is I had an SD card on the F439 but not on the pyboard lite. I will test more on Monday...

tannewt pushed a commit to tannewt/circuitpython that referenced this issue Dec 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants