Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESP8266 Pin hard interrupt latency #2972

Open
peterhinch opened this issue Mar 21, 2017 · 16 comments

Comments

Projects
None yet
4 participants
@peterhinch
Copy link
Contributor

commented Mar 21, 2017

These measurements were similar prior to the soft interrupt firmware change.

The ESP8266 suffers from long and variable interrupt latency in response to pin level changes, even when clocked at 160MHz. The following script, when run for an hour, shows latency ranging from 43us to 739us (>100,000 clocks). @robert-hh has seen similar results using a different measurement approach.

Is there any scope for improvement?

Link pins 4 and 5.

from machine import Pin, freq, disable_irq, enable_irq, Timer
from utime import ticks_us, ticks_diff, sleep

pin = Pin(5, Pin.IN)
pout = Pin(4, Pin.OUT)
pout(0)
freq(160000000)
t = ticks_us()
tmax = 0
tmin = 10000
count = 0

def phan(line):
    global tmax, tmin
    delta = ticks_diff(ticks_us(), t)
    tmax = max(tmax, delta)
    tmin = min(tmin, delta)

def than(timr):
    global t
    t = ticks_us()
    pout(1)
    pout(0)

pin.irq(handler = phan, trigger = Pin.IRQ_RISING, hard = True)
tim = Timer(-1)
tim.init(period = 20, mode = Timer.PERIODIC, callback = than)

while True:
    print(count, tmax, tmin)
    count += 1
    sleep(1)
@dpgeorge

This comment has been minimized.

Copy link
Member

commented Mar 21, 2017

It would be useful to measure the time taken by "one line" of Python code (eg pout(1)) to understand how that compares with the values you measure.

@robert-hh

This comment has been minimized.

Copy link
Contributor

commented Mar 21, 2017

For me it looks as if your script adds up several sources of uncertainty, especially since the creation of the pulse and the response is done by the same unit. As far as I could tell, timer callbacks can be interrupted. So I guess that the execution of 'than' can be interrupted by the 'phan' ISR, and also there could be interrupts between taking the time and changing the GPIO leven in 'than', which add up to the results you receive but do not tell something about the ISR response. But nevertheless, using an external interrupt source and just taking the difference between two interrupts I've seen similar results: rare cases of a ~500µs latency.
@dpgeorge: We could measure the pulse with & variation created by a pout(1)/pout(0) pair. Doing that would also give an idication whether the timer callback is interrupted itself.

@peterhinch

This comment has been minimized.

Copy link
Contributor Author

commented Mar 21, 2017

The time for a single instruction, tested over just 3 minutes, varied from 174us to 856us.

from machine import Pin, freq, disable_irq, enable_irq, Timer
from utime import ticks_us, ticks_diff, sleep

pin = Pin(5, Pin.IN)
pout = Pin(4, Pin.OUT)
pout(0)
freq(160000000)
t = ticks_us()
tmax = 0
tmin = 10000
t_one_max = 0
t_one_min = 10000
count = 0

def phan(line):
    global tmax, tmin
    delta = ticks_diff(ticks_us(), t)
    tmax = max(tmax, delta)
    tmin = min(tmin, delta)

def than(timr):
    global t, t_one_max, t_one_min
    t = ticks_us()
    pout(1)
    t1 = ticks_us()
    t_one_max = max(t_one_max, ticks_diff(t1, t))
    t_one_min = min(t_one_min, ticks_diff(t1, t))
    pout(0)

pin.irq(handler = phan, trigger = Pin.IRQ_RISING, hard = True)
tim = Timer(-1)
tim.init(period = 20, mode = Timer.PERIODIC, callback = than)

while True:
    print(count, tmax, tmin, t_one_max, t_one_min)
    count += 1
    sleep(1)

Eliminating the instruction - to time the ticks_us() call - gave times from 11us to 450us. It suggests a higher priority interrupt is intervening. The handler could disable interrupts but that wouldn't help the latency issue.

def than(timr):
    global t, t_one_max, t_one_min
    t = ticks_us()
    t1 = ticks_us()
    t_one_max = max(t_one_max, ticks_diff(t1, t))
    t_one_min = min(t_one_min, ticks_diff(t1, t))
    pout(1)
    pout(0)
@robert-hh

This comment has been minimized.

Copy link
Contributor

commented Mar 21, 2017

So I have used another set-up for test:
The source of interrupts is a pulse generator which generates an 8µs pulse every 4 ms. This triggers the ESP8266 GPIO, which creates a single pulse in response, like that:

from machine import Pin
#
@micropython.native
def resp(p):
    global resp_pin
    resp_pin(0)
    resp_pin(1)

resp_pin = Pin(14, Pin.OUT, Pin.PULL_UP, value=1)
irq_pin = Pin(13, Pin.IN, Pin.PULL_UP)
irq_pin.irq(trigger = Pin.IRQ_FALLING, handler = resp, hard = True)

try:
    while True:
        pass
except:
    pass

irq_pin.irq(handler = None)

The signals were captured with in oscilloscope in persistent mode and fro special events in single capture mode. I've ran the ISR as byte code & native code, with hard & soft interrupts:
The results are to some extend as expected:
Native Code & hard IRQ: latency 20µs ... 270 µs, mostly below 80 µs
Native Code & soft IRQ: latency 20 µs .. 410 µs, mostly below 150 µs
Byte code & hard IRQ: latency 20 µs .. 270 µs, mostly below 80 µs
Byte code & soft IRQ: latency 20 µs .. 460 µs, mostly below 150 µs
The response pulse with is about 10 µs. With byte code the response pulse is somehow extended up to about 300 µs, with native code to about 25 µs.

@dpgeorge

This comment has been minimized.

Copy link
Member

commented Mar 22, 2017

Doing a similar measurement for PYBv1.0@168MHz, with input pulses of 3us width every 5ms, I get a very stable ExtInt irq latency of 8.2us (up to about 8.6us on rare occasions; USB is disabled). The response pulse width is 2us wide. This is all with bytecode, something similar to:

pout = Pin('X1', Pin.OUT)
def pin_irq(pin, p=pout): # cache pout as a local for speed
    p(1)
    p(0)
ExtInt(Pin('X2', Pin.IN), ExtInt.IRQ_RISING, Pin.PULL_NONE, pin_irq)

The esp8266's Xtensa arch is not as efficient as Thumb, so it won't reach down to very low latency and responses. The esp also has a few sources of uncertainty that'll give jitter to any realtime process:

  • code executed from external flash may need to be read in to iRAM cache, leading to long and unpredictable delays of code execution
  • if the flash is being read into cache then this may block the execution of interrupts (even if they're executed from iRAM)
  • the esp needs to handle wifi in the background and this can lead to interrupts being disabled, although I don't know what all the priorities are for this case
  • the uPy port of esp8266 has a hook in the VM which is called periodically to run the internal wifi code, and this will lead to variations in the time taken to execute the same bytecode loop (for example)

@peterhinch @robert-hh you may want to try running your timing tests with all wifi disabled, both AP and STA. That may give better results and help to understand where all the uncertainty comes from.

@robert-hh

This comment has been minimized.

Copy link
Contributor

commented Mar 22, 2017

@dpgeorge : Hello Damien, thanks for the response. I had the impression that the longish delays of about 300 µs are cause by paging in code from flash to RAM. These effects never occurred in the ISR when it was defined at native code in RAM. I thought of repeating the test w/o WiFi, but then the benefit of the ESP is gone.
The minimum response pulse width I've seen should be the time needed for one port out instruction, which was 10µs most of the time with rare extensions up to 300 µs, when defined as byte code.
Lesson learned: Do not use ESP8266 for applications with the need for fast interrupt response. And at the moment it's for me just something to know & consider. The ESP8266 is cheap in almost all aspects, including it's technical features, but still usable, if you can cope with that. The STM32 family offers a much better quality.
Question: Is there a method in the ESP API to make an area of code 'sticky'? Or could the native code space in RAM be extended for applications which need more reliable response?

@peterhinch

This comment has been minimized.

Copy link
Contributor Author

commented Mar 22, 2017

Running my original script with WiFi disabled produced similar results, with latencies of 791us max 44us min appearing in a few minutes of running. (My original test was performed with station mode active, AP mode off).

I modified it so that the pin was pulsed in the main loop rather than in a timer ISR, with networking off. Results after 30 minutes were 619us max, 55us min. So outputting the pulse in a timer ISR appears to increase the worst case latency.

The maximum latency goes to ~250us in the first second. Events pushing it beyond that are evidently infrequent, it typically takes ~2 minutes before jumping to ~600us.

from machine import Pin, freq
from utime import ticks_us, ticks_diff, sleep_ms
import network

wlan = network.WLAN(network.STA_IF)  # WiFi off
wlan.active(False)
ap = network.WLAN(network.AP_IF)
ap.active(False)

pin = Pin(5, Pin.IN)
pout = Pin(4, Pin.OUT)
pout(0)

freq(160000000)

t = 0
tmax = 0
tmin = 10000
count = 0

def phan(line):
    global tmax, tmin
    tn = ticks_us()
    delta = ticks_diff(tn, t)
    tmax = max(tmax, delta)
    tmin = min(tmin, delta)

pin.irq(handler = phan, trigger = Pin.IRQ_RISING, hard = True)

while True:
    for _ in range(50):
        t = ticks_us()
        pout(1)
        pout(0)
        sleep_ms(20)
    print(count, tmax, tmin)
    count += 1
@robert-hh

This comment has been minimized.

Copy link
Contributor

commented Mar 22, 2017

@peterhinch : What happens if you write the function phan() as @micropython.native? I expect the worst case responses going down to ~350 µs, becase there is no risk anymore for the function not being in RAM when the trigger hits.

@peterhinch

This comment has been minimized.

Copy link
Contributor Author

commented Mar 22, 2017

@robert-hh 590us max, 69us min within 2 minutes of running (networking off). No significant difference as far as I can see.

@robert-hh

This comment has been minimized.

Copy link
Contributor

commented Mar 22, 2017

@dpgeorge : Thanks for the hint about caching pout as default parameter. With that, the minimum responsive time is 16 µs (max 180 µs), and the pulse width is 4 µs at 80 MHz clock. At 160 MHz, the minimal response time an pulse width is 8/4 µs, but the max response time seen is still 180 µs.
@peterhinch : That's taken with an external trigger and signal capture by an oscilloscope.

@dpgeorge

This comment has been minimized.

Copy link
Member

commented Mar 22, 2017

@robert-hh are those latest figures (16/4 and 8/4) with native code or bytecode?

@robert-hh

This comment has been minimized.

Copy link
Contributor

commented Mar 23, 2017

Native code & hard interrupt. Native code shortens the response pulse width (which was b.t.w. 2 µs), and hard interrupt limits the worst case response time. A screen shot is attached. The green trace is the trigger pulse, the yellow the response, 20 µs/div. The scope trigger was set on the response pulse, showing both the response latency variation and the response pulse variation. The trigger pulse width is 4 µs. The persistence was set to inifinite, so you can see all variants that occured. Observation time 1 hour.
irq_resp_native_hard_cached160
For comparison: here is the "wort case" figure, with bytcode ISR and soft interrupt, @80 MHz, 100 µs/div (colors pink = trigger, yellow = response).
irq_resp_bytecode_soft
Even if the majority of responses is in the 100µs range, you have at at about 420µs, and also the response pulse is at least once extended to about 300 µs, which looks like a cache miss between two instruction (e.g. an interrupt causing the ISR code to be dropped from the RAM)

@dpgeorge

This comment has been minimized.

Copy link
Member

commented Mar 23, 2017

Thanks @robert-hh for the nice analysis and pics (and good tip to trigger on the leading edge of the response pulse).

A further discussion of this probably belongs on the forum. But to answer the original question of the issue:

Is there any scope for improvement?

It doesn't look like it, since in this case you are limited by the hardware (cache loading, and to some extent the need to process wifi events).

Note that the cache is still a problem even with the ISR handler in iRAM: the ISR is calling functions like pin.value() which live in flash and may need a cache line (or many) loaded.

The only way to have truly consistent (and low latency) response in a hard pin IRQ is to have the entire handler, and all the functions it calls, in iRAM. Because there is not much iRAM this is difficult. Even writing a pure asm_xtensa function won't be good enough because the wrapper code to call that asm function is in flash.

@peterhinch

This comment has been minimized.

Copy link
Contributor Author

commented Mar 23, 2017

Thanks both. Interesting and informative, but I guess we should close this?

@robert-hh

This comment has been minimized.

Copy link
Contributor

commented Mar 23, 2017

Agreed. Another observation: even when I disabled interrupts in the ISR for the pulse low/high instructions, there was some pulse width variation. So either machine.disable_irq does not disable all interrupts, or there was some clock stretching (test at 160 MHz clock). I considered that as just an observation, since my conclusion is anyhow that the ESP8266 is not good for any task with hard time constraints.

@pfalcon

This comment has been minimized.

Copy link
Contributor

commented Mar 23, 2017

I wasn't able to follow thru this thread so far, but early esp8266 hackers discovered that there's some jitter in GPIO operations (when doing it as the most base level - iRAM, asm, etc.). See pvvx's posts on bbs.espressif.

uPy wise, I'd guess the biggest reason for large jitter seen is:

the uPy port of esp8266 has a hook in the VM which is called periodically to run the internal wifi code, and this will lead to variations in the time taken to execute the same bytecode loop (for example)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.