New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISR intervenes micros() #297

Closed
agcaahmet opened this Issue Apr 4, 2017 · 22 comments

Comments

Projects
None yet
4 participants
@agcaahmet

agcaahmet commented Apr 4, 2017

If interrupt service routine is used and working, the micros() value jumps around 18 seconds (~17800000). I think, this value is close to the micros() overflow value that was previously fixed #267

@me-no-dev

This comment has been minimized.

Member

me-no-dev commented Apr 4, 2017

jumps to what? can you provide some code and debug?

@agcaahmet

This comment has been minimized.

agcaahmet commented Apr 4, 2017

The micros value suddenly increases by around 17890000 at random instants(I dont know if there is a defined pattern) . You can use the attached sketch with a pwm signal(50 Hz, 1-2 milliseconds duty) attached on pin 38.

sketch_apr04a.txt
sketch_apr04a_serial_output.xlsx

@AnalysIR

This comment has been minimized.

AnalysIR commented Sep 4, 2017

Just confirming that I see the same behaviour, with the same offset. (~17992004)

I tweaked the example above and can force the issue using an IR receiver & Remote on the input pin.
I also added a check to stop when issue occured/detected.
Using ESP-Wroom-32.

  • Also checked FALLING & RISING...same result.
  • removed the calls to micros() in the ISR & problem went away.
  • However, in my own app, I changed the calling of micros() from within the ISR to the loop() via flag and the problem still persisted. Conclusion, more likely to be related to an ISR & micros, even is micros is not called from ISR.

Updated sketch

volatile long dutyCycle_Thr = 0;
volatile long startTime_Thr = 0;
volatile long lastTime = 0;

void setup() {
  Serial.begin(115200);
  pinMode(18, INPUT);
  attachInterrupt(18, isrTHR, CHANGE);
  //  attachInterrupt(18, isrTHR, FALLING);
  // attachInterrupt(18, isrTHR, RISING);
}

void loop() {
  uint32_t usLoop = micros();
  if (usLoop - lastTime > 200000) { //check if time gap too large
    Serial.println("Boom! : ");
    Serial.println(usLoop - lastTime);
    while (1); //hang here once issue detected
  }
  if (usLoop - lastTime > 100000)
  {
    Serial.println(micros());
    lastTime = micros();
  }
}

void isrTHR(){
  if (digitalRead(18) == HIGH) {
    startTime_Thr = micros();
  }
  else {
    dutyCycle_Thr =  micros() - startTime_Thr;
  }
}
@me-no-dev

This comment has been minimized.

Member

me-no-dev commented Sep 4, 2017

the number 17895697 makes sense. That is the overflow of the ccounter in microseconds. Question is why it happens? micros() is implemented here and in general should not be possible to end up in such situation

@AnalysIR

This comment has been minimized.

AnalysIR commented Sep 4, 2017

@me-no-dev
Yes indeed, very strange. Unfortunately, I am not familiar enough with the ESP32 to help much.

However, it should be relatively easy for me to code a workaround for my own requirements in the short term, now that I know it can wrap on circa 17 secs.

@me-no-dev

This comment has been minimized.

Member

me-no-dev commented Sep 4, 2017

oh! I might have an idea! Can I bother you to try something, since you have a case to test?
Edit [esp32-arduino]/cores/esp32/esp32-hal-misc.c and change the code for micros() with the following:

static unsigned long IRAM_ATTR _micros()
{
    static unsigned long lccount = 0;
    static unsigned long overflow = 0;
    unsigned long ccount;
    __asm__ __volatile__ ( "rsr     %0, ccount" : "=a" (ccount) );
    if(ccount < lccount){
        overflow += UINT32_MAX / CONFIG_ESP32_DEFAULT_CPU_FREQ_MHZ;
    }
    lccount = ccount;
    return overflow + (ccount / CONFIG_ESP32_DEFAULT_CPU_FREQ_MHZ);
}

unsigned long IRAM_ATTR micros()
{
    unsigned long res;
    portENTER_CRITICAL_ISR(&microsMux);
    res = _micros();
    portEXIT_CRITICAL_ISR(&microsMux);
    return res;
}
@AnalysIR

This comment has been minimized.

AnalysIR commented Sep 4, 2017

I cheated and did it all in the sketch :)
renamend to myMicros()...&.....microsMux2...to avoid conflicts.

....hopefully not a problem. Result: FAIL, same as before :(

volatile long dutyCycle_Thr = 0;
volatile long startTime_Thr = 0;
volatile long lastTime = 0;

portMUX_TYPE microsMux2 = portMUX_INITIALIZER_UNLOCKED;

static unsigned long IRAM_ATTR _micros(){
    static unsigned long lccount = 0;
    static unsigned long overflow = 0;
    unsigned long ccount;
    __asm__ __volatile__ ( "rsr     %0, ccount" : "=a" (ccount) );
    if(ccount < lccount){
        overflow += UINT32_MAX / CONFIG_ESP32_DEFAULT_CPU_FREQ_MHZ;
    }
    lccount = ccount;
    return overflow + (ccount / CONFIG_ESP32_DEFAULT_CPU_FREQ_MHZ);
}

unsigned long IRAM_ATTR myMicros()
{
    unsigned long res;
    portENTER_CRITICAL_ISR(&microsMux2);
    res = _micros();
    portEXIT_CRITICAL_ISR(&microsMux2);
    return res;
}

void setup() {
  Serial.begin(115200);
  pinMode(18, INPUT);
  attachInterrupt(18, isrTHR, CHANGE);
  //  attachInterrupt(18, isrTHR, FALLING);
  // attachInterrupt(18, isrTHR, RISING);
}

void loop() {
  uint32_t usLoop = myMicros();
  if (usLoop - lastTime > 200000) { //check if time gap too large
    Serial.println("Boom! : ");
    Serial.println(usLoop - lastTime);
    while (1); //hang here once issue detected
  }
  if (usLoop - lastTime > 100000)
  {
    Serial.println(myMicros());
    lastTime = myMicros();
  }
}

void isrTHR(){
  if (digitalRead(18) == HIGH) {
    startTime_Thr = myMicros();
  }
  else {
    dutyCycle_Thr =  myMicros() - startTime_Thr;
  }
}
@AnalysIR

This comment has been minimized.

AnalysIR commented Sep 4, 2017

workaround....I had been working on this in the meantime....FYI: it seems that the wraparound can happen at any stage from the 1st to 240th within the 0xFFFFFFFF lifespan.

      deltaMicros = (micros() - oldMicros); 
      while (deltaMicros > 0x1111111) deltaMicros -= 0x1111111;

0x1111111 = 0xFFFFFFFF/240 (Almost & good enough)

This seems to take care of any issues for my purposes. It seems the wraparound can happen at any point

Thoughts:

  • may not be a wraparound but an extra increment of 'overflow' counter, which would suggest that the ISR is managing to increment it out of turn.
  • However, as it can happen before even 17 secs have elapsed, it suggests a problem with calling micros from an ISR. At this stage I can force it quite consistently.
  • Always happens only when ISR is being used (When I am sending an IR signal)

One set of output shows it happens @ 5 secs after startup

100002
200040
300061
400088
500108
Boom! : 
17955525
Next set shows on first 17 sec transition (coincidence = when I pressed hte remote)
900192
1000213
1100234
1200255
1300276
1400298
1500319
Boom! : 
17969375

Conclusion: its not related to the when the 17 sec boundary occurs, but when it happens it causes a 17 seconds delta in measurements.....I just did another test and when the error occurs, the micros value jumps by 1 overflow count(approx) and it can happen anywhere in the cycle (from 1->17 secs)

You can see the pattern here.....

15603323
15703344
15803365
15903386
Boom! : 
17987787
33891234
Boom! : 
17994742
51886036
Boom! : 
17904897
69790993
Boom! : 
17903445
87694498
87794520
87894541
87994562
Boom! : 
17918337
105912959
106012980
106113002
@me-no-dev

This comment has been minimized.

Member

me-no-dev commented Sep 4, 2017

This is so not the result I expected :( this could only happen if _micros() gets called at the same time from the loop and from the ISR. And that should not be possible now. Before, the second to call would have read the old lccount and trigger the addition of overflow again, thus I wrapped the whole logic inside CRITICAL. That should have taken care of overflow. But now that you say that it can happen at any point... totally speechless. Must be then something else. Maybe something resets ccount? That should not be...
@igrr @projectgus can you guys make sense of this?

@AnalysIR

This comment has been minimized.

AnalysIR commented Sep 4, 2017

I did some more measurements & it seems pretty clear that the issue is that overflow is being incremented - in error....when interrupts happen.
(In my own app, I can recover this by mod'ding any time delta by 0x1111111)

Is there any chance the 2 calls end up running on different cores, with the values for lccount or ccount being out of scope on one???

I have attached a PDF with a trace of what happens to the relevant values. There is a line output every sec approx.
You can see the higlighted lines where the 'jump error' in the overflow variable happens.

I suspect the error is actually what causes overflow to be incremented in error, rather than overflow itself.

which happens here: if(ccount < lccount)

So yes ccount is less than lccount, which shouldn't happen normally - other than a real overflow.

micros.pdf

@igrr

This comment has been minimized.

Member

igrr commented Sep 4, 2017

Which CPU does the ISR triggered by IR remote run on? Could micros also be called on CPU0 somehow?
Adding some ets_printf's of xPortGetCoreID might help answer these questions.

@me-no-dev

This comment has been minimized.

Member

me-no-dev commented Sep 5, 2017

Runs on CPU1 and it's the only interrupt. Still even if it does get called on both cores, the critical section should take care of it. And it overflows randomly... which makes even less sense

@igrr

This comment has been minimized.

Member

igrr commented Sep 5, 2017

Still even if it does get called on both cores, the critical section should take care of it

I don't think this is the case, because ccount values on two CPUs are not the same, but lccount value is shared between CPUs.

I would suggest adding an assert(xPortGetCoreID() == 1); into the micros function to quickly rule out this possibility if the issue is indeed related to something else. @AnalysIR, would you be able to do such a test?

@AnalysIR

This comment has been minimized.

AnalysIR commented Sep 5, 2017

Sounds promising....will try it out later & report back....tnx

@me-no-dev

This comment has been minimized.

Member

me-no-dev commented Sep 5, 2017

ccount values on two CPUs are not the same

Not good :D how should we coop with this?

@igrr

This comment has been minimized.

Member

igrr commented Sep 5, 2017

In general, you can not cope with this, and also there is no need to. In IDF 3.0, we will have a microsecond timer available to the applications, which would be guaranteed to give the correct values, accounting for CPU clock frequency changes, overflows, and so on. These things are pretty hard to get right, so i would suggest reusing this timer once it is exposed as a public API (currently the header file, esp_timer.h, is private).

As a short-term fix (if indeed the issue proves to be related to ccount differences), i would suggest keeping two (portNUM_PROCESSORS) lccount values, and using the one depending on the CPU ID (as returned by xPortGetCoreID).

@AnalysIR

This comment has been minimized.

AnalysIR commented Sep 5, 2017

I would suggest adding an assert(xPortGetCoreID() == 1); into the micros function to quickly rule out this possibility if the issue is indeed related to something else.

Tried that still fails as before. :(

@me-no-dev

This comment has been minimized.

Member

me-no-dev commented Sep 5, 2017

That confirms my point that it's running on the same core (Core1). Otherwise it would have been a bug.
Question why this happens still stands :(

I'll think of implementing that short-term idea, though I can't yet make up the logic that will not cause different micros on both cores or desync in the overflow.

@AnalysIR

This comment has been minimized.

AnalysIR commented Sep 22, 2017

UPDATE: FYI
Just updated to the latest version here (Latest commit d27d297 ) & this problem seems to be gone, based on the test sketches used above. I will now restart working on my own firmware again.
..tnx to everyone involved.
I guess this can now be closed? (for me at least)

@AnalysIR

This comment has been minimized.

AnalysIR commented Sep 22, 2017

Sorry, spoke too soon...the interrupts were not firing as expected....so still the same issue.

@AnalysIR

This comment has been minimized.

AnalysIR commented Sep 24, 2017

OK..I did some more testing.

I double checked the dates on the ESP32 core on my system...after following the instructions to update to the latest and noticed that my file dates were much older than the latest on github.
(the reason I found this is because I wanted to use the HTTPClient library & it was not installed on my system..even after I did an update. This eventually led me to figure out the differences in dates with the latest github version)

So I removed all ESP32 files from my system and went thru the full install again.

This time the issue seems to have gone away....so the problem looks like it was at least in part related to inadvertently using an older version of the repository.

From my perspective, please close this issue & thanks for all the great work on ESP32.

I dont know why the instructions for updating didn't work for me. I will make sure to double-check that in future.

@agcaahmet

This comment has been minimized.

agcaahmet commented Sep 24, 2017

That's great, thanks for all esp32 contributers. I havent had any time to test my code again but I am closing this issue based on your observations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment