Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I2C ACK error counter makes the next i2c session timeout (IDFGH-8295) #9777

Closed
3 tasks done
weblqb opened this issue Sep 13, 2022 · 17 comments
Closed
3 tasks done

I2C ACK error counter makes the next i2c session timeout (IDFGH-8295) #9777

weblqb opened this issue Sep 13, 2022 · 17 comments
Assignees
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally

Comments

@weblqb
Copy link

weblqb commented Sep 13, 2022

Answers checklist.

  • I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
  • I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

General issue report

What I want to do

My application is going to check if a slave device is plugged to the I2C bus by send a write byte operation. My expectation is that if there is no ack from slave after the app calling the address, the app gets to know there is not the specific slave device plugged.

What is the issue

The issue is that after the bus get a I2C_STATUS_ACK_ERROR, the SCL wire is pull down to GND. It makes the next I2C session occurs a timeout error. And only after then the I2C SCL can recover to high level and the bus can properly work. In addition, if there is no more i2c communication after a I2C_STATUS_ACK_ERROR, SCL line is going to be low level permanently.

Maybe I found the cause of the issue, but I'm not sure

I found that there is a RESET COUNTER mechanism dealing with the I2C_STATUS_ACK_ERROR, which is in driver/i2c.c, line 1153 to 1161. The Macro I2C_ACKERR_CNT_MAX, as shown below, is defined as 10 in the same file, which means that only if I2C_STATUS_ACK_ERROR appears 10 times, i2c hardware FSM can be reset. And I think that is why the i2c session next to a I2C_STATUS_ACK_ERROR is always timeout.

} else if (p_i2c->status == I2C_STATUS_ACK_ERROR) {
    clear_bus_cnt[i2c_num]++;
    if (clear_bus_cnt[i2c_num] >= I2C_ACKERR_CNT_MAX) {
        clear_bus_cnt[i2c_num] = 0;
        i2c_hw_fsm_reset(i2c_num);
    }
    ret = ESP_FAIL;
} else {

I changed this part to:

} else if (p_i2c->status == I2C_STATUS_ACK_ERROR) {
    clear_bus_cnt[i2c_num]++;
    if (clear_bus_cnt[i2c_num] >= I2C_ACKERR_CNT_MAX) {
        clear_bus_cnt[i2c_num] = 0;
    }
    i2c_hw_fsm_reset(i2c_num);
    ret = ESP_FAIL;
} else {

in which i totally ignore the RESET COUNTER by reset the FSM whenever I2C_STATUS_ACK_ERROR comes out. And this solve the problem.

What I want to ask

What is the RESET COUNTER for? I wonder whether it is fine to simply reset FSM once a ACK ERROR occurs. Will it cause any other problem with out RESET COUNTER mechanism?

@espressif-bot espressif-bot added the Status: Opened Issue is new label Sep 13, 2022
@github-actions github-actions bot changed the title I2C ACK error counter makes the next i2c session timeout I2C ACK error counter makes the next i2c session timeout (IDFGH-8295) Sep 13, 2022
@o-marshmallow
Copy link
Collaborator

Hi @weblqb ,

Can you please provide a bit more details? Which ESP-IDF version are you using? If master, which commit? On which target are you encountering this issue? Can you provide a small reproductible example?

The reset counter is necessary because the I2C hardware bus can get stuck, thus, if we detect an arbitrary amount of ACK error, it is necessary to perform a hardware reset on the bus.

@KonssnoK
Copy link
Contributor

KonssnoK commented Dec 19, 2022

we are facing the same issue.

v4.4.3-319-g0fb32e11a58
ESP32S3

We have 2 devices on our bus, and depending on the manufacturing board one is populated and the other is not.

the code is


    error_t err = dac121_init();
    if (err == SYS_OK) {
        dac.model = DAC_MODEL_DAC121;
    } else {
        error_t err = max580x_init(0, MAX580X_I2C_ADDR_DFLT);
        if (err == SYS_OK) {
            dac.model = DAC_MODEL_MAX580X;
        } else {
            return SYS_ENODEV;
        }
    }

With the current I2C code we have

                } else if (p_i2c->status == I2C_STATUS_ACK_ERROR) {
                    clear_bus_cnt[i2c_num]++;
                    if (clear_bus_cnt[i2c_num] >= I2C_ACKERR_CNT_MAX) {
                        clear_bus_cnt[i2c_num] = 0;
                        i2c_hw_fsm_reset(i2c_num);
                    }
                    ret = ESP_FAIL;
                } else {

when failing to find the first DAC, the I2C will not be able to find anymore the second DAC because the state machine is fucked up somehow.

by reducing

#define I2C_ACKERR_CNT_MAX             (1)

The second DAC is correctly seen on the I2C bus.

@o-marshmallow here is the I2C sequence.

W (00:00:00.227) hw: Detected hw type 2
READ 0                         -> DEVICEA  OK
READ 0                         -> DEVICEB OK
WRITEREAD 0               -> DEVICEA  OK
WRITEREAD 0               -> DEVICEB  OK

READ -1                        -> DAC1 ESP_FAIL ON NACK -> DEVICE NOT PRESENT
WRITEREAD 263            -> DAC2 ESP_TIMEOUT

Please either fix the I2C fsm
or change the I2C_ACKERR_CNT_MAX retries to 1.

@KonssnoK
Copy link
Contributor

we are facing the same issue.

Which esp-idf version? ("git describe --tags" output) And which SoC/module are you using?

updated

@o-marshmallow
Copy link
Collaborator

o-marshmallow commented Dec 20, 2022

Hi @KonssnoK ,

Thanks for the details, after testing on my side, I was able to reproduce the issue and it seems like it is a problem with the driver indeed. The problem is that after a NACK, the I2C hardware controller doesn't issue a STOP, which blocks the I2C bus. This is why in your case, resetting the bus after each NACK works. But this is a bit overkill and very (CPU) time consuming. The simplest and fastest solution is to force a STOP signal after NACK is received. Here is a patch that does exactly that. Please try the following patch:
i2c_nack.diff.txt

You can apply it with:

cd $IDF_PATH
patch -p1 < ~/path/to/patch/above/i2c_nack.diff.txt

(This patch needs to be applied on top of IDF v4.4(.x) versions, which is your case)

If this works for you, please inform me and I will create an internal merge request to fix it globally.

@o-marshmallow o-marshmallow self-assigned this Dec 20, 2022
@KonssnoK
Copy link
Contributor

yeah we didn't spend so much time on investigating what was wrong once we found a way to going around it.
I'll test and let you know

@espressif-bot espressif-bot added Status: In Progress Work is in progress and removed Status: Opened Issue is new labels Dec 21, 2022
@KonssnoK
Copy link
Contributor

@o-marshmallow i confirm you patch works as well.
I'll migrate our baseline to your implementation.
Cheers.

@o-marshmallow
Copy link
Collaborator

@KonssnoK Thanks for the confirmation! I am going to create an internal merge request for this fix then.

@AxelLin
Copy link
Contributor

AxelLin commented Jan 15, 2023

@KonssnoK Thanks for the confirmation! I am going to create an internal merge request for this fix then.

The fix is not yet available in any branch, is there anything wrong?

@o-marshmallow
Copy link
Collaborator

@AxelLin Sorry for the delay of response, other high priority tasks came in.

Further tests are required from our side to determine whether this is a hardware issue or a software issue. In fact, this problem doesn't appear on the ESP32 and ESP32-S2 but does appear on other targets I tested, even though the driver is the same.
Once I have determined for sure where the issue is, I will make a "real" fix.

@AxelLin
Copy link
Contributor

AxelLin commented Mar 15, 2023

@AxelLin Sorry for the delay of response, other high priority tasks came in.

Further tests are required from our side to determine whether this is a hardware issue or a software issue. In fact, this problem doesn't appear on the ESP32 and ESP32-S2 but does appear on other targets I tested, even though the driver is the same.

@o-marshmallow
Did you check #10811 ?
SOC_I2C_CMD_REG_NUM is 16 for esp32 and esp32s2, 8 for other SoCs.

@o-marshmallow
Copy link
Collaborator

Hello @AxelLin ,

This was not on my priority list anymore as other tasks came in. I am prioritizing it again, currently checking this could be a hardware issue or not. I will keep you informed.

@espressif-bot espressif-bot added Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally Resolution: Done Issue is done internally and removed Status: In Progress Work is in progress Resolution: NA Issue resolution is unavailable labels Apr 17, 2023
@KaeLL
Copy link
Contributor

KaeLL commented Apr 19, 2023

@o-marshmallow I presume it's it's just a software issue, then? Also ETA on the backports? (Especially release/v5.{0,1}?

@o-marshmallow
Copy link
Collaborator

Hi @KaeLL ,

Fortunately yes! It was a bug in the I2C timing configuration, backports have been created, they will come soon (v5.0, v4.4, v4.3)

@AxelLin
Copy link
Contributor

AxelLin commented Apr 22, 2023

@o-marshmallow
commit a0f8434 causes new issue: #11249

espressif-bot pushed a commit that referenced this issue Apr 22, 2023
* Closes #9777

This bug prevented SCL line to work properly after a NACK was received in master mode.
@KonssnoK
Copy link
Contributor

we are still using the following patch:
image
should we consider removing it after updating to 3cec3a0 ?

@AxelLin
Copy link
Contributor

AxelLin commented May 3, 2023

@o-marshmallow
v4.3 branch supports esp32C3, so I think it needs backport fix as well.

@o-marshmallow
Copy link
Collaborator

o-marshmallow commented May 4, 2023

@AxelLin The fix is on the way, the internal MR has been created already.

@KonssnoK If the patch is working for you, you can keep it, but ideally, it would be better to integrate the new fix as the driver used to put to I2C hardware module into an "undefined" behavior.

espressif-bot pushed a commit that referenced this issue May 19, 2023
* Closes #9777

This bug prevented SCL line to work properly after a NACK was received in master mode.
espressif-bot pushed a commit that referenced this issue May 19, 2023
* Closes #9777

This bug prevented SCL line to work properly after a NACK was received in master mode.
espressif-bot pushed a commit that referenced this issue May 20, 2023
* Closes #9777

This bug prevented SCL line to work properly after a NACK was received in master mode.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally
Projects
None yet
Development

No branches or pull requests

6 participants