Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threading issue with mcp23017 platform when used by both binary_sensor and switch integration #31867

Closed
jpcornil-git opened this issue Feb 15, 2020 · 31 comments

Comments

@jpcornil-git
Copy link
Contributor

jpcornil-git commented Feb 15, 2020

The problems

When a mcp23017 platform is instantiated by e.g. a binary_sensor and a switch component (running in different threads):

  • there is no protection mechanism/mutex to protect non atomic operations, e.g. a read-modify-write sequences -> inconsistent results when both threads access hardware at the same time (typically when hass starts up and each thread runs its setup_platform)
  • platform is initialized twice and 2nd initialisation/thread may therefore overwrite settings already made by the first one, e.g. switch thread ahead and already configuring pin as output with binary_sensor thread behind and instantiating a new MCP23017 object (that resets all directions to input)

I fixed the issue in my view by adding (See Additional information section below):

  • a (class) mutex in the MCP23017 class instantiated by both binary_sensor and a switch components
  • thread synchronization when accessing MCP23017 using above mutex in both binary_sensor and switch components
  • a class boolean in the MCP23017 class indicating that initialization has already been performed

Note that issue may be generic/affect others components sharing a common platform

Environment

Problem-relevant configuration.yaml

binary_sensor:
  - platform: mcp23017
    i2c_address: 0x27
    scan_interval: 1
    invert_logic: true
    pins:
      8 : Button_0
      9 : Button_1
      10: Button_2
      11: Button_3
      12: Button_4
      13: Button_5
      14: Button_6
      15: Button_7

switch:
  - platform: mcp23017
    i2c_address: 0x27
    pins:
      0 : Sortie_0
      1 : Sortie_1
      2 : Sortie_2
      3 : Sortie_3
      4 : Sortie_4
      5 : Sortie_5
      6 : Sortie_6
      7 : Sortie_7

Traceback/Error logs

None

Additional information

/srv/homeassistant/lib/python3.7/site-packages/adafruit_mcp230xx.py:

  • Addition of :
    • mutex and isInit class variables
    • condition to prevent a second initialization
...
import threading

class MCP23017:
    """Initialize MCP23017 instance on specified I2C bus and optionally
    at the specified I2C address.
    """
    mutex  = threading.Lock()
    isInit = False

    def __init__(self, i2c, address=_MCP23017_ADDRESS):
        self._device = i2c_device.I2CDevice(i2c, address)
        if not MCP23017.isInit:
            MCP23017.isInit = True
            # Reset to all inputs with no pull-ups and no inverted polarity.
            self.iodir = 0xFFFF
            self.gppu = 0x0000
            self._write_u16le(_MCP23017_IPOLA, 0x0000)
...

/srv/homeassistant/lib/python3.7/site-packages/homeassistant/components/mcp23017/binary_sensor.py:

  • Protection of all MCP23017 accesses by a mutex
...

def setup_platform(hass, config, add_devices, discovery_info=None):
    """Set up the MCP23017 binary sensors."""
    pull_mode = config[CONF_PULL_MODE]
    invert_logic = config[CONF_INVERT_LOGIC]
    i2c_address = config[CONF_I2C_ADDRESS]

    i2c = busio.I2C(board.SCL, board.SDA)
    with adafruit_mcp230xx.MCP23017.mutex:
        mcp = adafruit_mcp230xx.MCP23017(i2c, address=i2c_address)

    binary_sensors = []
    pins = config[CONF_PINS]

    for pin_num, pin_name in pins.items():
        pin = mcp.get_pin(pin_num)
        binary_sensors.append(
            MCP23017BinarySensor(pin_name, pin, pull_mode, invert_logic)
        )

    add_devices(binary_sensors, True)

class MCP23017BinarySensor(BinarySensorDevice):
    """Represent a binary sensor that uses MCP23017."""

    def __init__(self, name, pin, pull_mode, invert_logic):
        """Initialize the MCP23017 binary sensor."""
        self._name = name or DEVICE_DEFAULT_NAME
        self._pin = pin
        self._pull_mode = pull_mode
        self._invert_logic = invert_logic
        self._state = None

        with adafruit_mcp230xx.MCP23017.mutex:
            self._pin.direction = digitalio.Direction.INPUT
            self._pin.pull = digitalio.Pull.UP

    @property
    def name(self):
        """Return the name of the sensor."""
        return self._name

    @property
    def is_on(self):
        """Return the state of the entity."""
        return self._state != self._invert_logic

    def update(self):
        """Update the GPIO state."""
        with adafruit_mcp230xx.MCP23017.mutex:
            self._state = self._pin.value

/srv/homeassistant/lib/python3.7/site-packages/homeassistant/components/mcp23017/switch.py:

  • Protection of all MCP23017 accesses by a mutex
def setup_platform(hass, config, add_entities, discovery_info=None):
    """Set up the MCP23017 devices."""
    invert_logic = config.get(CONF_INVERT_LOGIC)
    i2c_address = config.get(CONF_I2C_ADDRESS)

    i2c = busio.I2C(board.SCL, board.SDA)

    with adafruit_mcp230xx.MCP23017.mutex:
        mcp = adafruit_mcp230xx.MCP23017(i2c, address=i2c_address)

    switches = []
    pins = config.get(CONF_PINS)
    for pin_num, pin_name in pins.items():
        pin = mcp.get_pin(pin_num)
        switches.append(MCP23017Switch(pin_name, pin, invert_logic))
    add_entities(switches)


class MCP23017Switch(ToggleEntity):
    """Representation of a  MCP23017 output pin."""

    def __init__(self, name, pin, invert_logic):
        """Initialize the pin."""
        self._name = name or DEVICE_DEFAULT_NAME
        self._pin = pin
        self._invert_logic = invert_logic
        self._state = False

        with adafruit_mcp230xx.MCP23017.mutex:
            self._pin.direction = digitalio.Direction.OUTPUT
            self._pin.value = self._invert_logic

    @property
    def name(self):
        """Return the name of the switch."""
        return self._name

    @property
    def should_poll(self):
        """No polling needed."""
        return False

    @property
    def is_on(self):
        """Return true if device is on."""
        return self._state

    @property
    def assumed_state(self):
        """Return true if optimistic updates are used."""
        return True

    def turn_on(self, **kwargs):
        """Turn the device on."""
        with adafruit_mcp230xx.MCP23017.mutex:
            self._pin.value = not self._invert_logic
        self._state = True
        self.schedule_update_ha_state()

    def turn_off(self, **kwargs):
        """Turn the device off."""
        with adafruit_mcp230xx.MCP23017.mutex:
            self._pin.value = self._invert_logic
        self._state = False
        self.schedule_update_ha_state()
@probot-home-assistant
Copy link

Hey there @jardiamj, mind taking a look at this issue as its been labeled with a integration (mcp23017) you are listed as a codeowner for? Thanks!

@Misiu
Copy link
Contributor

Misiu commented Feb 17, 2020

@jpcornil-git I'm working on PCF8574 integration, so your finding is very useful.
I'll wait to see what others have to say about this problem.
Do you have other hardware to test this problem? pcal9535a?

@jpcornil-git
Copy link
Contributor Author

No I don't but can help eventually, e.g. add debug code to expose the issue/thread activities as I did for my MCP23017-based board (https://github.com/jpcornil-git/RPiHat_GPIO_Expander)
Cheers
jpc

@Misiu
Copy link
Contributor

Misiu commented Feb 17, 2020

@jpcornil-git thank you for the link!
Let's wait for others to look into that issue.

@jardiamj
Copy link
Contributor

@StephenBeirlaen, this is most likely the reason we were having issues using the INTFLAG for implementing the Interrupt logic into the component.
In order to implement this changes to the Adafruit_CircuitPython_MCP230xx would need to be done through a PR first.
Adafruit's i2c_device implementation is designed so you can lock the i2c bus and device address: https://github.com/adafruit/Adafruit_CircuitPython_BusDevice/blob/master/adafruit_bus_device/i2c_device.py
But their implementation of the MCP230xx only locks the device address on every operation (read/write): https://github.com/adafruit/Adafruit_CircuitPython_MCP230xx/blob/master/adafruit_mcp230xx/mcp230xx.py
Some sort of refactoring of the library might be needed, but I haven't put that much thought into it yet. Any suggestions?

@jpcornil-git
Copy link
Contributor Author

My changes aren't generic enough; I would protect/lock by I2C address ideally (when you have multiple identical devices at different addresses you don't want to block access for all of them), same with the initialization (you want to initialize all of them but only once not everytime a new instance of the same {device, address} is made)

@StephenBeirlaen
Copy link

Nice findings, it looks promising! I can take a closer look tomorrow.

@StephenBeirlaen
Copy link

Very nice and detailed post @jpcornil-git! I also found similar problems where multiple I2C signals appeared to 'collide'. I tried to document my findings (with a logic analyser measurement) here: jardiamj/homeassistant#1 (comment)

I now also see why multithreading could be the issue, previously I was confused by the locks already present in the Adafruit code. I could not explain how they weren't sufficient, since I lack some more Python experience to debug this.

My particular project uses switches to momentarily turn on outputs, and these outputs controls pulse relays (more or less, for simplification). The output of the relays are fed back into the MCP as inputs (binary_sensors), to check the current state of the pulse relay (since they are pulsed to toggle them on/off, I have to verify the current state). So it's very similar to RPiHat_GPIO_Expander.

Since the measurement of an input change happens directly after an output changed its state, this leads to simultaneous I2C traffic all the time. That means I can easily reproduce the issue :)

Now I am unsure how we should fix this (universally as @jpcornil-git means). As I understand it right now, the proposed changes above are a global lock, that blocks the I2C bus for every device, not per I2C address? So two I2C devices with different addresses cannot be communicated with in parallel?

I would protect/lock by I2C address ideally (when you have multiple identical devices at different addresses you don't want to block access for all of them)

Wouldn't allowing this make it possible for multiple processes to use the I2C bus at the same time, and potentially mix up two I2C transmissions?

@jpcornil-git
Copy link
Contributor Author

The Adafruit code takes a lock of the I2C bus for a single R/W transaction (to protect that transaction) and this is fine at that level and not hurting (the bus is used for that transaction/can't be used for anything else as i2c is serial, i.e. you can't communicate in parallel).

The issue is when two threads are doing read-modify-write actions, e.g.:

In the MCP23017BinarySensor and MCP23017Switch __init__ code, you have:
self._pin.direction = digitalio.Direction.OUTPUT

The above calls the direction setter method from DigitalInOut class (defined in adafruit_mcp230xx/digital_inout.py)

self._mcp.iodir = _clear_bit(self._mcp.iodir, self._pin)

That line translates into two I2C transactions:

  • self._mcp.iodir (_clear_bit parameter) calls the iodir getter method from MCP23017 class (adafruit_mcp230xx/mcp23017.py)
    -> self._read_u16le(_MCP23017_IODIRA) [1st transaction]
  • [returned value is modified/one bit cleared]
  • self._mcp.iodir (assignation) calls the iodir setter method from MCP23017 class
    -> self._write_u16le(_MCP23017_IODIRA, val) [2nd transaction]

If between these two transactions, another thread is updating the same IODIRA register (because not waiting on a lock) it will be lost/overwritten by the 2nd transaction.

To solve the issue and make such read-modify-write transaction atomic, you have to take the lock on the device for the whole sequence -> need a lock per device instance or per address (one device instance per address).

Issue is not very visible because most functions do a single read or write (but not the case during the platform setup phase)

The second issue to solve is the initialization of the device when multiple instances of the same devices are created (in different threads), it should only happen once as second initialisation may overwrite configuration of the thread that is already ahead/configuring the device (above lock won't address that)

@stale
Copy link

stale bot commented May 19, 2020

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates.
Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍
This issue now has been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label May 19, 2020
@fjufju
Copy link

fjufju commented May 24, 2020

@jpcornil-git any updates? Have problem. Thank you.

@stale stale bot removed the stale label May 24, 2020
@jpcornil-git
Copy link
Contributor Author

jpcornil-git commented May 24, 2020 via email

@stale
Copy link

stale bot commented Aug 22, 2020

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates.
Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍
This issue now has been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Aug 22, 2020
@jpcornil-git
Copy link
Contributor Author

I still plan to work on this/fix this properly as soon as I can find some time
Cheers
jpc

@stale stale bot removed the stale label Aug 24, 2020
@fjufju
Copy link

fjufju commented Aug 30, 2020

Looking forward to that :)

@gcs278
Copy link

gcs278 commented Sep 29, 2020

Thanks @jpcornil-git. For everyone else finding this thread, I'm still patching in what @jpcornil-git provided earlier, but I think I made a couple of tweaks because of library-related errors (can't remember what).

Here is the files that I'm currently using (synced as of HA Core 0.115.2):
mcp23017_patches.zip

There's a patch script in the zip that patches the homeassistant docker container (you need SSH access). It should ask you to vimdiff the files to confirm the changes (I'm using my mac with samba mounts to diff) then it will patch in the files. Diffs should be good for 0.115. There might be a syntax error with the patch script, good luck and your mileage may vary!

@github-actions
Copy link

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates.
Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍
This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jan 26, 2021
@jpcornil-git
Copy link
Contributor Author

This issue/resolution is pending review of #42928 pull request
Cheers
jpc

@github-actions github-actions bot removed the stale label Jan 26, 2021
@StephenBeirlaen
Copy link

Nice work @jpcornil-git! Did not know about the PR yet. I'll see if I can get my hardware working again and test it out.

@gralin
Copy link
Contributor

gralin commented Jan 26, 2021

I will be happy to help test it too. I have been having some issues which might be caused by threading. All my light switches are connected via MCP23017 inputs and my lights are controlled by MCP23017 outputs. Once in a while it happens that clicking one light switch activates all outputs in MCP23017 causing lights in whole house to turn on 😅 I didn't monitor what is being sent to the module but I assume HA is sending something like 0xFF instead of just one bit for the right output and this is the effect. I'm curious if anyone could confirm that this could happen using the current source code or if it could be prevented using the code from PR. The version I use is modified because I needed to support interrupt pin to get better response time, and this interrupt pin has chained interrupt pins from all my MCP23017 modules, and this scenario was not supported out of the box. But I guess for this I can open separate issue/disucssion.

Anyway, thank you @jpcornil-git for your time and contribution, I hope I can at least help with testing since I'm not an experienced Python developer (I'm .NET).

@jpcornil-git
Copy link
Contributor Author

... and I'll be happy to support too/great to have tests on different platforms :-).

@gralin I guess that with this implementation you won't need interrupt either (polling is happening in a dedicated thread/100ms period, i.e. reactive and not loading HA). I did this because of async refactoring but also because current polling implementation within HA is way too slow for a good user experience when inputs are used for push button, e.g. switching on a light/

@jpcornil-git
Copy link
Contributor Author

For people affected by this issue (or willing to test/use a config flow/lower latency implementation already) and in order to ease testing, I made a 'custom_components version' available under https://github.com/jpcornil-git/HA-mcp23017.

You will need HA2021.2 or later (need #44238 fix) to test it.

Cheers
jpc

@jpcornil-git
Copy link
Contributor Author

Added HACS support to my https://github.com/jpcornil-git/HA-mcp23017 repository as well
=> Just add https://github.com/jpcornil-git/HA-mcp23017 to HACS custom repositories to try it out

@github-actions
Copy link

github-actions bot commented Jul 2, 2021

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates.
Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍
This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jul 2, 2021
@jpcornil-git
Copy link
Contributor Author

Issue is still there/component hasn't been updated since then but a fix is available as a custom_component or a HACS integration here: https://github.com/jpcornil-git/HA-mcp23017
Cheers,
jpc

@github-actions
Copy link

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates.
Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍
This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Sep 30, 2021
@jpcornil-git
Copy link
Contributor Author

Issue is still there/component hasn't been updated since then ... it looks like there is no support/owner for this integration anymore
Cheers,
jpc

@github-actions github-actions bot removed the stale label Oct 1, 2021
@github-actions
Copy link

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates.
Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍
This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Dec 30, 2021
@jpcornil-git
Copy link
Contributor Author

Issue is still present but given that the mcp23017 integration will be deprecated soon cfr. #62484, issue should be closed.

As mentioned above, another version of the mcp23017 integration without that issue will remain available as a custom_component/HACS integration under https://github.com/jpcornil-git/HA-mcp23017

Cheers,
jpc

@github-actions github-actions bot removed the stale label Dec 31, 2021
@thecode
Copy link
Member

thecode commented Mar 29, 2022

Closing this as the code for this integration is already removed from dev branch. I suggest to create a similar issue in ttps://github.com/jpcornil-git/HA-mcp23017

@thecode thecode closed this as completed Mar 29, 2022
@jpcornil-git
Copy link
Contributor Author

Just to correct that https://github.com/jpcornil-git/HA-mcp23017 doesn't suffer from this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants