Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modbus patch, to allow communication with "slow" equipment using tcp #32557

Merged
merged 15 commits into from Mar 29, 2020

Conversation

janiversen
Copy link
Member

@janiversen janiversen commented Mar 7, 2020

Proposed change

Some modbus-tcp servers (like e.g. Huawei sun2000) need time between the connect() and the first send(), in order to prepare themself.

A typical error pattern is that connect() works perfectly, but when requesting data there are no response (apart from the TCP/IP ACK message). Once in this mode, the server cannot recover and later messages are also not responded to.

The modbus configuration (only for modbus-tcp) have been extended with a new optional parameter "delay" with default being 0. When adding e.g. delay: 2, time.sleep(2) will be called directly after the connect() returns. If delay is not present in the configuration it has a value of 0, and time.sleep() are not called.

Another PR have been filed to update the documentation.

There should not be any breaking changes, basically it is "just" a change from
sync to async. However the library version is bumped from 1.5.2 to 2.3.0, and
that might solve/cause problems.

This started as a 2-3 line change, but now something like 30% of the code is touched,
due to the request of making it async.

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • [x ] New feature (which adds functionality to an existing integration)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Example entry for configuration.yaml:

# Example configuration.yaml
modbus:
    name: sun2000
    type: tcp
    host: 192.168.0.3
    port: 502
    delay: 2

sensor:
  platform: modbus
  scan_interval: 10
  registers:
    - name: powerMeter
      hub: sun2000
      unit_of_measurement: W
      slave: 0
      register: 37113
      count: 2

Additional information

The PR is done as 2 commits.

First commit adds a fixed time.sleep(2), which is the solution (apart from the 2)
Second commit adds the additional items needed to make time.sleep(_delay) configurable

Checklist

  • [x ] The code change is tested and works locally.
  • [ x] Local tests pass. Your PR cannot be merged unless tests pass
  • [ x] There is no commented out code in this PR.
  • [ x] I have followed the development checklist
  • The code has been formatted using Black (black --fast homeassistant tests)
  • Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • Untested files have been added to .coveragerc.

The integration reached or maintains the following Integration Quality Scale:

  • No score or internal
  • 🥈 Silver
  • 🥇 Gold
  • 🏆 Platinum

@homeassistant
Copy link
Contributor

Hi @janiversen,

It seems you haven't yet signed a CLA. Please do so here.

Once you do that we will be able to review and accept this pull request.

Thanks!

@probot-home-assistant probot-home-assistant bot added integration: modbus small-pr PRs with less than 30 lines. labels Mar 7, 2020
@project-bot project-bot bot added this to Needs review in Dev Mar 7, 2020
@probot-home-assistant
Copy link

Hey there @adamchengtkc, mind taking a look at this pull request as its been labeled with a integration (modbus) you are listed as a codeowner for? Thanks!

@janiversen
Copy link
Member Author

It would be nice if someone could take a look at this small patch, so it can (hopefully) get merged, and used.

@Emilv2
Copy link
Contributor

Emilv2 commented Mar 12, 2020

I think the devs have said that time.sleep is never allowed, but I could be wrong.

I'm interested in your usecase, since I also had some trouble getting modbus to read out the Huawei sun2000, and I seem to have different results.

If I understand your code correctly, the connect (and delay) is only used once during HA startup, after which there are no problems anymore?
My device frequently gives errors, and the solution I found is to try to read out three times, it always works on the third try. Delay between the three reads does not matter (unless the timeout is small, then more might be needed). Immediately following reads work from the first try, but even after several seconds three tries are needed again. Not really a very elegant solution of course.

Waiting after connect seems to work, but I still need to repeat that frequently (every ~10 seconds) or the problem reoccurs.

@janiversen
Copy link
Member Author

I agree that using time.sleep in the running code is something that should be avoided and are normally not really needed since the worker thread will call every second. However in setup once you return (unless I am mistaken, which very well might be the case) setup is done, so because of that a wait is needed in the function.

If you poll the sun2000 too fast it will fail. Do you use the scan interval as pr my example? It runs for me andI might get a read error 2-3 times a day but not more.

Can you please provide the modbus part of your config then I can try it on my installation and see what can be done

@Emilv2
Copy link
Contributor

Emilv2 commented Mar 12, 2020

I don't use the build-in modbus component but I build my own: https://github.com/Emilv2/huawei_solar
The library doing the actual communication is here: https://gitlab.com/Emilv2/huawei-solar

I quickly tried your solution using pymodbus manually on an interactive python, and your solution seems to work. Close the connection, connect the client, and then wait two seconds to read the registers, but I need to repeat this every time I start to read registers. Polling it very fast doesn't to fail for me, it's only the first read in a sequence that fails.

I would say that if I could get async pymodbus to work that would be perfect, since asyncio.sleep() is non-blocking and allowed. But for some reason I can't get it to work: https://community.home-assistant.io/t/async-pymodbus/177601

@janiversen
Copy link
Member Author

Why do you need to connect for every poll cycle? that creates a bit of stress in the sun2000 and seems to be a lot of overhead.

How does the first one fail? no response at all, only a tcp ack or a error message? I have a non home assistant python3 script that works without problems. The Huawei developers have confirmed that a delay after connect is needed.

I am actually testing a new integration with async, but async.sleep() does not solve the delay after connect since it is in setup(). I read multiple registers in one read and then have a short async.sleep(). So far I have not had problems with the core of home assistant. I made the setup async as well.

@janiversen
Copy link
Member Author

Just had a fast look at your code, I miss a thread lock is that on purpose?

I will try your code this weekend, it looks a lot like my code, with 2 big differences. Based on the configuration I group the registers and read then in blocks and I keep a local value to further limit the number of actual reads.

@Emilv2
Copy link
Contributor

Emilv2 commented Mar 12, 2020

Because otherwise it fails just like if I never did the sleep. I think it was my HA instance reading out at the same time that made it do that, with it disabled it works as you described. multiple connections at the same time don't seem to be supported. I was too fast, it still happens after a while.

Without the sleep or after a while it always fails in the same way like this:

>>> result = client.read_holding_registers(30000,15)
ModbusIOException(InvalidMessageReceivedException('Incomplete message received, expected at 
least 8 bytes (0 received)'), 3)
>>> result = client.read_holding_registers(30000,15)
ModbusIOException('No Response received from the remote unit/Unable to decode response', 3)
>>> result = client.read_holding_registers(30000,15)
>>> result.encode()
b'\x1eSUN2000L-3KTL\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

I'm fairly new to asyncio, where would I need a thread lock?

Which local variables do you keep that I don't? The variables that do not change such as name and serial numbers I poll only once and the ones that change I poll. I thought about reading the registers in blocks, but that seems to make it so much more complicated.

@janiversen
Copy link
Member Author

the thread lock is for both sync and async, it should be around each pymodbus call. Sun2000 do not support multiple connections, and if you open/close-open/close without a larger delay you will rum into problems.

asyncio is a bit more complicated, just as reading multiple registers with on read call, but it pays to do it.

I keep local variables for all registers, because from a automation pow, only current production and the import/value matters. I actually build a third variable “current comsumption” = production - power meter (import/export).

@Emilv2
Copy link
Contributor

Emilv2 commented Mar 12, 2020

I tried on three different devices, and all keep the same problem:

import pymodbus
import time
client = pymodbus.client.sync.ModbusTcpClient('192.168.0.135', '502')
client.connect()
time.sleep(3)
# works
result = client.read_holding_registers(30000,15)
time.sleep(180) # not sure about the exact value needed
# fails again
result = client.read_holding_registers(30000,15)

My code does not get stuck on any async part, it gets stuck on the creation of the modbus client. I can try it, but I don't see how a lock can help there?

@janiversen
Copy link
Member Author

janiversen commented Mar 12, 2020

the lock is not for the connect() but for the read() however since you reconnect every time, you are very likely to run into problems with multiple connects simultaneously, unless you lock the connect().

As I wrote the lock is not to solve your async problem, since a lock is also needed for the sync calls. Your async problems are different, and I will try to compare my async code to see why yours might fail.

I am not at my computer, but I remember I did something about adding keepalive to the socket.

@Emilv2
Copy link
Contributor

Emilv2 commented Mar 12, 2020

I don't think adding a lock would make sense in my sync code? All the updates are done sequentially in the same function.

While the current production is the most useful for automation, the other values can be of use too for monitoring your setup so I poll them too.

Dev automation moved this from Needs review to Review in progress Mar 16, 2020
@Emilv2
Copy link
Contributor

Emilv2 commented Mar 16, 2020

Shouldn't async.sleep() work as well when you're using the lock during the sleep? The component should be first changed to use async.

@Emilv2
Copy link
Contributor

Emilv2 commented Mar 16, 2020

If you are using the lock the update process will have to wait for the sleep to be finished and the lock to be released.

According to this there is a difference between the two. async.sleep() also seems to be allowed in the HA code.

@janiversen
Copy link
Member Author

you are right my comment was wrong (and I deleted it), there is a difference between async and asyncio :-(

I am testing async.sleep() right now.

@janiversen
Copy link
Member Author

Changed time.sleep to asyncio.sleep(), and at the same time learned something :-) it seems time.sleep() blocks the process in some versions of python and only the thread in other versions, at least that how I understand the documentation. @Emilv2 thanks for the suggestion.

@bdraco I hope this solves your request for change, I made it as an extra commit to make the review easier, if you want me to squash it or make other changes please let me know.

@janiversen
Copy link
Member Author

the assert error in sensor.py seems to be in a previous version, at least I cannot find the assert.

found and corrected the problem in switch.py (update --> async_update).

@vzahradnik Lets see what other errors you can come up with :-)

Copy link
Contributor

@vzahradnik vzahradnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at my comments. Thanks!

homeassistant/components/modbus/__init__.py Outdated Show resolved Hide resolved
homeassistant/components/modbus/const.py Outdated Show resolved Hide resolved
homeassistant/components/modbus/const.py Outdated Show resolved Hide resolved
@vzahradnik
Copy link
Contributor

@janiversen I tested and reviewed your latest code. Everything works for me now. I found a few cosmetic issues. Otherwise, I think this code is in a good shape, and from my point of view I will hapilly see this PR merged.

For a reference, I'm putting here log warning, which needs to be fixed in the pymodbus library someday:

/srv/git/hassio/home-assistant/venv/lib/python3.8/site-packages/pymodbus/client/asynchronous/asyncio/__init__.py:250: DeprecationWarning: "@coroutine" decorator is deprecated since Python 3.8, use "async def" instead
  def start(self, host, port=502):

@janiversen
Copy link
Member Author

I will make your suggested changes later. I have seen the warnings, but it is really a warning about it will not work in 3.10 so I do not think the pymodbus people will see it as sn urgent issue.

Thanks for your time so far, once all is cleaned up (tomorrow) I will ask to get merged.

Adjust code based on review comments.
@janiversen
Copy link
Member Author

@bdraco finally this PR is robust and ready to merge (pending if you have new comments).

@vzahradnik have helped a lot with review (as pr. suggestion from the code owner @adamchengtkc) and testing. Now we have an integration a lot more robust and totally async.

Looking forward to get this merged, so both @vzahradnik and myself can continue on follow-up PRs.

Copy link
Member

@bdraco bdraco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @janiversen Glad to see this is getting closer!

Would you please ensure all outstanding comments are resolved and mark them as resolved?

I did a quick run though. I'll do a more through pass this weekend when I'm not focused on my day job.

homeassistant/components/modbus/__init__.py Show resolved Hide resolved
homeassistant/components/modbus/climate.py Show resolved Hide resolved
homeassistant/components/modbus/sensor.py Show resolved Hide resolved
homeassistant/components/modbus/switch.py Show resolved Hide resolved
homeassistant/components/modbus/switch.py Show resolved Hide resolved
homeassistant/components/modbus/climate.py Show resolved Hide resolved
@janiversen
Copy link
Member Author

Sorry if I made confusion by doing "git rebase", but it is natural to me, to update the commit and thus not first make something wrong and then correct it in a later commit. This way of working stops for me, when a PR is set for final review (as this one is now).

If "git rebase" causes too much confusion, please tell me, and I will refrain from using it.

@bdraco
Copy link
Member

bdraco commented Mar 27, 2020

Sorry if I made confusion by doing "git rebase", but it is natural to me, to update the commit and thus not first make something wrong and then correct it in a later commit. This way of working stops for me, when a PR is set for final review (as this one is now).

If "git rebase" causes too much confusion, please tell me, and I will refrain from using it.

rebase should be fine. I can't see where you are logging when the device goes unavailable, and logging where it is recovered. I'm looking at 4b1c64f as the sha when I check this out. Can you point me to a specific file / line?

@janiversen
Copy link
Member Author

We are not doing that loging, because it is handled inside the Pymodbus library. It will automatically try to reconnect with a doubling delay for each failed attempt, so there are no spefic log for that.

Pymodbus will return an exception when it gives up, and that is handled in init.py line 255, which also handles spurious read errors.

Pymodbus will return None, if there are no connection, which will set unavailable at entity level.

Pymodbus in asyncio mode do not use twister but still throws a
warning if twister is not installed, this warning goes into
homeassistant.log and can thus cause confusion among users.

However installing twister just to avoid the warning is not
the best solution, therefore removing dependency on twister.
@janiversen
Copy link
Member Author

@bdraco do you have more review comments ?

@bdraco
Copy link
Member

bdraco commented Mar 28, 2020

I’m still planning on running though this today now that we hit the weekend and I can give it more attention

remove commented out code.
@bdraco
Copy link
Member

bdraco commented Mar 28, 2020

@janiversen LGTM. I don't see anything else besides the comments out code. Once that is fixed, I can approve. Someone else will likely do another pass on this at some point before or after merge given the size.

@janiversen
Copy link
Member Author

@bdraco thanks a lot for your review. I am working on the followup PR, so I hope is not going to take ages :-)

Dev automation moved this from Review in progress to Reviewer approved Mar 28, 2020
@vzahradnik
Copy link
Contributor

Somebody should remove the flag "small-pr" 😄

@bdraco bdraco removed the small-pr PRs with less than 30 lines. label Mar 28, 2020
@bdraco bdraco merged commit dd3cd95 into home-assistant:dev Mar 29, 2020
Dev automation moved this from Reviewer approved to Done Mar 29, 2020
@janiversen janiversen deleted the modbuspatch branch March 29, 2020 17:52
Copy link
Member

@MartinHjelmare MartinHjelmare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just some small comments for a future PR.

self._client = modbus_client
self._lock = threading.Lock()
self._name = name
_LOGGER.debug("Preparing setup: %s", client_config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want side effects, eg logging, in init method. Please move it outside, or remove it.

async with self._lock:
kwargs = {"unit": unit} if unit else {}
result = await func(address, count, **kwargs)
if isinstance(result, (ModbusException, ExceptionResponse)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if the client func raised errors instead of returning them. I guess this is the library responsibility.

@@ -56,7 +54,7 @@
)


def setup_platform(hass, config, add_entities, discovery_info=None):
async def async_setup_platform(hass, config, add_entities, discovery_info=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename add_entities to async_add_entities.

@lock lock bot locked and limited conversation to collaborators Apr 2, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Dev
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

8 participants