-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parser recompiles regex every loop #130
Comments
But the signature is different for every traversal of the loop:
As this is done again for every new telegram offered to the parser, you might still be right that there might be some unnecessary overhead here. I noticed on bulk processing that the parser is very slow. Not sure how much improvement this will bring. |
A quick fix would be an
A better fix would be to store the |
Another option could be to do it in the init |
@bdraco i tested it and it seems to improve the performance by about 5-6%. Do you have a preference for the chosen solution? Im impartial and also fine wit the lru approach. https://github.com/ndokter/dsmr_parser/pull/131/files I suspect there is a big(ger) gain to be made in |
Building them ahead of time and storing them is cleaner than using an LRU 👍 |
import re
# - Match all characters after start of telegram except for the start
# itself again '^\/]+', which eliminates incomplete preceding telegrams.
# - Do non greedy match using '?' so start is matched up to the first
# checksum that's found.
# - The checksum is optional '{0,4}' because not all telegram versions
# support it.
_FIND_TELEGRAMS_REGEX = re.compile(r"\/[^\/]+?\![A-F0-9]{0,4}\0?\r\n", re.DOTALL)
class TelegramBuffer(object):
"""
Used as a buffer for a stream of telegram data. Constructs full telegram
strings from the buffered data and returns it.
"""
def __init__(self):
self._buffer = ""
def get_all(self):
"""
Remove complete telegrams from buffer and yield them.
:rtype generator:
"""
for telegram in _FIND_TELEGRAMS_REGEX.findall(self._buffer):
self._remove(telegram)
yield telegram
def append(self, data):
"""
Add telegram data to buffer.
:param str data: chars, lines or full telegram strings of telegram data
"""
self._buffer += data
def _remove(self, telegram):
"""
Remove telegram from buffer and incomplete data preceding it. This
is easier than validating the data before adding it to the buffer.
:param str telegram:
:return:
"""
# Remove data leading up to the telegram and the telegram itself.
index = self._buffer.index(telegram) + len(telegram)
self._buffer = self._buffer[index:] Maybe something like this. I don't actually use this lib so I can't test it. I was only submitting this on behalf of a user that I was helping with a performance issue with their system. |
Yup, that user is me ;) Thank you all for looking into this! |
Thank you, i have put these performance improvements in version 1.2.2 |
Great work! Would it be possible to merge the update into Home Assistant? |
I don't know how to test the changes in HA. It seems that the version listed here is quite old as well (v0.33): |
home-assistant/core#84097 looks like it's waiting for a reviewer that can test this |
I've made a HACS integration for it now, as the PR keeps hanging in HA: |
Cool, thanks! I'll give it a try today/tomorrow. I'm curous to see the impact of the efficiency-improvement on the cpu-utlization. |
By the way, since home-assistant/core#84097 is from dec 2022, i guess the performance improvement of version 1.2.2 is not in there. Would it be an idea to add the latest improvement in it? |
dsmr_parser/dsmr_parser/parsers.py
Line 83 in de167c8
It would be nice if this cached the compile of the regex and didn't recompile it every loop
The text was updated successfully, but these errors were encountered: