Parser recompiles regex every loop #130

bdraco · 2023-04-02T20:45:44Z

Line 83 in de167c8

pattern = re.compile(signature, re.DOTALL)

It would be nice if this cached the compile of the regex and didn't recompile it every loop

lowdef · 2023-04-02T22:04:17Z

But the signature is different for every traversal of the loop:

for signature, parser in self.telegram_specification['objects'].items():
            pattern = re.compile(signature, re.DOTALL)
            matches = pattern.findall(telegram_data)

As this is done again for every new telegram offered to the parser, you might still be right that there might be some unnecessary overhead here.

I noticed on bulk processing that the parser is very slow. Not sure how much improvement this will bring.
Could be an interesting experiment.

bdraco · 2023-04-02T23:03:41Z

A quick fix would be an lru_cache around the re.compile

_cached_re_compile = lru_cache(maxsize=512)(re.compile). -- max size may be too high

for signature, parser in self.telegram_specification['objects'].items():
            pattern = _cached_re_compile(signature, re.DOTALL)
            matches = pattern.findall(telegram_data)

A better fix would be to store the re.compile objects for each signature

ndokter · 2023-04-03T05:21:37Z

Another option could be to do it in the init

ndokter · 2023-04-07T12:53:09Z

@bdraco i tested it and it seems to improve the performance by about 5-6%. Do you have a preference for the chosen solution? Im impartial and also fine wit the lru approach.

https://github.com/ndokter/dsmr_parser/pull/131/files

I suspect there is a big(ger) gain to be made in TelegramBuffer.get_all. It's a method that is called for every x bytes received, after which it performs a regex to see if a full telegram is received. Maybe it can be optimized to more smartly look at the end of a telegram which often is something like !56DD\r\n

bdraco · 2023-04-07T21:21:42Z

Building them ahead of time and storing them is cleaner than using an LRU 👍

bdraco · 2023-04-07T21:25:16Z

I suspect there is a big(ger) gain to be made in TelegramBuffer.get_all. It's a method that is called for every x bytes received, after which it performs a regex to see if a full telegram is received. Maybe it can be optimized to more smartly look at the end of a telegram which often is something like !56DD\r\n

import re

# - Match all characters after start of telegram except for the start
# itself again '^\/]+', which eliminates incomplete preceding telegrams.
# - Do non greedy match using '?' so start is matched up to the first
# checksum that's found.
# - The checksum is optional '{0,4}' because not all telegram versions
# support it.
_FIND_TELEGRAMS_REGEX = re.compile(r"\/[^\/]+?\![A-F0-9]{0,4}\0?\r\n", re.DOTALL)


class TelegramBuffer(object):
    """
    Used as a buffer for a stream of telegram data. Constructs full telegram
    strings from the buffered data and returns it.
    """

    def __init__(self):
        self._buffer = ""

    def get_all(self):
        """
        Remove complete telegrams from buffer and yield them.
        :rtype generator:
        """
        for telegram in _FIND_TELEGRAMS_REGEX.findall(self._buffer):
            self._remove(telegram)
            yield telegram

    def append(self, data):
        """
        Add telegram data to buffer.
        :param str data: chars, lines or full telegram strings of telegram data
        """
        self._buffer += data

    def _remove(self, telegram):
        """
        Remove telegram from buffer and incomplete data preceding it. This
        is easier than validating the data before adding it to the buffer.
        :param str telegram:
        :return:
        """
        # Remove data leading up to the telegram and the telegram itself.
        index = self._buffer.index(telegram) + len(telegram)

        self._buffer = self._buffer[index:]

Maybe something like this.

I don't actually use this lib so I can't test it. I was only submitting this on behalf of a user that I was helping with a performance issue with their system.

pimw1 · 2023-04-08T10:15:50Z

Yup, that user is me ;) Thank you all for looking into this!

ndokter · 2023-04-12T12:39:55Z

Thank you, i have put these performance improvements in version 1.2.2

pimw1 · 2023-04-16T12:07:11Z

Great work! Would it be possible to merge the update into Home Assistant?

ndokter · 2023-04-16T12:40:03Z

I don't know how to test the changes in HA. It seems that the version listed here is quite old as well (v0.33):
https://github.com/home-assistant/core/blob/dev/homeassistant/components/dsmr/manifest.json

bdraco · 2023-04-16T17:18:57Z

home-assistant/core#84097 looks like it's waiting for a reviewer that can test this

dupondje · 2023-04-17T07:36:06Z

I've made a HACS integration for it now, as the PR keeps hanging in HA:
https://github.com/dupondje/homeassistant-dsmr

pimw1 · 2023-04-18T10:23:37Z

Cool, thanks! I'll give it a try today/tomorrow. I'm curous to see the impact of the efficiency-improvement on the cpu-utlization.

pimw1 · 2023-04-18T10:41:09Z

By the way, since home-assistant/core#84097 is from dec 2022, i guess the performance improvement of version 1.2.2 is not in there. Would it be an idea to add the latest improvement in it?

pimw1 · 2023-04-20T18:54:28Z

Cool, thanks! I'll give it a try today/tomorrow. I'm curous to see the impact of the efficiency-improvement on the cpu-utlization.

I've tested with the new version and compared the performance with the vanila dsmr of home assistant. From performance perspective, i don't see a clear pattern. It might be that vanilla home assistant dsmr performs slightly better, but the difference is too small to tell from the graphs. I've done two runs and rescaled the y-axis for both runs.

ndokter closed this as completed Apr 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser recompiles regex every loop #130

Parser recompiles regex every loop #130

bdraco commented Apr 2, 2023

lowdef commented Apr 2, 2023

bdraco commented Apr 2, 2023 •

edited

ndokter commented Apr 3, 2023

ndokter commented Apr 7, 2023 •

edited

bdraco commented Apr 7, 2023

bdraco commented Apr 7, 2023

pimw1 commented Apr 8, 2023 •

edited

ndokter commented Apr 12, 2023

pimw1 commented Apr 16, 2023

ndokter commented Apr 16, 2023

bdraco commented Apr 16, 2023

dupondje commented Apr 17, 2023

pimw1 commented Apr 18, 2023

pimw1 commented Apr 18, 2023

pimw1 commented Apr 20, 2023 •

edited

Parser recompiles regex every loop #130

Parser recompiles regex every loop #130

Comments

bdraco commented Apr 2, 2023

lowdef commented Apr 2, 2023

bdraco commented Apr 2, 2023 • edited

ndokter commented Apr 3, 2023

ndokter commented Apr 7, 2023 • edited

bdraco commented Apr 7, 2023

bdraco commented Apr 7, 2023

pimw1 commented Apr 8, 2023 • edited

ndokter commented Apr 12, 2023

pimw1 commented Apr 16, 2023

ndokter commented Apr 16, 2023

bdraco commented Apr 16, 2023

dupondje commented Apr 17, 2023

pimw1 commented Apr 18, 2023

pimw1 commented Apr 18, 2023

pimw1 commented Apr 20, 2023 • edited

bdraco commented Apr 2, 2023 •

edited

ndokter commented Apr 7, 2023 •

edited

pimw1 commented Apr 8, 2023 •

edited

pimw1 commented Apr 20, 2023 •

edited