Respect UDP maximum packet size when sending batched points #185

simonaldrich · 2023-03-22T15:36:48Z

This closes #184.

Adds a new mandatory member function of the Transport base class called getMaxMessageSize. This function must returns the size of the maximum message which may be transmitted using the Transport implementation's send method.

For the UDP transport this is set to the minimum of the UDP socket's send buffer (for MacOS) or the maximum data length which is practical in the UDP protocol after header sizes are taken into account (65507).
N.B. This may cause packet fragmentation at the IP layer but since UDP makes no delivery guarantees anyway I think this is an acceptable compromise when the user has chosen to batch write points over UDP.
For the other transports (HTTP, TCP, UnixSocket) this is to effectively unlimited (std::numeric_limits<std::size_t>::max()) to preserve existing behaviour.

The InfluxDB::flushBatch method is updated to construct the largest possible messages which can be successfully sent given the mTransport in use.

If any individual Point is too large to be successfully sent (i.e.: its Line Protocol representation is larger than the maximum message size) it is skipped and the other Points in the batch are sent. An exception is subsequently raised.

…ion for unsuccessful call

…cifying a valid port number

…hey can transmit using their send method

…stand

…raise an exception when all other points have been sent. Also, ensure that the connection's point batch is definitely cleared even if there is a transmission error.

…surements

… size or the UDP theoretical limit (for MacOS)

…event unwanted macro expansion in MSVC see https://stackoverflow.com/questions/27442885/syntax-error-with-stdnumeric-limitsmax

…try attempts at installing packages This is a workaround for intermittent 404 errors for the Azure apt mirror

offa · 2023-03-22T18:37:17Z

.github/workflows/ci.yml

@@ -29,7 +29,9 @@ jobs:
        run: script/ci_setup.sh
      - name: Install Boost
        if: ${{ matrix.boost == true }}
-        run: apt-get install -y libboost-system-dev
+        run: |


Why the switch from apt-get to apt? Did you experience any connection issues?

offa · 2023-03-22T18:42:48Z

src/BoostSupport.cxx

@@ -103,12 +103,19 @@ namespace influxdb::internal

    std::unique_ptr<Transport> withUdpTransport(const http::url& uri)
    {
-        return std::make_unique<transports::UDP>(uri.host, uri.port);
+        static constexpr std::uint16_t INFLUXDB_UDP_PORT{8089};


I see a port as something a caller has to provide (like the host address) and not use any assumptions – unless there's a really, really good reason to do so.

(same for tcp below)

offa · 2023-03-22T18:46:05Z

src/HTTP.cxx

@@ -107,6 +109,11 @@ namespace influxdb::transports
        checkResponse(response);
    }

+    std::size_t HTTP::getMaxMessageSize() const
+    {
+        return (std::numeric_limits<std::size_t>::max)();


Instead of handling a huge number, we could simple use 0 (= off) and skip any splitting logic.

offa · 2023-03-22T18:56:19Z

src/InfluxDB.cxx

+    {
+        /// Group the points into the largest possible line-protocol messages that can be sent using the transport.
+        template <typename PointContainer>
+        void TransmitBatch(std::unique_ptr<Transport>& transport, const std::string& globalTags, PointContainer&& points)


Since the function doesn't change the unique_ptr itself, pass the transport by const-ref instead.

Also, please use transmitBatch() (lower case t) naming – I know this isn't used consistent everywhere (yet) :-)

(both same for the function below)

offa · 2023-03-22T18:59:23Z

src/UDP.cxx

 #include <string>

 namespace influxdb::transports
 {
+    namespace
+    {
+        std::size_t GetSocketSendBufferSize(const boost::asio::ip::udp::socket& socket)


Minor; getSocketSendBufferSize() (see naming comment above).

offa · 2023-03-22T19:02:37Z

src/UDP.cxx

+        // this can be changed by setting the sysctl net.inet.udp.maxdgram or setting the
+        // SO_SNDBUF option on a per socket basis. For our purposes we can just use the
+        // smaller of maxUDPDataSize and the send buffer size for the socket.
+        return std::min(maxUDPDataSize, GetSocketSendBufferSize(mSocket));


How about other systems? Is the behaviour consistent across OS types?

offa · 2023-03-22T19:04:07Z

src/UriParser.h


 namespace http
 {
    struct url
    {
        std::string protocol, user, password, host, path, search, url;
+        static constexpr int PORT_NOT_SET = -1;


(see port related comment above)

offa · 2023-03-22T19:05:54Z

test/InfluxDBTest.cxx

+        std::string p4Line{FormatPoint(p4)};
+        // Set transport max message size to accommodate three test points (with newline delimiters)
+        const std::size_t maxMessageSize{(p4Line.size() * 3) + 2};
+        //


Something missing here? :-)

offa · 2023-03-22T19:08:33Z

test/system/InfluxDBST.cxx

+        }
+    }
+
+    TEST_CASE("UDP Transport", "[InfluxDBST]")


Since this is some kind of new test suite, maybe be should extract this to a dedicated source file (to keep each on it's own topic).

offa · 2023-03-22T19:25:02Z

Thanks for your PR!

I have some comments, but my (personal) overall opinion is, that this adds quite some complexity to everyone just to handle a single (less used?) transport. Instead, I'm tempted to consider deprecating (and finally removing) UDP support, even more as v2 wont support it anymore (questdb did too).

simonaldrich · 2023-03-22T21:57:08Z

Thanks for your PR!

You're welcome. Thank you for the review comments (which I'm happy to address) and the opportunity to discuss the best approach to this 🙂

this adds quite some complexity to everyone just to handle a single (less used?) transport

You're absolutely right, this definitely adds complexity to the transmission of batched (or vectors of) points. As a slight mitigation it does consolidate two code paths (sending unbatched point vectors and flushing batched points) into a single function. Perhaps we can find middle ground which avoids the complexity for Transports with no transmission limits?

Instead, I'm tempted to consider deprecating (and finally removing) UDP support

You'd obviously be completely within your rights to do so. Unfortunately, that would be problematic from my perspective. To explain why, it might be helpful to understand the use-case behind the PR.

My team are replacing a legacy system which sends large quantities of data to another system's v1 InfluxDB using UDP. We want to perform as few writes as possible (hence the batching) so that Influx is likely to flush related batches of points to the measurements together. This mirrors the behaviour of the system being replaced.

During integration testing we noticed the issue with batched points exceeding the UDP packet size limit which would cause data loss as far as the other system was concerned.

We don't have the luxury of replacing the receiving system and I don't think this situation will be unique. I would suggest that a combination of inertia and aversion to change most likely means that Influx v1 and the UDP transport will be around for quite some time especially in embedded and industrial markets.

It would be great if we can maintain UDP support, I think it may be more widely used than you might think 🙂

offa · 2023-03-23T19:15:48Z

You'd obviously be completely within your rights to do so. Unfortunately, that would be problematic from my perspective. To explain why, it might be helpful to understand the use-case behind the PR.

Thanks for the insight, it helps a lot to understand the situation.

Perhaps we can find middle ground which avoids the complexity for Transports with no transmission limits?

An ideal solution would be to stay within UDP transport implementation. 👍
But yes, lets find a good solution for both sides here :-)

You're welcome. Thank you for the review comments (which I'm happy to address) and the opportunity to discuss the best approach to this

👍

offa · 2023-03-23T19:34:28Z

There are two little things that are worth merging independently as they provide profit already: Using uint16 ports (TCP / UDP ctor) and the resolve() related changes (= the try-catch part of UDP ctor).

If you have some minutes spare, could you submit those as a separate PR?

offa · 2023-03-23T21:12:09Z

Some thoughts (not necessarily good though):

Implementing an additional batching strategy beside by-count
Implementing send as strategy in general (eg. no batching, by-count, by-size)
Since it's a line protocol – terminated by a newline – and tags / fields don't support \n; send as many lines as fit into in a packet, then cut and continue with next packet (simple, stupid, but does it work?)

simonaldrich · 2023-03-24T09:05:13Z

Thanks for the feedback, very much appreciated. I will get around to doing the separate PRs over the next few days 🙂

offa · 2023-04-01T17:50:03Z

Since it's a line protocol – terminated by a newline – and tags / fields don't support \n; send as many lines as fit into in a packet, then cut and continue with next packet (simple, stupid, but does it work?)

While playing around with this idea it turned out to be quite promising 👍. I'm going to write a basic PoC so we'll see if it's a feasible solution.

simonaldrich added 14 commits March 21, 2023 14:35

Resolve UDP endpoint using the non-deprecated method and catch except…

7d7ce20

…ion for unsuccessful call

Add unit test for UDP transport

1cf6890

Catch invalid URI port and protocol values in UriParser, add unit-test

8838ee0

Make it impossible to construct the UDP or TCP transports without spe…

371a58d

…cifying a valid port number

Add system tests for UDP transport

9a07eab

Allow Transport implementations to specify the maximum message size t…

865c8af

…hey can transmit using their send method

Respect the transport's maximum message size when transmitting point(s)

72585fc

Refactor the TransmitBatch function to make the logic easier to under…

5f741d0

…stand

When transmitting a point batch, skip any untransmittable points and …

238b6a0

…raise an exception when all other points have been sent. Also, ensure that the connection's point batch is definitely cleared even if there is a transmission error.

Extend the wait within systemtest for Influx to flush UDP data to mea…

bfb35ae

…surements

Apply clang-format, fix CI build errors under various compilers

071bb61

Make the UDP max message size the minimum of the socket's send buffer…

306e05d

… size or the UDP theoretical limit (for MacOS)

Wrap all calls to std::numeric_limits::(min/max) in parentheses to pr…

598988d

…event unwanted macro expansion in MSVC see https://stackoverflow.com/questions/27442885/syntax-error-with-stdnumeric-limitsmax

Always perform an "apt update" before an "apt install" and allow 3 re…

312e383

…try attempts at installing packages This is a workaround for intermittent 404 errors for the Azure apt mirror

offa reviewed Mar 22, 2023

View reviewed changes

offa self-assigned this Mar 22, 2023

offa added the enhancement label Mar 22, 2023

offa added the discussion label Mar 22, 2023

offa mentioned this pull request Apr 5, 2023

[PoC] UDP packet size #192

Draft

simonaldrich closed this by deleting the head repository Jul 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Respect UDP maximum packet size when sending batched points #185

Respect UDP maximum packet size when sending batched points #185

simonaldrich commented Mar 22, 2023

offa Mar 22, 2023

offa Mar 22, 2023

offa Mar 22, 2023

offa Mar 22, 2023

offa Mar 22, 2023

offa Mar 22, 2023

offa Mar 22, 2023

offa Mar 22, 2023

offa Mar 22, 2023

offa commented Mar 22, 2023

simonaldrich commented Mar 22, 2023 •

edited

Loading

offa commented Mar 23, 2023

offa commented Mar 23, 2023 •

edited

Loading

offa commented Mar 23, 2023 •

edited

Loading

simonaldrich commented Mar 24, 2023

offa commented Apr 1, 2023

Respect UDP maximum packet size when sending batched points #185

Respect UDP maximum packet size when sending batched points #185

Conversation

simonaldrich commented Mar 22, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

offa commented Mar 22, 2023

simonaldrich commented Mar 22, 2023 • edited Loading

offa commented Mar 23, 2023

offa commented Mar 23, 2023 • edited Loading

offa commented Mar 23, 2023 • edited Loading

simonaldrich commented Mar 24, 2023

offa commented Apr 1, 2023

simonaldrich commented Mar 22, 2023 •

edited

Loading

offa commented Mar 23, 2023 •

edited

Loading

offa commented Mar 23, 2023 •

edited

Loading