Add uint64 support to line protocol serializer #516

drcraig · 2022-10-12T17:54:08Z

Proposal:
Add support for 64 bit unsigned integers to Point.to_line_protocol()

Current behavior:
InfluxDB 2.x supports 64 bit unsigned integers, which the Line Protocol documentation says should be suffixed with a "u". Point.to_line_protocol (via calling _append_fields() only appends an "i" for integer field values. This works fine for positive integers smaller than max signed int32, but for positive integers between max signed int32 and max unsigned int64, InfluxDB 2.x will reject as "value out of range."

Desired behavior:
Perhaps the simplest and backward compatible solution would be to use a "u" suffix for integer values > max signed int32. I don't have a good sense, though, of whether this would behave expectedly on InfluxDB 1.x when uint support is not enabled.

Alternatives considered:

Append a "u" for all positive integers, regardless of size. I don't know for sure, but I'd worry that influx will have typing problems if you mix unsigned and signed integers for the same field.
Write my own line protocol serializer for my specific use case so that I can have deliberate control of the suffixes based on my specific definitions of my measurements.

Use case:
Metrics collector that receives a stream of incoming data, translates it to line protocol, and sends it to a local Telegraf socket listener for ingestion into a remote InfluxDB.

bednar · 2022-10-13T07:50:13Z

Hi @drcraig,

thanks for using our client and your proposal.

Append a "u" for all positive integers, regardless of size. I don't know for sure, but I'd worry that influx will have typing problems if you mix unsigned and signed integers for the same field.

You are right, the InfluxDB raise the unprocessable entity if the type is mixed:

curl -i -X POST 'http://localhost:8086/api/v2/write?org=my-org&bucket=my-bucket' \
    -H 'Authorization: Token my-token' \
    -d 'h2o,location=west level=1i'

curl -i -X POST 'http://localhost:8086/api/v2/write?org=my-org&bucket=my-bucket' \
    -H 'Authorization: Token my-token' \
    -d 'h2o,location=west level=1u'

>>> 

{"code":"unprocessable entity","message":"failure writing points to database: partial write: field type conflict: input field \"level\" on measurement \"h2o\" is type unsigned, already exists as type integer dropped=1"}

Write my own line protocol serializer for my specific use case so that I can have deliberate control of the suffixes based on my specific definitions of my measurements.

I think we could support something like this directly in the client API. It will be also useful for cases like this: https://community.influxdata.com/t/pushing-nested-json-into-influxdb-using-python/26940/5?u=bednar.

What do you think about new option for the client: integer_number_type:

:param integer_number_type: Specifies which number type should be used for serialisation integers into Line protocol. Possible values are: 
                            - ``i`` -  serialize integers as "Signed 64-bit integers" - "9223372036854775807i" (default behaviour)
                            - ``u`` -  serialize integers as "Unsigned 64-bit integers" - "9223372036854775807u"
                            - ``f`` -  serialize integers as "IEEE-754 64-bit floating-point numbers". Useful for unify number types in your pipeline to avoid field type conflict - "9223372036854775807.0"

?

Regards

drcraig · 2022-10-13T17:04:59Z

@bednar , thank you for the reply and for testing out that "u" for all positive integers is not a viable solution.

I don't think a setting at the client level would work, at least for my use case. We have measurements with many fields of different types, some of which are signed integers and some are unsigned, or for that matter, floats, bools, etc.

What about an optional field_types parameter, either to Point or to to_line_protocol? It could be a dictionary keyed by field names and allow you to explicitly set the type for that field. If not specified, it would use the default inferred types, as it currently does.

For Point.from_dict(), something like:

dict_structure = {
    "measurement": "h2o_feet",
    "tags": {"location": "coyote_creek"},
    "fields": {
        "water_level": 1.0,
        "some_counter": 108913123234
     },
     "field_types": {
        "some_counter": "uinteger"
     },
    "time": 1
}
point = Point.from_dict(dict_structure)

For the Point object itself, the type could be an optional parameter to field(), e.g.

def field(self, field, value, field_type=None):
    ...

Using "uinteger" feels like a clunky name (as opposed to, say, "uint"), but it's consistent with the line protocol docs.

bednar · 2022-10-14T09:07:02Z

@drcraig thanks for detail info

It sounds reasonable to have possibility configure types per field... we have to figure out how to update our APIs.

What do you thing about something like:

dict_structure = {
    "measurement": "h2o_feet",
    "tags": {"location": "coyote_creek"},
    "fields": {
        "water_level": 1.0,
        "some_counter": 108913123234
     },
    "time": 1
}
point = Point.from_dict(dict_structure, number_types={"some_counter", "uinteger"})

?

I don't think a setting at the client level would work, at least for my use case. We have measurements with many fields of different types, some of which are signed integers and some are unsigned, or for that matter, floats, bools, etc.

It will be applicable only for number fields and useful if you do not have control to your incoming data. Something like:

[
   {
      "humidity": 62,
      "temperature": 20.88,
      "windSpeed": 7
   },
   {
      "humidity": 62.5,
      "temperature": 20.88,
      "windSpeed": 7
   }
]

the first occurs of humidity is parsed as a int and second as a float which leads to field type conflict.

drcraig · 2022-10-17T22:55:01Z

dict_structure = {
"measurement": "h2o_feet",
"tags": {"location": "coyote_creek"},
"fields": {
"water_level": 1.0,
"some_counter": 108913123234
},
"time": 1
}
point = Point.from_dict(dict_structure, number_types={"some_counter", "uinteger"})

Yeah, that could work. Still feels a little unfortunate to have to separate it from the rest of the dict, but I could be good with it. I hadn't considered the possibility of not controlling your incoming data. In my particular case, the incoming data is of a different format, and we have to construct the dict and populate it with the parsed incoming data. I don't have any sense of which is the more common situation; controlling the dict or not, so I'd defer to your expertise there.

bednar added the enhancement New feature or request label Oct 13, 2022

powersj added the waiting for response waiting for response from contributor label Oct 17, 2022

telegraf-tiger bot removed the waiting for response waiting for response from contributor label Oct 17, 2022

bednar mentioned this issue Dec 9, 2022

feat: configure types of integers fields when initializing Point from dict structure #538

Merged

6 tasks

bednar closed this as completed in #538 Dec 12, 2022

bednar added this to the 1.35.0 milestone Dec 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add uint64 support to line protocol serializer #516

Add uint64 support to line protocol serializer #516

drcraig commented Oct 12, 2022 •

edited

Loading

bednar commented Oct 13, 2022 •

edited

Loading

drcraig commented Oct 13, 2022 •

edited

Loading

bednar commented Oct 14, 2022

drcraig commented Oct 17, 2022

Add uint64 support to line protocol serializer #516

Add uint64 support to line protocol serializer #516

Comments

drcraig commented Oct 12, 2022 • edited Loading

bednar commented Oct 13, 2022 • edited Loading

drcraig commented Oct 13, 2022 • edited Loading

bednar commented Oct 14, 2022

drcraig commented Oct 17, 2022

drcraig commented Oct 12, 2022 •

edited

Loading

bednar commented Oct 13, 2022 •

edited

Loading

drcraig commented Oct 13, 2022 •

edited

Loading