Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add uint64 support to line protocol serializer #516

Closed
drcraig opened this issue Oct 12, 2022 · 4 comments · Fixed by #538
Closed

Add uint64 support to line protocol serializer #516

drcraig opened this issue Oct 12, 2022 · 4 comments · Fixed by #538
Labels
enhancement New feature or request
Milestone

Comments

@drcraig
Copy link

drcraig commented Oct 12, 2022

Proposal:
Add support for 64 bit unsigned integers to Point.to_line_protocol()

Current behavior:
InfluxDB 2.x supports 64 bit unsigned integers, which the Line Protocol documentation says should be suffixed with a "u". Point.to_line_protocol (via calling _append_fields() only appends an "i" for integer field values. This works fine for positive integers smaller than max signed int32, but for positive integers between max signed int32 and max unsigned int64, InfluxDB 2.x will reject as "value out of range."

Desired behavior:
Perhaps the simplest and backward compatible solution would be to use a "u" suffix for integer values > max signed int32. I don't have a good sense, though, of whether this would behave expectedly on InfluxDB 1.x when uint support is not enabled.

Alternatives considered:

  1. Append a "u" for all positive integers, regardless of size. I don't know for sure, but I'd worry that influx will have typing problems if you mix unsigned and signed integers for the same field.
  2. Write my own line protocol serializer for my specific use case so that I can have deliberate control of the suffixes based on my specific definitions of my measurements.

Use case:
Metrics collector that receives a stream of incoming data, translates it to line protocol, and sends it to a local Telegraf socket listener for ingestion into a remote InfluxDB.

@bednar bednar added the enhancement New feature or request label Oct 13, 2022
@bednar
Copy link
Contributor

bednar commented Oct 13, 2022

Hi @drcraig,

thanks for using our client and your proposal.

  1. Append a "u" for all positive integers, regardless of size. I don't know for sure, but I'd worry that influx will have typing problems if you mix unsigned and signed integers for the same field.

You are right, the InfluxDB raise the unprocessable entity if the type is mixed:

curl -i -X POST 'http://localhost:8086/api/v2/write?org=my-org&bucket=my-bucket' \
    -H 'Authorization: Token my-token' \
    -d 'h2o,location=west level=1i'

curl -i -X POST 'http://localhost:8086/api/v2/write?org=my-org&bucket=my-bucket' \
    -H 'Authorization: Token my-token' \
    -d 'h2o,location=west level=1u'

>>> 

{"code":"unprocessable entity","message":"failure writing points to database: partial write: field type conflict: input field \"level\" on measurement \"h2o\" is type unsigned, already exists as type integer dropped=1"}
  1. Write my own line protocol serializer for my specific use case so that I can have deliberate control of the suffixes based on my specific definitions of my measurements.

I think we could support something like this directly in the client API. It will be also useful for cases like this: https://community.influxdata.com/t/pushing-nested-json-into-influxdb-using-python/26940/5?u=bednar.

What do you think about new option for the client: integer_number_type:

:param integer_number_type: Specifies which number type should be used for serialisation integers into Line protocol. Possible values are: 
                            - ``i`` -  serialize integers as "Signed 64-bit integers" - "9223372036854775807i" (default behaviour)
                            - ``u`` -  serialize integers as "Unsigned 64-bit integers" - "9223372036854775807u"
                            - ``f`` -  serialize integers as "IEEE-754 64-bit floating-point numbers". Useful for unify number types in your pipeline to avoid field type conflict - "9223372036854775807.0"                       

?

Regards

@drcraig
Copy link
Author

drcraig commented Oct 13, 2022

@bednar , thank you for the reply and for testing out that "u" for all positive integers is not a viable solution.

I don't think a setting at the client level would work, at least for my use case. We have measurements with many fields of different types, some of which are signed integers and some are unsigned, or for that matter, floats, bools, etc.

What about an optional field_types parameter, either to Point or to to_line_protocol? It could be a dictionary keyed by field names and allow you to explicitly set the type for that field. If not specified, it would use the default inferred types, as it currently does.

For Point.from_dict(), something like:

dict_structure = {
    "measurement": "h2o_feet",
    "tags": {"location": "coyote_creek"},
    "fields": {
        "water_level": 1.0,
        "some_counter": 108913123234
     },
     "field_types": {
        "some_counter": "uinteger"
     },
    "time": 1
}
point = Point.from_dict(dict_structure)

For the Point object itself, the type could be an optional parameter to field(), e.g.

def field(self, field, value, field_type=None):
    ...

Using "uinteger" feels like a clunky name (as opposed to, say, "uint"), but it's consistent with the line protocol docs.

@bednar
Copy link
Contributor

bednar commented Oct 14, 2022

@drcraig thanks for detail info

It sounds reasonable to have possibility configure types per field... we have to figure out how to update our APIs.

What do you thing about something like:

dict_structure = {
    "measurement": "h2o_feet",
    "tags": {"location": "coyote_creek"},
    "fields": {
        "water_level": 1.0,
        "some_counter": 108913123234
     },
    "time": 1
}
point = Point.from_dict(dict_structure, number_types={"some_counter", "uinteger"})

?

I don't think a setting at the client level would work, at least for my use case. We have measurements with many fields of different types, some of which are signed integers and some are unsigned, or for that matter, floats, bools, etc.

It will be applicable only for number fields and useful if you do not have control to your incoming data. Something like:

[
   {
      "humidity": 62,
      "temperature": 20.88,
      "windSpeed": 7
   },
   {
      "humidity": 62.5,
      "temperature": 20.88,
      "windSpeed": 7
   }
]

the first occurs of humidity is parsed as a int and second as a float which leads to field type conflict.

@powersj powersj added the waiting for response waiting for response from contributor label Oct 17, 2022
@drcraig
Copy link
Author

drcraig commented Oct 17, 2022

dict_structure = {
"measurement": "h2o_feet",
"tags": {"location": "coyote_creek"},
"fields": {
"water_level": 1.0,
"some_counter": 108913123234
},
"time": 1
}
point = Point.from_dict(dict_structure, number_types={"some_counter", "uinteger"})

Yeah, that could work. Still feels a little unfortunate to have to separate it from the rest of the dict, but I could be good with it. I hadn't considered the possibility of not controlling your incoming data. In my particular case, the incoming data is of a different format, and we have to construct the dict and populate it with the parsed incoming data. I don't have any sense of which is the more common situation; controlling the dict or not, so I'd defer to your expertise there.

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Oct 17, 2022
@bednar bednar added this to the 1.35.0 milestone Dec 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
3 participants