Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Inconsistent data type (int/float) within single field/column in results #8707

Closed
bbczeuz opened this issue Aug 16, 2017 · 5 comments
Closed

Comments

@bbczeuz
Copy link

bbczeuz commented Aug 16, 2017

Bug report

System info:

  • RHEL7, Influxdb 1.2.4

Steps to reproduce:

  1. Push some data with the 'cpu'-plugin of telegraf to the DB
  2. Query this data
  3. Resulting field data types vary in a single column from row to row

Expected behavior:
One field shall always have the same data type

Actual behavior:
If a float-field value is zero, it's returned as '0' instead of '0.0'. Re-Importing this data to InfluxDB leads to a field type conflict: e.g. 400: {"error":"partial write: field type conflict: input field \"usage_user\" on measurement \"cpu\" is type integer, already exists as type float dropped=1"}

  • Test SELECT usage_user::integer FROM telegraf.raw.cpu: all results arrive as integers
  • Test SELECT usage_user::float FROM telegraf.raw.cpu: some results arrive as ints, some as floats
  • Test SELECT usage_user + 0.0 FROM telegraf.raw.cpu: some results arrive as ints, some as floats
  • Test SELECT usage_user + 0.5 FROM telegraf.raw.cpu: all results arrive as floats
  • Test SELECT usage_user + 1.0 FROM telegraf.raw.cpu: some results arrive as ints, some as floats
  • At least Python's JSON package can differ between int and float zero: aa=json.loads('{"bla":0.0, "ble":0}')

Additional info:

curl -G 'http://localhost:8086/query?pretty=true' --data-urlencode "db=telegraf" --data-urlencode "q=SHOW FIELD KEYS FROM cpu"                             
{
    "results": [
        {
            "statement_id": 0,
            "series": [
                {
                    "name": "cpu",
                    "columns": [
                        "fieldKey",
                        "fieldType"
                    ],
                    "values": [
                        [
                            "usage_guest",
                            "float"
                        ],
                        [
                            "usage_guest_nice",
                            "float"
                        ],
                        [
                            "usage_idle",
                            "float"
                        ],
                        [
                            "usage_iowait",
                            "float"
                        ],
                        [
                            "usage_irq",
                            "float"
                        ],
                        [
                            "usage_nice",
                            "float"
                        ],
                        [
                            "usage_softirq",
                            "float"
                        ],
                        [
                            "usage_steal",
                            "float"
                        ],
                        [
                            "usage_system",
                            "float"
                        ],
                        [
                            "usage_user",
                            "float"
                        ]
                    ]
                }
            ]
        }
    ]
}

curl -G 'http://localhost:8086/query?pretty=true' --data-urlencode "db=telegraf" --data-urlencode "q=SELECT usage_user + 0.0 FROM telegraf.raw.cpu WHERE \"cpu\"='cpu3' GROUP BY * ORDER BY time DESC LIMIT 3" 
{
    "results": [
        {
            "statement_id": 0,
            "series": [
                {
                    "name": "cpu",
                    "tags": {
                        "cpu": "cpu3",
                        "host": "salt-probe"
                    },
                    "columns": [
                        "time",
                        "usage_user"
                    ],
                    "values": [
                        [
                            "2017-08-16T11:38:00Z",
                            0.6018054162312702
                        ],
                        [
                            "2017-08-16T11:37:50Z",
                            0
                        ],
                        [
                            "2017-08-16T11:37:40Z",
                            0.799200799217918
                        ]
                    ]
                }
            ]
        }
    ]
}
@e-dard
Copy link
Contributor

e-dard commented Aug 17, 2017

@bbczeuz the output is being emitted as expected. All values you're seeing are InfluxDB float64 types, not integers. As far as I'm aware (certainly in the case of Go's and JS's), JSON libraries tend to emit natural numbers without any significand. So you would see 1.1 for the result of 11.0/10.0, but 1 for the result of 2.0/2.0.

@e-dard e-dard closed this as completed Aug 17, 2017
@bbczeuz
Copy link
Author

bbczeuz commented Aug 18, 2017

Why close the issue? influxDB cannot be fed back its own results.
The only solution I see:

  • If a field is of type float64, always return the fraction, even if it is zero. Response size cannot be an argument as the results come as json anyhow. Without this, there's no way a receiver could know the field types.

I was thinking of a more flexible input parser, but didn't come to a solution - the field type is defined the first time data arrives. If some field was '0', the field will be interpreted as int and following writes to this field will fail if the data is not int.

We could send the results of 'SHOW FIELD KEYS' on startup of the sender, but that's ugly - we would need to send dummy data, then delete it again. In addition, this is very unflexible as the sender has to keep track of what schemes exist in his and the receiver's DB (the receiver cannot send requests to the sender).

@gusutabopb
Copy link

gusutabopb commented Sep 21, 2017

I have just run into the same issue in one of my applications.

InfluxDB output JSON is showing floats as integers and that is a big issue, IMHO.

As @bbczeuz said, data fetched from InfluxDB can't be fed written back to the same collection as that will cause column type issues and result in a HTTP 400 error.

As @bbczeuz, my current workaround is also to use SHOW FIELD KEYS to double check the type of each column and confirm that what InfluxDB outputted as "integers" are really integers or integer-looking floats. I then convert types on the client application before feeding any data back into InfluxDB.

Here's a sample of the JSON I am getting:

{'results': [{'series': [{'columns': ['time',
      'delay',
      'status',
      'status_float'],
     'name': 'health',
     'values': [[1505895981094785000, 0.27442002, 'NORMAL', 5],
      [1505895984157558000, 0.05830979, 'NORMAL', 5],
      [1505895987221781000, 0.058213, 'NORMAL', 5],
      [1505895990281719000, 0.05700994, 'NORMAL', 5],
      [1505895993341101000, 0.05793214, 'NORMAL', 5]]}],
   'statement_id': 0}]}

And here's what I believe I should be getting:

{'results': [{'series': [{'columns': ['time',
      'delay',
      'status',
      'status_float'],
     'name': 'health',
     'values': [[1505895981094785000, 0.27442002, 'NORMAL', 5.0],
      [1505895984157558000, 0.05830979, 'NORMAL', 5.0],
      [1505895987221781000, 0.058213, 'NORMAL', 5.0],
      [1505895990281719000, 0.05700994, 'NORMAL', 5.0],
      [1505895993341101000, 0.05793214, 'NORMAL', 5.0]]}],
   'statement_id': 0}]}

And this is what SHOW FIELD KEYS gives me:

fieldKey     fieldType
--------     ---------
delay        float
status       string
status_float float

@e-dard While you are right about the behavior of Javascript JSON libraries (haven't checked Go), that behavior is problematic nevertheless. In Python, the default JSON library does include decimals when parsing integer floats:

>>> json.dumps([1.0, 2.0, 3.0, 1, 2, 3])
'[1.0, 2.0, 3.0, 1, 2, 3]'

Parsing that list back and forth between JSON and Python does not cause any type inconsistency issues, so I don't see how that is not an issue in a database.

@e-dard
Copy link
Contributor

e-dard commented Sep 21, 2017

@gusutabopb @bbczeuz sorry, I should have said that we have a proposal in place to remedy this by moving to an alternative output format to JSON: #8330

I think the only current solution is, as you explained @gusutabopb to check the type first, or have some client-based error handling to detect the wrong type and insert accordingly.

@lovasoa
Copy link

lovasoa commented Jul 10, 2019

influxdata/influxdb-python#665 is caused by this issue.

As far as I'm aware (certainly in the case of Go's and JS's), JSON libraries tend to emit natural numbers without any significand.

This is not the case of python, for instance:

In [1]: import json                                                                                         

In [2]: json.dumps(1)                                                                                       
Out[2]: '1'

In [3]: json.dumps(1.0)                                                                                     
Out[3]: '1.0'

@e-dard : This behavior is the cause of many subtle bugs that are hard to debug. The workaround involves doubling the number of requests to the server. Fixing the bug would have no impact on json parsers that do not make a difference between 1.0 and 1, such as the one javascript's JSON.decode. What would it take to fix this bug ? Would you accept a pull request fixing it ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants