Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing null values via HTTP API fails #53

Closed
auxesis opened this issue Nov 14, 2013 · 9 comments
Closed

Writing null values via HTTP API fails #53

auxesis opened this issue Nov 14, 2013 · 9 comments

Comments

@auxesis
Copy link
Contributor

auxesis commented Nov 14, 2013

According to the HTTP API docs, it is possible to write null values into columns:

As you can see you can write to multiple time series names in a single POST. You can also write multiple points to each series. The values in points must have the same index as their respective column names in columns. However, not all points need to have values for each column, nulls are ok.

I have observed that when you do this, you get an Unknown type <nil> response from the HTTP API.

Here is some test code to reproduce:

set -e

echo "Delete database"
curl -X DELETE 'http://localhost:8086/db/no_nulls?u=root&p=root' \
  -w "%{http_code}\n"

echo "Create database"
curl -X POST 'http://localhost:8086/db?u=root&p=root' \
  -d '{"name": "no_nulls"}' \
  -w "%{http_code}\n"

echo "Create user"
curl -X POST 'http://localhost:8086/db/no_nulls/users?u=root&p=root' \
  -d '{"username": "foo", "password": "bar"}' \
  -w "%{http_code}\n"

echo "Set privileges"
curl -X POST 'http://localhost:8086/db/no_nulls/users/foo?u=root&p=root' \
  -d '{"admin": true}' \
  -w "%{http_code}\n"

echo "Write data"
curl -X POST 'http://localhost:8086/db/no_nulls/series?u=foo&p=bar&time_precision=s' \
  -d '[
  {
    "name": "response_times",
    "columns": ["time", "value"],
    "points": [
      [1382819388, 234.3],
      [1382819389, 120.1],
      [1382819380, 340.9]
    ]
  }
]' \
 -w "%{http_code}\n"

echo "Write null data"
curl -X POST 'http://localhost:8086/db/no_nulls/series?u=foo&p=bar&time_precision=s' -d \
'[
  {
    "name": "response_times",
    "columns": ["time", "value"],
    "points": [
      [1382819388, 234.3],
      [1382819389, null],
      [1382819380, 340.9]
    ]
  }
]' \
-w "%{http_code}\n"

And this is the output I currently see when running that script:

Delete database
204
Create database
201
Create user
200
Set privileges
200
Write data
200
Write null data
Unknown type <nil>400

Is this intended behaviour, or should I actually be able to write null values?

@pauldix
Copy link
Member

pauldix commented Nov 14, 2013

You should be able to write null values. It should also be a way to remove a previously existing column value if you specify a timestamp and sequence_number, since writes with those fields specified are either inserts (if non-existant) or updates. If it's an update, nulls should clear out anything that might have been there before. Thanks for the code to reproduce!

@auxesis
Copy link
Contributor Author

auxesis commented Nov 14, 2013

No worries at all!

To give you some background on what I'm trying to achieve here, I'm currently extending collectd's write_http plugin to support writing data to InfluxDB.

The format of the data I'm currently writing looks like this:

[
  {
    "name": "memory.memory.free",
    "columns": [
      "host",
      "plugin",
      "plugin_instance",
      "type",
      "type_instance",
      "value",
      "time"
    ],
    "points": [
      [
        "influxdb-01.example.org",
        "memory",
        null,
        "memory",
        "free",
        0,
        1384395663179
      ]
    ]
  }
]

I'm writing null values because the plugin_instance and type_instance fields are optional in collectd's data model.

The "name" field is a concatenation of plugin + plugin_instance + type + type_instance, with the missing members stripped out.

Do you see anything particularly wrong with this approach?

The only immediate issue I see is the write performance. write_http will buffer up writes, so you end up with ~20 time series per HTTP POST. The caveat is that each of those time series only has one row of points.

It should be possible to batch up multiple rows of points to make them more efficient, but it would probably require building a brand new plugin, as I would have to do some pretty massive reworking of the write_http plugin to support all the different formats.

For a first-pass implementation though, I think the current approach is OK.

@pauldix
Copy link
Member

pauldix commented Nov 14, 2013

If you're only writing a single point per time series, you shouldn't be
including the null columns since it's just extraneous bytes over the wire
that we'd end up dropping. However, writing null values should be
supported. This is useful in two cases: first, if you're writing multiple
points per time series and some of them don't have those values. Second, if
you're updating a point (so you've included the time and sequenc_number), a
null value indicates that we should delete whatever value was there for
that column.

On Wed, Nov 13, 2013 at 9:39 PM, Lindsay Holmwood
notifications@github.comwrote:

No worries at all!

To give you some background on what I'm trying to achieve here, I'm
currently extending collectd's write_http plugin to support writing data to
InfluxDB.

The format of the data I'm currently writing looks like this:

[
{
"name": "memory.memory.free",
"columns": [
"host",
"plugin",
"plugin_instance",
"type",
"type_instance",
"value",
"time"
],
"points": [
[
"influxdb-01.example.org",
"memory",
null,
"memory",
"free",
0,
1384395663179
]
]
}]

I'm writing null values because the plugin_instance and type_instancefields are optional in collectd's data model.

The "name" field is a concatenation of plugin + plugin_instance + type +
type_instance, with the missing members stripped out.

Do you see anything particularly wrong with this approach?

The only immediate issue I see is the write performance. write_http will
buffer up writes, so you end up with ~20 time series per HTTP POST. The
caveat is that each of those time series only has one row of points.

It should be possible to batch up multiple rows of points to make them
more efficient, but it would probably require building a brand new plugin,
as I would have to do some pretty massive reworking of the write_http
plugin to support all the different formats.

For a first-pass implementation though, I think the current approach is
OK.


Reply to this email directly or view it on GitHubhttps://github.com//issues/53#issuecomment-28454991
.

@auxesis
Copy link
Contributor Author

auxesis commented Nov 15, 2013

If you're only writing a single point per time series, you shouldn't be
including the null columns since it's just extraneous bytes over the wire
that we'd end up dropping.

This is a revelation to me. I didn't realise you could just omit columns when you have null values. Thanks for the tip! 🍰

One edge case that just came to mind: what happens when you write a time series the first time with a column omitted, but on subsequent writes you include it?

My guess would be that the columns initially omitted would continue to be omitted, is that right?

@jvshahid
Copy link
Contributor

no, they will be null only for the initial points that didn't have it.

On Thu, Nov 14, 2013 at 7:46 PM, Lindsay Holmwood
notifications@github.comwrote:

If you're only writing a single point per time series, you shouldn't be
including the null columns since it's just extraneous bytes over the wire
that we'd end up dropping.

This is a revelation to me. I didn't realise you could just omit columns
when you have null values. Thanks for the tip! [image: 🍰]

One edge case that just came to mind: what happens when you write a time
series the first time with a column omitted, but on subsequent writes you
include it?

My guess would be that the columns initially omitted would continue to be
omitted, is that right?


Reply to this email directly or view it on GitHubhttps://github.com//issues/53#issuecomment-28538394
.

@obeleh
Copy link

obeleh commented Nov 17, 2014

I've tried inserting null values. But I don't see them when I run a query. What am I doing wrong?

curl -X POST 'http://localhost:8086/db/no_nulls/series?u=foo&p=bar&time_precision=s' -d \
'[
  {
    "name": "response_times",
    "columns": ["time", "value"],
    "points": [
      [1382819388, 234.3],
      [1382819389, null],
      [1382819380, 340.9]
    ]
  }
]' \
-w "%{http_code}\n"

How I expected it:

curl -G 'http://localhost:8086/db/no_nulls/series?u=foo&p=bar&time_precision=s' --data-urlencode "q=select * from response_times where value is null" 

syntax error, unexpected SIMPLE_NAME, expecting $end

What I also tried:

curl -G 'http://localhost:8086/db/no_nulls/series?u=foo&p=bar&time_precision=s' --data-urlencode "q=select * from response_times where value = null" 

Error while filtering points: Cannot find column null

Lets just query without a where...

curl -G 'http://localhost:8086/db/no_nulls/series?u=foo&p=bar&time_precision=s' --data-urlencode "q=select * from response_times" 

[]

@obeleh
Copy link

obeleh commented Nov 17, 2014

Also tried this:

curl -X POST 'http://localhost:8086/db/no_nulls/series?u=foo&p=bar&time_precision=s' -d \
'[
  {
    "name": "response_times",
    "columns": ["time", "value"],
    "points": [
      [1382819388, 234.3],
      [1382819389],
      [1382819380, 340.9]
    ]
  }
]' \
-w "%{http_code}\n"

invalid payload400

@dandv
Copy link
Contributor

dandv commented Sep 26, 2016

@mitar
Copy link

mitar commented Sep 26, 2016

;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants