Skip to content

Client Improvements Plan

liec0dez edited this page Jan 26, 2019 · 12 revisions

This is a list of improvements that could form the next major release of the InfluxDb java client.

1. Serialization of data points to/from java objects [Not Done]

1.1. Nice deserialization API, efficiency

It is a best practice in java world to represent data as objects. The current library provides the capability to serialize query results to java beans:

@Measurement(name = "cpu")
public class Cpu {
    @Column(name = "time")
    private Instant time;
    @Column(name = "host", tag = true)
    private String hostname;
    @Column(name = "region", tag = true)
    private String region;
    @Column(name = "idle")
    private Double idle;
    @Column(name = "happydevop")
    private Boolean happydevop;
    @Column(name = "uptimesecs")
    private Long uptimeSecs;
    // getters (and setters if you need)
}

QueryResult queryResult = influxDB.query(new Query("SELECT * FROM cpu", dbName));

InfluxDBResultMapper resultMapper = new InfluxDBResultMapper(); // thread-safe - can be reused
List<Cpu> cpuList = resultMapper.toPOJO(queryResult, Cpu.class);
  • The API isn't very nice - we don't need InfluxDBResultMapper.
  • It is not efficient - this will always re-instantiate the whole result as java beans at once

Solution

See here

1.2. Missing serialization of Java objects to data points

To write a data point you need to serialize it as follows:

Point point = Point.measurement("disk")
					.time(System.currentTimeMillis(), TimeUnit.MILLISECONDS)
					.addField("used", 80L)
					.addField("free", 1L)
					.build();
influxDB.write(dbName, rpName, point);

There is no way to use the previously defined and annotated Cpu class to write a data point.

Solution

@Measurement(name = "cpu")
public class Cpu {

    public Cpu(Instant time, String host, String region, Double idle) {
        this.time=time;
        this.host=host;
        this.region=region;
        this.idle=idle;
    }

    ....
}
....

Cpu cpu=new Cpu(new Instant(), 80L, 1L);
influxDB.write(dbName, rpName, cpu);

See here

1.3. Schema validation

Point point = Point.measurement("disk")
					.time(System.currentTimeMillis(), TimeUnit.MILLISECONDS)
					.addField("used", 80)
					.addField("free", 1.0)
  • If the current approach of writing data points into InfluxDB is used it requires the user to be quite careful - first write into a measurement defines data types of its fields. In the example above the 'used' field should have been of a float type, but if submitted as is it will be an integer. The measurement will have to be dropped and recreated to fix this.

  • Also tag structure is usually defined for a measurement (some fields/ tags are required etc). With this approach it is easy to forget about a tag/field.

  • It is easy to make a typo when defining field/tag or even measurement name.

Solution

Allow the user to define a schema for the measurement where each write will have to be compliant with the schema. The javascript client has solved this:

https://node-influx.github.io/class/src/index.js%7EInfluxDB.html

2. Asynchronous API [Done]

There is a request to provide async API for the client.

https://github.com/influxdata/influxdb-java/issues/386

This is also related to the error handling problem since we need to signal errors correctly in the async scenario. We don't want to introduce new API that would be changed after fixing the problem below.

Currently, asynchronous processing is available for certain use cases. For example when writing data points you have to explicitly enable batching to get the async behavior.

There was a couple of issues reported where users were confused (their fault) and couldn't recognize that the API can be used asynchronously already.

Solution

Provide asynchronous (callback-based) method for missing use cases.

3. Handling of errors [Partially done]

The error handling of the library is very basic - it just detects errors based on non-2xx error code returned by influx DB. No further analysis of the error information contained in the response is performed.

3.1. Client vs. Server Error distinction [Done]

There is already a request to make a distinction between errors caused by the client (http status 4xx) and server failures (http status 5xx)

https://github.com/influxdata/influxdb-java/issues/375

Solution

As a solution, we would implement a hierarchy of exception objects (for example InfluxDBClientException, InfluxDBServerException inheriting from the existing InfluxDBException.

Also 4xx status codes we would parse the JSON delivered and provide correct error message sent by Influx as the InfluxDBException message property.

3.2. Partial writes [Not Done]

The current client doesn't expect there might be partial writes. The client should get error information only for the data points that failed to be written. This applies when the batching mode is used.

Solution

As mentioned in the previous section we would provide an additional method on the exception object to return information about data point that failed to be written.

3.3. Handling of partially completed writes into a multi-node cluster [Not Done]

Recently it has been resolved the problem of setting the consistency level setting:

https://github.com/influxdata/influxdb-java/pull/385

However, there still a problem of propagating the detailed error information to the client.

3.4. Batch Processing Issues [Not Done]

There was a request to handle errors during batch writes (and it has been fulfilled), however the solution using BiConsumer interface might be revised and possibly reworked so that it is able to transfer error information mentioned above.

Also, the user might get notified not only of errors but also when the points were successfully written into influx db.

Related information:

https://github.com/influxdata/influxdb-java/pull/319 https://github.com/influxdata/influxdb-java/issues/381

3.5. Error signaling for chunked query responses [Not Done]

Current interface doesn't force the user to handle/catch errors that happen when evaluating chunked query responses.

The documentation even shows an example where error handling is completely missing.

4. Cleanup of API write methods [Not Done]

All the improvements above will have an impact on the API, especially write methods. Therefore we should avoid having too many of them and we want them to behave predictably.

Still, we would keep the current API available for backward compatibility, perhaps deprecate some of the existing methods we see these are no longer necessary or provide better names for those.

We would also fix the following issue:

https://github.com/influxdata/influxdb-java/issues/378

5. Observer publishing pattern [Not Done]

Right now the user has to build and push data points to InfluxDb using write methods, he is writing an imperative program to do this. In reality, the user is often just monitoring some process and wants the resulting telemetry to be written into InfluxDB.

This monitoring process might be relatively difficult to implement. Eg. watch for changes, if change doesn't come in a certain interval, write data anyway etc.

The idea is to let the user declare that he is interested in monitoring something, and provide the imperative code that is watching for data out of the box, together with the java-client. This would make usage of influxdb-java even easier.

For API proposals take a look here:

See here

We may also consider spring AOP for this kind of functionality.

6. Message pack [Not Done]

Since Influx 1.4 we should support message pack for query result data transfers. See issue below:

https://github.com/influxdata/influxdb-java/issues/389