Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Query with filter on an integer column. (JS has number not int and float) #754

Open
vicb opened this issue Nov 12, 2020 · 19 comments
Open
Labels
api: datastore Issues related to the googleapis/nodejs-datastore API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@vicb
Copy link
Contributor

vicb commented Nov 12, 2020

  1. Is this a client library issue or a product issue?

I guess a bug in the client library ("@google-cloud/datastore": "^6.3.0")

Environment details

  • OS: Linux
  • Node.js version: 14.15
  • npm version: 6.14.8
  • @google-cloud/datastore version: 6.3.0

Steps to reproduce

  1. Create an entity in the cloud console with an integer column, named integer_column and set its value to 200.

  2. Executes a query with a filter on that column:

    const query = datastore
      .createQuery(tableName)
      .filter('integer_column', '>', value);

    const [results] = await datastore.runQuery(query);

Expected behavior:

The correct results are returned if value is a valid numerical value, i.e. 123 or 123.456

Actual behavior

The correct results are only returned if value is 123. A float value (123.456) would return an empty result.

Workaround

Update the query to add Math.round().

    const query = datastore
      .createQuery(tableName)
      .filter('integer_column', '>', Math.round(value));

    const [results] = await datastore.runQuery(query);

Thanks!

@product-auto-label product-auto-label bot added the api: datastore Issues related to the googleapis/nodejs-datastore API. label Nov 12, 2020
@stephenplusplus
Copy link
Contributor

Thanks for opening the issue! What would be an ideal solution on our side? Running Math.floor() on the float? Throw an error, specifying only an integer is expected?

@vicb
Copy link
Contributor Author

vicb commented Nov 12, 2020

The ideal solution is that .filter('integer_column', '>', value) works whatever the (numerical) value.

I mean 10 > 5.123 == true is something everybody expects to work, whatever the language/API.

Edit: The same way, I also expect .filter('float_column', '>', 10) to work and I don't have to type .filter('float_column', '>', 10.0)

@stephenplusplus
Copy link
Contributor

The issue I'm seeing is that the API expects an integer value to compare against integers. It cannot compare floats against integers. So, if we're given a float of 123.456, we would have to lose precision to reduce it to 123. Of course, the results could never include a result where integer_column contains 123.456, so a result with 124 will come through in both cases, as that's the next possible value. What I'd be concerned about, is that we'd be adding code that could be misleading-- having the user assume it's possible to have float precision where they cannot. It may be best to continue sanitizing non-integer input in the application layer, to prepare it for the integer operation.

@vicb
Copy link
Contributor Author

vicb commented Nov 12, 2020

  1. Again everybody expects 10 > 5.123 == true. The opposite is very misleading.

  2. Thoughts:

So, if we're given a float of 123.456, we would have to lose precision to reduce it to 123

You have to be careful about the rounding here:
value > 123.001 is equivalent to value > 123 but value < 123.001 is equivalent to value < 124

While it makes sense to round for >, >=, <= and < it is reasonable to say that 123 = 123.456 == false (i.e. do not apply rounding for equality).

An other question I would have is at what level the type is stored ?
From the cloud console UI it seems like the type is store at entity level. i.e. 2 entity might have different types for the same column.
type(entityA.some_col) == integer while type(entityB.some_col) == float
Is this true ?
If it is the API should offer a way to pass a float value otherwise you would have to do 2 queries (one with int and one with float) to do something like some_col > 123.001.

I honestly never asked myself the question as I never set the type of the column explicitly. I just saved entity using the API.
I only noticed this strange behavior when doing some tests and creating entity via the console.

edit: fixed rounding for > and <

@crwilcox
Copy link
Contributor

Note: Other language clients have similar behavior.

from google.cloud import datastore

client = datastore.Client()

kind = "test"
id = "test_id"
key = client.key(kind, id)
entity = datastore.entity.Entity(key)
entity['field'] = 'field_value'
entity['integer_column'] = 200
client.put(entity)

query = client.query(kind=kind).add_filter('integer_column', '>', 123.3)
results = list(query.fetch())
print(f"Query comparing an integer and float (none): {results}")

query = client.query(kind=kind).add_filter('integer_column', '>', 123)
results = list(query.fetch())
print(f"Query comparing an integer and integer (one result): {results}")
Query comparing an integer and float (none): []
Query comparing an integer and integer (one result): [<Entity('test', 'test_id') {'field': 'field_value', 'integer_column': 200}>]

@BenWhitehead
Copy link
Contributor

Similar behavior in Java:

    @Test
    void numberSaving() {
      final Datastore ds = datastoreOptions.getService();
      final String kind = "valueTypeTesting";

      final String long_property = "long_property";
      final String double_property = "double_property";
      final String number_property = "number_property";

      KeyFactory txTesting = ds.newKeyFactory().setKind(kind);
      final Entity a = Entity.newBuilder(txTesting.newKey("A"))
          .set(long_property, 20)
          .set(double_property, 20.0)
          .set(number_property, 30.0)
          .build();
      final Entity b = Entity.newBuilder(txTesting.newKey("B"))
          .set(long_property, 10)
          .set(double_property, 15.0)
          .set(number_property, 18)
          .build();
      ds.put(a);
      ds.put(b);

      final EntityQuery qInt = Query.newEntityQueryBuilder()
          .setKind(kind)
          .setFilter(PropertyFilter.gt(number_property, 5))
          .build();
      final EntityQuery qDouble = Query.newEntityQueryBuilder()
          .setKind(kind)
          .setFilter(PropertyFilter.gt(number_property, 5.0))
          .build();

      final QueryResults<Entity> qIntRun = ds.run(qInt);
      LOGGER.info("qIntProto = {}", protoToString(getQueryProto(qInt)));
      while (qIntRun.hasNext()) {
        Entity next = qIntRun.next();
        LOGGER.debug("entityProto = {}", protoToString(entityProto(next)));
        if ("A".equals(next.getKey().getName())) {
          assertEquals(30.0, next.getDouble(number_property));
        } else if ("B".equals(next.getKey().getName())) {
          assertEquals(18, next.getLong(number_property));
        }
      }

      final QueryResults<Entity> qDoubleRun = ds.run(qDouble);
      LOGGER.info("qDoubleProto = {}", protoToString(getQueryProto(qDouble)));
      while (qDoubleRun.hasNext()) {
        Entity next = qDoubleRun.next();
        LOGGER.debug("entityProto = {}", protoToString(entityProto(next)));
        if ("A".equals(next.getKey().getName())) {
          assertEquals(30.0, next.getDouble(number_property));
        } else if ("B".equals(next.getKey().getName())) {
          assertEquals(18, next.getLong(number_property));
        }
      }
    }
qIntProto = { query { kind { name: "valueTypeTesting" } filter { property_filter { property { name: "number_property" } op: GREATER_THAN value { integer_value: 5 } } } } }
entityProto = { key { partition_id { project_id: "default-223119" } path { kind: "valueTypeTesting" name: "B" } } properties { key: "double_property" value { double_value: 15.0 } } properties { key: "long_property" value { integer_value: 10 } } properties { key: "number_property" value { integer_value: 18 } } }
entityProto = { key { partition_id { project_id: "default-223119" } path { kind: "valueTypeTesting" name: "A" } } properties { key: "double_property" value { double_value: 20.0 } } properties { key: "long_property" value { integer_value: 20 } } properties { key: "number_property" value { double_value: 30.0 } } }
qDoubleProto = { query { kind { name: "valueTypeTesting" } filter { property_filter { property { name: "number_property" } op: GREATER_THAN value { double_value: 5.0 } } } } }
entityProto = { key { partition_id { project_id: "default-223119" } path { kind: "valueTypeTesting" name: "A" } } properties { key: "double_property" value { double_value: 20.0 } } properties { key: "long_property" value { integer_value: 20 } } properties { key: "number_property" value { double_value: 30.0 } } }

@crwilcox
Copy link
Contributor

crwilcox commented Nov 12, 2020

I think the root of the problem is, at query filter creation time, the client has no knowledge of the type of the field being filtered. the type needs to be figured out from the type provided. So, as a user you can round. I don't think the client can do that without side effects.

@crwilcox crwilcox added the type: question Request for information or clarification. Not an issue. label Nov 12, 2020
@vicb
Copy link
Contributor Author

vicb commented Nov 12, 2020

Thanks for checking other clients.

I don't think the client can do that without side effects.

May be it should be done server side then ?

In the meantime it would probably be worth documenting this behavior.
I think we can agree to say that this behavior is really unexpected ?

@vicb
Copy link
Contributor Author

vicb commented Nov 13, 2020

The concern is expressed in my former message is real.

Let's init a table with [0...9] / 2:

  for (let value = 0; value < 10; value++) {
    const key = datastore.key('TEST');
    await datastore.save({
      key,
      data: {
        value: value / 2,
      }
    })
  }  

You can go to the console and you will see that 0,1,2, 3, and 4 are integer 0.5, 1.5, 2.5, 3.5, and 4.5 are floats.

Sanity check:

  const [allEnt] = await datastore.createQuery('TEST').order('value').run();
  console.log(`allEnt -> ${allEnt.map(e => e.value)}`);

  // allEnt -> 0,1,2,3,4,0.5,1.5,2.5,3.5,4.5

Query with an integer value

  const [intEnt] = await datastore.createQuery('TEST').filter('value', '>', 3).order('value').run();
  console.log(`intEnt > 3 -> ${intEnt.map(e => e.value)}`);

 // intEnt > 3 -> 4,0.5,1.5,2.5,3.5,4.5

Ouch !

  • 0.5,1.5,2.5, should not be here,
  • Notice the buggy order too !

Query with float value:

  const [floatEnt] = await datastore.createQuery('TEST').filter('value', '>', 2.5).order('value').run();
  console.log(`floatEnt > 2.5 -> ${floatEnt.map(e => e.value)}`);  

  // floatEnt > 2.5 -> 3.5,4.5

Ouch again !

  • 4 is missing

Maybe the server API is fine for typed language (I haven't checked the client) but it is definitely not for JS (and probably Python).

@stephenplusplus @crwilcox @BenWhitehead - I think we can agree to say that is behavior is bug prone and should be fixed ?

@vicb vicb changed the title [Bug] Query with filter on an integer column. [Bug] Query with filter on an integer column. (JS has number not int and float) Nov 13, 2020
@crwilcox
Copy link
Contributor

I am not sure I would consider this a bug, but more a result of us trying to make things ergonomic in libraries. Filters must specify, via proto, the value type that the filter is evaluating.

So, when you set a filter, it is converted to a proto, and the type is picked at that point. You can see this in @BenWhitehead response if you look at the qProto.

{
  query {
    kind {
      name: "valueTypeTesting"
    }
    filter {
      property_filter {
        property { name: "number_property" }
        op: GREATER_THAN
        value { double_value: 5.0 } 
      }
    }
  }
}

With that said, I understand how as a user it seems that, perhaps, the backend could recognize that an int_value: 10 is greater than double_value: 5.0

@vicb
Copy link
Contributor Author

vicb commented Nov 14, 2020

@crwilcox how is this not a bug.

JS has numbers not int or float.
If you look at my example I save some numbers and because some happen to be ints then there is no way to get back a correct result without having to make 2 queries (1 for the integer values and 1 for the floating points).

So basically if you have a product entities with a price, some prices will be integer ($10) and some will be float ($9.99). You would need two queries + some post-processing (for orders and limits) to get a correct result if you want a list of products with price < $15.

So, when you set a filter, it is converted to a proto, and the type is picked at that point. You can see this in @BenWhitehead response if you look at the qProto.

This seems to indicate that the bug should be fixed server side.
Again if you look at my examples the same code could result in entities having either an integer or a floating point field. So you do not know the type at query time - or more exactly you known that there could be 2 different types.

@crwilcox
Copy link
Contributor

@vicb I think what might be missing is that we have to account for folks using other languages, accessing the database not just from node. For instance, since JS has no separationg of int, float, we can do one of a couple things.

As best I can figure, we have three options on the client side (see later for backend discussion):

  1. The Node.js Client always converts number to int in datastore
  • This is lossy and almost certainly not going to be viable.
  1. The Node.js Client always converts number to float in datastore
  • This isn't lossy, so better than no. 1
  • The issue with this is going to be interoperability. Let's say you use a typed language, and in that typed language, you assume a field is an int, say, a quantity field. Making this decision in the Node.js client could cause breaking behavior for customers.
  1. Current Behavior: Through inference, decide if the given number is best reflected by int, or float value, within the database.
  • This isn't lossy
  • This will interopt with other languages generally, but requires the user from node to take caution, sticking to int or double for a value. I understand that this isn't a node concept.

As for backend, I agree it seems this may be addressable. A discussion could be started around allowing the backend to filter across disparate types, but I am not certain this would be straightforward, but there is no reason not to have that conversation. Though as that is no longer about the Node.js client, I would ask we continue that discussion on the appropriate issue tracker with the backend folks if possible.

If you believe I have missed an option for the client I would be happy to discuss it further, it is quite possible I am not thinking outside the box enough 😄

@vicb
Copy link
Contributor Author

vicb commented Nov 14, 2020

Hey @crwilcox,

I would add the following the client side options (1-3 below refer to the options in your message).

  1. I am not exactly what you mean here.
    If you mean remove any support for floating point from the Datastore, I don't think this is a viable option.
    If you mean converting number to int for queries (as suggested by @stephenplusplus above) then it would still not solve the floating point issue.

  2. Treating all number as floating point.
    It could work with JS because ints are encoding on 53bits only. Not sure about other dynamic langage.
    But there would still be a few problems on top of interoperability:

  • legacy database using integer fields,
  • entities created on the cloud console where the field type could be manually set,
  1. Current Behavior
    As you noted ("sticking to int or double for a value. I understand that this isn't a node concept.") and as far as I know there is no way to force a number to be a float in JS (only integer can be forced as far as I know).

  2. Maybe one solution would be to change the API to have the user specifying the type. But again it will break legacy DB that have mixed integer/float for the same field.

In the end I don't really see a way to fix that on the client side.

As for backend, I agree it seems this may be addressable. A discussion could be started around allowing the backend to filter across disparate types, but I am not certain this would be straightforward, but there is no reason not to have that conversation. Though as that is no longer about the Node.js client, I would ask we continue that discussion on the appropriate issue tracker with the backend folks if possible.

It sounds like a great idea please include me in the discussion.

Thanks !

@vicb
Copy link
Contributor Author

vicb commented Nov 14, 2020

Reading the docs, it looks like this behavior is actually documented

TL;DR Integers are sorted before FPs.

I think this makes working with FP values in JS (and probably other languages)

  1. hard - you would have to issue at least 2 queries to get all numerical values > or < to a limit (one for int, one for fp),
  2. inefficient - querying for fp_colum < value will dump all the integer values. Similarly for int_colum > value dumping all the fps.

@crwilcox
Copy link
Contributor

crwilcox commented Nov 16, 2020

Adding here after chatting a bit further. Because of the inference of the client, it is hard to only insert doubles. If you want to enter values 1,2,3.99, there isn't a way, through this method, to make sure these are all doubles.

At the same time, I think users can send the value as a custom type: https://github.com/googleapis/nodejs-datastore/blob/master/src/entity.ts#L62

This would ensure the typing of the column. I wonder if there shouldn't be a warning on Number type autoconversion? In the case that you only ever use ints, it is probably fine, but it seems difficult to avoid doubles with .00 fractional.

@crwilcox
Copy link
Contributor

crwilcox commented Nov 18, 2020

@vicb would it be fair to characterize the status of this as the following:

As a user you would like to see the untyped entries to not be the preferred way via node as it is difficult (not feasible) to keep type consistency, for example the field values (1.99, 3.99, 4.0 (which will be seen as an int)).

By not preferred I think there are a few solutions

  1. Updates to the product docs to draw attention to this.
  2. Add a client side merge of these, or an explicit client side "or". This could be a bit heavy, and is double the queries.
  3. Add a warning to be emitted by the methods (puts where data includes the auto conversion, queries where we have auto conversion).
  4. deprecate the auto conversion.

The final option is the most extreme to be sure and would mean a major version revision.

What do you think is necessary here as a user? Also, seeing as you seem to have done a good deal of homework on this, are there areas you can point us to in the docs where you think improvements should be made?

@vicb
Copy link
Contributor Author

vicb commented Nov 18, 2020

Yes so the whole problem is that in JS Number.IsInteger(4.0) === true.

This basically makes the main API of datastore unusable for float numbers as you can not rule out some will be considered ints.

// The behavior of this is counter intuitive as soon as you query the value field
datastore.save(
  key,
  data: {
    value: 1.23
  }
};

// The behavior of this is counter intuitive when you deal with a float field
// For the same reason limit and order have counter intuitive behaviors.
datastore.createQuery('Product').filter('price', '<', 1.23);

To answer your points:

  1. IMO updating the docs is not enough, because:
  • who reads the docs ;)
  • there is some kind of doc on the Could Datastore doc - but it is hard to find when you don't know what you are looking for,
  • I also don't really see a good place where this should be documented in the current sets of doc - ie the client API docs and the datastore docs.

That being said, if this should be documented, there should be some examples to make the counter intuitive behavior clear.

  1. The only warning I can imagine would be emitted every time a user saves a numerical value. It might be useful when there is a clear path to remove the current limitation but I am not sure if it would be helpful before that.

  2. If you decide to fix this issue on the client side (also applicable to other dynamically typed languages) then I think deprecating and removing the auto-conversion is a required step. It means that every single client application must be updated (as long as they are dealing with numbers). Note that it would still not solve the problem with values created before the major version - all DB would need to be migrated if users are impacted by the limitation.

Updating the query code would be quite straightforward:

// current unsafe API
const query = datastore.createQuery('Product').filter('price', '<', 4.0);
// next safe API
const query = datastore.createQuery('Product').filter('price', '<', double(4.0));

However updating your entities would be much more of a pain:

{
  price: 4.0
  qty: 10,
  id: 10,
  sale: 0.50,
}

You basically have to wrap every single numeric field in int() or double().

A few comments/ideas:

  1. Instead of having to manually convert all entities in order to wrap numeric field, it might be good to specify a schema:
{
  price: double
  qty: int,
  id: int,
  sale: double,
}

The schema could be easily passed in datastore.save(), same way the current API supports excludeFromIndexes.
It might feel a little awkward to pass the schema to datastore.createQuery(...).
So maybe a good solution would be to create a Schema registry that you would init once ? Maybe as an option to new Datastore() ?

  1. I still think a server side fix would be much easier on the users.

I am sorry not to have a good turnkey solution to propose right now. It would probably be good to investigate more deeply different solutions, their impact on the client code and their cost/feasibility on the API or server side.

@crwilcox
Copy link
Contributor

I think the way forward is to start with exposing, in docs, the double() and int() representations, maybe not highlighting inference for numbers. I think this would be needed on save and query docs.

// current unsafe API
const query = datastore.createQuery('Product').filter('price', '<', 4.0);
// next safe API 
const query = datastore.createQuery('Product').filter('price', '<', double(4.0));

Regarding

I still think a server side fix would be much easier on the users.

From a server standpoint the sort order is defined already. There is potential to add an 'OR' query that would essentially be a merge query. That said, even with that, the way inference is happening that wouldn't solve this interface problem on it's own, since that is really a double() or int()

@meredithslota
Copy link
Contributor

We're considering whether an "OR" query can be added so I'm converting this to a feature request as one of the suggested outcomes of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: datastore Issues related to the googleapis/nodejs-datastore API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

5 participants