Can't store entities with embedded entities which have properties > 1500 bytes #1916

bedeoverend · 2017-01-11T07:37:20Z

Seems that excluding a top level property doesn't exclude an embedded entities properties from being indexed - on top of that, I don't believe there's a way to exclude an embedded entities individual properties, that would solve my problem too.

Steps to reproduce

Just use this script (I'm running the datastore emulator)

const Datastore = require('@google-cloud/datastore');
const datastore = new Datastore({ projectId: 'datastore-test' });

datastore.insert({
  key: datastore.key('User'),
  data: [
    {
      name: 'description',
      value: {
        text: Buffer.alloc(1501, 'a').toString(),
      },
      excludeFromIndexes: true
    }
  ]
}, (err, result) => {
  if (err) {
    console.error(err);
  } else {
    console.log(result.data);
  }
});

I'd expect there to be no error - the embedded entity ({ text: '...' }) shouldn't be indexed I believe? But it throw an error saying:

Error: The value of property "text" is longer than 1500 bytes.

Environment

OS: OS X Sierra
Node.js: v6.7.0
npm: 3.10.3
@google-cloud/datastore: v0.6.0

The text was updated successfully, but these errors were encountered:

stephenplusplus · 2017-01-11T13:52:11Z

Thanks for reporting.

@pcostell is it possible in Datastore to exclude individual properties within an Entity from indexing?

bedeoverend · 2017-01-11T22:15:07Z

FWIW I've had a play with the datastore save method to support the explicit syntax on nested properties e.g.

datastore.insert({
  key: datastore.key('Stuff'),
  data: [{
    name: 'description',
    value: [{
      name: 'text',
      value: Buffer.alloc(1501, 'a').toString(),
      excludeFromIndexes: true
    }, {
      name: 'long',
      value: true
    }]
  }]
}, handler);

obviously though that wouldn't work as an interface as it would break any properties that have array values e.g. this would break:

datastore.insert({
  key: datastore.key('Stuff'),
  data: [{
    name: 'foo',
    value: [ 1, 2, 3 ]
  }]
}, handler);

For reference, this is the hacked reduction (previously here) function to make this work:

function encodeValues(acc, data) {
  var value;

  if (is.array(data.value)) {
    value = {
      entityValue: {
        properties: data.value.reduce(encodeValues, {})
      }
    };
  } else {
    value = entity.encodeValue(data.value);

    if (is.boolean(data.excludeFromIndexes)) {
      var excluded = data.excludeFromIndexes;
      var values = value.arrayValue && value.arrayValue.values;

      if (values) {
        values = values.map(propAssign('excludeFromIndexes', excluded));
      } else {
        value.excludeFromIndexes = data.excludeFromIndexes;
      }
    }
  }

  acc[data.name] = value;

  return acc;
}

pcostell · 2017-01-11T22:29:35Z

Excluding a top-level entity value will exclude it and all its sub-values from indexing. However, it still must be a valid entity. This means that properties that cannot be indexed should be marked accordingly.

In the Datastore proto API, this is done by passing excludeFromIndexes at the appropriate levels. This is important from an API perspective because it means an entity value is a valid entity (and could be moved to a top-level entity if desired).

bedeoverend · 2017-01-11T22:56:08Z

@pcostell so am I right in thinking that if a top level value was marked as to be excluded from indexing, it would be safe for this SDK to mark all sub-values to be excluded? Given that even if not explicitly excluded they will still be as they're nested inside a value that was excluded.

pcostell · 2017-01-12T00:29:58Z

That would be safe, but it would represent a departure from the design of the API.

bedeoverend · 2017-01-12T00:37:12Z

I'm guessing it would be hard to introduce a syntax that would allow excludeFromIndexes on sub properties, while maintaining backwards compatibility @stephenplusplus?

Until that's achieved, would passing down exclude when set on the top level property be viable? I've got the beginnings of it in a branch, happy to make a PR

stephenplusplus · 2017-01-23T19:33:55Z

Yes, this would be breaking, however we could try something like:

datastore.insert({
+ raw: true,
  key: datastore.key('Stuff'),
  data: [
    {
      name: 'description',
      value: [
        {
          name: 'text',
          value: Buffer.alloc(1501, 'a').toString(),
          excludeFromIndexes: true
        },
        {
          name: 'long',
          value: true
        }
      ]
    }
  ]
}, handler);

If raw is set, it would be required that all values inside of data are specified in the longhand/verbose syntax. Any chance I'm overlooking something that makes this unfeasible?

bedeoverend · 2017-01-23T20:54:07Z

@stephenplusplus I don't think that would work for storing array values?

What about the raw flag needed the explicit encoded value returned by encodeValue? I think that's the proto value? You could also open up the encodeValue function to help users e.g:

datastore.insert({
  key: datastore.key('Stuff'),
  data: {
    description: {
      text: Buffer.alloc(1501, 'a').toString(),
      long: true
    },
    values: [ 1, 2, 3 ]
  }
}

the user could then transform into a raw value by:

datastore.insert({
  raw: true,
  key: datastore.key('Stuff'),
  data: Datastore.encodeValue({
    description: Datastore.encodeValue({
      text: Datastore.encodeValue(Buffer.alloc(1501, 'a').toString(), { exclude: true }),
      long: Datastore.encodeValue(true)
    }),
    values: Datastore.encodeValue([ 1, 2, 3 ])
  })
}

Not sure if I've got the syntax right, but that's the idea. Also more likely the user would do it programmatically.

stephenplusplus · 2017-01-23T21:15:48Z

@stephenplusplus I don't think that would work for storing array values?

Sorry, I've had my head in many APIs lately. Would you mind showing a bit of code that would highlight this problem?

bedeoverend · 2017-01-23T22:31:39Z

@stephenplusplus sure, so if I wanted to store the following JS Object:

var personEntity = {
  name: "Mary",
  metadata: {
    bio: "..." // Some long string > 1500 bytes
  },
  subjects: [{
    name: "history",
    credits: 2
  }, {
    name: "french",
    credits: 4
  }, {
    name: "maths",
    credits: 4
  }]
}

I need to have entity.metadata.bio excluded from indexes. Using the raw flag and syntax provided above, I could do this, however I couldn't store the subjects prop as an array e.g.

var rawPersonEntity = [{
  name: "name",
  value: "Mary"
}, {
  name: "metadata",
  value: [{
    name: "bio",
    value: "...", // Some long string > 1500 bytes
    excludeFromIndexes: true
  }]
}, {
  name: "subjects",
  value: [] // What do I do here? An array will indicate explicit syntax of { name, value }
}]

stephenplusplus · 2017-01-25T20:08:08Z

Got it, thank you! Would it make sense to recurse the data property?

var rawPersonEntity = {
  key: datastore.key('Person'),
  data: [
    {
      name: 'name',
      value: 'Mary'
    },
    {
      name: 'metadata',
      data: [
        {
          excludeFromIndexes: true,
          name: 'bio',
          value: '...'
        }
      ]
    },
    {
      name: 'subjects',
      value: [
        {
          name: 'history',
          credits: 2
        },
        // ...
      ]
    }
  ]
}

Otherwise, I think the raw flag on a nested level could have the same effect:

var rawPersonEntity = {
  key: datastore.key('Person'),
  data: [
    {
      name: 'name',
      value: 'Mary'
    },
    {
      name: 'metadata',
      raw: true,
      value: [
        {
          excludeFromIndexes: true,
          name: 'bio',
          value: '...'
        }
      ]
    },
    {
      name: 'subjects',
      raw: false, // (the default value, but explicitly set for demonstration)
      value: [
        {
          name: 'history',
          credits: 2
        },
        // ...
      ]
    }
  ]
}

Do either of these have any quirks, and if not, which is better? If neither, we can still keep trying :)

bedeoverend · 2017-01-25T22:36:44Z

@stephenplusplus thanks for the options - I think the data seems to make the more sense to me... i.e. value is for more 'basic' types, and data for embedded objects.

It's pretty fringe edge case at this point - and I'm not sure if it's even possible to do, but if you had an array of objects, could you index them?

e.g.

var rawPersonEntity = {
  key: datastore.key('Person'),
  data: [
    {
      name: 'name',
      value: 'Mary'
    },
    {
      name: 'metadata',
      data: [
        {
          excludeFromIndexes: true,
          name: 'bio',
          value: '...'
        }
      ]
    },
    {
      name: 'subjects',
      value: [
        {
          excludeFromIndexes: true, // Is this something that can be done? Is there a reason why it would be done?
          name: 'history',
          credits: 2
        },
        // ...
      ]
    }
  ]
}

Other than that, I can't see any issue with them...Another option would be making an datastore.value style factory - like datastore.key - where you can specify types and the data associated with it e.g.

datastore.insert({
  key: datastore.key('Person'),
  data: datastore.value({
    type: datastore.Entity, // Defaults to 'auto'
    data: {
      name: datastore.value({
        type: String,
        value: 'Mary'
      }),
      metadata: datastore.value({
        type: datastore.Entity,
        value: {
          bio: datastore.value({
            type: String,
            excludeFromIndexes: true,
            value: '...'
          })
        }
      }),
      subjects: datastore.value({
        type: Array,
        value: [
          datastore.value({
            type: datastore.Entity,
            value: {
              name: datastore.value({
                type: String,
                value: 'history',
              }),
              credits: datastore.value({
                type: Number,
                value: 2
              })
            }
          },
          datastore.value({
            type: datastore.Entity,
            value: {
              name: datastore.value({
                type: String,
                value: 'french',
              }),
              credits: datastore.value({
                type: Number,
                value: 4
              })
            }
          },
          datastore.value({
            type: datastore.Entity,
            value: {
              name: datastore.value({
                type: String,
                value: 'maths',
              }),
              credits: datastore.value({
                type: Number,
                value: 4
              })
            }
          }
        ]
      })
    }
  });

Ha - after writing that all out, I realise how incredibly over the top that looks, but it feels robust. Perhaps not as a candidate for quickly writing out simple entities, but perhaps as a way that will work well with a helper utility? e.g. for my use case, I'm converting incoming JSON to an entity, and I want to recursively step through it and add excludeFromIndexes as needed...thoughts?

bjwatson · 2017-03-01T23:50:34Z

@stephenplusplus Is this a bug in the client, a bug in the service, or user support to workaround a documented limitation?

kzahel · 2017-03-11T23:11:32Z

Looking forward to seeing an improvement on this front. I also noticed a nested array type is forced to be indexed. Also I was hoping the API would let me store nested JSON but I can't figure out if it's possible to do with this API. Exposing something like a datastore.entry factory method would make sense.

bedeoverend · 2017-04-05T00:02:30Z

Any updates on this? Happy to make a PR if needed

Also @kzahel it will allow nested JSON if you just give it a JSON object as a property, but just can't specify excludeFromIndexes on nested properties, that's where the factory function would help.

bedeoverend · 2017-04-05T00:05:25Z

Looking back over this, how about using a Symbol to specify excludeFromIndexes?

You could then use regular ol' syntax in the data prop. But add in a flag using symbols, e.g.

datastore.insert({
  key: datastore.key('Stuff'),
  data: {
    description: {
      text: Buffer.alloc(1501, 'a').toString(),
      long: true,
      [ datastore.EXCLUDE_FROM_INDEXES ]: true
    },
    values: [ 1, 2, 3 ]
  }
}

much like how datastore.KEY can now be used.

I still think a factory method is more explicit, and cleaner, but symbol might be a bit easier to integrate with the current setup.

stephenplusplus · 2017-04-05T00:14:34Z

I'm intrigued! And thank you for your offer to help. @callmehiphop what do you think about using a Symbol the way @bedeoverend demo'd above?

kzahel · 2017-04-05T00:18:16Z

Using the symbol is clever and makes sense, I think. I already have to use datastore.KEY symbol to get the actual key from a result.

bedeoverend · 2017-04-05T00:21:19Z

On this point (and this might be diverging from the original issue), could you now just use the KEY symbol, rather than key for insert?

e.g.

datastore.insert({
  [datastore.KEY]: datastore.key('Stuff'),
  foo: 'bar'
});

Therefore flattening it out. I guess that'd be tricky for backwards compatibility? Technically you could do it: i.e.

if (entity[datastore.KEY]) {
  // Using new syntax
} else {
  // Using old syntax
}

But it's a bit magical...

lukesneeringer · 2017-04-05T14:22:33Z

Using a symbol seems fine to me. I prefer the more verbose syntax first proposed over the [datastore.KEY] syntax in the most recent comment.

chadbr · 2017-04-11T04:57:47Z

We can't write string properties over 1500 chars. Is this expected? Is there some way to explicitly mark a property as 'text' instead of 'string'?

rsemenov100 · 2017-04-11T11:30:32Z

Let's say we have an entity with a property of "text" type, i.e. not string.
When the property is updated (with unindexed string), the type of the property is changing from "text" to "string".
This breaks some other software, which expects to have "text" property.

The question is: how to write >1500 chars as a "text"?

ggprod · 2017-06-05T18:41:17Z

Could I ask what the status of this issue is? I have some entities that have embedded entities that have properties that are >1500 Bytes. When I try setting excludeFromIndexes: true on the embedded entity I still get an error thrown for its sub-properties that are >1500 Bytes. Is there some way I can store them?

JayGoldberg · 2017-06-13T22:11:53Z

Just piping in that I'd opened an issue for this internally several days ago, but I'd routed it to the wrong place. Corrected, and looks like @stephenplusplus is aware.

swcloud · 2017-06-28T23:21:51Z

@stephenplusplus This issue may be related to a fixed issue in python:

googleapis/google-cloud-python#1206

arliber · 2017-06-29T11:40:45Z

For me, it happens in the web console as well - is that related or am I missing something?

ShahNewazKhan · 2017-07-07T21:30:54Z

I followed the following steps to exclude from index for indexed properties of an Embedded Entity type from datastore:

Lookup (get) the entity from Cloud Datastore.
Write (update) the entity back to Cloud Datastore with excluded index.

However when I look at the Entities overview page, each of the properties of the Embedded Entity are still indexed even after the update, is this the right behavior?

From the docs for excluding indexes it mentions "The index entries for any existing entities with that property will continue to exist until the entities are updated or deleted."

jniemin · 2017-07-17T09:47:30Z

I just pumped in to this issue. We have data structure where under main entity there is embedded entity (or actually array of entities) that has a property that is longer than 1500 bytes. As we can't exclude this >1500b property from indexing, we can't store even the main entity.

Is there any progress on this? Basically datastore is unusable for us at this point

arliber · 2017-07-20T09:31:16Z

@jniemin I had the same issue and found 2 workarounds: One is utilizing the hierarchy supported in Datastore - one entity can be a child of another so if the embedded entity is complex - I suggest you take it out into a new entity where the parent of it is the original one.

Another (sort of) workaround I used is just simplifying the embedded entity by taking some properties back to the parent itself - not ideal but in my case it made sense.

stephenplusplus · 2017-07-28T15:14:44Z

Thanks for your patience, everyone. There are a couple of solutions we've talked about, but I'd like to get feedback on one more that might be the most simple. Can you see any "gotchas" using an "excludeFromIndexes" array on the top level to define where the unindexed properties are?

(cc @bedeoverend)

datastore.insert({
  key: datastore.key('Person'),
  excludeFromIndexes: [
    'description',
    'metadata.bio',
    'subjects.name'
  ],
  value: {
    description: '...',
    metadata: {
      bio: '...'
    },
    subjects: [
      {
        name: '...',
        credits: 2
      }
    ]
  }
})

This seems like it would also be more simple than the current "data"/verbose definition syntax for non-nested properties, i.e.

// before
datastore.insert({
  key: datastore.key('Person'),
  data: [
    {
      name: 'description',
      value: '...',
      excludeFromIndexes: true
    }
  ]
})

// after
datastore.insert({
  key: datastore.key('Person'),
  excludeFromIndexes: ['description'],

  // `data` remains a plain JavaScript object:
  data: {
    description: '...'
  }
})

lukesneeringer · 2017-07-28T15:29:05Z

Summoning @JayGoldberg for feedback also. We can implement this if there are no objections to @stephenplusplus's proposed solution.

stephenplusplus · 2017-07-28T15:56:33Z

Small amendment to the above suggestion, since subjects is an array:

datastore.insert({
  key: datastore.key('Person'),
  excludeFromIndexes: [
    'description',
    'metadata.bio',
-   'subjects.name'
+   'subjects[].name'
  ],
  value: {
    description: '...',
    metadata: {
      bio: '...'
    },
    subjects: [
      {
        name: '...',
        credits: 2
      }
    ]
  }
})

lukesneeringer · 2017-07-28T16:02:52Z

Is that amendment necessary? It is unambiguous even in your previous construction.

It is not clear to me how this information gets sent up to Datastore. If that is the correct Datastore format, then obviously it makes sense. If we are recursively going and adding a property, though, there's no reason we need the [].

stephenplusplus · 2017-07-28T16:05:05Z

I think it follows standard embedded property notation, to differentiate between "property of an object" and "property of an object inside of an array". I've also seen this in API response documentation on the various Google API official docs sites. So no, not necessary, but should be more familiar. Willing to be downvoted on that assumption.

stephenplusplus · 2017-07-28T23:17:15Z

PR sent in #2497. Please take a look!

bedeoverend · 2017-07-29T02:07:56Z

@stephenplusplus thanks for this! Looks good to me, I like the general concept.

So two potential things:

Is your proposed solution also allowing for exclusion for all nested properties? e.g. Will excluding metadata exclude all properties / nested entities beneath it? Alternatively could a metadata.* be also supported? Otherwise we'll still run into the current problem of having to explicitly ignore all properties, no matter how large the entity.
Suppose you hold an array of values which don't all follow the same schema:

{
  data: [{
    foo: '...'
  }, {
    bar: '...'
  }]
}

Should this solution also try address not indexing one particular entity's property? e.g. a syntax like: data.0.foo? I'm not sure if this is even supported by datastore's indexing...? But a scenario where this might be appropriate is something like storing a series of metadata objects, and searching for arbitrary values inside them. I'm not sure how valid a use case this is, but just flagging it.

Other than that, I think it looks good 👍

lukesneeringer · 2017-07-31T14:43:15Z

I am pretty sure that trying to do data.0.foo sounds not great:

It is ambiguous in JavaScript because data = {'0': { 'foo': ... }} is distinct from data = [{'foo': ... }].
In the case of differing schemas, @stephenplusplus's current solution still works, since data[].bar just does not match.

I do agree that some concept of * and ** would be useful, but it is technically orthogonal -- we should get the current PR in, and then maybe do that.

lukesneeringer · 2017-07-31T14:44:16Z

I think it follows standard embedded property notation, to differentiate between "property of an object" and "property of an object inside of an array". I've also seen this in API response documentation on the various Google API official docs sites. So no, not necessary, but should be more familiar. Willing to be downvoted on that assumption.

Prior art, got it. I am convinced.

bedeoverend · 2017-07-31T22:33:27Z

It is ambiguous in JavaScript...
In the case of differing schemas, @stephenplusplus's current solution still works

Fair enough - I guess there could be the scenario of similar / same schemas, but wanting to exclude only one item from being indexed. But, I think that's a big edge case, and I think it would be a bad pattern regardless, so happy to ignore that.

we should get the current PR in, and then maybe do that.

Sounds good to me 👍

mrksbnch · 2017-11-01T23:27:06Z

@lukesneeringer @stephenplusplus Thanks for the "fix", but can this please be updated in the documentation? Personally, I think it's weird that I have to exclude every single subproperty from indexing but the bigger issue is that the documentation currently says that "If you exclude this value from indexing, then all subproperties are also excluded from indexing". This is very misleading (at least it was for me) and it took quite a while to find this issue.

EDIT: Another question that came up. How do you exclude properties that are not written in camel case?

data: {
  test: {
    'This is a key': 'This is a string > 1500 bytes ...'
  }
}

stephenplusplus · 2017-11-01T23:58:11Z

Here's the issue for the catch-all feature request: #2510

How do you exclude properties that are not written in camel case?

This worked for me:

datastore.save({
  key: ...,
  data: {
    test: {
      'This is a key': 'This is a string > 1500 bytes ...'
    }
  },
  excludeFromIndexes: [
    'test.This is a key'
  ]
}, ...)

stephenplusplus added the api: datastore Issues related to the Datastore API. label Jan 11, 2017

This was referenced Jan 12, 2017

Allow storage of sub properties > 1500 bytes bedeoverend/feathers-datastore#6

Closed

Use upstream @google-cloud/datastore bedeoverend/feathers-datastore#8

Open

stephenplusplus added the enhancement label Jan 23, 2017

danoscarmike added this to Cloud Datastore in First 4 (GA) Feb 21, 2017

bjwatson added the Status: Acknowledged label Mar 1, 2017

bjwatson added type: question Request for information or clarification. Not an issue. priority: p2 Moderately-important priority. Fix may not be included in next release. and removed Type: Enhancement labels Mar 1, 2017

bjwatson assigned stephenplusplus Mar 1, 2017

stephenplusplus added this to Not Started in State Jun 12, 2017

stephenplusplus mentioned this issue Jul 28, 2017

datastore: allow nested unindexed properties #2497

Merged

3 tasks

stephenplusplus closed this as completed in #2497 Aug 1, 2017

kanethal mentioned this issue Aug 3, 2017

can't save embedded entity with string length > 1500 sebelga/gstore-node#71

Closed

lostpebble mentioned this issue Aug 6, 2017

[datastore] new excludeFromIndexes syntax should allow for a catch-all on object properties #2510

Closed

beaulac mentioned this issue Dec 12, 2017

datastore: Support excludeFromIndexes for primitives in arrays. #2773

Closed

euri10 mentioned this issue Apr 25, 2018

Excluded indexes from nested Array googleapis/google-cloud-python#5111

Closed

PowerOfM mentioned this issue Jun 18, 2018

Use upstream @google-cloud/datastore dependency bedeoverend/feathers-datastore#10

Open

huysamen mentioned this issue Oct 8, 2018

Embedded entities cannot store string values larger than 1500 bytes sebelga/gstore-node#132

Closed

Can't store entities with embedded entities which have properties > 1500 bytes #1916

Can't store entities with embedded entities which have properties > 1500 bytes #1916

Comments

bedeoverend commented Jan 11, 2017

stephenplusplus commented Jan 11, 2017

bedeoverend commented Jan 11, 2017

pcostell commented Jan 11, 2017

bedeoverend commented Jan 11, 2017

pcostell commented Jan 12, 2017

bedeoverend commented Jan 12, 2017

stephenplusplus commented Jan 23, 2017

bedeoverend commented Jan 23, 2017 • edited

stephenplusplus commented Jan 23, 2017

bedeoverend commented Jan 23, 2017

stephenplusplus commented Jan 25, 2017

bedeoverend commented Jan 25, 2017 • edited

bjwatson commented Mar 1, 2017

kzahel commented Mar 11, 2017

bedeoverend commented Apr 5, 2017

bedeoverend commented Apr 5, 2017

stephenplusplus commented Apr 5, 2017

kzahel commented Apr 5, 2017

bedeoverend commented Apr 5, 2017 • edited

lukesneeringer commented Apr 5, 2017

chadbr commented Apr 11, 2017

rsemenov100 commented Apr 11, 2017

ggprod commented Jun 5, 2017

JayGoldberg commented Jun 13, 2017

swcloud commented Jun 28, 2017

arliber commented Jun 29, 2017

ShahNewazKhan commented Jul 7, 2017

jniemin commented Jul 17, 2017

arliber commented Jul 20, 2017

stephenplusplus commented Jul 28, 2017

lukesneeringer commented Jul 28, 2017

stephenplusplus commented Jul 28, 2017

lukesneeringer commented Jul 28, 2017 • edited

stephenplusplus commented Jul 28, 2017 • edited

stephenplusplus commented Jul 28, 2017

bedeoverend commented Jul 29, 2017 • edited

lukesneeringer commented Jul 31, 2017

lukesneeringer commented Jul 31, 2017

bedeoverend commented Jul 31, 2017

mrksbnch commented Nov 1, 2017 • edited

stephenplusplus commented Nov 1, 2017

bedeoverend commented Jan 23, 2017 •

edited

bedeoverend commented Jan 25, 2017 •

edited

bedeoverend commented Apr 5, 2017 •

edited

lukesneeringer commented Jul 28, 2017 •

edited

stephenplusplus commented Jul 28, 2017 •

edited

bedeoverend commented Jul 29, 2017 •

edited

mrksbnch commented Nov 1, 2017 •

edited