Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validate row schema #128

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

validate row schema #128

wants to merge 1 commit into from

Conversation

poonai
Copy link
Collaborator

@poonai poonai commented Feb 19, 2024

No description provided.

@poonai poonai requested a review from marino39 February 19, 2024 06:25
schema := jsonschema.Reflect(&table.valueEmpty)
buf, _ := schema.MarshalJSON()
if err := opt.DB.Set(bondTableKey(opt.TableID, rowSchema), buf, Sync); err != nil {
panic(err)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just return error here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we can use jsonschema here. We are using other serializers here and the tags will not be read.

We may need to use something else. At the same time, we would need to make sure that all of the serializers use the same tags. Or get rid of the serializers altogether and use only one of them. WDYT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it works event if JSON tags are not set. On the same note, we could have problem if the tag is set to json:"-".

I agree with setting one serializer. maybe right time, we introduce capnproto or flatbuffers. We could certainly use it's advantage for filtering.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that the schema will change if the names in the tags will change. So if this is not checked then we have one big hole in our schema checker.

If it comes to serializers, It will be hard to convince Peter to change the serializer from cbor. So must have good arguments and data that justify this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you may investigate how much it would take and how much faster we would get if we switched to capnproto. It probably would simplify schema control as it has schema files. How this would work in this context too?

We have benchmarks that can use any serializer, so it would be the best to compare those numbers.

Copy link
Collaborator

@marino39 marino39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other solution is needed. The jsonschema will not work with other serializers than json.

@pkieltyka
Copy link
Contributor

btw ... I'm just catching up .. the idea of this PR is to query a table metadata of the schema for a row, and then compare it with the app..? this is a cool idea.

And, the issue is we use pluggable serialization formats, as a result, we won't know the exact schema..?

I wonder... with bond tables, don't we pass the serializer a table is using..? if so, then wouldn't we just include the serializer in the metadata, and compare the schema against that...?

another idea, we could add struct tags like bond:"field-name" to each field, and then use mapstructure to grab the schema, hash it, and then compare...?

I don't think we need to rely on capnproto for this stuff..? as our schemas can vary..? but lmk what you guys are thinking

@marino39
Copy link
Collaborator

@pkieltyka
Yeah, serializing to map[string]any would work for names. We still need to think about field types and how they get serialized e.g. prototyp.Hash could change its serialization/deserialization making it incompatible.

I & @poonai discussed capnproto in relation to filtering on the rows without deserializing them. I have been working on the library that can read msgpack without making any allocations and @poonai happened to know the library that worked the same way. Apparently one of protobuf creators after quitting Google came up with a new serialization library that worked along the same lines. We thought that maybe it could be also useful here. However, probably not relevant to this PR.

@pkieltyka
Copy link
Contributor

hey guys -- yea I've heard of capnproto from way back.. it's just a bit weird for us to have so many serialization formats were using.. I know we have a pluggable serialization system which is nice, and then I guess we're potentially deciding to use one of them for this row schema validation?

or are you mentioning something else, where we have a value from a row to compute its schema without deserializing the data..? if so, when is it useful for us to get the schema before we deserialize the entire value..?

@poonai
Copy link
Collaborator Author

poonai commented Feb 21, 2024

we were discussing for different use case. Not specific to schema. The ideas is to filter rows without deserializing.

eg:

query := Table.Query().Filter(cond.Func(func(row *Row) bool {
   if row.Expired {
      return false
   }
    return true
})

Most of our use cases involves filters, and we have to incur additional cost of serialization while filtering the irrelevant records.

I and @marino39 were discussing for long time about the idea of filtering the rows without deserializing the rows.

@marino39
Copy link
Collaborator

I think we will need to settle for one and remove pluggable serializers. Otherwise, it might be really hard to figure out this schema for all of them.

Since we use cbor already it probably will be the one. @poonai lmk if you have all figured out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants