Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schemas validation #31

Closed
leplatrem opened this issue Apr 30, 2015 · 5 comments
Closed

Schemas validation #31

leplatrem opened this issue Apr 30, 2015 · 5 comments
Milestone

Comments

@leplatrem
Copy link
Contributor

Let's start a discussion about schema validation !

I believe we want something similar to Daybed. Except that we won't
reinvent the wheel with custom schema formalism.

It looks like there is no way around JSON schema. We won't rewrite thousands
of lines of code using Colander (see https://github.com/Julian/jsonschema).
We will use existing validators, and frontends will build forms from schemas.

By default, collections could be schemaless.

PUT|PATCH|DELETE /collections/person/schema
{
    {
        "title": "Example Schema",
        "type": "object",
        "properties": {
            "firstName": {
                "type": "string"
            },
            "lastName": {
                "type": "string"
            },
            "age": {
                "description": "Age in years",
                "type": "integer",
                "minimum": 0
            }
        },
        "required": ["firstName", "lastName"]
    }
}

It is acceptable if the validation of the definition is done using Cornice/Colander.
Even though there are meta-schemas :)

If a schema exists for a collection, records that don't match the schema are
refused with 400 Bad Request (on PUT and after merge of PATCH).

Unlike Daybed, I propose that each user owns the schema of the collection.
Especially because the schema endpoint will probably be a resource :)
Schemas could be cached to avoid overhead of reading it from storage at each incoming record.

This implies that users can store heterogeneous data with the same collection name.

Hence, on first start, the JS application will have to check that the user did
not set any schema (e.g. at /collections/moz:readinglist:articles/schema).
If she did, then confirm to replace it. If she hacks it afterwards and stores
invalid records, the JS application may crash, but that's okay.

Open questions

  • Is the schema using the resource code of Cliquet ? If so, how do avoid overlap of stored collections ? We could use underscores prefix, and prevent public collections to have a name that starts with underscore :)
  • What happen to existing data when schema is changed ? Probably ignore and wait for next
    PUT or PATCH ?
  • What happen when records are shared between users ? Do we let other users
    records crash our application when fetching shared records ? We could probably
    run client-side validation using JSON schema on shared records.
  • Do we want to provide a collection of custom formats or even types ?
    I'm thinking of recurrent needs for uuid4, geohash, GeoJSON objects, phone, postal codes...
@leplatrem
Copy link
Contributor Author

For example, generate Angular forms from JSON schema http://schemaform.io/

@almet
Copy link
Member

almet commented May 8, 2015

Unlike Daybed, I propose that each user owns the schema of the collection.
Especially because the schema endpoint will probably be a resource :)

I don't quite get this. How is that different from daybed? I would propose anyone who can create a collection can also create a schema.

@almet
Copy link
Member

almet commented May 8, 2015

Is the schema using the resource code of Cliquet ? If so, how do avoid overlap of stored collections ? We could use underscores prefix, and prevent public collections to have a name that starts with underscore :)

I think this is handled by the "bucket" concept.

What happen to existing data when schema is changed ? Probably ignore and wait for next PUT or PATCH ?

That's a good question. I believe in this case it should be possible to iterate on all the records and apply a function to them, maybe?

What happen when records are shared between users ? Do we let other users records crash our application when fetching shared records ? We could probably run client-side validation using JSON schema on shared records.

In case we download data from somewhere, we assume it's already validated by the server, so I don't get where the problem lies here?

Do we want to provide a collection of custom formats or even types ? I'm thinking of recurrent needs for uuid4, geohash, GeoJSON objects, phone, postal codes...

I think we should do that but would need to explore the json schema spec further to understand better how to do that.

Also, I think json schema has one big problem: its complexity. It doesn't seem to be simple to use it. As such, we could probably provide a way to create schema in an easy way, which would then map to the standard?

@leplatrem
Copy link
Contributor Author

Unlike Daybed, I propose that each user owns the schema of the collection.
Especially because the schema endpoint will probably be a resource :)

I don't quite get this. How is that different from daybed? I would propose anyone who can create a collection can also create a schema.

Unlike Daybed, the collections are not global. It means that as a user, I can associate a schema to my todo collection, even if someone else already had set a different schema for her own todo collection.

Is the schema using the resource code of Cliquet ? If so, how do avoid overlap of stored
collections ? We could use underscores prefix, and prevent public collections to have a name that
starts with underscore :)

I think this is handled by the "bucket" concept.

Nope, what I meant with this was mozilla-services/cliquet#243.
And that the schema endpoint is built using the cliquet.ressource.BaseRessource class (CRUD).

What happen when records are shared between users ? Do we let other users records crash our
application when fetching shared records ? We could probably run client-side validation using JSON
schema on shared records.

In case we download data from somewhere, we assume it's already validated by the server, so I
don't get where the problem lies here?

I was wondering what happens if two users have a different schema for the same collection name.

Also, I think json schema has one big problem: its complexity. It doesn't seem to be simple to use it.
As such, we could probably provide a way to create schema in an easy way, which would then
map to the standard?

I wouldn't go that way. Maybe if it's too complex, then we can imagine a WYSIWYG JSON schema builder ?

@almet
Copy link
Member

almet commented May 11, 2015

Unlike Daybed, the collections are not global. It means that as a user, I can associate a schema to my todo collection, even if someone else already had set a different schema for her own todo collection.

Then we agree :-) However, the notion of "own" differs a little: with buckets, a resource can have multiple owners.

Gotcha about how we should store the schemas. This should be handled by mozilla-services/cliquet#243 then.

I was wondering what happens if two users have a different schema for the same collection name.

We need some kind of namespacing here (and I believe this is achieved through buckets). Like on github: leplatrem/cliquet differs from ametaireau/cliquet.

I wouldn't go that way. Maybe if it's too complex, then we can imagine a WYSIWYG JSON schema builder ?

I don't know the json schema spec well enough to make a call, but it seems that it would be harder to do it that way than allowing a simpler format.

@leplatrem leplatrem modified the milestone: 1.4.0 Aug 14, 2015
Natim added a commit that referenced this issue Aug 28, 2015
Validate JSONSchema for collections (ref #31)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants