Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feathers Schema: A common data model definition format #2312

Closed
daffl opened this issue Apr 22, 2021 · 29 comments
Closed

Feathers Schema: A common data model definition format #2312

daffl opened this issue Apr 22, 2021 · 29 comments

Comments

@daffl
Copy link
Member

daffl commented Apr 22, 2021

This issue consolidates the work from feathersjs/schema and the discussion from feathersjs-ecosystem/schema#20 into an up to date proposal in the main repository since I think for now it makes the most sense to include the @feathersjs/schema here (with the option to turn it into a standalone module later once it has been used for a bit).

Problem

A very common problem we are seeing in web application development at the moment, especially in the NodeJS world, is that data model definitions are duplicated in many different places, be it ORM models, JSON schema validators, Swagger/OpenAPI specifications, TypeScript type definitions or GraphQL schemas. It is difficult to keep them synchronized and any new protocol or database is probably going to need the same work done again.

You can find converters from almost any specific format to any other format on npm but there are many challenges from relying on a compilation step (generating duplicate code that doesn't really have to exist in the first place) to only covering a small subset of functionality. It also appeared a majority of those modules is no longer actively maintained. ORMs are trying to address the challenge by locking you into their way of doing things with database or ORM specific idiosynchrasies dictating how flexible you can be in defining your model.

Feathers schema

Feathers schema provides a common schema definition format for JavaScript and TypeScript that allows to target different formats like GraphQL, JSON schema (Swagger/OpenApi), SQL tables, validations and more with a single definition. Similar to how Feathers services allow different transport mechanisms to access their API without having to change your application code, schemas are Feathers answer to address the ever expanding list of data schema definition and validation formats.

It does not only address the definition of data types but also how to resolve them within the context of your application. This is a different approach to most ORMs where you define your data model based on the database (or other ORM specific convention). Some examples where schemas are useful:

  • Ability to add read- and write permissions on the property level
  • Complex validations and type conversions
  • Shared model definition types between the client and server (when using TypeScript)
  • Associations and loader optimizations
  • Query string conversion, validation (no more type mismatches) and protections
  • Automatic API docs

How it works

@feathersjs/schema uses JSON schema as the main definition format with the addition of property resolvers. Resolvers are functions that are called with the application context and return the actual value of a property. This can be anything from the user associated to a message to the hashed password that should be stored in the database.

Schema definitions

import { schema } from '@feathersjs/schema';

const UserSchema = schema({
  type: 'object',
  required: [ 'email', 'password' ],
  additionalProperties: false,
  properties: {
    id: { type: 'number' },
    email: { type: 'email' },
    password: {
      type: 'string',
      minLength: 8
    }
  }
});

const MessageSchema = schema({
  type: 'object',
  required: [ 'text' ],
  additionalProperties: false,
  properties: {
    id: { type: 'number' },
    userId: { type: 'number' },
    text: {
      type: 'string',
      maxLength: 400
    }
  }
});

// This defines a `data` schema (for `create`, `update` and `patch`)
// based on UserSchema that hashes the password before putting it into the db
const UserDataSchema = schema(UserSchema, {
  properties: {
    password: {
      resolve (password) {
        return hashPassword(password);
      }
    }
  }
});

// A new schema based on MessageSchema that includes the associated user
const MessageResultSchema = schema(MessageSchema, {
  properties: {
    user: {
      resolve (value, message, context) {
        const { app, params } = context;

        return app.service('users').get(message.userId, params);
      }
    }
  }
});

TypeScript

TypeScript types can be inferred from schema definitions. This is where to me TypeScript finally is starting to make real sense 😸 You get type definitions, dynamic validations and resolvers all in one place:

import { schema, Infer } from '@feathersjs/schema';

const UserSchema = schema({
  type: 'object',
  required: [ 'email', 'password' ],
  additionalProperties: false,
  properties: {
    id: { type: 'number' },
    email: { type: 'string' },
    password: {
      type: 'string',
      minLength: 8
    }
  }
} as const);

type User = Infer<typeof UserSchema>;

const user = UserSchema.resolve({
  email: 'hello@feathersjs.com',
  password: 'supersecret'
});

Both the User type and user variable will have a type of the following interface:

type User = {
  id?: number;
  email: string;
  password: string;
}

Using schemas with Feathers

In a Feathers application schemas can be passed when registering a service via the options introduced in v5. Different schemas can be passed for validating data and params.query as well as resolving the result object returned to the client.

import { feathers } from '@feathersjs/feathers';

const app = feathers();

// One schema for everything
app.use('/messages', new MessageService(), {
  schema: MessageSchema
});

// Different schema for `result`, `data` and `query`
app.use('/messages', new MessageService(), {
  schema: {
    result: MessageResultSchema,
    data: MessageDataSchema,
    query: MessageQuerySchema
  }
});

// With override for a specific method
app.use('/messages', new MessageService(), {
  schema: {
    result: MessageResultSchema,
    data: MessageDataSchema,
    query: MessageQuerySchema,
    methods: {
      patch: {
        data: MessagePatchSchema
      }
    }
  }
});

Feedback wanted

Development is currently happening in the schema branch and will eventually be moved to dove. I will start adding documentation in the Dove API docs section once a pre-release PR is ready. Many thanks already to the feedback to @DaddyWarbucks, @DesignByOnyx and @mrfrase3, please continue to challenge my assumptions if you see anything else that this proposal is missing 😄

@mrfrase3
Copy link
Contributor

Funny you post this on the day that I'm making the schedule for our backend refactor. I'm planning on starting in the next few weeks, so if you want help implementing this, I'm down.

Relations

Something that would be nice and potentially reduce a lot of boilerplate is defining the relation of an Id in the schema.

In your above examples, you had userId in your MessageSchema; In a developer-readable way, it is obvious that it points to your users service, but programmatically it isn't.

Maybe something along the lines of:

const MessageSchema = schema({
  type: 'object',
  required: [ 'text' ],
  additionalProperties: false,
  properties: {
    id: { type: 'number' },
    userId: { type: 'number', relation: 'users' },
    text: {
      type: 'string',
      maxLength: 400
    }
  }

You could then internally validate that a userId passed actually exists in users

Extendability

What would be nice to see happening is people extending the base @feathersjs/schema with their own resolvers libraries, which means people could add features to the schemas that they want without having to define resolvers on every service and every field.

Take the point about relations above, what if I could specify that as an extension:

const MessageDataSchema = schema(MessageSchema, {
  extensions: [{
    async resolve (value, data, context, fieldName, fieldSchema) {
      if (fieldSchema.relation) {
        const { app, params } = context;
        const ids = Array.isArray(value) ? value : [value];

        const { total } = await app.service(fieldSchema.relation).find({
          query: { id: { $in: ids  }, $limit: 0 },
        }, params);
        if (total !== ids.length) {
          throw new BadRequest(`${fieldName} provided does not exist in service ${fieldSchema.relation}`);
        }
      }

      return value;
    },
  }],
});

or maybe more simply:

const UserSchema = schema({
  type: 'object',
  required: [ 'email', 'password' ],
  additionalProperties: false,
  properties: {
    id: { type: 'number' },
    email: { type: 'email' },
    password: {
      type: 'string',
      format: 'password',
      minLength: 8
    }
  }
});

const UserDataSchema = schema(UserSchema, {
  extensions: [{
    resolve (value, data, context, fieldName, fieldSchema) {
      if (fieldSchema.format === 'password') return hashPassword(value);
      return value;
    },
  }],
});

// or

import { passwordExtension, relationExistsExtension } from '@feathersjs/common-schema-extensions';

const UserDataSchema = schema(UserSchema, {
  extensions: [
    passwordExtension(),
    relationExistsExtension({ idField: 'id' }),
  ],
});

My fiddle with how that would look like is an extension has a resolver that is run on every field, with the fieldName and fieldSchema passed through, the resolver looks at the fieldSchema to see if it should do anything.

I'm not sure how the order of operations would work out on that... 🤔

@KidkArolis
Copy link
Contributor

Hi, glad to see the use of json schemas, I think a lot of work has gone into standardising that and it will provide good flexibility in the context of Feathers.

Some questions and observations.

Resolving

Can't quite wrap my head around resolving yet. When would you use schema level resolves vs do the same kind of thing in the hooks? Also, won't you get into circular situations, where post resolves user, and user resolves posts, and each post resolves user again, and so on.

New options API

How are you thinking about these new service options (that you can now access via getServiceOptions) in terms of mixing it with the old options.. When would one pass things into service vs attach it as this new option. In our app, we typically pass a set of standard(ish) options into our services like so:

  const options = {
    app,
    name: 'equipment',
    Model,
    paginate,
    multi: ['create'],
    owner: 'person',
    scope: 'private',
    softDelete: true,
    schemas,
    docs,
    encrypted,
  }

app.use('/api/equipment', new Equipment(options))

But now, there would be another way to pass options:

app.use('/api/equipment', new Equipment(options), alsoOptions)

I guess all options could be moved to the new option format, but wondering what the benefit is. Is it that this would be standard across feathers apps, so more functionality can be built atop of this new structure?

Dynamic schema switching

Sometimes, in our app we pick one of the query/data/result schemas on the fly based on context/data. E.g. context.user.role or data.authorId might dictate which schema we pick for validating the input (e.g. admins can do more, than regular users, or owners of the piece of data can do more, etc.) and similarly for filtering the result (e.g. based on the role we show more or less of the fields). This can be achieved with a custom resolver hook that accounts for a more intricate app specific needs I suppose. E.g. we break the result schema (we call them read schema) into per role schema object {owner, admin, user, self}. Not sure there's anything to do here, but just wanted to bring this point to the attention in case you see some connection to how you're thinking about these things.

@daffl
Copy link
Member Author

daffl commented Apr 22, 2021

@KidkArolis True, resolvers could be hooks but in many of the apps I saw I noticed that often things that are part of the same data model end up spread throughout many different locations and it can be difficult to discern what is actually happening to your data. For example, some business logic ends up in a custom JSON schema formatter while the password is hashed in a hook, the user id is added in another hook and a null check is in the ORM model. Often this also means that errors come back in many different formats.

Schema resolving still happens in a hook but what I'd like to see is hooks being mainly used for workflows (like logging or sending an email). Resolvers are one of the (few) things I think GraphQL does well which is why this is also a pragmatic choice since otherwise a GraphQL transport would be more complicated (with this it can basically happen automatically). As for circular dependencies, nested properties that need to be resolved would need to be passed explicitly (just like in GraphQL) with an additional guard that makes sure that the same resolver function can't run twice on the same path.

The options is a good point, I was also wondering if this could be confusing. The problem with service options is that it's up to the service how to handle them. What I'd like to accomplish is move everything that is Feathers specific out of those (like custom events or methods). For schemas, the alternative would be to just register a hook with the schema you want to resolve, but there are cases where you might need access to all schemas (e.g. for a GraphQL integration or when using fast-json-stringify) which isn't easily possible with a hook.

@daffl
Copy link
Member Author

daffl commented Apr 22, 2021

@mrfrase3 Yes, there definitely needs to be some indication for relationships. I was hoping to mostly rely on the $ref JSON schema keyword and a collection of pre-canned resolver functions very similar to your extensions, for example:

import { checkExists, resolveWith } from '@feathersjs/schema-feathers';

const MessageSchema = schema({
  type: 'object',
  required: [ 'text' ],
  additionalProperties: false,
  properties: {
    id: { type: 'number' },
    userId: {
      type: 'number'
    },
    text: {
      type: 'string',
      maxLength: 400
    }
  }
});

const MessageDataSchema = schema(MessageSchema, {
  properties: {
    userId: {
      resolve: [
        (value, message, context) => {
          const { user } = context.params;

          if (value !== undefined && isAdmin(user)) {
            return value;
          }

          // Non admins can only send messages as themselves
          return user.id;
        },
        checkExists({
          service: 'users',
          method: 'get',
          id: 'userId'
        })
      ]
    }
  }
});

const MessageResultSchema = schema(MessageSchema, {
  properties: {
    user: {
      resolve: resolveWith({
        service: 'users',
        method: 'get',
        id: 'userId'
      })
    },
    reactions: {
      resolve: resolveWith({
        service: 'reactions',
        method: 'find',
        query: (value, message) {
          return {
            messageId: message.id
          }
        }
      })
    }
  }
});

@mrfrase3
Copy link
Contributor

I think the main issue I was trying to resolve was that I, like many other developers, am lazy. I also maintain an application with massive amounts of data, and I really cannot be bothered registering resolvers on every field. (One of the many things that scared me away from GraphQL)

The other issue is that we seem to be taking data validation out of the schema again. IMO, you should be declaring the properties of validation on the schema, and then just have background resolvers/validators that read the declaration and run on it.

In my example with the password, if we translated it into swagger docs, it would display format: password, whereas if we did all of this manually in resolvers, that documentation would be lost. From a glance I can tell what's going to happen to that data without having to read any code.

Furthermore, if I then were building a form dynamically from the schema, the builder would know to make it a password field, because it's right there on the schema.

Maybe we are getting too far ahead of ourselves, these schemas really should only be declaring how the data should be structured and validated, no more, no less. The rest can be thrown into hooks. (I wrote this after I wrote the part bellow)

Population

With @KidkArolis 's query, it's also my opinion that population should probably stay out of this and be in hooks, whilst it serves as a good example, people should probably use one of the many perfectly fine solutions to this.

What would be cool though is if we took a page out of marshall's book with feathers-vuex and register methods directly on the data.

On our front end, I setup population "methods" on the data, so if say I wanted to get the user of a message, I'd call message.user(). These are nice because you're not doing any calls to the database until you actually need the data, they also drop off when the data gets converted to json for transport, and they remove a lot of boilerplate code.

Also being able to call .save() would be cool.

I actually don't like the idea of server side population as it leads to being very complicated and there being a lot of redundant queries. Why do I need message.user when I already have the user object in my client cache, being auto updated by channels?

@daffl
Copy link
Member Author

daffl commented Apr 23, 2021

I missed the format: 'password' part. I like that because based on that you could add rules like hashing it before adding to the db and making sure it doesn't get returned to the client. I believe it's also a part of the JSON schema spec so no reason that couldn't be a thing.

As for resolvers/population (which to me is the same thing):

  • I concede that it should be more explicitly separate usage wise from the JSON schema definition itself
  • A resolver for every field shouldn't be needed if it doesn't do anything to change the field.
  • I don't see a reason not to use them on the client instead if you want to. You could even define virtual getters that allow lazy loading by calling a client side resolver via something like const user = await message.user. From what I understand that would be very similar to the client side methods you are mentioning (I did spec parts of the resolver mechanism out with @marshallswain earlier on because of his work on client side population).
  • It's likely that everybody in this discussion has their preferred way of populating/fetching associated data but it is and always has been a huge pain point for beginners (there was another two new questions about it on Stackoverflow just today) and is one of the most common questions by companies I talk to:
    • Mongoose $populate and Sequelize includes can be difficult to implement via hooks and nested data do not run through Feathers mechanisms which can cause security problems when you are not aware of what you are populating (like the user with password).
    • There are many different hooks, again with different population schema formats. feathers-hooks-common alone has two different ones (which is regularly confusing people). Since there is no longer a maintainer for them I'd much rather take the lessons learned and incorporate it into a more official Feathers module (these common hooks were initially started because there's been discussions about the need for a better schema/association tool for almost 5 years now).

I'm definitely all for "if it ain't broke don't fix it" but in this case it would be great to work towards a common solution that works for the 95% of use cases since it has been such a pervasive challenge for so long.

@DaddyWarbucks
Copy link
Member

** EDIT: I initially made this comment under a different account that I used and did not recognize it til after posting. The inital comment has been deleted and reposted under this account **

Hi, seems like some good feedback so far!

David can correct me if I am wrong here, but I think its important to keep in mind that feathers-schema is not really just about "data" or "ORM style" validation/population...its about "service" query/data/population. And I think David is challenging the traditional approach of that data-centric view. As a newcomer to Feathers I had to wrap my head around the idea that the service interface replaces the ORM interface. Pre-feathers, when using just Express and Mongoose for example, there was no question where validation/population happened...you stuffed it on the model. And when you had an endpoint that wasn't explicitly tied to a model...you winged it with some other validation/population tool. But once that Feathers/Hooks SOA idea clicked, I recognized the decoupling of the payload (in or out) and the model interface. The service is the interface, and that includes the query, data, and population...all of which are not necessarily tied to this service's "model".

For example, I think a feature that has been mentioned by David but maybe not fully appreciated is the query validation/resolvers. For example,

// When using REST, the query `messages?userId=1` is going to result in a String(1) on the server, not a Number(1).
// So we can use something like this that will cast the string back into a number. Some ORM's handle this coercion,
// but many don't so I like how this schema can handle that. And many services aren't tied to an ORM.
const MessageQuerySchema = schema({
    // ...other stuff
    userId: { type: 'number' },
  }
});

// We could also define cross service queries where we search another service and return some ID's for this service
// See: https://daddywarbucks.github.io/feathers-fletching/hooks.html#joinquery
// The linked hook above works great, but that is better defined on some schema which I think is
// part of the value prop to feathers-schema.
const MessageQuerySchema = schema({
    // ...other stuff
    id: {
      type: 'number',
      resolve: joinQuery
    },
  }
});

All in all I like where this is going. I can't tell you how many times I have seen:
Q: What is the "feathers way" to do Mongoose (or whatever ORM) populate?
A: Well......you can just use Mongoose populate as per the Mongoose docs, but if you do that its not going through the service and you may accidentally join on the user PW, etc, etc, etc....So do it the Mongoose way is the easy answer, but you should really try Hook-X blah blah.

I think many Feathers newbies expect some full-featured Feathers ORM when using feathers-monoose, feathers-sequelize, etc and are confused on some "feathers way" of using those. But there isn't currently...because the "feathers way" is not tied to the ORM, its tied to the service interface and that can be hard to describe to new learners. So I think feathers-schema is a good answer to that. As David described earlier, we do all of these operations all over the place, and I think feathers-schema is a step towards the "Feathers way" of doing them.

@mrfrase3
Copy link
Contributor

@daffl I think if population is such an issue for beginners, then we should go full ham and provide a baked-in solution that is "beginner proof", i.e. making it as simple and straightforward as possible, which likely means limited in scope, if people want advanced population options, they can stop using this method and use other prior-mentioned options/methods.

Looking at it, $ref is just for being able to clone a sub-schema from elsewhere, not really population. I can't seem to find anything about population definitions in json schemas, but I do remember how this was resolved in the feathers-plus-cli with graphQL population resolvers. We could do something similar:

const MessageSchema = schema({
  type: 'object',
  required: [ 'text' ],
  additionalProperties: false,
  properties: {
    id: { type: 'number' },
    userId: {
      type: 'number'
    },
    text: {
      type: 'string',
      maxLength: 400
    }
  },
  populate: {
    user: { service: 'users', relation: 'userId' },
    reactions: { service: 'reactions', relation: { ourTable: 'id', otherTable: 'messageId' } },
  },
});

Maybe with the format front, we could allow extending with "formatters", something like:

schema.registerFormatters({
  password: {
    data(value, item, context) {
      return hashPassword(value);
    },
    result(value, item, context) {
      return context.params.provider ? undefined : value;
    },
    query(value, item, context) {
      if (context.params.provider) throw new NotAuthorized('Ahhh, no');
      return value;
    },
  },
  json: {
    data(value, item, context) {
      return JSON.stringify(value);
    },
    result(value, item, context) {
      return JSON.parse(value);
    },
  },
  objectId: {
    data(value, item, context) {
      return mongo.ObjectId(value);
    },
    result(value, item, context) {
      return `${value}`;
    },
  },
  s3: { // this one might be pushing it :P
    async data(value, item, context) {
      if (value.length < 4096) return JSON.stringify(value);
      await s3.putObject({
        Bucket: 'large-fields',
        Key: `${item.id}.txt`,
        Body: Buffer.from(value),
        ContentEncoding: 'base64',
        ContentType: 'text/plain',
      }).promise();
      return JSON.stringify({ Bucket: 'large-files', Key: `${item.id}.txt` });
    },
    async result(value, item, context) {
      const val = JSON.parse(value);
      if (typeof val === 'string') return val;
      const { Bucket, Key } = val;
      const { Body } = await s3.getObject({ Bucket, Key }).promise();
      return Body.toString('utf8');
    },
  },
});

@L1lith
Copy link

L1lith commented May 2, 2021

I have been focusing a lot of my time on Sandhands and I believe it is very well tested and has lots of neat sanitation features, and I've recently created a feathers adapter for it called Sand Feathers. It may be a good option for users, or if the feathers team finds a custom implementation is more suitable they could simply use Sandhands directly and reference Sand Feathers as a starting point.

I'm not a feathers expert but I believe it has been implemented properly (I would appreciate anyone willing to open issues or pull requests to make sure the adapter is functioning properly and securely), and I wrote a series of mock-ups in the test cases to simulate API calls.

Overall, I believe it may be beneficial to the Feathers team to consider Sandhands for effective general purpose data validation in JS.

P.S. One cool trick about SandFeathers is that within the format if you do not define the format for the _id property, it automatically populates that field of the format for you since it's almost always going to be expected to be an ObjectID.

@DaddyWarbucks
Copy link
Member

Should params replace the query keyword? For example

app.use('/messages', new MessageService(), {
  schema: {
    result: MessageResultSchema,
    data: MessageDataSchema,
    params: MessageParamsSchema
    // query: MessageQuerySchema
  }
});

It feels odd that there is a query keyword instead of params when data, params, and result are the three "building blocks" of context. While I agree that params.query is the most common use case, there are still cases where a user may want to resolve/validate other properties in params.

@L1lith
Copy link

L1lith commented May 6, 2021

@DaddyWarbucks The way I handled this was by creating multiple hooks for each data input/output (seen here). The query hook is automatically made to not be strict, so you need not supply a fully populated object, but on the output validation it's strict by default (so you don't end up missing any properties). It might seem a bit redundant but I think enforcing data validation at both the exit and entry points of data is important as it can prevent corrupted or malicious data from being sent to or being received from the user.

@DaddyWarbucks
Copy link
Member

@L1lith This is an RFC for feathers-schema. My question was aimed at feathers-schema specifically. Can we please keep this thread on topic. If you would like feedback on SandHands, please open another RFC in its own repo or you can visit the Feathers Slack channel and ask for feedback.

@L1lith
Copy link

L1lith commented May 6, 2021

I still thought the conversation was relevant to the general topic, I'm just reference the way I implemented it as you're basically doing the same thing and I thought it could be helpful as an example. I'll drop it since my feedback isn't helpful for you at this time.

@DaddyWarbucks
Copy link
Member

Your feedback is definitely welcome and appreciated! I think feathers-schema has landed on AJV and JSON Schema for validation.

@DesignByOnyx
Copy link
Contributor

@daffl - thanks so much for your work on this. I really really appreciate the separation of the schema from the resolvers - this will really help teams share JSON Schemas while allowing individual teams to expand upon those definitions as they see fit.

Speaking of extensibility... instead of overloading the schema function as is shown in the examples above, can we consider keeping schema simple, and allow extensions to implement a common interface. I'd even almost consider resolvers the "first extension".

// schema-setup.js (or some other high-level file)
import { schema } from 'feathers-schema';
import { relationships, relatesTo } from 'third-party-library';

// Decorates the core schema with additional keywords, resolvers, and/or whatever else...
// This is similar to AJV's API - SEE: https://ajv.js.org/packages/
relationships(schema, { /* optional config */ })

// Or something like this (also kind of based on AJV - https://ajv.js.org/guide/user-keywords.html)
schema.addKeyword('relatesTo', relatesTo);

Then later on...

// UserSchema.js
import { schema, resolvers } from 'feathers-schema';
import { someOtherExtension } form 'third-party-library';

const UserSchema = schema({
    // Only JSON Schema goes here.
    // Any additional keywords can be used per the configured extensions.
})

resolvers(UserSchema, { /* define resolvers here, optionally overriding ones added by extensions */ })
someOtherExtension(UserSchema, { /* some common interface here: data/query/result/etc */ })

@DaddyWarbucks
Copy link
Member

It appears I am the only one that likes the concept of the schema and resolver being one. (sad face emoji)

@daffl I need a clarification on the scope of this idea. @feathersjs/hooks brings middleware to any JS function, not necessarily just feathers. Is the goal of this project to similarly bring validation, resolving, typing to any JS object, not necessarily just feathers?

From my understanding, I think there is some overlap of two ideas/conversations here.
1 - Feathers Dove will offer a new way of storing and retrieving service options, including a schema.
2- feathers-schema is a new idea that validates/resolves/types objects.

And the overlap of those two conversations is "How do we use this new feathers-schema in a feathers service including methods, data, query, result, hooks, context, population, etc".

I am interested in the idea of a generic object validate/resolve/type that exists outside of the scope of feathers. I know in a previous post I ranted about the "feathers way", but I think that this generic schema thing still aligns with that. The decoupled/agnostic nature of feathers services (which are not inherently api routes, database ORM's, etc) lends itself to a generic schema. And a generic "resolver" on that schema can be used for anything from data massaging to population.

So what are we talking about here? An object schema (despite its name) or a feathers service schema?

@daffl
Copy link
Member Author

daffl commented May 24, 2021

@DesignByOnyx Should be doable. It's looking like schema is really just a convenience wrapper for AJV at the moment (and for helping with TypeScript type inference) so maybe we can start with the direct AJV functionality?

@DaddyWarbucks That's what I was thinking initially and why I put it into a separate module at https://github.com/feathersjs/schema. Basically

  • @feathersjs/schema - Generic object schema definitions and resolvers
  • @feathersjs/schema-hooks - Hooks to integrate those definitions and resolvers into your Feathers app. I think the best way is to keep it all explicitly in hooks - I know I've been flip flopping about that (e.g. adding schemas to the service options) - but I think that makes it most clear on what is happening.

So yeah, good point, maybe it should stay in that other repository after all.

This is super helpful everybody, thank you everybody for all your input!

@DaddyWarbucks
Copy link
Member

DaddyWarbucks commented Sep 15, 2021

I initially opened a PR outlining a resolver pattern as a conversation starter for feathers-schema. But it was super basic and not really complete. It can be seen here: #2341 (comment)

Feathers-schema has changed quite a bit since I last looked at it, so rather than updating a PR, I decided to throw it in a code sandbox which can be run easily. The main point of this little example is to offer the resolver a way to resolve its "siblings". It also offers a way for the developer to add additional methods to the resolver for their own uses.

See the sandbox here: https://codesandbox.io/s/keen-sanne-306ot?file=/src/index.ts

And a basic usage below

const data = {
  userId: "456"
};

const messageSchema = new Schema({
  user: async function (value, data, context) {
    console.log("Resolving user");

    // Ensure userId exists on the payload and is
    // valid before calling a service with the id.
    // const userId = await this.validateAt("userId", data.userId);

    // return context.loader.service('load').load(userId);

    return {
      name: "Rogue"
    };
  },
  isRougue: async function (value, data, context) {
    const user = await this.ref("user");
    return user.name === "Rogue";
  },
  isNotRogue: async function (value, data, context) {
    const isRogue = await this.ref("isRougue");
    return !isRogue;
  }
});

messageSchema.resolve(data, {}).then(console.log).catch(console.error);

This code is miles away from the current state of the resolver seen here: https://github.com/feathersjs/feathers/blob/dove/packages/schema/src/resolver.ts

But I can see how this feature could be added. We are passing the resolverStatus to each resolver

return resolver(value, data, context, resolverStatus);

So we could add some memoized/toposorted function to it that can be used in each resolver.

user: async function (value, data, context) {
    console.log("Resolving user");
    // return context.loader.service('load').load(userId);

    return {
      name: "Rogue"
    };
  },
  isRougue: async function (value, data, context, resolverStatus) {
    const user = await resolverStatus.resolveProperty("user");
    return user.name === "Rogue";
  },
  isNotRogue: async function (value, data, context) {
    const isRogue = resolverStatus.resolveProperty("isRougue");
    return !isRogue;
  }

@DaddyWarbucks
Copy link
Member

DaddyWarbucks commented Sep 26, 2021

So I am a few hours into using feathers-schema and have some feedback. I have a laundry list of complaints about JSON schema and AJV as a whole. I know why they were used, because we can infer types, we can generate docs, and it's a standard. But damn they are cumbersome.

  • I think we need pick and omit methods on the schema. These methods are available on most validation libraries and I use them daily. For example
// This is a hook I use to validate some data before sending it to a third party service.
// So I don't need to validate the whole payload because I don't have it. For example,
// after this hook I upload a file to cloudinary that returns some data that is then
// merged with context.data and then that whole payload is validated by the whole schema

// Validates data needed before cloudinary upload
const preValidateData = async (context) => {
  const schema = context.service.schema.pick([
    'entity_id',
    'entity_type',
    'document_type'
  ]);
  await schema.validateData(context.data, { context });
  return context;
};
  • I need synchronous validation. I plan to use schemas for both server and client side validation. I use hooks on both the client and server and those are perfectly fine async, of course. But I also use schemas in form validation as the user types and this should be synchronous. Using AJV, the properties must be marked as $async. See https://ajv.js.org/guide/async-validation.html . So if you have any async keywords, you can't validate that synchronously at all. I believe Joi does this and I could have sworn Yup did too, but I just checked the docs and it does not. Ooof, maybe I am on al old version of yup. So I will likely need to extend the Schema class with a validateSync method that recursively walks the properties and removes the async keywords, clones the schema and validates against that cloned schema? I am not sure how I will accomplish that.

  • I am not sure we can pass context to custom validations? I have been reviewing adding custom keywords and I don't see how I can pass context to the validation function...thats kind of a deal breaker. I need this:

const data = await schema.validate(data, { context })
  • It takes sooo much code to add a label to validation messages. Even when using ajv-errors, its a pain to write a custom validation error for every property. Again, I am used to yup where this is built in.
{
  user_name: yup.string().required().label('User Name')
}

// error: 'User Name is a required field'

Lots of this is my discomfort with AJV and JSON Schema, but I wanted to capture my initial thoughts as I am starting to use it. Maybe you guys can give me some AJV pointers. I also want to share my current solution for some context one where my opinions are coming from. The below code is how I currently do validations.

const schema = yup.object({
  user_id: yup
     // ._id() is a custom method I wrote for mongo ObjectId's
    ._id()
    //.belongsTo() is a custom method for validating this id belongs to this service
    .belongsTo('api/users')
    // Dynamic required based on method
    .requiredOnMethod(['create', 'update'])
     // use meta/describe to store all kinds of stuff. This is where I typically store config for stuff
     // like joinQuery hook and resolvers. I also store some ORM specific stuff here
    .meta({ service: 'api/users', foreignKey: '_id', as: 'user', mongoose: {...} })
    .label('User ID') // so easy...
})

// Using schema.describe() its pretty easy to get everything you need to convert a schema or
// get to the meta data to config hooks, etc
const mongooseSchema = convertSchema(schema)

Yup also infers TS types... just saying 😉 . And there is probably a yup-to-json library out there.

@DesignByOnyx
Copy link
Contributor

Having used both yup and JSONSchema (AJV), I have the following opinions:

  • JSONSchema is not as rich, but you can usually add the richness you need. Yup is a whole lot of API and very difficult for developers to understand or implement the advanced stuff.
  • In JSONSchema, the schema is all about describing the data. AJV allows you to extend this for custom validation and other stuff dealing directly with the data. Anything more should be separate from the schema. For me, this includes relationships, metadata, form labels, etc. This is definitely my trauma and PTSD with ORMs. I found it much easier to have schemas describe the data, a relationship manager manage that stuff, form managers for all that stuff, etc.
  • JSONSchema is shareable across teams. This was invaluable for projects where different teams used different languages - both teams needed to validate against the core schema while each branched off with their own logic, error messages, business rules, etc.

In gerneral, I found AJV's approach to favor multiple instances as opposed to one giant schema and a bunch of API to act on that one schema. This results in faster performance and reduced API in exchange for a little more memory and file size (that might not even be entirely true either... definitely more schemas, but each with its own clear purpose). This solves your pick/omit problem as well as your async/sync one. It's more code, but it's very clear what each schema for, and it can be done in a DRY fashion. It's possible to automatically generate async schemas from sync ones, and shared fields can be defined in a common place and reused/extended in multiple schemas.

@DesignByOnyx
Copy link
Contributor

Also, just learned about JSON Type Definition (JTD) - it's an actual RFC as of January (unlike JSON Schema), looks a lot like JSON Schema, but is easier, more concise, and optimized for code generators (unlike JSON Schema). It also has a "metadata" field which can be used to hold anything you want. AJV already works with JTD.

@DaddyWarbucks
Copy link
Member

^^ Solid feedback.

My main hangup is with AJV more than JSON Schema. JSON Schema (or JTD) just makes sense. My main hangup is not being able to pass context/options to the validate method. I think we would be doing ourselves a disservice to create a feathers validation tool that can't use context during its validation. There is this potential solution: https://ajv.js.org/options.html#passcontext
Everything else with AJV I can get over.

I have the opposite PTSD as @DesignByOnyx where the PTSD stems from managing X different places for the relationships, metadata, form labels, etc. I tend to keep my ORM very thin because I rely on the service level schema so heavily, so thin thats its pretty easy to transform JSON schema to a model. And a "label" or "nice name" is the only thing I expect from the "form manager" piece. So I actually do want to be able to describe all of those things in the schema. To be fair, I have had a bad experience similar to what @DesignByOnyx mentioned here too. It's easy to create a "god class" for sure. Thus why I swung to the opposite side and now have to describe everything 5 times over...so I am kinda burned on that side of things ATM.

I suggested to David in another conversation a "feathers" property in the schema, and from that junk drawer of a property we could manage lots of other stuff...if we wanted to.

{
  type: "object",
  properties: {
    user_id: {
      type: "string",
      feathers: {
         ... maybe some relationship stuff, maybe some ORM stuff, etc
      }
    }
  }
}

That feathers prop could be plucked off before being shared to clients, keeping it a true JSON schema and not leaking server side info. But it would also give us the ability to more easily/accurately create the schemaToSequelize and schemaToMongoose that we have talked about, if that is still a goal. I have done basically this same thing with the yup example above

.meta({ service: 'api/users', foreignKey: '_id', as: 'user', mongoose: {...} })

and it works well.

Maybe I am trying to shoehorn too much into this too.

@DaddyWarbucks
Copy link
Member

DaddyWarbucks commented Oct 26, 2021

Some more feedback. I don't love the resolver function signature.

export type PropertyResolver<T, V, C> = (
  value: V|undefined,
  obj: any,
  context: C,
  status: ResolverStatus<T, C>
) => Promise<V|undefined>;

I have found that I am more often using resolvers to join properties that do not yet exist on the object. This means that the value argument is generally undefined. And I don't mind having to pluck the value off of the object when I do need it.

For example, I think I prefer this

// Note there is no `value` argument, instead we just get what we need off of `obj`
resolve (obj, context, status) {
  return context.app.service('users').get(obj.userId);
}

or more concisely we can just use object destructuring.

resolve ({ userId }, context, status) {
  return context.app.service('users').get(userId, params);
}

@fratzinger
Copy link
Member

+1 on all things, @DaddyWarbucks said!

I'm late to the party but want to share my two cents. I try to keep it short. I'm using the following libraries and besides it's an effort to keep them in sync, they all are amazing:

  • server models: feathers-sequelize, feathers-graph-populate and feathers-fletching/joinQuery
  • client models: customized feathers-vuex and some yup

What I'm excited the most about @feathersjs/schema is the point:

Shared model definition types between the client and server (when using TypeScript)

I would like to keep the focus on the feathers-vuex application. I would love to bake in @feathersjs/schema to feathers-vuex/feathers-pinia to make it a full blown typed frontend ORM. Actually I'm close to that without @feathersjs/schema.

feathers-vuex uses Models. I added a custom Model.init(schema) function to make it an ORM. I shared the gist with @marshallswain once. There was a discussion at feathersjs-ecosystem/feathers-vuex#397 (comment). In my mind this 'schema' should be replaced by @feathersjs/schema. To accomplish that, I need the keyHere/keyThere pattern or belongsTo/hasMany pattern for @feathersjs/schema. Here is why. What is my current setup? Bear with me:

service /users with User model

type User = {
  id: number
  name: string
  todos: Todo[]
  getTodos: Function
}

service /todos with Todo model

type Todo = {
  id: number
  text: string
  userId: number
  user: User
  completedAt: Date
  getUser: Function
}

My schema for Todo.init(schema) looks like the following:

Todo.init({
  id: {
    type: Number,
    default: null,
  },
  text: {
    type: String
    default: ""
  },
  user: {
    belongsTo: "User",
    secondaryKey: "userId",
  },
  completedAt: {
    type: Date,
    default: () => new Date()
  }
});

The init does some custom magic with Object.defineProperty and leverages instanceDefaults and setupInstance from feathers-vuex. My models act like the following:

  1. When I fetch a todo with a populated user object from the server. The user becomes an User instance automatically because of Object.defineProperty('user', { get() {}, set(val) {} }).
  2. todo.user is fully reactive. The user is stored in the central store. If I edit the user.name elsewhere, also todo.user.name will be changed
  3. When I assign todo.user = existingUser, the todo.userId also gets replaced.
  4. When I assign todo.userId = existingUserId, the todo.user also gets replaced
  5. feathers-vuex has temporary items with idTemp, which totally works with this ORM
  6. todo.completedAt is a native Date object by default.
  7. todo.getUser() can be called (same as @mrfrase3 mentioned above)
  8. The points described above are a belongsTo pattern. My 'schema' also work for hasMany pattern.

This behavior of my custom implementation of feathers-vuex works pretty well for me. I would replace Todo.init(schema) with something like the following:

defineModel({
  $id: 'Todo',
  type: 'object',
  additionalProperties: false,
  required: ['text', 'userId'],
  properties: {
    text: { type: 'string' },
    user: { $ref: 'User', foreignKey: 'userId' },
    completedAt: { type: 'date' }
  }
});

Please can we have a keyThere/keyHere or belongsTo/hasMany pattern baked into @feathersjs/schema?
If we could have this, I will make it work natively with feathers-vuex/feathers-pinia and make my custom solution completely open source and documented.

I'm also perfectly fine with this approach from @DaddyWarbucks mentioned above:

{
  type: "object",
  properties: {
    user_id: {
      type: "string",
      feathers: {
         ... maybe some relationship stuff, maybe some ORM stuff, etc
      }
    }
  }
}

@daffl
Copy link
Member Author

daffl commented Jan 14, 2022

I'd still be hesitant to bake that in because in my experience declarative associations is what makes pretty much every ORM such a pain to use. I do think however, that it should be possible with what we already have to create a separate module with utility functions that return a resolver that does exactly that. Something like:

import { schema, resolve, Infer } from '@feathersjs/schema'
import { associate } from '@feathersjs/schema-associations'

const todoSchema = schema({
  $id: 'Todo',
  type: 'object',
  additionalProperties: false,
  required: ['text', 'userId'],
  properties: {
    text: { type: 'string' },
    user: { $ref: 'User' },
    completedAt: { type: 'date' }
  }
})
 
type Todo = Infer<typeof todoSchema> & {
  user: User
}

const todoResultResolver = resolve<Todo, HookContext>({
  properties: {
    user: associate({
      service: 'users',
      foreignKey: 'userId'
    })
  }
})

This utility function could also come with @DaddyWarbucks's batch loader functionality built in. I'd expect it to perform better than e.g. Mongoose with MongoDB or even Sequelize in some cases. It would also keep things cleanly separated. Your data schema should only really declare the properties, not where the data is coming from.

@DaddyWarbucks
Copy link
Member

DaddyWarbucks commented Jan 15, 2022

I actually am already using a concept similar to what @daffl is describing with the associate function. It looks like this

module.exports.loaders = function loaders(service) {
  return {
    load: (id, params) => {
      return (data, context) => {
        const [idKey, dataKay] = Object.entries(id)[0];
        const value = data[dataKay];
        if (value) {
          return context.params.loader.service(service).load(
            {
              [idKey]: value
            },
            params
          );
        }
      };
    },
    loadMulti: (id, params) => {
      return (data, context) => {
        const [idKey, dataKay] = Object.entries(id)[0];
        const value = data[dataKay];
        if (value) {
          return context.params.loader.service(service).loadMulti(
            {
              [idKey]: value
            },
            params
          );
        }
      };
    }
  };
};

And it is used like

const { loaders } = require('@lib');

module.exports = {
  user: loaders('api/users').load({ _id: 'user_id' })
};

Its doing basically exactly the same thing as @daffl describes, but just a bit different function signature that makes it very similar to the loader syntax.

@DaddyWarbucks
Copy link
Member

I can understand the desire to separate the relationship definition from the data schema. It keeps the schema pure and cross team/language/environment. That makes sense and is valuable. But, I do also believe that stuff has to be defined somewhere...in order to use resolvers and to convert schemas to ORM models, those relationships have to be defined at some point. Right now we are leaning towards "defining" them in function arguments to the functions that will map them to resolvers and ORM models.

// I don't love "defining" the relationship here
associate({
  service: 'users',
  foreignKey: 'userId'
})

What about some kind of RelationshipSchema? We would use what is now simply defined as a Schema as more of a "data schema" for validation, types, etc. But perhaps we also offer a RelationshipSchema as a standard way of defining relationships in Feathers. Then these utility functions that create resolvers/models would take an instance of this relationship schema as its argument. Then we could also similarly get some typings, documentation, etc out the relationship.

@daffl
Copy link
Member Author

daffl commented Jan 15, 2022

That is a valid point. I also understand why everybody seems to want this. The problem is that all those ORMs have been spending years arguing over this kind of thing and from everything I can tell not really come to a conclusion that has been widely adopted as... well, actually working. I feel like we'd be starting from scratch trying to find a format that can accomodate all the different ways all the different ORMs do their thing ™️ - and essentially end up writing our own ORM for ORMs.

My thought around this was to just leave that kind of thing to the ORM or database adapter you picked. For example with sequelize you'd have something like createModel(mySchema, sequelizeSpecificOptions). So it'd create the basic model from mySchema and then tack on whatever additional Sequelize options (like foreign keys etc.) you want.

@daffl
Copy link
Member Author

daffl commented Jun 5, 2022

Schemas are now available in the v5 prerelease and documented here. I am going to close this issue and we can create new ones for anything to follow up on.

@daffl daffl closed this as completed Jun 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants