Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

be able to resolve iglu schemas in validator #210

Open
hamnis opened this issue Oct 24, 2022 · 8 comments
Open

be able to resolve iglu schemas in validator #210

hamnis opened this issue Oct 24, 2022 · 8 comments

Comments

@hamnis
Copy link
Contributor

hamnis commented Oct 24, 2022

The underlying library support uri resolving for custom schemes.
Attached is a possible patch that can enable this feature

circevalidator.patch

@istreeter
Copy link
Contributor

This is a very interesting idea! If I understand it right, this means we can have schemas with properties like this:

"properties: {
  "xyz": { "$ref": "iglu:com.snowplowanalytics.iglu/anything-a/jsonschema/1-0-0" }
}

Until now, I have always discouraged people from using the $ref keyword in iglu schemas. And it was for three reasons:

  1. If the ref is a non-versioned url (e.g. http://example.com/path/to/schema) then it bypasses Iglu's strict versioning rules. It is a principle of Iglu that schemas are static once published, and that means we should not allow them to point to just any url.
  2. Snowplow's repo schema-ddl contains the code we use for converting from json schemas into warehouse table definitions. Currently that repo does not support the $ref keyword. It means if a schema uses $ref then for events in a Snowplow pipeline, the fields will not get loaded into the warehouse.
  3. When json-schema-validator looks up a $ref, it blocks the jvm thread. In your patch, this happens on the line where you call dispatcher.unsafeRunSync(...). In Snowplow apps, which are required to be high throughput, we have to be very careful about which threads get blocked and when.

Your patch addresses problem 1, and I think that is a huge improvement. It is a big step in the right direction.

But I think we need to properly address problems 2 and 3 before we can claim that Iglu fully supports this type of external lookup.


A couple of smaller implementation comments:

I don't think you need the ValidatorF trait. I think your new validator implementation could simply implement the old Validator trait.

If we add this feature, I would prefer to amend the old validators instead of adding a new validator. It would be much simpler if we can say that all iglu validators support external refs, instead of saying that some do and some don't depending on how you use it. This might need a breaking change to the code, so it would be a new major release of the library (version 3).

@voropaevp
Copy link
Contributor

If iglu-client substitutes $ref for its contents when json body is returned, it would solve the point 2.

@istreeter
Copy link
Contributor

@voropaevp that's a good idea. It would solve the point 3 too, because if we do it in the resolver then we have better control over how threads/fibers get used.

@voropaevp
Copy link
Contributor

Implementation would need to be careful about resolution depth and possible loops.

@hamnis
Copy link
Contributor Author

hamnis commented Jan 19, 2023

A way to fix the blocking issue is to change Sync[F].delay to Sync[F].blocking or Sync[F].interruptable if this is a cancelable operation.

@stanch
Copy link

stanch commented Jan 20, 2023

Hi @hamnis, I like how this brings us closer to JSON Schema compatibility, although it would be interesting to understand the use cases behind this feature. That would help us think through if any changes are needed in our UI flows as well.

I assume you’d like to reuse the same “subschema” in a set of schemas? Would this be primarily for event schemas or for entity schemas? Could you give some examples? In particular, if the goal is to add the same “subschema” to multiple event schemas, it’s already possible just add that “subschema” as an entity/context to all such events. But I imagine there is something you prefer in the $ref approach?

@hamnis
Copy link
Contributor Author

hamnis commented Jan 20, 2023

This would be reuse in other entity schemas.

A particular use case is that we have a concept called Plugs, each plug displays a Content item.
There may be multiple plugs visible plugs on a page, which also May represent a Content item.

We would typically attach a set of plugs to an impression event.

For now we have embedded the Content definition within our Plug definition.

I can see other usecases, but this is what we currently have.

@marcin-j
Copy link

marcin-j commented Apr 4, 2023

Hey!

Any updates on this?

Our use cases:

  1. Attach same set of reusable, required fields to events. I know we could use contexts, but we would really like to have those fields as non-optional and contexts are actually optional AFAIK.
  2. Be able to extend events with additional fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants