Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json sum constructors as sibling record fields #25

Merged
merged 5 commits into from
Nov 28, 2014

Conversation

dsheets
Copy link
Collaborator

@dsheets dsheets commented Nov 4, 2014

This patch adds support for a new annotation, <json constr="json_object_field">, on record fields with sum types which indicates another field in the same JSON object to use as a sum constructor for this field.

This functionality requires ocaml-community/yojson#11 in order to defer parsing the sum's payload JSON string until the sum's constructor has been read.

Binding a record's constructor tag field is not required. This case is called "implicit" in this patch set and a special field name prefixed with 0jic_ is used to denote these fields and keep them separate from normal OCaml record fields (0 is not a valid indentifier start character).

The constructor field itself may be anything that serializes to a string and constructor name comparison is done via strings. This means that the constructor field type may be a polymorphic variant and the <json name>s of the constructors will be used for type-directed parsing of the constructor's payload. Nullary constructors may have their payload field omitted. Multiple payload fields may key on the same constructor field. Default tag and payload types are supported but optional tag and payload types are not.

I would like to proceed as follows:

  1. Discussion of the semantics and nomenclature of this feature (I don't really like constr.).
  2. Discussion of the various error conditions that can arise at run-time and how they should be handled.
  3. Discussion of this specific implementation of this feature.

If a future revision of this patchset is merged, I would be happy to write the documentation for this feature. This functionality is required for upcoming Events API support in avsm/ocaml-github.

Thanks for your time reviewing this work. :-)

@mjambon
Copy link
Owner

mjambon commented Nov 4, 2014

Thanks for putting efforts into this, it's going to be very useful.

I had some thoughts about this in the past. I remember being not clear on how broad our support should be, although different approaches could be implemented without conflicting, e.g. maybe sometime in the future allow a full inspection of the JSON AST before deciding which type it should have.

I'm going to take a look at your design and I'll get back to you.

@dsheets
Copy link
Collaborator Author

dsheets commented Nov 4, 2014

This patch only takes care of the product of sum case (existential field type) not the sum of product case (universal field types). I think the universal case is easier and perhaps more common where an object contains a field (e.g. "type") and that field is used to determine the type of all of the other fields. In my current terminology, I think this could be exposed as a <json constr> annotation on a sum type each with a record type payload (or parse a record like { "type": "null" } in the case of a nullary constructor).

@mjambon
Copy link
Owner

mjambon commented Nov 4, 2014

For clarity, could you please tell me which of the following cases is handled, and which is not:

(*
  Case 1 - no common fields, one generic field

  current atd definition:
*)
type 'a event = {
  type_: string; (* "t1" or "t2" *)
  data: 'a;
}

(* we want the following OCaml type: *)
type event = [ `T1 of t1 | `T2 of t2]
(*
  Case 2 - some common fields, one generic field
  such as https://stripe.com/docs/api#event_object

  current atd definition:
*)
type 'a event = {
  type_: string; (* "t1" or "t2" *)
  x: int;
  data: 'a;
}

(* we want the following OCaml type: *)
type event = {
  x: int;
  data: [ `T1 of t1 | `T2 of t2];
}

which is different from:

(*
  Case 3 - no common field, no generic field

  current atd definition
*)
type event = {
  ?t1: t1 option;
  ?t2: t2 option;
}

(* we want the following OCaml type: *)
type event = [ `T1 of t1 | `T2 of t2]

and both different from the messier case:

(*
  Case 4 - some shared fields, some type-specific fields, no generic field

  current atd definition
*)
type event = {
  type_: string; (* "t1" or "t2" *)
  x: int;
  ?y: int option; (* exists always in t2, never in t1 *)
}

(* we want the following OCaml types: *)

type t1 = {
  x: int;
}

type t2 = {
  x: int;
  y: int;
}

type event = [ `T1 of t1 | `T2 of t2 ]

or even worse, a hybrid of 3 and 4:

(*
  Case 5 - some shared fields, some type-specific fields, no generic field, no "type" field

  current atd definition
*)
type event = {
  x: int;
  ?y: int option; (* exists always in t2, never in t1 *)
}

(* we want the following OCaml types: *)

type t1 = {
  x: int;
}

type t2 = {
  x: int;
  y: int;
}

type event = [ `T1 of t1 | `T2 of t2 ]

@dsheets
Copy link
Collaborator Author

dsheets commented Nov 4, 2014

(*
  Case 1 - no common fields, one generic field

  current atd definition:
*)
type 'a event = {
  type_: string; (* "t1" or "t2" *)
  data: 'a;
}

(* we want the following OCaml type: *)
type event = [ `T1 of t1 | `T2 of t2]

This would be represented like:

type event_constr = [ `T1 <json name="t1"> of t1 | `T2 <json name="t2"> of t2]

type event = {
  data <json constr="type_">: event_constr;
}

There will be an intermediate record because the underlying JSON representation is an object (and more fields could be added or present in cases we don't model and would rather just ignore).

(*
  Case 2 - some common fields, one generic field
  such as https://stripe.com/docs/api#event_object

  current atd definition:
*)
type 'a event = {
  type_: string; (* "t1" or "t2" *)
  x: int;
  data: 'a;
}

(* we want the following OCaml type: *)
type event = {
  x: int;
  data: [ `T1 of t1 | `T2 of t2];
}

This would be represented like:

type event_constr = [ `T1 <json name="t1"> of t1 | `T2 <json name="t2"> of t2]

type event = {
  x: int;
  data <json constr="type_">: event_constr;
}
(*
  Case 3 - no common field, no generic field

  current atd definition
*)
type event = {
  ?t1: t1 option;
  ?t2: t2 option;
}

(* we want the following OCaml type: *)
type event = [ `T1 of t1 | `T2 of t2]

This is some horrible collision-prone schema of the dual case (sums of products) that isn't currently representable. Lots of schemas contain nonsense like this, though, so it should be supported in some shining future.

(*
  Case 4 - some shared fields, some type-specific fields, no generic field

  current atd definition
*)
type event = {
  type_: string; (* "t1" or "t2" *)
  x: int;
  ?y: int option; (* exists always in t2, never in t1 *)
}

(* we want the following OCaml types: *)

type t1 = {
  x: int;
}

type t2 = {
  x: int;
  y: int;
}

type event = [ `T1 of t1 | `T2 of t2 ]

This is currently (barely) representable but yields a different type scheme:

type event_constr = [ `T1 <json name="t1"> | `T2 <json name="t2"> of int]

type event = {
  x: int;
  y <json constr="type_">: event_constr;
}
(*
  Case 5 - some shared fields, some type-specific fields, no generic field, no "type" field

  current atd definition
*)
type event = {
  x: int;
  ?y: int option; (* exists always in t2, never in t1 *)
}

(* we want the following OCaml types: *)

type t1 = {
  x: int;
}

type t2 = {
  x: int;
  y: int;
}

type event = [ `T1 of t1 | `T2 of t2 ]

This is fairly insane (but obviously realistic, pragmatic, and deployed). It is not currently representable.

I believe all of the cases that can't be represented or can only be represented poorly can be fixed with the implementation of the dual sums of products construction. The feature that would be necessary is <json constr> annotation of specific constructors to select the (unique) field to use for type-direction. I believe this is dual to the products of sums feature that allows you to set the same record field as a constructor for multiple payload fields however this will result in multiple (possibly out-of-sync) variants. This would satisfy case 3 and maybe case 4. To satisfy case 4 generally, we'd need to compose sum-of-products with product-of-sums to key off the value in the type field if the relationship between t2 and y weren't unique. I actually think that case 4 is easier to support than case 3 because of this. Case 5 can be solved with the dual annotation required by case 3.

Something like this:

type t1 = {
  x: int;
}

type t2 = {
  x: int;
  y: int;
}

type event = [
  | `T1 <json name="t1"> of t1
  | `T2 <json name="t2" constr="y"> of t2
]

Where you now need to do a set union algorithm (sums) rather than the current topological sort (products) in order to compute a parsing schedule.

Of course, this dual relationship suggests another case:

(*
  Case 6 - some shared fields, some type-specific fields, some generic fields, a "type" field

  proposed atd definition
*)
type a_subset = {
  name: string;
  body: string;
}

type b_subset = {
  name: string;
  count: int;
}

type subset_constr = [
  | `A of a_subset
  | `B of b_subset
]

type event = {
  title: string;
  specific <json subset constr="type">: subset_constr;
}

(* we want essentially the same OCaml types but from parsing: *)
{ "title": "XOR and Peas", "type": "B", "name": "xor_and_peas", "count": 3 }
(* we get: *)
{ title = "XOR and Peas"; specific = `B { name = "xor_and_peas"; count = 3 } }

dsheets added a commit to dsheets/ocaml-github that referenced this pull request Nov 4, 2014
org is now the base type for users and orgs
user_type has been removed
organization and team types have been added
comment type hierarchy fleshed out
event constructors defined (except Deployment, DeploymentStatus, PageBuild and TeamAdd)
atd uses new <json constr> annotation from mjambon/atdgen#25
web_hook_config's insecure_ssl field changed to boolean now that yojson supports quoted booleans (>= 1.1.7)
hook types now use <json constr> for configuration (keyed on hook name, only "web" for now)
@mjambon
Copy link
Owner

mjambon commented Nov 8, 2014

Very good.

You mentioned that you don't like the name "constr". I'm not particularly opposed to it. Perhaps "discr" would be more fitting as it allows to discriminate between different cases.

I haven't looked into the implementation yet, I'm hoping to do that over the weekend. If I don't, please bug me.

@mjambon
Copy link
Owner

mjambon commented Nov 8, 2014

Secondary comments:

Case 3 could be implemented in the future as an alternative syntax to the current variants. The ATD definition would be:

type t = [ T1 <json name="t1"> of string | T2 <json name="t2"> of int ]

Legal JSON representations would include:

["t1", "hi"]   /* current */
{ t1: "hi" }   /* proposed */

For the other, more complicated cases, we may just give an easy access to the JSON AST and let a user-given function decide the type before returning it to the parser. At this point, I don't think optimal performance is a big issue anyway.

@dsheets
Copy link
Collaborator Author

dsheets commented Nov 9, 2014

Hmm... I hadn't considered discr and was thinking more like cons, tag, sum, or tcons. I'm not terribly concerned about the name so long as it is memorable for the feature and fits with the rest of the names in the project.

Regarding variants-as-objects, I'd like that annotation to support sibling fields to the constructor field at the least. I don't have a good idea about how to do that beyond treating a cons annotation on the variant type as a tag field and a cons annotation on a specific constructor as a field existence check (with a static uniqueness property to avoid ambiguity).

I had considered adding another annotation called parse or something which would be used to specify a constructor with a JSON AST payload that would be fallen back on when a parse error occurred or an unknown tag was encountered. There are some additional complexities with this approach which is why I removed my preliminary support. In addition to user-definable wrappers/views, I think this would provide a way to express the more complicated cases. With that said, I would really like to a see a coherent set of these sum-of-products and product-of-sum annotations available with high performance so atdgen can be used to describe JSON schemas which use these constructions to make up for the lack of sums in JSON's type vocabulary. For instance, we will soon be able to process much of the GitHub firehose archive http://www.githubarchive.org/ with ocaml-github and it would be nice if the standard features of the schema could be used without constructing huge intermediate structures which need to be garbage-collected almost immediately. On the other hand, I really don't have a good feel for the slowdown we're talking about until we have some perf numbers.

| None -> init_f
| Some _constr_i ->
let oname = field.ocamlf.Ag_ocaml.ocaml_fname in
`Block [ (* prepare to defer parsing *)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to avoid unnecessary indentation, use Inline` instead of Block`. Or return a list, which is the most convenient and the current convention (this code may predate it, and it's not very important anyway).

@mjambon
Copy link
Owner

mjambon commented Nov 16, 2014

Regarding the name for the annotation, I think I prefer tag over all the other suggestions because it's more likely to be understood by a reader who's not familiar with the feature.

@mjambon
Copy link
Owner

mjambon commented Nov 16, 2014

The feature makes sense to me. The implementation looks good to me too.

Support for more exotic cases can be added as needed in the future, I don't think there's a need to design solutions for these today.

If you're good with "tag" instead of "constr", I'll let you change that. Then I'll merge the branch and give you write access to the manual, which now has its own repository (https://github.com/mjambon/atdgen-doc). I'll also make you a member of atdgen, yojson, and friends, which makes it easier to collaborate on large feature branches.

@avsm
Copy link

avsm commented Nov 16, 2014

Very much looking forward to using this!

@mjambon
Copy link
Owner

mjambon commented Nov 28, 2014

I'm merging the branch. Expect the name in the annotation to change from "constr" to something more descriptive like "tag_field".

mjambon added a commit that referenced this pull request Nov 28, 2014
json sum constructors as sibling record fields
@mjambon mjambon merged commit 62b4144 into mjambon:master Nov 28, 2014
@dsheets
Copy link
Collaborator Author

dsheets commented Nov 28, 2014

Thanks, Martin! I've been totally hosed recently with other work (doc gen...) but I will come back to this eventually (~weeks). Documenting the feature shouldn't be more than a couple hours so I will commit to this. Sorry I dropped the ball on changing the name of the feature. :-/

Thank you, again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants