Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code-generate a golang semantic layer on top of arbitrary JSON-LD contexts (Formerly: compile time) #48

Closed
ppwfx opened this issue Jun 9, 2018 · 12 comments
Assignees
Milestone

Comments

@ppwfx
Copy link
Contributor

ppwfx commented Jun 9, 2018

Hey,

I had a look into vocab.go. On my machine it adds about 40s of compile time.

I have a couple of questions and ideas how to reduce the file size, but Im not exactly sure how they impact the compile time actually.

Why are there interfaces for all the types? Are there more than one implementation for them?

What about adding more generic interfaces, like Serializer and Deserializer?

e.g.

type AltitudeTrait interface {
	IsAltitude() (ok bool)
	GetAltitude() (v float64)
	SetAltitude(v float64)
	IsAltitudeIRI() (ok bool)
	GetAltitudeIRI() (v *url.URL)
	SetAltitudeIRI(v *url.URL)
}

What about dumping all the types that can be represented as an Object (like Note, Page)?

@cjslep
Copy link
Member

cjslep commented Jun 9, 2018

Thanks for filing this!

Yes vocab is a very heavyweight library. It also adds a bit of binary bloat as well, unfortunately.

I've been thinking on and off about what a 1.0 implementation would look like. So far the design choices I've imagined are:

  • It would have iterators instead of requiring indices.
  • Remove would be smart enough to remove the correct type that that index.
  • Property ("trait", as you mentioned) oriented interfaces.

The original idea for why interfaces like vocab.ObjectType exist is to:

  • allow other implementations to conform to it and be supported by pub
  • allow passing around these 4 top level types: vocab.ObjectType, vocab.LinkType, vocab.CollectionType, vocab.OrderedCollectionType

Now those reasons have turned out to be fairly brittle. The first one would make sense if they were defined in the pub package (they are not). The second is non-obvious, so for things like Create and Tombstone detection (ActivityType and ObjectType respectively) the actual type property has to be inspected.

All the above would be fixable in a major release, in 1.0.0.

Luckily, the compilation time & resource issue may be addressable in major version 0.

The bulk of the compilation time by far should be the fact that the library generates intermediate types to encapsulate that ActivityPub permits a property to be an IRI, value, Object, or array. And in the case of an array, it applies recursively. So under most conditions, a property is stored as this intermediate type.

The way I initially implemented intermediate types was incredibly naiive: every property for every type gets its own intermediate value. This means for N properties and M types, instead of N intermediate types, there's N*M. This should be fixable without any major interface changes. This may also solve #42. This is definitely something on my radar, but haven't tacked yet because:

  • I've prioritized getting pub is as compliant and tested as possible.
  • The code-generation code is unpleasant. It's not something I'm proud of, I need to break parts of it out into more utilitarian libraries, and it has a lot of edge cases built up over time as I learned about the ActivityStream spec. It's a product of my active on-the-fly learning rectifying ActivityStreams with the Go language. Touching this code is high cost, mentally.
  • Because of the above difficulty and finite time, I've set its priority rather low at the moment. It may be worth refactoring the code-generation code before tackling this. However, that turns the effort into high-cost mentally and physically.

If you want me to tackle this sooner, please say so. The only 2 issues not from me on this library are about this topic, so I am now very receptive to prioritizing this highly.

Finally, FYI there are vocab.Serializer and vocab.Deserializer interfaces, that all types implement.

@ppwfx
Copy link
Contributor Author

ppwfx commented Jun 9, 2018

Touching this code is high cost, mentally.

Haha yeah, that's what I thought when I looked into the gen part. Maybe just use text\template.

allow other implementations to conform to it and be supported by pub

It would indeed be really nice to have a highly abstract representation. Maybe even something that is capable of generating the representation from JSON-LD specs. This is something Im very open to collaborate on.

Finally, FYI there are vocab.Serializer and vocab.Deserializer interfaces, that all types implement.

yeah I saw that, that's why I mentioned them (in an ambiguous way tho)..

I don't want to implement anything on top of ActivityPub right now. I simply want to support the overall adaption for now. So no need to change your prioritisation.

I think you didn't answer this question

What about dumping all the types that can be represented as an Object (like Note, Page)?

And how would a developer be able to add custom types?

@cjslep
Copy link
Member

cjslep commented Jun 9, 2018

The file tools/defs/defs.go is probably what you are looking for. It is my hand-captured details on all the types and properties defined in the ActivityStream spec. At the time, I had enough on my plate just learning about ActivityStreams and ActivityPub that I didn't want to also add learning the nuances of JSON-LD in the first get-go. But this file and its data structures would be the one to replace to be able to read directly from a JSON-LD context to generate definitions. Unfortunately, the code generation methods are directly tied to these data structures. Also, JSON-LD specs introduced a chicken-and-egg (bootstrapping) problem: I would have needed to read in JSON-LD for the code-generation library which was generating the code to do the reading-in.

And how would a developer be able to add custom types?

Right now, it requires modification of tools/defs/defs.go, and then rerunning go generate in the vocab and streams subpackages. To modify the file, add new extended types and properties, then add those new types and properties to the AllExtendedTypes and AllPropertyTypes. The go generate binary should then pick up those types and generate the implementation code for them. Note that the values of the properties can leverage existing ones (like xsdStringValueType), but if you need new values to store which requires new serialization/deserialization constraints, then you may need to add the value type and insert it into AllValueTypes too so the library knows how to do so. Finally, there's an init function towards the end of the file that hooks up which properties belong to which type. Should only be necessary if subtypes define new properties not held by the parent types.

What about dumping all the types that can be represented as an Object (like Note, Page)?

The tools/defs/defs.go has types that Extends so it should be straightforward to iterate through AllCoreTypes and AllExtendedTypes and check the chain of Extends up to objectType.

I understand this is a bit of a complex set up. I don't think this is the setup I would use a second time. However, my logic when I first set out to do the implementation was along the lines of "It's OK if this is difficult because I don't want ActivityPub apps creating new types willy-nilly". I had no idea that the passion for federating git servers would come so quickly nor so strongly, so in hindsight it probably was a wrong decision.

I'm happy for collaboration; I just haven't sat down to create a CONTRIBUTING doc yet. I can do so. The full elegant end-to-end JSON-LD-to-code solution probably looks like a massive refactor or rewrite of the code-generation code (since the bootstrap problem is now solved, but not yet transitioned off the bootstrap solution).

Would you be fine if I split out the vocab interface aspect of this issue to a separate one (which would be addressed in major version 1) and kept this one focused on improving the compile time for the existing implementation (doable in major version 0)? If you would like to file a new issue for the new generation method from JSON-LD context definitions, we can continue that conversation there as well. Since the generated code would keep API compatibility, it is also technically doable for major version 0.

@zet4
Copy link

zet4 commented Jun 9, 2018

I am also interested in direct JSON-LD to Go generation.
It might also be worth looking into using https://github.com/dave/jennifer for the actual code generation.

@ppwfx
Copy link
Contributor Author

ppwfx commented Jun 10, 2018

Also, JSON-LD specs introduced a chicken-and-egg (bootstrapping) problem: I would have needed to read in JSON-LD for the code-generation library which was generating the code to do the reading-in.

I think the initial step could be handled with github.com/kazarena/json-gold

Would you be fine if I split out the..

I really don't want disrupt your workflow and I guess the generation process will be kinda experimental in the beginning. It will take a while until we agreed on things, as there are multiple ways on how to implement it.

Since the generated code would keep API compatibility

That's the question. Your interfaces are very implementation specific, which is totally fine, but in order for others to be able to write custom en/decoding it would be nice to have the highest possible abstraction. Which is something we might not even need interfaces for.

Let's say you want to iterate over a list of links. In a very abstract fashion that could be done like so

for _, item := range actor.Followers {
   println(item.Link.Href)
}

Independent of if the Link was encoded as a IRI, string or might even object initially. Which would be something that would be handled by an en/decoding implementation.

@cjslep
Copy link
Member

cjslep commented Jun 10, 2018

It might also be worth looking into using https://github.com/dave/jennifer for the actual code generation.

Great suggestion @zet4! I will definitely look into this library.

I think the initial step could be handled with github.com/kazarena/json-gold

Another good suggestion, @21stio. I did look into that initially, however from what I could tell it is meant to operate on JSON-LD data and transform it using some known algorithms. It does not solve the semantic layer problem. That is where the vocab code-generating library comes into play; rather than try to understand JSON-LD, I basically hardcoded the context into a data format that is understandable by the code generation ("The data format has semantic meaning").

Now that the code generation can create semantic layers, it could be applied itself on JSON-LD, which may require a preprocessing using json-gold to create a normalized (some kind of "expanded") context. If that's even the best path forward

I really don't want disrupt your workflow and I guess the generation process will be kinda experimental in the beginning. It will take a while until we agreed on things, as there are multiple ways on how to implement it.

That's fine, since this issue has turned out to be very focused on a feature I can only best describe as "Code-generate a golang semantic layer on top of arbitrary JSON-LD contexts" I will rename it as such.

I'll break out the compile time and intermediate fix into another issue.

I'll add the API bit as another issue. I've addressed it below, but would welcome tackling it after this problem is solved (as the API would be a subset of the API being code-generated).

Finally, will add an issue to add a CONTRIBUTING doc to this project.

That's the question. Your interfaces are very implementation specific, which is totally fine, but in order for others to be able to write custom en/decoding it would be nice to have the highest possible abstraction. Which is something we might not even need interfaces for.

The interfaces are needed only for the pub package. Unfortunately, because ActivityPub imposes some basic constraints such as:

  • Auto-wrapping Object with Create activities
  • Knowing about to, bto, cc, bcc on Activity to perform delivery
  • Ability to strip bto, bcc from Activity before serving it
  • Copying around audience, to, etc. Between an Activity and Object
  • Auto-creating Accept and Reject activities
  • Creating, copying data to, and specially serving Tombstone
  • etc

The "most abstract interface" for someone wanting to still leverage the pub library is still quite demanding. I agree though that in its current form, it is not the most abstract it can be.

@cjslep cjslep changed the title compile time Code-generate a golang semantic layer on top of arbitrary JSON-LD contexts (Formerly: compile time) Jun 10, 2018
@cjslep
Copy link
Member

cjslep commented Jun 10, 2018

I'll break out the compile time and intermediate fix into another issue.

Refocused #42.

I'll add the API bit as another issue. I've addressed it below, but would welcome tackling it after this problem is solved (as the API would be a subset of the API being code-generated).

Created #50.

Finally, will add an issue to add a CONTRIBUTING doc to this project.

Created #51.

@cjslep
Copy link
Member

cjslep commented Jun 10, 2018

Quick summary after adjusting the issues.

The current system is:

  1. JSON-LD Context Normalization - Skipped
  2. JSON-LD Context Semantic Layer - Bypassed, hardcoded in tools/defs/defs.go
  3. Code Generation for ActivityPub Types' Semantic Layer - Implemented in tools subdirectory
  4. ActivityPub Types' Semantic Layer - vocab package

The next step is to scope what areas and identify what specific problems should be solved as part of this effort. I welcome everyone's input.

Note that if this gets sufficiently generic, I am not opposed to breaking it out of the activity package.

@ppwfx
Copy link
Contributor Author

ppwfx commented Jun 24, 2018

I think I spotted a little issue in the way type is unserialised. The normalised version of ActivityPub turns type into @type.

By now I think decoding is impossible without normalising the input before.

@cjslep
Copy link
Member

cjslep commented Dec 17, 2018

FYI, this is a part of the v1.0.0 release.

To quickly summarize, I've been busy creating a different version of the code-generation tool that will be able to take in a special @context JSON-LD definition file (and any other files it needs to resolve dependencies, including the ActivityStreams spec) in order to generate golang code that will automatically serialize/deserialize the RDF types and have a saner golang API than the current tool creates.

@cjslep
Copy link
Member

cjslep commented Dec 17, 2018

In a previous comment I mentioned:

The current system is:

  1. JSON-LD Context Normalization - Skipped
  2. JSON-LD Context Semantic Layer - Bypassed, hardcoded in tools/defs/defs.go
  3. Code Generation for ActivityPub Types' Semantic Layer - Implemented in tools subdirectory
  4. ActivityPub Types' Semantic Layer - vocab package

What it will look like now:

  1. JSON-LD Context Normalization - Still skipped (use better RDF tooling to transform the @context file as demanded by the tool)
  2. JSON-LD Context Semantic Layer - NEW in v1: Use files from step 1 to autogenerate ActivityStreams extensions (ex: ForgeFed, ValueFlows, MoodleNet, ...)
  3. Code Generation for ActivityPub Types' Semantic Layer - Changed in v1: Much more maintainable code for others who may want to poke and/or prod at this tool
  4. ActivityPub Types' Semantic Layer - Changed in v1: The APIs and resulting packaging layouts will be different for a saner experience and begin addressing the large binaries issue.

@cjslep
Copy link
Member

cjslep commented Jan 26, 2019

The latest merge into tools/exp does this now.

@cjslep cjslep closed this as completed Jan 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants