-
-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specification for AST #95
Comments
Another alternative way to define a schema that would become JSON is to use Cue: https://cuelang.org/docs/usecases/datadef/ |
Here's a definition of the AST using TypeScript notation: https://github.com/jgm/djot.js/blob/main/src/ast.ts#L4 I don't know if that's the sort of thing you had in mind, @matklad |
Yup, that looks lovely! I would suggest adding some form of that to https://djot.net, to:
In terms of specific things:
interface Section extends HasAttributes, HasChildren<Block> {
tag: "section";
}
interface Section extends HasAttributes {
tag: "section";
children: Block[];
} (and we do that for
|
Is there now? So this works like the LaTeX itemize environment rather than like HTML definition lists or Pandoc/Markdown definition lists? Then maybe it should have another name ("itemiz{e,ation}"?) even if it is rendered with |
That makes sense. I'm also up for putting it on the website, but I want to fine-tune the AST a bit first.
Not exactly. Bullet lists, for example, simply don't have a We could have separate types for OrderedList and BulletList, as pandoc does in its AST. I don't know if that would be better. I was thinking of making DefinitionList its own type. (And maybe TaskList.)
Yes, these are a bit weird and I'd been thinking of consolidating them. We do want to keep both the original text (e.g. straight quote) and an annotation like
Probably should be, yes. It was originally Symbol but then I realized this is a native JS type.
Is there any way to enforce this in the types? I'm a bit unhappy about this one, as well as the way we include a Caption as one of the children of a table, along with the Rows. One could make a case for something like
But with the current system
Yes, I think I added them recently because I needed handlers for them in the @bpj the way definition lists currently work, there can only be one definition (it's just everything after the first paragraph, which is treated as the term). I think that's probably okay for most purposes. Segmenting into multiple definitions would require a different syntax; if this is desirable, we should open a new issue to discuss it. |
Is there a way to leverage the typescript type checking to produce a program that will validate a JSON document for conformity to the AST? The |
0.7 confidence, but, as far as I know, not really. You need to write “deserialization” code yourself, and, last time I looked, lsp impl for vscode (which has the same problem) did just that. TS type system is fully static, there’s nothing in compiled code to do runtime validation. Two bad options are:
|
That LSP thing:
https://github.com/microsoft/vscode-languageserver-node/blob/c91c2f89e0a3d8aa8923355a65a2977b2b3d3b57/types/src/main.ts#L224
…On Monday, 2 January 2023, John MacFarlane ***@***.***> wrote:
Is there a way to leverage the typescript type checking to produce a
program that will validate a JSON document for conformity to the AST?
The djot CLI tool in djot.js will read -f ast, but it will happily accept
a malformed one.
—
Reply to this email directly, view it on GitHub
<#95 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANB3M2EKZDSENYVACPQYS3WQNDHTANCNFSM6AAAAAAR6SMS2A>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Argh, was afraid of that. I'm used to Haskell which is more serious about its types. |
OK, figured out how to validate using |
validate.js: const fs = require("fs");
const Validator = require('jsonschema').Validator;
const v = new Validator;
const instance = 4;
const schema = JSON.parse(fs.readFileSync("djot-schema.json", "utf8"));
const input = JSON.parse(fs.readFileSync("/dev/stdin", "utf8"));
let errs = v.validate(input, schema).errors;
if (errs.length === 0) {
console.log("Valid");
process.exit(0);
} else {
for (let i in errs) {
let err = errs[i];
console.log(err.stack);
}
process.exit(1);
} |
Yes, see above, I'd already tried typescript-json-schema and it seems to work. |
I find this indispensable when writing/tuning JSON schemas: |
uhu, and that’s why I think it makes more sense to TypeScript for the spec: that’s much more readable. Though, we should have JSON schema as well, because a) people would ask for that b) it accumulated a bit more tooling on top. |
True JSON Schema gets hairy pretty quickly if you want to be more specific, but such is the price for precision in any language: the more precise the more conditions. I would agree that JSON Schema is a bit on the verbose side. Its way of referencing definitions in the same schema in particular is annoyingly verbose! I actually cheat by writing my schemas in YAML and using my own interpolation engine — e.g. ⁅name⁆ gets expanded to |
Forgot to say I agree there should be a JSON schema because of its greater portability. |
The approach I outline above, using |
Not sure if this is the right place, but is it a goal for djot to move towards a more-or-less full representation of pandoc's AST? i.e. is djot to pandoc AST what asciidoc is to docbook? |
No, djot's AST is djot-specific. However, it is possible to convert between djot's and pandoc's ASTs. |
IMO the conversion to Pandoc AST should wrap non div/span elements with attributes in a div/span which holds the attributes, as I believe Pandoc does with commonmark_x. @jgm would an issue (or even a pull request) for this be welcome? |
sure. |
This still wouldn't give us lossless conversion, unless we adapted a convention like adding a "wrapper" class to the div, so it could be recognized and stripped off in converting from pandoc AST to djot. |
The wrapping could be made optional. |
TL;DR: AST should be specified in the reference. I think the best way to do that is via TypeScript type notation.
I've noticed that markup languages can fail extensibility in two ways:
The example of the latter is AsciiDoctor. Although, like djot, it has a generic block structure on the syntax level, the way to extend AsciiDoctor is by writing plugins against specific asciidoctor implementation. Thus, you get extensions of a particular toolchain, not extensions of particular syntax.
I think the way to combat that is to specify AST structure which must be common across all implementations. That way, if extensibility is expressed as
AST -> AST
transform, you can mix and match readers, filters, and writers (provided that AST can be serialized as data).This I think is a somewhat underappreciated idea, so my primary goal here is, by having an "here's the AST" section in the reference, to encourage people to implement djot tools in terms of AST, so that things like
djot_parser_in_rust paper.djot | djot2pdf_in_haskell
just work. The secondary goal is of course to make sure that separate implementations agree not only on the HTML, but on the AST as well.How do we define AST? I think "ast is JSON" is a good start. JSON is ubiquitous, and is a good match for "scripting" languages, which I think are most natural for doing filters and writers. The problem with JSON is that, as far as I know, there's no uncontroversial way to specify or "type" JSON.
The official answer is JSON Schema, but it's objectively unfit for human consumption. What I've found to work much better in practice are just TypeScript definitions (this comes from my experience with LSP). So, practically, I would consider adding
djot_ast.d.ts
file with a reasonable subset of TypeScript as a part of the spec, along these lines:The text was updated successfully, but these errors were encountered: