Skip to content

Latest commit

 

History

History
307 lines (238 loc) · 11 KB

worklog.org

File metadata and controls

307 lines (238 loc) · 11 KB

Worklog

This is a working surface and nothing in here should be considered documentation.

Parsing

AMAL is supposed to be markdown for the static parts, and use a custom syntax for the dynamic parts. How can we accomplish that?

Parse markdown and then parse AMAL

We could parse markdown to a tree using the existing Racket markdown reader, and then re-parse the parse tree, inserting AMAL nodes.

Pros:

  1. Split up the parsing steps. This allows me to not have to think about markdown syntax in my reader.

Cons:

  1. All AMAL blocks have to be within markdown blocks? It seems like it would get really complicated to try to figure out parsing them across multiple markdown blocks

What do we do about a situation like this:

Where I have some markdown **syntax $wiggle{interspersed with** `AMAL`}

What should the parse-tree look like in that situation?

What does markdown do about similar situations? Seems like it doesn’t really work:

;; Putting markup entirely within other markup works fine
tokenizer.rkt/test> (define xs (parse-markdown "How do we handle *interspersed **text in** here?*"))
tokenizer.rkt/test> xs
'((p
   ()
   "How do we handle "
   (em () "interspersed " (strong () "text in") " here?")))

;; But interspersed markup is messed up
tokenizer.rkt/test> (define xs (parse-markdown "How do we handle *interspersed __text* in__ here?"))
tokenizer.rkt/test> xs
'((p
   ()
   "How do we handle *interspersed "
   (strong () "text* in")
   " here?"))

I also tested on https://markdownlivepreview.com/, with the same results. So maybe we just say it isn’t possible in AMAL?

In that case we could Parse markdown and then parse AMAL That’s still kind of nasty though, right? We’d rather not parse twice if we can help it.

We can’t go all the way jsonic because jsonic compiled to json. We want to compile to HTML, not markdown.

Compile AMAL to markdown, then compile markdown

Okay, we still don’t love two passes of parsing, but I’m on a very limited schedule here. What if we compile AMAL to markdown, replacing the AMAL markup with HTML and associated CSS, and then pass that on to be compiled as markdown? This lets us mostly do the jsonic thing, where we just pass along every character that’s not relevant to AMAL markup. HTML should pass through markdown. Let’s test that here to be sure.

> (define xs (parse-markdown "How does markdown handle *inline <a href='https://www.w3schools.com/html/'>HTML</a>, huh?*"))

> xs

'((p () "How does markdown handle " (em () "inline " (a ((href "https://www.w3schools.com/html/")) "HTML") ", huh?")))

Wow, it converts it to xexpressions! That makes sense. Going a little further and rendering it to HTML gives us:

 > (display-xexpr `(html () (head ()) (body () ,@xs)))

 <html>
   <head></head>
   <body>
     <p>
	How does markdown handle <em>inline <a href="https://www.w3schools.com/html/">HTML</a>, huh?</em>
     </p>
   </body>
 </html>

Looks good! How about markdown within an HTML element?

> (define xs (parse-markdown "How does markdown handle *inline <a href='https://www.w3schools.com/html/'>__HTML__</a>, huh?*"))

> xs

'((p () "How does markdown handle " (em () "inline " (a ((href "https://www.w3schools.com/html/")) (strong () "HTML")) ", huh?")))

> (display-xexpr `(html () (head ()) (body () ,@xs)))

<html>
  <head></head>
  <body>
    <p>How does markdown handle <em>inline <a href="https://www.w3schools.com/html/"><strong>HTML</strong></a>, huh?</em>
    </p>
  </body>
</html>

Just as we’d expect! Let’s go with this method.

What lives in the tokenizer, what lives in the parser?

We are going to compile AMAL to html, then pass it along to get compiled as markdown. Given that design, how do we divide the work between the tokenizer and the parser?

We really only care about identifying AMAL expressions. Everything else can just get passed along. But is there a problem if we mis-tokenize parts of AMAL expressions out of context? For instance, let’s say our grammar is this:

#lang brag
a-program : (a-char | a-expr)*
a-char : CHAR
a-expr : a-expr-name [a-expr-args] a-expr-body
a-expr-name : EXPR-NAME
a-expr-args : "(" [EXPR-KEY ":" EXPR-VALUE ","]* EXPR-KEY ":" EXPR-VALUE ")"
a-expr-body : EXPR-BODY
  • How do we differentiate EXPR-NAME from EXPR-KEY in the tokenizer?
  • Should EXPR-VALUE be typed some how (uh oh)?

What if we just tokenize everything into whitespace delimited strings. Then all of the EXPR-<whatever> become WORD

#lang brag
a-program : (a-char | a-expr)*
a-char : CHAR
a-expr : a-expr-name [a-expr-args] a-expr-body
a-expr-name : "$" WORD
a-expr-args : "(" [a-arg-key ":" a-arg-value ","]* a-arg-key ":" a-arg-value ")"
a-arg-key : WORD
a-arg-value : WORD
a-expr-body : "{" WORD+ "}"

This kind of makes sense right? We still have the problem where a-arg-value should probably be typed or something, but that sounds like a problem for a future language update.

Wait then we have a problem with what all of the symbols get lexed as… I guess they can just get lexed literally and passed on to markdown if they’re not part of an AMAL expression.

Ugh… it’s not working.

Really all I want to do is expand AMAL expressions within markdown and add the associated animation styles to the top of the string. That’s all! What’s the simplest way to do that?

Ultimately what do we want the parse tree to look like? Maybe something like this:

(a-program (a-markdown "<STUFF>")
	     (a-expr (a-expr-name "<name>")
		     (a-expr-args (a-arg-key "<key>")
				  (a-arg-value "<value>")
				  ...)
		     (a-expr-body "<STUFF>"))
	     (a-markdown "<MORE STUFF>"))

But how do we differentiate the stuff within the AMAL blocks from stuff in markdown blocks? If I wasn’t trying to model this as a /”language”/ I would know how to do it… Maybe I just handle the AMAL parsing in the expander. Is that crazy? Maybe not. That way we really can do it like jsonic, but we just have to actually handle the AMAL expressions instead of letting racket handle them.

I spent some more time trying to actually parse this into usable stuff. Let’s stop. Full jsonic, all the way.

Oh yeah, that doesn’t let us nest expressions…………………………………….. What am I going to do?

Idea I had this morning:

  • Use mirrored delimiters
  • Tokenize to delimiters and words
  • Require escapes for literal delimiters

Obviously I’ve thought this before, but I think this would work! At the expense of breaking existing markdown with unescaped delimiters but whatever. I have to finish the damn parser…

WHAT THE FUCK DOES “ENCOUNTERED PARSING ERROR” MEAN!?!?!?!??!?!?!?!?

Guess my whitespace was wrong in the parser…

Note to self: comment out the real parse rule attempt and build up the rule incrementally to debug

I think I can make a programming language out of this!

Expansion

So we need to implement these macros:

  • a-program
  • a-expr
  • a-expr-name
  • a-expr-args
  • a-expr-body
  • a-escaped-at
  • a-word
  • a-whitespace
  • a-colon

(NOT a-at, a-arg-key, or a-arg-value. All of those are pruned from the parse tree)

a-colon, a-whitespace, a-word

These are all just passed along literally for markdown parsing

-> actually I can parse all of these to a-literal!!

a-escaped-at

This becomes an @ and then is passed along literally

a-expr

This is the meat of the language.

We have to:

  1. Place the keyframes definition named a-expr-name for the animation at the top of the file in a style block
  2. Wrap a-expr-body in some block element that uses the animation
  3. Change any args supplied in a-expr-args

a-expr-name

We try to map this to the library of animations. Hopefully throw a helpful error if it can’t be mapped.

a-expr-args

We need to map these either to arguments for the animation in the HTML element, or arguments in the keyframes (if that’s possible) and change the default values to the passed values.

a-expr-body

I don’t know if this does anything…

a-program

I guess this runs the markdown parser and prints the whole thing as HTML?

Back at it again on <2023-06-06 Tue>

Maybe we can just splice the a-expr-* forms and handle everything in a-expr? Is that inelegant? Seems like those subforms only really make sense in the context of a whole expr.

That was pretty easy. And a nice way to get myself back into the codebase. Thank god for all the tests I wrote.

Back at it again on <2023-10-15 Sun>…

Where were we… I want to make this language by the end of the year. That would be cool.

Evening of <2023-10-29 Sun>

On productivity in this project

Feeling like one thing that prevents me from making good progress on this project is that I don’t know the language (Racket) that well. Which is true, this is basically (besides the projects in Beautiful Racket) my first Racket project. Someone smart once said you should either learn a new tool or do a new project, but it’s too hard to do a new project while learning a new tool. I don’t know. Maybe that’s an excuse. I think I should stick with this at least until the end of the year. But maybe then I should read Crafting Interpreters and write along in C/++. Something I know better and arguably more useful.

Parser problems

I think one of the issues I’m having is that nothing is solid. I’m running into issues with my expander, but they seem to stem from the parser. I go back and update the parser, but now it doesn’t work for reasons I don’t really understand. It’s too abstract to understand. I think that’s a valid excuse! I actually just don’t understand it, and the point is for me to understand! I’m actually not becoming an expert this way! Hmm… Or is this just an excuse for changing projects? This is definitely not bringing me the joy of the business card project. Nor even the make-a-face project.

Okay, the point is understanding. The point is learning. The point is actually not the product. So, to address the immediate issue I need to learn more about how BRAG parses. But this is part of the issue, there aren’t that many resources on BRAG. Or, maybe I need to learn more about how to write a good grammar. I’m sure there’s a simple way to write my grammar that BRAG wouldn’t have a problem with. Or even a grammar that I like better (why did I let the implementation difficulty change my grammar?). Or maybe my lexer is causing me problems, I don’t know. The point is, I need to understand better.

End of work

Man, it just hurts. The error messages suck, I don’t understand the language, I don’t understand the libraries. I want to read Crafting Interpreters and write in C. Ugh…