[WIP] migrate to typescript #16

ChristianMurphy · 2020-09-10T00:42:42Z

Micromark has a number of complex/opaque types used in it's internals.
This attempts to document and validate these types using TypeScript.

In particular some complex and currently undocumented types include Events, Token, Effects, ok, and nok.

In addition, micromark's current usage pattern includes directly accessing internal files, meaning that for TypeScript users, most (if not all) files will need types.

lib/types.ts

wooorm · 2020-09-10T08:37:26Z

Awesome! I see this effort as useful, as the current code has only seen my eyes and is young, so there’s much that can be better!
Working on types will undoubtedly catch a bunch of things I missed!

Regarding this project as TS, well, we talked about our opinions on TS a lot already, I’m not really for it, in short:

I feel strongly that the bundle size of mm should be as small as possible. The version of mm that lived here before was written in TS and while it only supported a couple of markdown constructs, it was already over the size that mm is now. This may indicate that a project in TS is bigger than a project by hand in JS. (Practically, for ESM the scripts would also need to be fixed)

I also see that less folks understand TS compared to JS (including myself). Especially important for folks who want to build extensions and look at this project for examples.

Any reason to not do the jsdoc approach?

lib/types.ts

wooorm · 2020-09-10T09:17:46Z

lib/types.ts

+  hooks: {
+    [key: string]: unknown
+  }
+  flow: (something: unknown) => unknown


Some more:

micromark/lib/parse.js

Line 98 in 307e0a5

content: create(initializeContent),

These are the available tokenizers:

micromark/lib/util/create-tokenizer.js

Line 13 in 307e0a5

function createTokenizer(parser, initialize, from) {

wooorm · 2020-09-10T09:19:45Z

lib/types.ts

+    [key: string]: unknown
+  }
+  flow: (something: unknown) => unknown
+  defined: unknown[]


This containers unique normalized identifiers for definitions:

micromark/lib/tokenize/definition.js

Line 28 in 307e0a5

identifier = normalizeIdentifier(context.sliceSerialize(events[index][1]))

It does some rather complex things to how references are parsed 😓

micromark/lib/tokenize/label-end.js

Line 192 in 307e0a5

return self.parser.defined.indexOf(labelIdentifier) < 0

wooorm · 2020-09-10T09:21:49Z

lib/types.ts

+    _closeFlow: unknown
+    furtherBlankLines: unknown
+  }
+}


Also this stuff?

micromark/lib/util/create-tokenizer.js

Line 36 in 307e0a5

// State and tools for resolving, serializing.

wooorm · 2020-09-10T09:26:32Z

lib/types.ts

+    initialBlankLine: unknown
+    size: number
+    _closeFlow: unknown
+    furtherBlankLines: unknown


containerState is used in the document (container) tokenizer, because generally we can parse in one go (e.g., an atx heading), but for lists and block quotes we parse a part of it (e.g., > ), then we do the rest of the line, and then at the next line look for another marker. As we parse separate “runs” of content, information needs to be stored somewhere, and I came up with this.

All properties are used by lists

_closeFlow is a way to communicate from lists to the tokenizer that the flow is closed, but we do continue the list. That’s useful when finding a new list item, because the previous one needs to be closed, but the list remains open.

ChristianMurphy · 2020-09-10T14:06:46Z

The version of mm that lived here before was written in TS and while it only supported a couple of markdown constructs, it was already over the size that mm is now

The difference in size doesn't necessarily have to do with TS.
You re-architected the parser and compiler.

I also see that less folks understand TS compared to JS (including myself).

TypeScript is JavaScript, with annotations.
It's true that adding annotations technically makes it a superset/new language, it's not nearly as much of a learning curve as you make it out to be.
Take d286abe for example:

function tokenizeDefinition(
  effects,
  ok,
  nok
)

becomes

function tokenizeDefinition(
  effects: Effects,
  ok: Okay,
  nok: NotOkay
)

and

function start(code) {

becomes

function start(code: number) {

that's it, little annotations on parameter types.
the rest of the changes are ESM related (not typescript specific)

Especially important for folks who want to build extensions and look at this project for examples.

It took hours to even partially reconstruct what a tokenizer can accept for this PR.
Most people probably wont be able to look at this code and understand what the undocumented, completed dynamic, inputs might be.

Any reason to not do the jsdoc approach?

https://www.typescriptlang.org/docs/handbook/type-checking-javascript-files.html#null-undefined-and-empty-array-initializers-are-of-type-any-or-any
it problematic for micromark, the code style makes heavy use of uninitialized variables, nullish values, and empty untyped arrays.
Even when using JSDoc annotations.
An example

// in JavaScript mode TS can't tell that this is meant to be a constant, it thinks it is meant to be an initialized export representing another currently unknown type.
exports.eof = null

wooorm · 2020-09-10T14:28:10Z

The difference in size doesn't necessarily have to do with TS.
You re-architected the parser and compiler.

The difference in size may not necessarily be due to TS, but I’m assuming that lack of access to the output format will result in more bytes (this is also what I remember from coffeescript).
The re-architecting had to happen because the previous approach didn’t work.

It's true that adding annotations technically makes it a superset/new language, it's not nearly as much of a learning curve as you make it out to be.

I’m afraid of it getting way more complex than a couple of types sprinkled on top (effects, okay, notokay). I foresee that once TS is in, the more powerful and more confusing things will also be used. Regarding the learning curve, I can only speak for myself and for what I’ve seen when others’ started using TS.

It took hours to even partially reconstruct what a tokenizer can accept for this PR.
Most people probably wont be able to look at this code and understand what the undocumented, completed dynamic, inputs might be.

It is complex, there are no docs, and it needs to be better.

the code style makes heavy use of uninitialized variables, nullish values, and empty untyped arrays

Null is only used for the EOF character. Much of the code style is written to work well with minifiers, manglers, and gzip.

ChristianMurphy · 2020-09-10T14:54:25Z

The difference in size may not necessarily be due to TS, but I’m assuming that lack of access to the output format will result in more bytes (this is also what I remember from coffeescript).

Not necessarily, again, it's JavaScript with annotations, remove the annotations and it's back to JavaScript.
Both the TypeScipt compiler and https://babeljs.io/docs/en/babel-preset-typescript can do this.

foresee that once TS is in, the more powerful and more confusing things will also be used.

Using babel as the compiler gives pretty fine grained control over what features are/are not enabled.
https://babeljs.io/docs/en/babel-preset-typescript just removes annotations, any other language features need to be run through their respective babel plugins (and therefore can be disabled).
This also isn't limited to TypeScript, there are a lot of JavaScript features which could be used, but aren't, this seems like a slippery slope fallacy.

the code style makes heavy use of uninitialized variables, nullish values, and empty untyped arrays

Null is only used for the EOF character. Much of the code style is written to work well with minifiers, manglers, and gzip.

Right, but that is just one example, the code is littered with var something and var something = [], uninitialized values, and untyped empty arrays.

lib/util/flat-map.ts

--wip-- [skip ci]

bd4dc2d

ChristianMurphy requested a review from wooorm September 10, 2020 00:42

ChristianMurphy added 4 commits September 9, 2020 18:30

--wip-- [skip ci]

93b78a9

--wip-- [skip ci]

fad628d

--wip-- [skip ci]

817bcb2

--wip-- [skip ci]

8102cde

ChristianMurphy commented Sep 10, 2020

View reviewed changes

lib/types.ts Outdated Show resolved Hide resolved

--wip-- [skip ci]

d286abe

wooorm reviewed Sep 10, 2020

View reviewed changes

ChristianMurphy commented Sep 10, 2020

View reviewed changes

lib/util/flat-map.ts Show resolved Hide resolved

--wip-- [skip ci]

b12029f

ChristianMurphy mentioned this pull request Sep 10, 2020

types: add type definitions for micromark #17

Closed

ChristianMurphy closed this Sep 11, 2020

ChristianMurphy deleted the refactor/typescript branch September 11, 2020 03:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] migrate to typescript #16

[WIP] migrate to typescript #16

ChristianMurphy commented Sep 10, 2020 •

edited

wooorm commented Sep 10, 2020

wooorm Sep 10, 2020

wooorm Sep 10, 2020

wooorm Sep 10, 2020

wooorm Sep 10, 2020

ChristianMurphy commented Sep 10, 2020 •

edited

wooorm commented Sep 10, 2020

ChristianMurphy commented Sep 10, 2020

[WIP] migrate to typescript #16

[WIP] migrate to typescript #16

Conversation

ChristianMurphy commented Sep 10, 2020 • edited

wooorm commented Sep 10, 2020

wooorm Sep 10, 2020

Choose a reason for hiding this comment

wooorm Sep 10, 2020

Choose a reason for hiding this comment

wooorm Sep 10, 2020

Choose a reason for hiding this comment

wooorm Sep 10, 2020

Choose a reason for hiding this comment

ChristianMurphy commented Sep 10, 2020 • edited

wooorm commented Sep 10, 2020

ChristianMurphy commented Sep 10, 2020

ChristianMurphy commented Sep 10, 2020 •

edited

ChristianMurphy commented Sep 10, 2020 •

edited