Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] migrate to typescript #16

Closed

Conversation

ChristianMurphy
Copy link
Member

@ChristianMurphy ChristianMurphy commented Sep 10, 2020

Micromark has a number of complex/opaque types used in it's internals.
This attempts to document and validate these types using TypeScript.

In particular some complex and currently undocumented types include Events, Token, Effects, ok, and nok.

In addition, micromark's current usage pattern includes directly accessing internal files, meaning that for TypeScript users, most (if not all) files will need types.

@ChristianMurphy ChristianMurphy added 🦋 type/enhancement This is great to have help wanted 🙏 This could use your insight or help 🕸️ area/tests This affects tests ☂️ area/types This affects typings 🛠 blocked/wip This cannot progress yet, it’s being worked on 💬 type/discussion This is a request for comments labels Sep 10, 2020
lib/types.ts Outdated Show resolved Hide resolved
@wooorm
Copy link
Member

wooorm commented Sep 10, 2020

Awesome! I see this effort as useful, as the current code has only seen my eyes and is young, so there’s much that can be better!
Working on types will undoubtedly catch a bunch of things I missed!

Regarding this project as TS, well, we talked about our opinions on TS a lot already, I’m not really for it, in short:

I feel strongly that the bundle size of mm should be as small as possible. The version of mm that lived here before was written in TS and while it only supported a couple of markdown constructs, it was already over the size that mm is now. This may indicate that a project in TS is bigger than a project by hand in JS. (Practically, for ESM the scripts would also need to be fixed)

I also see that less folks understand TS compared to JS (including myself). Especially important for folks who want to build extensions and look at this project for examples.

Any reason to not do the jsdoc approach?

lib/types.ts Show resolved Hide resolved
lib/types.ts Show resolved Hide resolved
lib/types.ts Show resolved Hide resolved
lib/types.ts Show resolved Hide resolved
lib/types.ts Show resolved Hide resolved
lib/types.ts Show resolved Hide resolved
hooks: {
[key: string]: unknown
}
flow: (something: unknown) => unknown
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more:

content: create(initializeContent),

These are the available tokenizers:

function createTokenizer(parser, initialize, from) {

[key: string]: unknown
}
flow: (something: unknown) => unknown
defined: unknown[]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This containers unique normalized identifiers for definitions:

identifier = normalizeIdentifier(context.sliceSerialize(events[index][1]))

It does some rather complex things to how references are parsed 😓

return self.parser.defined.indexOf(labelIdentifier) < 0

_closeFlow: unknown
furtherBlankLines: unknown
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this stuff?

// State and tools for resolving, serializing.

initialBlankLine: unknown
size: number
_closeFlow: unknown
furtherBlankLines: unknown
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

containerState is used in the document (container) tokenizer, because generally we can parse in one go (e.g., an atx heading), but for lists and block quotes we parse a part of it (e.g., > ), then we do the rest of the line, and then at the next line look for another marker. As we parse separate “runs” of content, information needs to be stored somewhere, and I came up with this.

All properties are used by lists

_closeFlow is a way to communicate from lists to the tokenizer that the flow is closed, but we do continue the list. That’s useful when finding a new list item, because the previous one needs to be closed, but the list remains open.

@ChristianMurphy
Copy link
Member Author

ChristianMurphy commented Sep 10, 2020

The version of mm that lived here before was written in TS and while it only supported a couple of markdown constructs, it was already over the size that mm is now

The difference in size doesn't necessarily have to do with TS.
You re-architected the parser and compiler.

I also see that less folks understand TS compared to JS (including myself).

TypeScript is JavaScript, with annotations.
It's true that adding annotations technically makes it a superset/new language, it's not nearly as much of a learning curve as you make it out to be.
Take d286abe for example:

function tokenizeDefinition(
  effects,
  ok,
  nok
)

becomes

function tokenizeDefinition(
  effects: Effects,
  ok: Okay,
  nok: NotOkay
)

and

function start(code) {

becomes

function start(code: number) {

that's it, little annotations on parameter types.
the rest of the changes are ESM related (not typescript specific)

Especially important for folks who want to build extensions and look at this project for examples.

It took hours to even partially reconstruct what a tokenizer can accept for this PR.
Most people probably wont be able to look at this code and understand what the undocumented, completed dynamic, inputs might be.

Any reason to not do the jsdoc approach?

https://www.typescriptlang.org/docs/handbook/type-checking-javascript-files.html#null-undefined-and-empty-array-initializers-are-of-type-any-or-any
it problematic for micromark, the code style makes heavy use of uninitialized variables, nullish values, and empty untyped arrays.
Even when using JSDoc annotations.
An example

// in JavaScript mode TS can't tell that this is meant to be a constant, it thinks it is meant to be an initialized export representing another currently unknown type.
exports.eof = null

@wooorm
Copy link
Member

wooorm commented Sep 10, 2020

The difference in size doesn't necessarily have to do with TS.
You re-architected the parser and compiler.

The difference in size may not necessarily be due to TS, but I’m assuming that lack of access to the output format will result in more bytes (this is also what I remember from coffeescript).
The re-architecting had to happen because the previous approach didn’t work.

It's true that adding annotations technically makes it a superset/new language, it's not nearly as much of a learning curve as you make it out to be.

I’m afraid of it getting way more complex than a couple of types sprinkled on top (effects, okay, notokay). I foresee that once TS is in, the more powerful and more confusing things will also be used. Regarding the learning curve, I can only speak for myself and for what I’ve seen when others’ started using TS.

It took hours to even partially reconstruct what a tokenizer can accept for this PR.
Most people probably wont be able to look at this code and understand what the undocumented, completed dynamic, inputs might be.

It is complex, there are no docs, and it needs to be better.

the code style makes heavy use of uninitialized variables, nullish values, and empty untyped arrays

Null is only used for the EOF character. Much of the code style is written to work well with minifiers, manglers, and gzip.

@ChristianMurphy
Copy link
Member Author

The difference in size may not necessarily be due to TS, but I’m assuming that lack of access to the output format will result in more bytes (this is also what I remember from coffeescript).

Not necessarily, again, it's JavaScript with annotations, remove the annotations and it's back to JavaScript.
Both the TypeScipt compiler and https://babeljs.io/docs/en/babel-preset-typescript can do this.

foresee that once TS is in, the more powerful and more confusing things will also be used.

Using babel as the compiler gives pretty fine grained control over what features are/are not enabled.
https://babeljs.io/docs/en/babel-preset-typescript just removes annotations, any other language features need to be run through their respective babel plugins (and therefore can be disabled).
This also isn't limited to TypeScript, there are a lot of JavaScript features which could be used, but aren't, this seems like a slippery slope fallacy.

the code style makes heavy use of uninitialized variables, nullish values, and empty untyped arrays

Null is only used for the EOF character. Much of the code style is written to work well with minifiers, manglers, and gzip.

Right, but that is just one example, the code is littered with var something and var something = [], uninitialized values, and untyped empty arrays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🕸️ area/tests This affects tests ☂️ area/types This affects typings 🛠 blocked/wip This cannot progress yet, it’s being worked on help wanted 🙏 This could use your insight or help 💬 type/discussion This is a request for comments 🦋 type/enhancement This is great to have
Development

Successfully merging this pull request may close these issues.

None yet

2 participants