Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Current state of the project? #26

Closed
michaelfward opened this issue Jun 12, 2018 · 12 comments
Closed

Current state of the project? #26

michaelfward opened this issue Jun 12, 2018 · 12 comments
Labels
general discussion Not a bug or enhancement, just a discussion

Comments

@michaelfward
Copy link

Hello everyone!
Super excited about this project. I'd love to get involved in any way I can.

I tried to install & run tsdoc locally, and ran into issues. So, I wanted to know - is there a roadmap available with planned features / support + implemented features / support? I'm asking both so I can debug the issue I ran into, and so I can reference it when looking for ways to contribute =)

Thanks for the hard work everyone!

@octogonz
Copy link
Collaborator

octogonz commented Jun 22, 2018

(We'll try to give more frequent updates going forward. Something was broken with my GitHub notifications, so I just noticed your question today.)

Status: I'm currently working on the initial architecture and prototype for the parser library. Once we have a working prototype, it will be in a good state to accept pull requests / feedback from the community. Until then it's a little hard to collaborate unless someone is in the Seattle area and wants to meet up.

I was making pretty good progress until I ran into some questions about how to model the AST, specifically handling of mutiline tokens whose "lines" are embedded inside a doc comment, for example:

  /**
   * `abc
   * def`
   */

(Is abc def a single lexical token? Is the * part of that token?)

We met with the TypeScript compiler owners, and they gave a bunch of really helpful feedback. They also suggested that we build the tsdoc NPM package as an extension of the compiler's AST, rather than having it act as a standalone parser. It would still be its own library, but the input would be a TypeScript compiler data structure instead of a plain text string (as in the current approach).

Integrating with the compiler has some benefits:

  • More involvement and help from the compiler team
  • Leverage a sophisticated existing framework that includes a lot of helpful utilities
  • Simpler roadmap for integrating TSDoc support into the compiler and VS Code

But there are also drawbacks:

  • A bigger learning curve to get off the ground
  • The tsdoc package would become coupled to a specific release of the compiler
  • If we need to make patches to the compiler to get our problems solved, we'd have to depend on an unstable release of the compiler until our patches finally reached production
  • Any changes we need to make in the compiler would demand a high quality bar, and force us to consider requirements that aren't immediately relevant to a documentation tool (e.g. IntelliSense)

It's a tough call. My coding progress stalled while I researched this decision, and then recently I got sidetracked by some other priorities. TSDoc is now my primary focus again for the rest of June, but then in July I will be away on vacation for a few weeks... so the short answer is that it's moving along, but probably won't take off until later this summer.

I will say that we now feel reasonably confident about the initial language spec for TSDoc, and also the overall parsing strategy.

I'd love to get involved in any way I can.

The easiest way to contribute right now is to open/answer GitHub issues that help us flesh out the requirements and syntax for the TSDoc notation. There are still plenty of interesting edge cases and cool feature ideas. If you maintain a documentation tool, you can also compare/contrast your syntax and features against what we're proposing for TSDoc and identify any gaps.

@typhonrt
Copy link

typhonrt commented Jun 22, 2018

Hi folks.. I'm just going jump in here as I've been following tsdoc since the initial announcement. I've been on a long sabbatical, but am releasing a major overhaul of ESDoc aptly named TJSDoc as I'm supporting JSDoc and Typescript transparently or at least that is the goal. As things go the ESDoc maintainer is hostile to outside contributions hence a major fork. A considerable amount of work (8 months full time) has already commenced in '17 and I drastically overhauled the ESDoc infrastructure for the better reducing all technical debt and adding many new features. I took a break / sabbatical before finishing things off and public release and will be back on things in a month or so.

Anyway tsdoc is an important project that really fills a hole in well formatted evaluation of doc comments / text only. IMHO tsdoc should take a text string (comment node) and parse it generating an AST; nothing else including no other dependencies such as markdown library, etc. It would still be its own library, but the input would be a TypeScript compiler data structure instead of a plain text string (as in the current approach) - this will be no use to me and will likely limit adoption from a wider set of documentation tooling efforts. TJSDoc and Typescript support is based on Babylon 7. I'll gladly contribute to tsdoc if things progress from a text parsing direction. My general plan is to create a "sister" project to tsdoc and parse JSDoc text comments and generate an AST as well.

So what I see as most beneficial and generalized is tsdoc accepting text / comment nodes which results in an AST for further parsing by the documentation tooling at hand. No outside dependencies.

@octogonz
Copy link
Collaborator

So what I see as most beneficial and generalized is tsdoc accepting text / comment nodes which results in an AST for further parsing by the documentation tooling at hand. No outside dependencies.

Thank you for this feedback! I find it highly persuasive. We were unsure whether anyone would want to use the tsdoc library without first invoking the TypeScript compiler engine. Certainly you shouldn't be forced to write your own parser just because your tool isn't based on the TypeScript engine.

BTW if you plan to be contributing, feel free to create an PR to add your project to the "Who's involved?" list.

@octogonz
Copy link
Collaborator

Another way to look at this is, the best projects I've participated in often took the form of a 2.0 reboot of a working 1.0 implementation. Only with the hindsight of having actually coded everything, do we learn how to code everything well. So if standalone-tsdoc requires us to later redo the entire parser inside the compiler's code base, perhaps that's a benefit and not a wasted duplication. :-)

@typhonrt
Copy link

typhonrt commented Jun 22, 2018

@pgonzal I'll definitely do a PR when I get back into full swing of things which should be in August. TJSDoc is event based on inter-module communication. Based on the file type comment nodes would be dispatched to tsdoc for .ts files and for Javascript files a forthcoming JSDoc module which also generates a compatible AST for comment nodes which among other things should support type information in comments, etc. This would address #23 (hah you CCed me there too!) IE the AST nodes from Babylon 7 will provide the type information for TS files.

I bit my tongue w/ other discussion regarding potential markdown dependencies for tsdoc in other issues, but definitely had to jump in w/ your response above regarding dependence on the Typescript compiler engine. IMHO regarding markdown the documentation tooling should decides on a markdown module and simply parse the AST of tsdoc AST markdown nodes with the markdown module of choice. I was pleasantly happy to see Typescript / AST parsing in Babylon 7 emerge as possible during my sabbatical. I originally created things w/ TJSDoc to potentially have to support the Typescript compiler engine, but if I can pull things off for JS / TS using Babylon 7 transparently that is an ideal scenario.

Nonetheless a well defined Typescript comment format and module to generate an accompanying AST for consumption by documentation tooling is highly needed. This is one of the sore spots in ESDoc and certainly latent sore spot in TJSDoc (and I assume a lot of other documentation tooling!) as currently implementation is just a bunch of adhoc regex processing instead of separate modules that provide a well defined AST for processing comments.

As much as it is painful to accept regex processing as the way forward it is a generalized approach. It's certainly hard to get right, but worth the effort as if there is a standard module implementing things for Typescript comments and a separate module for JSDoc that generates a compatible AST for documentation tools to consume then all tooling can offload this aspect to common efforts, etc.

@octogonz
Copy link
Collaborator

IMHO regarding markdown the documentation tooling should decides on a markdown module and simply parse the AST of tsdoc AST markdown nodes with the markdown module of choice

So far we've come to the same opinion, see #12 (comment) . Thanks again for sharing your use case, very informative!

@typhonrt
Copy link

@pgonzal It has definitely been a pleasure following the progression thus far with the sussing out of potential features. I'd say in an ideal world the result of tsdoc and any accompanying JSDoc related module would be a well defined and -shared- AST for both angles. Given that this is a major pain point in my efforts I'd be glad to get to work in earnest on tsdoc and any potential JSDoc related module that outputs a common AST.

I must say that for expediency in the case of creating a JSDoc module I'd likely fork tsdoc, tweak the regex processing as necessary which outputs the AST + type information from comments, etc. I guess the bonus is that I'd finally have to embrace TS myself to implement said module.. 😄

@octogonz
Copy link
Collaborator

Status update:

Today we published the first release of the @microsoft/tsdoc NPM package! . This is “alpha quality” and still pretty rough around the edges, but the major approach and model are worked out. The project is now (finally!) in a state where people can give useful feedback on the implementation and API design. I’ve put together a small demo project that illustrates the basic usage. If you get a chance, please try it out and open GitHub issues with your feedback/ideas.

In the coming weeks we’ll be working on updating API Extractor to use this library for parsing doc comments, and fixing lots of bugs and feature gaps along the way.

Thanks again to everyone who’s been contributing ideas and input!

@octogonz
Copy link
Collaborator

And here's a quick summary of the major ideas:

  1. We start with a TextRange that is similar to ts.TextRange

  2. Then the LineExtractor finds the text ranges for the lines within the comment.

  3. Next the Tokenizer breaks these lines into primitive Token objects, that are really just symbol characters, newlines, whitespace, and blobs of text

  4. The NodeParser uses a TokenReader to build an AST of DocNode subclasses.

  5. The DocNode tree has two roles: For scenarios where you need to generate documentation comments and e.g. emit them into *.d.ts files, the DocNode acts like a high-level DOM API for building up a conceptual tree and transforming it. But it also can provide a detailed grammatical analysis of a parsed input (e.g. tell me the coordinates of the = for an HTML attribute). This is accomplished by including a bunch of DocParticle nodes in the tree. A visitor sees them when using DocNode.getChildNodes(), but otherwise the particles are invisible for the everyday API interactions. You never need to create them explicitly.

  6. Each DocNode has an optional associated Excerpt which tracks the corresponding source file coordinates for a parsed input. The DocNode.excerpt will be undefined for manually constructed nodes, and for abstract nodes (e.g. DocSection) which don't correspond to any input tokens.

  7. The Excerpt class tracks content and an optional spacingAfterContent (similar to TypeScript compiler trivia). These aren't TextRange or Token objects, but instead represented as TokenSequence objects. The TokenSequence allows very precise highlighting, for example a TokenSequence might correspond to "Hello, and world!" in this example, but NOT the newline or * in between:

    /**
     * <data-item description="Hello,
     * world!" />
     */
  8. All of these aspects come back in the ParserContext returned by the TSDocParser main API.

  9. The root of the AST is a DocComment object that has everything nicely rolled up into summary/remarks/parameters/returns and a ModifierTagSet for checking for the presence of modifier tags such as @readonly.

  10. Error messages come in two variants: General errors are reported via ParserContext.log.messages. But if specific tokens cannot not be parsed as expected (e.g. <badTag abc=" />), then the first misinterpreted characters (e.g. <) will get represented as a DocErrorText node (i.e. treated as literal text) and parsing will resume with the next token. A log message will also be generated. (This is the "infinite lookahead" aspect of a Markdown parser.)

  11. In this initial prototype, the parsing is very strict. But we've included an optional TSDocParserConfiguration that (1) provides a place to define custom tags, and (2) in the future will provide a rich set of options and switches to flexibly handle degenerate inputs. Later on, we'll open up the NodeParser to support custom syntaxes via plug-in rules. (I'm eager to do this, but the engine will need to be relatively mature if we want a stable API.)

If you want to see some examples of all these pieces together, the Jest snapshots are somewhat informative:

@typhonrt
Copy link

@pgonzal How is memory efficiency and speed looking? Those are my two big concerns with a more "heavyweight" / precise solution like tsdoc regarding potential integration with a documentation pipeline which can consume a lot of memory to begin with for large projects. IE it seems your / the projects main impetus is thoroughness / accuracy in parsing at expense to???

I've done a ton of optimization on both fronts over ESDoc for my forthcoming documentation effort, but it can still be a beast on very large projects. Perhaps tests can be added to stress tsdoc for speed and memory consumption over successive executions?

@octogonz
Copy link
Collaborator

octogonz commented Aug 17, 2018

Hey @typhonrt

Performance is certainly an important goal for this project, however we have not yet invested in measurements or optimization work in that area. The first priority is still to get the main usage scenarios to be feature complete and running. (If someone else wanted to set up some performance tests, that would be very appreciated!)

That said, the current architecture is designed with performance in mind. Like the TypeScript compiler, the core parser works primarily with integers (i.e. indexes into an array of tokens, or indexes into an array of characters) instead of allocating and comparing text strings. Here I am referring to the TextRange, Token, TokenSequence, and Excerpt classes. By contrast the DocNode tree does allocate strings, because it is intended to support a builder scenario as well as parsing (i.e. DocNode --> comment text, instead of comment text --> DocNode). But it would be straightforward to optimize this for the parser-only scenario.

The parser's algorithm time complexity should be analogous to a CommonMark parser: It sometimes performs infinite lookahead, e.g. parsing "<tag1 abc="<tag2 abc="<tag3 />" may require scanning to the end of the input two times before finally finding a well-formed <tag3 />. But generally the grammar rules and human behavior tend to make those cases rare or even exotic. In the current approach, TSDoc does not support CommonMark nesting blocks at all, so it completely avoids the problem of walking up and down a list of scopes to find the right place to insert a node. (This isn't just an unimplemented feature: When I write up the spec, I will try to argue persuasively that these CommonMark features are counterproductive for a documentation comment, and can be reasonably avoided while still ensuring that TSDoc constructs will still generally be handled correctly by CommonMark implementations. Some initial ideas about that were posted in #29 .)

The last aspect that comes to mind from a performance standpoint is circular references in the DocNode API. For example, ParserContext.docComment has child nodes with DocNode.excerpt properties that point back to the ParserContext. This means that any reference held to any DocNode object will keep the entire graph alive, which could lead to memory leaks. There are ways to break these loops, but I haven't had time to think about it yet in depth. However I believe it can be solved without significantly altering the current API design.

Hope that helps!

@octogonz octogonz added the general discussion Not a bug or enhancement, just a discussion label Aug 31, 2018
@octogonz
Copy link
Collaborator

I have added a Where are we on the roadmap? section to the root-level README file, which people can use to track our progress until we get a proper newsfeed set up.

Here's what I posted there as of today:

Already completed:

  • Write up all the interesting design questions as "RFC" GitHub issues to collect community feedback
  • Arrive at an initial consensus on the basic approach and strategy for TSDoc
  • Develop an initial feature-complete prototype of the @microsoft/tsdoc library and publish the NPM package
  • Convert Microsoft's API Extractor tool to use @microsoft/tsdoc (replacing its proprietary AEDoc engine); this demonstrates that TSDoc can meet the needs of a large production documentation web site

What's next:

  • Write up an initial draft of the TSDoc spec document, which outlines the proposed standard
  • Collect community feedback and integrate it into the draft, then publish the first "official" 1.0 spec
  • Review the @microsoft/tsdoc API with various integrators, including TypeScript and VS Code
  • Publish the first "1.0.0" stable release of the @microsoft/tsdoc package
  • Help onboard various partners

As such, I'm going to close this GitHub issue.

BTW I'm also excited to announce that we made a cool little TSDoc Playground with an interactive demo of the parser. (Big thanks to @iclanton and @KevinTCoughlin for their work on this!) Enjoy! :-P

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
general discussion Not a bug or enhancement, just a discussion
Projects
None yet
Development

No branches or pull requests

3 participants