Skip to content

Add JSDoc parsing#225

Merged
sandersn merged 32 commits intomicrosoft:mainfrom
sandersn:add-jsdoc-parsing
Jan 27, 2025
Merged

Add JSDoc parsing#225
sandersn merged 32 commits intomicrosoft:mainfrom
sandersn:add-jsdoc-parsing

Conversation

@sandersn
Copy link
Member

This PR adds JSDoc parsing. It builds on my previous PRs so it doesn't touch a lot of files for such a large feature. Specifically, the main AST still doesn't have a jsdoc property. Instead, the parser has a jsdocCache map that uses node as keys for jsdoc.

Like Strada,

  • It's disabled for .ts files unless the file contains @see or @link.
  • It moves diagnostics added during jsdoc parsing to a separate array, jsdocDiagnostics, so that JS files without checkJS can ignore them.
  • There are dual "nextToken/nextTokenJSDoc" pairs of functions. The JSDoc versions are used in JSDoc proper, but the parser switches back to the normal versions with parsing types in JSDoc.

Some notes:

  • Like Strada, it's disabled for .ts files unless the file contains a @see or @link.
  • Because I now have actual tests of JSDoc, there are quite a few bug fixes in ast.go — incorrect constructor functions or oorder visitor code mostly.
  • I removed the features I proposed removing in Add CHANGES.md with JSDoc proposal #132.
    • These are mostly deletions of entire functions except for parseJSDocTypeNameWithNamespace. In Corsa, it always returns a name, whereas in Strada it could return undefined, since typedef/enum/callback's names were optional.
    • Including an embarrassing AI artifact: KindJSDocImmediateTag. I think Copilot added this because of the @immediate proposal in 2024.
  • I didn't simplify the indentation algorithm, but I would like to later. Why not now?
    • It would be a complex rewrite compared to removing entire features.
    • I don't think test coverage of jsdoc indentation will be good enough until we have fourslash tests.
  • JSDoc parsing operates on a slice of the source text so that the parser will reliably emit EOF when it hits the end of the comment.
  • getJSDocCommentRanges returns overlapping trailing/leading comments when there are multiple jsdoc beforehand, which is common because of standalone tags like @typedef. I copied the Strada behaviour to Corsa even though it's wrong.
  • 50% of the lines changed in scanner.go is me fixing up my old style scanner *Scanner -> s *Scanner in methods, sorry.

I'll get some performance numbers and post them shortly. I don't expect this to be cheap because the fundamental algorithm hasn't changed from Strada.

@sandersn
Copy link
Member Author

sandersn commented Jan 13, 2025

Performance:

(numbers below)

In the normal mode, empty.ts is 17.7% slower, checker is the same, dom.generated.d.ts is 1.3% faster and Herebyfile.mjs is 0.4% slower. None of these are actually parsing JSDoc, so I don't know why empty.ts is so much slower. Maybe it's overhead from initialising the jsdocCache.

In always-parse mode, everything is slower. The interesting ones are the large ones: checker.ts and dom.generated.d.ts. checker.ts doesn't have much jsdoc and is only 2% slower. The DOM has JSDoc on nearly every declaration and is 58% slower.

My initial opinion is that this isn't worth optimising now because the language service isn't ready; the common case is that jsdoc isn't parsed at all, and parsing is a small part of take taken to boot.
Edit: Also because when I have more time I want to simplify and change the indent rules, which will allow much chunkier parsing per line of text.

In the normal mode:

                                                      │   old.txt   │             newnone.txt             │
                                                      │   sec/op    │   sec/op     vs base                │
Parse/empty.ts-16                                       352.9n ± 0%   415.4n ± 0%  +17.71% (p=0.000 n=10)
Parse/checker.ts-16                                     28.87m ± 2%   28.62m ± 1%        ~ (p=0.436 n=10)
Parse/dom.generated.d.ts-16                             8.919m ± 1%   8.806m ± 0%   -1.27% (p=0.000 n=10)
Parse/Herebyfile.mjs-16                                 396.8µ ± 0%   398.5µ ± 0%   +0.42% (p=0.029 n=10)
Parse/jsxComplexSignatureHasApplicabilityError.tsx-16   149.5µ ± 1%   148.9µ ± 0%        ~ (p=0.218 n=10)
geomean                                                 351.8µ        362.0µ        +2.88%

                                                      │   old.txt    │             newnone.txt             │
                                                      │     B/op     │     B/op      vs base               │
Parse/empty.ts-16                                       2.250Ki ± 0%   2.453Ki ± 0%  +9.03% (p=0.000 n=10)
Parse/checker.ts-16                                     24.34Mi ± 0%   24.34Mi ± 0%       ~ (p=0.796 n=10)
Parse/dom.generated.d.ts-16                             8.626Mi ± 0%   8.625Mi ± 0%       ~ (p=0.280 n=10)
Parse/Herebyfile.mjs-16                                 359.2Ki ± 0%   359.3Ki ± 0%  +0.05% (p=0.001 n=10)
Parse/jsxComplexSignatureHasApplicabilityError.tsx-16   157.0Ki ± 0%   157.2Ki ± 0%  +0.14% (p=0.000 n=10)
geomean                                                 488.9Ki        497.6Ki       +1.78%

                                                      │   old.txt   │             newnone.txt             │
                                                      │  allocs/op  │  allocs/op   vs base                │
Parse/empty.ts-16                                        6.000 ± 0%    8.000 ± 0%  +33.33% (p=0.000 n=10)
Parse/checker.ts-16                                     157.9k ± 0%   157.9k ± 0%        ~ (p=0.456 n=10)
Parse/dom.generated.d.ts-16                             72.84k ± 0%   72.84k ± 0%        ~ (p=0.066 n=10)
Parse/Herebyfile.mjs-16                                 2.606k ± 0%   2.608k ± 0%   +0.08% (p=0.000 n=10)
Parse/jsxComplexSignatureHasApplicabilityError.tsx-16    917.0 ± 0%    919.0 ± 0%   +0.22% (p=0.000 n=10)

In the old-style, always-parse mode:

                                                      │   old.txt   │              newall.txt              │
                                                      │   sec/op    │    sec/op     vs base                │
Parse/empty.ts-16                                       352.9n ± 0%    417.8n ± 0%  +18.39% (p=0.000 n=10)
Parse/checker.ts-16                                     28.87m ± 2%    29.47m ± 1%   +2.07% (p=0.000 n=10)
Parse/dom.generated.d.ts-16                             8.919m ± 1%   14.118m ± 1%  +58.30% (p=0.000 n=10)
Parse/Herebyfile.mjs-16                                 396.8µ ± 0%    453.0µ ± 0%  +14.17% (p=0.000 n=10)
Parse/jsxComplexSignatureHasApplicabilityError.tsx-16   149.5µ ± 1%    263.2µ ± 1%  +76.07% (p=0.000 n=10)
geomean                                                 351.8µ         460.6µ       +30.91%

                                                      │   old.txt    │              newall.txt               │
                                                      │     B/op     │     B/op       vs base                │
Parse/empty.ts-16                                       2.250Ki ± 0%    2.453Ki ± 0%   +9.03% (p=0.000 n=10)
Parse/checker.ts-16                                     24.34Mi ± 0%    24.68Mi ± 0%   +1.37% (p=0.000 n=10)
Parse/dom.generated.d.ts-16                             8.626Mi ± 0%   11.598Mi ± 0%  +34.46% (p=0.000 n=10)
Parse/Herebyfile.mjs-16                                 359.2Ki ± 0%    396.6Ki ± 0%  +10.44% (p=0.000 n=10)
Parse/jsxComplexSignatureHasApplicabilityError.tsx-16   157.0Ki ± 0%    223.5Ki ± 0%  +42.36% (p=0.000 n=10)
geomean                                                 488.9Ki         579.3Ki       +18.50%

                                                      │   old.txt   │              newall.txt               │
                                                      │  allocs/op  │  allocs/op    vs base                 │
Parse/empty.ts-16                                        6.000 ± 0%     8.000 ± 0%   +33.33% (p=0.000 n=10)
Parse/checker.ts-16                                     157.9k ± 0%    161.1k ± 0%    +2.06% (p=0.000 n=10)
Parse/dom.generated.d.ts-16                             72.84k ± 0%   118.11k ± 0%   +62.14% (p=0.000 n=10)
Parse/Herebyfile.mjs-16                                 2.606k ± 0%    3.040k ± 0%   +16.65% (p=0.000 n=10)
Parse/jsxComplexSignatureHasApplicabilityError.tsx-16    917.0 ± 0%    2213.0 ± 0%  +141.33% (p=0.000 n=10)
geomean                                                 2.776k         4.000k        +44.09%

I used the command go test -run=- -bench=. -benchmem -count=10 ./internal/parser | tee old.txt to generate the performance numbers.

@sandersn
Copy link
Member Author

sandersn commented Jan 13, 2025

Indeed making jsdocCache initialisation lazy make empty.ts only slow down by 2.4% and use 4.9% more memory.

                                                      │   old.txt   │          newnone-lazy.txt          │
                                                      │   sec/op    │   sec/op     vs base               │
Parse/empty.ts-16                                       352.9n ± 0%   361.5n ± 0%  +2.44% (p=0.000 n=10)
Parse/checker.ts-16                                     28.87m ± 2%   28.77m ± 1%       ~ (p=0.853 n=10)
Parse/dom.generated.d.ts-16                             8.919m ± 1%   8.726m ± 1%  -2.16% (p=0.000 n=10)
Parse/Herebyfile.mjs-16                                 396.8µ ± 0%   397.8µ ± 1%       ~ (p=0.075 n=10)
Parse/jsxComplexSignatureHasApplicabilityError.tsx-16   149.5µ ± 1%   149.0µ ± 1%       ~ (p=0.436 n=10)
geomean                                                 351.8µ        351.7µ       -0.05%

                                                      │   old.txt    │          newnone-lazy.txt           │
                                                      │     B/op     │     B/op      vs base               │
Parse/empty.ts-16                                       2.250Ki ± 0%   2.359Ki ± 0%  +4.86% (p=0.000 n=10)
Parse/checker.ts-16                                     24.34Mi ± 0%   24.34Mi ± 0%       ~ (p=0.529 n=10)
Parse/dom.generated.d.ts-16                             8.626Mi ± 0%   8.624Mi ± 0%       ~ (p=0.063 n=10)
Parse/Herebyfile.mjs-16                                 359.2Ki ± 0%   359.1Ki ± 0%       ~ (p=0.289 n=10)
Parse/jsxComplexSignatureHasApplicabilityError.tsx-16   157.0Ki ± 0%   157.0Ki ± 0%       ~ (p=0.060 n=10)
geomean                                                 488.9Ki        493.6Ki       +0.95%

                                                      │   old.txt   │           newnone-lazy.txt           │
                                                      │  allocs/op  │  allocs/op   vs base                 │
Parse/empty.ts-16                                        6.000 ± 0%    6.000 ± 0%       ~ (p=1.000 n=10) ¹
Parse/checker.ts-16                                     157.9k ± 0%   157.9k ± 0%       ~ (p=0.135 n=10)
Parse/dom.generated.d.ts-16                             72.84k ± 0%   72.84k ± 0%       ~ (p=0.074 n=10)
Parse/Herebyfile.mjs-16                                 2.606k ± 0%   2.606k ± 0%       ~ (p=1.000 n=10) ¹
Parse/jsxComplexSignatureHasApplicabilityError.tsx-16    917.0 ± 0%    917.0 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                                 2.776k        2.776k       -0.00%
¹ all samples are equal

@jakebailey
Copy link
Member

Retesting but setting JSDocParsingModeParseForTypeErrors everywhere, I don't see a significant change:

goos: linux
goarch: amd64
pkg: github.com/microsoft/typescript-go/internal/parser
cpu: Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz
                                                      │   old.txt   │              new.txt               │
                                                      │   sec/op    │   sec/op     vs base               │
Parse/empty.ts-20                                       750.7n ± 1%   784.3n ± 1%  +4.48% (p=0.000 n=10)
Parse/checker.ts-20                                     45.49m ± 3%   45.23m ± 4%       ~ (p=0.739 n=10)
Parse/dom.generated.d.ts-20                             14.89m ± 2%   14.98m ± 2%       ~ (p=0.481 n=10)
Parse/Herebyfile.mjs-20                                 626.4µ ± 2%   628.6µ ± 0%       ~ (p=0.631 n=10)
Parse/jsxComplexSignatureHasApplicabilityError.tsx-20   256.5µ ± 2%   257.9µ ± 1%       ~ (p=0.393 n=10)
geomean                                                 606.0µ        612.4µ       +1.06%

                                                      │   old.txt    │               new.txt               │
                                                      │     B/op     │     B/op      vs base               │
Parse/empty.ts-20                                       2.328Ki ± 0%   2.438Ki ± 0%  +4.70% (p=0.000 n=10)
Parse/checker.ts-20                                     24.30Mi ± 0%   24.31Mi ± 0%       ~ (p=0.075 n=10)
Parse/dom.generated.d.ts-20                             8.620Mi ± 0%   8.620Mi ± 0%       ~ (p=0.912 n=10)
Parse/Herebyfile.mjs-20                                 360.0Ki ± 0%   360.0Ki ± 0%       ~ (p=0.912 n=10)
Parse/jsxComplexSignatureHasApplicabilityError.tsx-20   157.6Ki ± 0%   157.7Ki ± 0%       ~ (p=0.079 n=10)
geomean                                                 492.7Ki        497.3Ki       +0.94%

                                                      │   old.txt   │               new.txt                │
                                                      │  allocs/op  │  allocs/op   vs base                 │
Parse/empty.ts-20                                        6.000 ± 0%    6.000 ± 0%       ~ (p=1.000 n=10) ¹
Parse/checker.ts-20                                     156.0k ± 0%   156.0k ± 0%       ~ (p=0.115 n=10)
Parse/dom.generated.d.ts-20                             72.32k ± 0%   72.32k ± 0%       ~ (p=0.317 n=10)
Parse/Herebyfile.mjs-20                                 2.599k ± 0%   2.599k ± 0%       ~ (p=1.000 n=10)
Parse/jsxComplexSignatureHasApplicabilityError.tsx-20    917.0 ± 0%    917.0 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                                 2.764k        2.764k       +0.00%
¹ all samples are equal

But, poop doesn't seem to like it:

Benchmark 1 (11 runs): ./built/local-old/tsgo -p _submodules/TypeScript/src/compiler
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           965ms ± 23.9ms     937ms … 1.01s           0 ( 0%)        0%
  peak_rss            454MB ± 4.47MB     447MB …  460MB          0 ( 0%)        0%
  cpu_cycles         17.6G  ±  327M     17.0G  … 18.2G           0 ( 0%)        0%
  instructions       19.9G  ±  200M     19.7G  … 20.3G           0 ( 0%)        0%
  cache_references    458M  ± 4.78M      450M  …  466M           0 ( 0%)        0%
  cache_misses       88.1M  ± 2.80M     82.2M  … 91.2M           0 ( 0%)        0%
  branch_misses      64.2M  ±  490K     63.6M  … 64.8M           0 ( 0%)        0%
Benchmark 2 (11 runs): ./built/local/tsgo -p _submodules/TypeScript/src/compiler
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           985ms ± 28.6ms     946ms … 1.02s           0 ( 0%)          +  2.1% ±  2.4%
  peak_rss            462MB ± 3.56MB     456MB …  467MB          0 ( 0%)        💩+  1.8% ±  0.8%
  cpu_cycles         18.4G  ±  292M     17.9G  … 18.8G           0 ( 0%)        💩+  4.3% ±  1.6%
  instructions       20.6G  ±  200M     20.4G  … 21.1G           0 ( 0%)        💩+  3.3% ±  0.9%
  cache_references    477M  ± 7.65M      465M  …  489M           0 ( 0%)        💩+  4.0% ±  1.2%
  cache_misses       91.4M  ± 3.45M     85.2M  … 96.0M           0 ( 0%)          +  3.7% ±  3.2%
  branch_misses      66.1M  ±  605K     65.2M  … 66.9M           0 ( 0%)        💩+  3.0% ±  0.8%

Also
1. Default to JSDocParsingModeAll everywhere else.
2. Change tests to explicitly use JSDocParsingModeAll.
2. Delete unused parseSourceFile wrapper in program.go.
@sandersn
Copy link
Member Author

I think 4% cost is reasonable for JSDoc parsing -- at least initially. I have some non-backward-compatible ideas for how to make it faster, but it's already cheaper than JSDoc parsing in Strada.

result.ScriptKind = p.scriptKind
result.Flags |= p.sourceFlags
result.SetJSDocCache(p.jsdocCache)
p.jsdocCache = nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this; duplicated above and not related to the source file itself, but the memory retention of the Parser.

result.LanguageVersion = p.languageVersion
result.LanguageVariant = p.languageVariant
result.ScriptKind = p.scriptKind
result.Flags |= p.sourceFlags
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a chance that the source flags differ when we reparse for await?

@jakebailey
Copy link
Member

Nathan and I looked at this on a call; there were a couple of bugs (not caused by this PR) that prevented things from working, notably that we never set the ScriptKind on the scanner. That plus some performance improvements leave us at:

Benchmark 1 (11 runs): ./built/local-old/tsgo -p _submodules/TypeScript/src/compiler
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           939ms ± 25.4ms     915ms …  994ms          0 ( 0%)        0%
  peak_rss            454MB ± 3.78MB     445MB …  458MB          2 (18%)        0%
  cpu_cycles         17.5G  ±  242M     17.1G  … 17.8G           0 ( 0%)        0%
  instructions       20.0G  ±  231M     19.8G  … 20.5G           0 ( 0%)        0%
  cache_references    471M  ± 8.49M      458M  …  486M           0 ( 0%)        0%
  cache_misses       81.5M  ± 2.79M     79.5M  … 89.3M           1 ( 9%)        0%
  branch_misses      64.7M  ±  581K     64.0M  … 65.8M           0 ( 0%)        0%
Benchmark 2 (11 runs): ./built/local/tsgo -p _submodules/TypeScript/src/compiler
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           942ms ± 48.9ms     897ms … 1.08s           1 ( 9%)          +  0.4% ±  3.7%
  peak_rss            455MB ± 2.95MB     452MB …  458MB          0 ( 0%)          +  0.4% ±  0.7%
  cpu_cycles         17.3G  ±  335M     16.5G  … 17.8G           0 ( 0%)          -  0.9% ±  1.5%
  instructions       20.1G  ±  171M     19.8G  … 20.3G           0 ( 0%)          +  0.5% ±  0.9%
  cache_references    470M  ± 8.03M      460M  …  485M           0 ( 0%)          -  0.1% ±  1.6%
  cache_misses       82.2M  ± 1.90M     79.5M  … 85.0M           0 ( 0%)          +  0.8% ±  2.6%
  branch_misses      64.4M  ±  523K     63.3M  … 65.0M           1 ( 9%)          -  0.4% ±  0.8%

aka no difference for a tsc run, so that's great.

I think there is sill some performance to be gained when parsing JSDoc but that's not strictly required now that we got over the hurdle of a plain tsc run.

return token == ast.KindGreaterThanToken || tokenIsIdentifierOrKeyword(token)
}

func getJSDocCommentRanges(f *ast.NodeFactory, commentRanges []ast.CommentRange, node *ast.Node, text string) []ast.CommentRange {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this would be better as an iterator, but it's not performant enough until Go 1.24 is released.

@sandersn
Copy link
Member Author

sandersn commented Jan 27, 2025

A quick run of TS 5.7.3 on dom.generated.ts shows a 30% slowdown between ParseNone and ParseAll. I don't think we measured that number after the last round of performance fixes, but that's an initial target at least.

That's in a file where 49% of characters are in JSDoc. I didn't measure the increase in number of nodes.

I'm going to merge this now and re-run my AST tests afterwards -- the overhead of branch management is way lower that way.

@sandersn sandersn merged commit 7acc629 into microsoft:main Jan 27, 2025
14 checks passed
@sandersn sandersn deleted the add-jsdoc-parsing branch January 27, 2025 14:39
@sandersn
Copy link
Member Author

AST tests are still good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants