11# Adding a New Language to Codegraph
22
33This guide walks through every file you need to touch when adding support for a
4- new programming language. It covers both the ** WASM engine** (main branch) and
5- the ** native Rust engine** (` feat/rust-core ` branch).
4+ new programming language.
65
76---
87
@@ -20,6 +19,32 @@ queries are engine-agnostic. When adding a new language you implement the
2019extraction logic ** twice** — once in JavaScript (WASM) and once in Rust
2120(native) — and a parity test guarantees they agree.
2221
22+ ### The LANGUAGE_REGISTRY
23+
24+ ` LANGUAGE_REGISTRY ` in ` src/parser.js ` is the ** single source of truth** for all
25+ supported languages. Each entry declares:
26+
27+ ``` js
28+ {
29+ id: ' go' , // Language identifier
30+ extensions: [' .go' ], // File extensions (auto-derives EXTENSIONS)
31+ grammarFile: ' tree-sitter-go.wasm' , // WASM grammar filename
32+ extractor: extractGoSymbols, // Extraction function reference
33+ required: false , // true = crash if missing; false = skip gracefully
34+ }
35+ ```
36+
37+ Adding a language to the WASM engine requires ** one registry entry** plus an
38+ extractor function. Everything else — extension routing, parser loading, dispatch
39+ — is automatic.
40+
41+ - ` SUPPORTED_EXTENSIONS ` (re-exported as ` EXTENSIONS ` in ` constants.js ` ) is
42+ ** derived** from the registry. You never edit it manually.
43+ - ` createParsers() ` iterates the registry and builds a ` Map<id, Parser> ` .
44+ - ` getParser() ` uses an extension→registry lookup map (` _extToLang ` ).
45+ - ` wasmExtractSymbols() ` calls ` entry.extractor(tree, filePath) ` — no ternary chains.
46+ - ` parseFilesAuto() ` in ` builder.js ` handles all dispatch — no per-language routing needed.
47+
2348---
2449
2550## Symbol Model
@@ -40,12 +65,16 @@ FileSymbols {
4065
4166| Structure | Fields | Notes |
4267| -----------| --------| -------|
43- | ` Definition ` | ` name ` , ` kind ` , ` line ` , ` endLine ` , ` decorators? ` | ` kind ` ∈ ` function ` , ` method ` , ` class ` , ` interface ` , ` type ` |
68+ | ` Definition ` | ` name ` , ` kind ` , ` line ` , ` endLine ` , ` decorators? ` | ` kind ` ∈ ` SYMBOL_KINDS ` (see below) |
4469| ` Call ` | ` name ` , ` line ` , ` dynamic? ` | |
4570| ` Import ` | ` source ` , ` names[] ` , ` line ` , ` <lang>Import? ` | Set a language flag like ` cInclude: true ` |
4671| ` ClassRelation ` | ` name ` , ` extends? ` , ` implements? ` , ` line ` | |
4772| ` ExportInfo ` | ` name ` , ` kind ` , ` line ` | |
4873
74+ ** Symbol kinds:** ` function ` , ` method ` , ` class ` , ` interface ` , ` type ` , ` struct ` ,
75+ ` enum ` , ` trait ` , ` record ` , ` module ` . Use the language's native kind (e.g. Go
76+ structs → ` struct ` , Rust traits → ` trait ` , Ruby modules → ` module ` ).
77+
4978Methods inside a class use the ` ClassName.methodName ` naming convention.
5079
5180---
@@ -85,50 +114,19 @@ Build the WASM binary:
85114npm run build:wasm
86115```
87116
88- This generates ` grammars/tree-sitter-<lang>.wasm ` . Commit this file.
89-
90- ### 3. ` src/constants.js ` — register file extensions
91-
92- ``` js
93- export const EXTENSIONS = new Set ([
94- // ... existing ...
95- ' .<ext>' , // e.g. '.c', '.h'
96- ]);
97- ```
98-
99- ### 4. ` src/parser.js ` — WASM extraction (3 changes)
100-
101- #### 4a. Load the grammar in ` createParsers() `
102-
103- Follow the graceful-fallback pattern used by every optional language:
104-
105- ``` js
106- let < lang> Parser = null ;
107- try {
108- const <Lang > = await Language .load (grammarPath (' tree-sitter-<lang>.wasm' ));
109- < lang> Parser = new Parser ();
110- < lang> Parser .setLanguage (< Lang> );
111- } catch (e) {
112- warn (` <Lang> parser failed to initialize: ${ e .message } . <Lang> files will be skipped.` );
113- }
114- ```
115-
116- Return it from the object: ` return { ..., <lang>Parser }; `
117+ This generates ` grammars/tree-sitter-<lang>.wasm ` (gitignored — built from
118+ devDeps on ` npm install ` ).
117119
118- #### 4b. Route extensions in ` getParser() `
120+ ### 3. ` src/parser.js ` — add extractor and registry entry
119121
120- ``` js
121- if ((filePath .endsWith (' .<ext>' )) && parsers.< lang> Parser)
122- return parsers.< lang> Parser;
123- ```
122+ This is the only source file where you need to make changes on the JS side.
123+ Two things to do:
124124
125- > Place this * before * the ` return null; ` at the end of ` getParser() ` .
125+ #### 3a. Create ` extract<Lang>Symbols(tree, filePath) `
126126
127- #### 4c. Create ` extract<Lang>Symbols(tree, filePath) `
128-
129- This is where the real work happens. Write a recursive AST walker that matches
130- tree-sitter node types for your language. Copy the pattern from an existing
131- extractor like ` extractGoSymbols ` or ` extractRustSymbols ` :
127+ Write a recursive AST walker that matches tree-sitter node types for your
128+ language. Copy the pattern from an existing extractor like ` extractGoSymbols ` or
129+ ` extractRustSymbols ` :
132130
133131``` js
134132/**
@@ -197,53 +195,53 @@ export function extract<Lang>Symbols(tree, filePath) {
197195to explore AST node types for your language. Paste sample code and inspect the
198196tree to find the right ` node.type ` strings.
199197
200- #### 4d . Add WASM dispatch in ` wasmExtractSymbols() ` (feat/rust-core only)
198+ #### 3b . Add an entry to ` LANGUAGE_REGISTRY `
201199
202- On the ` feat/rust-core ` branch, ` parser.js ` has a unified ` wasmExtractSymbols `
203- helper. Add your language before the final ` return extractSymbols(...) ` :
200+ Add your language to the ` LANGUAGE_REGISTRY ` array in ` src/parser.js ` :
204201
205202``` js
206- if (filePath .endsWith (' .<ext>' )) return extract< Lang> Symbols (tree, filePath);
203+ {
204+ id: ' <lang>' ,
205+ extensions: [' .<ext>' ],
206+ grammarFile: ' tree-sitter-<lang>.wasm' ,
207+ extractor: extract< Lang> Symbols,
208+ required: false ,
209+ },
207210```
208211
209- ### 5. ` src/builder.js ` — route parsing (main branch only)
210-
211- On ` main ` , the builder dispatches manually. On ` feat/rust-core ` this is
212- replaced by ` parseFilesAuto ` , so ** skip this step on the rust branch** .
212+ Set ` required: false ` so codegraph still works when the WASM grammar isn't
213+ available (e.g. in CI without ` npm install ` ). Only JS/TS/TSX are ` required: true ` .
213214
214- ** main branch** — add your language to the import and ternary chain:
215+ That's it for the WASM engine. The registry automatically:
216+ - Adds ` .<ext> ` to ` SUPPORTED_EXTENSIONS ` (and ` EXTENSIONS ` in ` constants.js ` )
217+ - Registers the parser in ` createParsers() `
218+ - Routes ` getParser() ` calls via the extension map
219+ - Dispatches to your extractor in ` wasmExtractSymbols() `
220+ - Handles ` builder.js ` routing via ` parseFilesAuto() `
215221
216- ``` js
217- // Import
218- import { ..., extract <Lang >Symbols } from ' ./parser.js' ;
222+ ** You do not need to edit ` constants.js ` or ` builder.js ` .**
219223
220- // In the parsing loop, add before `extractSymbols(tree, filePath)`
221- const is <Lang > = filePath .endsWith (' .<ext>' );
222- // ... add to the ternary chain:
223- : is< Lang> ? extract< Lang> Symbols (tree, filePath)
224- ` ` `
225-
226- ### 6. ` src/ parser .js ` — update ` normalizeNativeSymbols` (feat/rust-core only)
224+ ### 4. ` src/parser.js ` — update ` normalizeNativeSymbols ` (if needed)
227225
228226If your language's imports use a language-specific flag (e.g. ` c_include ` ), add
229- the camelCase mapping:
227+ the camelCase mapping in ` normalizeNativeSymbols ` :
230228
231229``` js
232230< lang> Import: i.< lang> Import ?? i.< lang> _import,
233231` ` `
234232
235233---
236234
237- ## Native Engine (feat/rust-core branch )
235+ ## Native Engine (Rust )
238236
239- ### 7 . ` crates/ codegraph- core/ Cargo .toml ` — add the Rust tree-sitter crate
237+ ### 5 . ` crates/ codegraph- core/ Cargo .toml ` — add the Rust tree-sitter crate
240238
241239` ` ` toml
242240[dependencies]
243241tree- sitter- < lang> = " 0.x"
244242` ` `
245243
246- ### 8 . ` crates/ codegraph- core/ src/ parser_registry .rs ` — register the language
244+ ### 6 . ` crates/ codegraph- core/ src/ parser_registry .rs ` — register the language
247245
248246Three changes in this file:
249247
@@ -274,7 +272,7 @@ impl LanguageKind {
274272}
275273` ` `
276274
277- ### 9 . ` crates/ codegraph- core/ src/ extractors/ < lang> .rs ` — implement the Rust extractor
275+ ### 7 . ` crates/ codegraph- core/ src/ extractors/ < lang> .rs ` — implement the Rust extractor
278276
279277Create a new file following the pattern in ` go .rs ` or ` rust_lang .rs ` :
280278
@@ -332,7 +330,7 @@ fn walk_node(node: &Node, source: &[u8], symbols: &mut FileSymbols) {
332330| ` named_child_text(&node, "field", source) ` | Shorthand for field text |
333331| ` start_line(&node) ` / ` end_line(&node) ` | 1-based line numbers |
334332
335- ### 10 . ` crates/codegraph-core/src/extractors/mod.rs ` — wire it up
333+ ### 8 . ` crates/codegraph-core/src/extractors/mod.rs ` — wire it up
336334
337335``` rust
338336// 1. Declare module
@@ -347,7 +345,7 @@ pub fn extract_symbols(...) -> FileSymbols {
347345}
348346```
349347
350- ### 11 . ` crates/codegraph-core/src/types.rs ` — add language flag (if needed)
348+ ### 9 . ` crates/codegraph-core/src/types.rs ` — add language flag (if needed)
351349
352350If your imports need a language-specific flag, add it to the ` Import ` struct:
353351
@@ -361,7 +359,7 @@ And update `Import::new()` to default it to `None`.
361359
362360## Tests
363361
364- ### 12 . ` tests/parsers/<lang>.test.js ` — WASM parser tests
362+ ### 10 . ` tests/parsers/<lang>.test.js ` — WASM parser tests
365363
366364Follow the pattern from ` tests/parsers/go.test.js ` :
367365
@@ -377,7 +375,7 @@ describe('<Lang> parser', () => {
377375 });
378376
379377 function parse<Lang>(code ) {
380- const parser = parsers.< lang> Parser ;
378+ const parser = parsers .get ( ' <lang>' ) ;
381379 if (! parser) throw new Error (' <Lang> parser not available' );
382380 const tree = parser .parse (code);
383381 return extract< Lang> Symbols (tree, ' test.<ext>' );
@@ -394,6 +392,9 @@ describe('<Lang> parser', () => {
394392});
395393```
396394
395+ > ** Note:** ` parsers ` is a ` Map ` — use ` parsers.get('<lang>') ` , not
396+ > ` parsers.<lang>Parser ` .
397+
397398** Recommended test cases:**
398399- Function definitions (regular, with parameters)
399400- Class/struct/enum definitions
@@ -403,7 +404,7 @@ describe('<Lang> parser', () => {
403404- Type definitions / aliases
404405- Forward declarations (if applicable)
405406
406- ### 13 . Parity tests (feat/rust-core only)
407+ ### 11 . Parity tests — native vs WASM
407408
408409Add test snippets to ` tests/engines/parity.test.js ` to verify the native and
409410WASM extractors produce identical output for your language.
@@ -422,27 +423,34 @@ npx vitest run tests/parsers/<lang>.test.js
422423# 3. Run the full test suite
423424npm test
424425
425- # 4. (feat/rust-core) Build native and test parity
426+ # 4. Build native and test parity
426427cd crates/codegraph-core && cargo build
427428npx vitest run tests/engines/parity.test.js
429+
430+ # 5. Test on a real project
431+ codegraph build /path/to/a/< lang> /project
432+ codegraph map
433+ codegraph fn someFunction
428434```
429435
430436---
431437
432438## File Checklist Summary
433439
434- | # | File | Branch | Action |
440+ | # | File | Engine | Action |
435441| ---| ------| --------| --------|
436- | 1 | ` package.json ` | both | Add ` tree-sitter-<lang> ` devDependency |
437- | 2 | ` scripts/build-wasm.js ` | both | Add grammar entry |
438- | 3 | ` grammars/tree-sitter-<lang>.wasm ` | both | Generated by ` npm run build:wasm ` |
439- | 4 | ` src/constants.js ` | both | Add file extensions |
440- | 5 | ` src/parser.js ` | both | Load grammar, route parser, add ` extract<Lang>Symbols() ` , add WASM dispatch |
441- | 6 | ` src/builder.js ` | main only | Import + ternary routing (not needed on rust branch) |
442- | 7 | ` tests/parsers/<lang>.test.js ` | both | WASM parser tests |
443- | 8 | ` crates/codegraph-core/Cargo.toml ` | rust | Add tree-sitter crate |
444- | 9 | ` crates/.../parser_registry.rs ` | rust | Register enum + extension + grammar |
445- | 10 | ` crates/.../extractors/<lang>.rs ` | rust | Implement ` SymbolExtractor ` trait |
446- | 11 | ` crates/.../extractors/mod.rs ` | rust | Declare module + dispatch arm |
447- | 12 | ` crates/.../types.rs ` | rust | Add language flag to ` Import ` (if needed) |
448- | 13 | ` tests/engines/parity.test.js ` | rust | Cross-engine validation snippets |
442+ | 1 | ` package.json ` | WASM | Add ` tree-sitter-<lang> ` devDependency |
443+ | 2 | ` scripts/build-wasm.js ` | WASM | Add grammar entry to array |
444+ | 3 | ` src/parser.js ` | WASM | Create ` extract<Lang>Symbols() ` + add ` LANGUAGE_REGISTRY ` entry |
445+ | 4 | ` src/parser.js ` | WASM | Update ` normalizeNativeSymbols ` (if language flag needed) |
446+ | 5 | ` crates/codegraph-core/Cargo.toml ` | Native | Add tree-sitter crate |
447+ | 6 | ` crates/.../parser_registry.rs ` | Native | Register enum + extension + grammar |
448+ | 7 | ` crates/.../extractors/<lang>.rs ` | Native | Implement ` SymbolExtractor ` trait |
449+ | 8 | ` crates/.../extractors/mod.rs ` | Native | Declare module + dispatch arm |
450+ | 9 | ` crates/.../types.rs ` | Native | Add language flag to ` Import ` (if needed) |
451+ | 10 | ` tests/parsers/<lang>.test.js ` | WASM | Parser extraction tests |
452+ | 11 | ` tests/engines/parity.test.js ` | Both | Cross-engine validation snippets |
453+
454+ ** Files you do NOT need to touch:**
455+ - ` src/constants.js ` — ` EXTENSIONS ` is derived from the registry automatically
456+ - ` src/builder.js ` — ` parseFilesAuto() ` uses the registry, no manual routing
0 commit comments