Skip to content

Commit 421c8bd

Browse files
committed
Enhance AI Scripts with Intent-Driven JSDoc Comments #7219
1 parent 0a667b3 commit 421c8bd

3 files changed

Lines changed: 58 additions & 3 deletions

File tree

buildScripts/ai/createKnowledgeBase.mjs

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,9 +27,21 @@ function createContentHash(chunk) {
2727
}
2828

2929
/**
30-
* This script processes and unifies the framework's existing knowledge sources into a format suitable for AI consumption.
31-
* It uses the pre-generated 'docs/output/all.json' for API and JSDoc information, and 'learn/tree.json'
32-
* to parse the conceptual learning guides.
30+
* This script is the first stage in the AI knowledge base pipeline: **Parse**.
31+
*
32+
* Its primary role is to act as a parser and compiler, reading from various source-of-truth files
33+
* (JSDoc JSON output, markdown learning guides) and converting them into a unified, structured format.
34+
*
35+
* Key characteristics:
36+
* - **Input:** Reads from `docs/output/all.json` for API data and `learn/tree.json` for the guide structure.
37+
* - **Processing:** It breaks down the content into logical "chunks" (e.g., a class, a method, a section of a guide).
38+
* - **Output:** It streams each chunk as a JSON object into the `dist/ai-knowledge-base.jsonl` file.
39+
* This JSONL (JSON Lines) format is crucial for ensuring that downstream processes can read the data
40+
* in a memory-efficient way.
41+
*
42+
* This script does NOT perform any scoring or data enrichment; its sole focus is on creating a clean,
43+
* structured representation of the source knowledge.
44+
*
3345
* @class CreateKnowledgeBase
3446
*/
3547
class CreateKnowledgeBase {

buildScripts/ai/embedKnowledgeBase.mjs

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,27 @@ import readline from 'readline';
77

88
dotenv.config();
99

10+
/**
11+
* This script is the second stage in the AI knowledge base pipeline: **Score & Embed**.
12+
*
13+
* It takes the structured `ai-knowledge-base.jsonl` file generated by the `create` script
14+
* and performs two critical functions:
15+
*
16+
* 1. **Scoring & Enrichment:** It loads the entire knowledge base into memory to perform holistic analysis.
17+
* Its most important task is to build a class inheritance map and pre-calculate the full
18+
* `inheritanceChain` for every chunk. This is a heavy, one-time operation that saves
19+
* significant processing time during the query phase. The enriched data (e.g., the inheritance chain)
20+
* is added to each chunk.
21+
*
22+
* 2. **Embedding & Storage:** It sends the content of each chunk to the Google Generative AI API
23+
* to get a vector embedding. It then "upserts" the chunk's content, its vector embedding, and all its
24+
* metadata (including the pre-calculated `inheritanceChain`) into the ChromaDB vector database.
25+
*
26+
* This script is intentionally memory-intensive, as it needs the full context to perform its analysis.
27+
* This is a trade-off to make the query phase as fast and lightweight as possible.
28+
*
29+
* @class EmbedKnowledgeBase
30+
*/
1031
class EmbedKnowledgeBase {
1132
static async run() {
1233
console.log('Starting knowledge base embedding...');

buildScripts/ai/queryKnowledgeBase.mjs

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,28 @@ import {hideBin} from 'yargs/helpers';
88

99
dotenv.config({quiet: true});
1010

11+
/**
12+
* This script is the final stage in the AI knowledge base pipeline: **Query**.
13+
*
14+
* Its purpose is to provide a fast and efficient way to search the knowledge base.
15+
* It takes a user's natural language query, converts it into a vector embedding, and uses that
16+
* to find the most relevant documents in the ChromaDB vector database.
17+
*
18+
* Key architectural features:
19+
* - **Lightweight & Fast:** This script is designed to be extremely performant. It does NOT read any
20+
* large JSON files from the filesystem. All necessary data is retrieved directly from the database.
21+
* - **Dynamic Scoring:** It applies a scoring algorithm to the results returned by the database.
22+
* This includes:
23+
* - A base score from the semantic similarity search.
24+
* - Dynamic boosts based on matching keywords from the query against the chunk's properties.
25+
* - An inheritance boost, which is calculated quickly by using the pre-computed `inheritanceChain`
26+
* stored in the metadata of each result.
27+
*
28+
* The design philosophy is to offload all heavy, static pre-processing to the `embed` phase,
29+
* allowing this `query` phase to be as quick and responsive as possible.
30+
*
31+
* @class QueryKnowledgeBase
32+
*/
1133
class QueryKnowledgeBase {
1234
static async run(query) {
1335
if (!query) {

0 commit comments

Comments
 (0)