Enhance AI Scripts with Intent-Driven JSDoc Comments #7219

tobiu · tobiu · commit 421c8bdf1259 · 2025-09-20T12:02:45.000+02:00
diff --git a/buildScripts/ai/createKnowledgeBase.mjs b/buildScripts/ai/createKnowledgeBase.mjs
@@ -27,9 +27,21 @@ function createContentHash(chunk) {
 }
 
 /**
- * This script processes and unifies the framework's existing knowledge sources into a format suitable for AI consumption.
- * It uses the pre-generated 'docs/output/all.json' for API and JSDoc information, and 'learn/tree.json'
- * to parse the conceptual learning guides.
+ * This script is the first stage in the AI knowledge base pipeline: **Parse**.
+ *
+ * Its primary role is to act as a parser and compiler, reading from various source-of-truth files
+ * (JSDoc JSON output, markdown learning guides) and converting them into a unified, structured format.
+ *
+ * Key characteristics:
+ * - **Input:** Reads from `docs/output/all.json` for API data and `learn/tree.json` for the guide structure.
+ * - **Processing:** It breaks down the content into logical "chunks" (e.g., a class, a method, a section of a guide).
+ * - **Output:** It streams each chunk as a JSON object into the `dist/ai-knowledge-base.jsonl` file.
+ *   This JSONL (JSON Lines) format is crucial for ensuring that downstream processes can read the data
+ *   in a memory-efficient way.
+ *
+ * This script does NOT perform any scoring or data enrichment; its sole focus is on creating a clean,
+ * structured representation of the source knowledge.
+ *
  * @class CreateKnowledgeBase
  */
 class CreateKnowledgeBase {
diff --git a/buildScripts/ai/embedKnowledgeBase.mjs b/buildScripts/ai/embedKnowledgeBase.mjs
@@ -7,6 +7,27 @@ import readline             from 'readline';
 
 dotenv.config();
 
+/**
+ * This script is the second stage in the AI knowledge base pipeline: **Score & Embed**.
+ *
+ * It takes the structured `ai-knowledge-base.jsonl` file generated by the `create` script
+ * and performs two critical functions:
+ *
+ * 1.  **Scoring & Enrichment:** It loads the entire knowledge base into memory to perform holistic analysis.
+ *     Its most important task is to build a class inheritance map and pre-calculate the full
+ *     `inheritanceChain` for every chunk. This is a heavy, one-time operation that saves
+ *     significant processing time during the query phase. The enriched data (e.g., the inheritance chain)
+ *     is added to each chunk.
+ *
+ * 2.  **Embedding & Storage:** It sends the content of each chunk to the Google Generative AI API
+ *     to get a vector embedding. It then "upserts" the chunk's content, its vector embedding, and all its
+ *     metadata (including the pre-calculated `inheritanceChain`) into the ChromaDB vector database.
+ *
+ * This script is intentionally memory-intensive, as it needs the full context to perform its analysis.
+ * This is a trade-off to make the query phase as fast and lightweight as possible.
+ *
+ * @class EmbedKnowledgeBase
+ */
 class EmbedKnowledgeBase {
     static async run() {
         console.log('Starting knowledge base embedding...');
diff --git a/buildScripts/ai/queryKnowledgeBase.mjs b/buildScripts/ai/queryKnowledgeBase.mjs
@@ -8,6 +8,28 @@ import {hideBin}            from 'yargs/helpers';
 
 dotenv.config({quiet: true});
 
+/**
+ * This script is the final stage in the AI knowledge base pipeline: **Query**.
+ *
+ * Its purpose is to provide a fast and efficient way to search the knowledge base.
+ * It takes a user's natural language query, converts it into a vector embedding, and uses that
+ * to find the most relevant documents in the ChromaDB vector database.
+ *
+ * Key architectural features:
+ * - **Lightweight & Fast:** This script is designed to be extremely performant. It does NOT read any
+ *   large JSON files from the filesystem. All necessary data is retrieved directly from the database.
+ * - **Dynamic Scoring:** It applies a scoring algorithm to the results returned by the database.
+ *   This includes:
+ *     - A base score from the semantic similarity search.
+ *     - Dynamic boosts based on matching keywords from the query against the chunk's properties.
+ *     - An inheritance boost, which is calculated quickly by using the pre-computed `inheritanceChain`
+ *       stored in the metadata of each result.
+ *
+ * The design philosophy is to offload all heavy, static pre-processing to the `embed` phase,
+ * allowing this `query` phase to be as quick and responsive as possible.
+ *
+ * @class QueryKnowledgeBase
+ */
 class QueryKnowledgeBase {
     static async run(query) {
         if (!query) {