- 
                Notifications
    You must be signed in to change notification settings 
- Fork 274
metadata-keyword-generator #1408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
         cpengilly
  
      
      
      commented
      
            cpengilly
  
      
      
      commented
        Feb 22, 2025 
      
    
  
- metadata keyword generator for page front matter
- has some automation functionality built in
- set to only run on active files in the PR to allow us to slowly update using smaller PRs
- provides suggestions in coderabbit
- details are in notes/metadata-update.md
- metadata keyword generator for page front matter - has some automation functionality built in - set to only run on active files in the PR to allow us to slowly update using smaller PRs - provides suggestions in coderabbit - details are in notes/metadata-update.md
| ✅ Deploy Preview for docs-optimism ready!
 To edit notification comments on pull requests, go to your Netlify site configuration. | 
| Warning Rate limit exceeded@cpengilly has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 13 minutes and 10 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the  We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
 📝 WalkthroughWalkthroughThis pull request introduces enhancements to the metadata management and linting processes for Markdown files. The CircleCI configuration now includes an improved  Sequence Diagram(s)sequenceDiagram
  participant CircleCI
  participant Lint_Job
  participant GitHub_API
  participant pnpm
  CircleCI->>Lint_Job: Trigger lint job (if CIRCLE_PULL_REQUEST set)
  Lint_Job->>Lint_Job: Extract PR number and check for changed files
  Lint_Job->>GitHub_API: Request list of changed `.mdx` files
  GitHub_API-->>Lint_Job: Return list of changed files
  Lint_Job->>Lint_Job: Export CHANGED_FILES variable
  Lint_Job->>pnpm: Run Markdown linting and execute `pnpm validate-pr-metadata`
sequenceDiagram
  participant User
  participant MetadataCLI
  participant FileSystem
  participant Analyzer
  participant Manager
  User->>MetadataCLI: Execute metadata batch CLI (with options)
  MetadataCLI->>FileSystem: Discover .mdx files matching pattern
  FileSystem-->>MetadataCLI: Return list of files
  MetadataCLI->>Analyzer: Analyze and update frontmatter metadata for each file
  Analyzer-->>MetadataCLI: Provide metadata analysis results
  MetadataCLI->>Manager: Validate and update metadata in files
  Manager-->>MetadataCLI: Return update summary and validation status
  MetadataCLI->>User: Output processing summary and errors (if any)
Possibly related PRs
 Suggested reviewers
 Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit: 
 Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
 Other keywords and placeholders
 Documentation and Community
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (18)
utils/metadata-manager.ts (2)
16-17: Consider using optional chaining for safer checks.
The condition(metadata.content && metadata.content.includes('<Cards>'))can be replaced bymetadata.content?.includes('<Cards>')to avoid potential runtime errors.🧰 Tools
🪛 Biome (1.9.4)
[error] 17-17: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
13-56: Add unit tests for coverage.
ThevalidateMetadatafunction is central to enforcing correct metadata schema. Consider adding dedicated tests to confirm that missing fields, invalid personas, or empty categories are appropriately flagged.Do you want me to generate a skeleton test suite for this function?
🧰 Tools
🪛 Biome (1.9.4)
[error] 17-17: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
utils/metadata-analyzer.ts (4)
128-504: Reduce complexity indetectCategories.
This function relies on numerousifblocks to handle diverse cases, making it tricky to maintain. Splitting this logic into smaller, focused helper functions or adopting a configuration-driven approach may improve readability and flexibility.
509-530: Consider dynamic weighting or a scoring config.
Hardcoded counts incountReferenceSignalscan be brittle. Abstracting the scoring logic into a data-driven model (e.g., external config) makes it easier to fine-tune detection without changing the code.
567-627: Clear logic for content type detection.
The checks for landing pages, references, and tutorials are concise and well-structured. However, there are repeated references to specialized paths likeovervieworindex. Consolidating these into a single helper method could streamline maintenance.
674-683: Offer a structured output format.
Currently,printSummaryprints plain text. If you plan to consume these summaries in other tools or scripts, consider a machine-readable output format such as JSON.utils/types/metadata-types.ts (2)
9-10: Remove or guard debug logging in production.
Excessive console output, especially logging entire config objects, can clutter logs. Consider wrapping this in a debug flag or removing it in production builds.
55-62: Streamline checks with optional chaining.
You can simplify these nested checks by consistently using optional chaining (e.g.,config?.metadata_rules?.persona?.validation_rules?.[0]?.enum). This reduces verbosity and risk of runtime errors.🧰 Tools
🪛 Biome (1.9.4)
[error] 56-58: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
utils/metadata-batch-cli.ts (2)
34-86: Remove unnecessary continue statements.The
continuestatements on lines 62 and 78 are unnecessary as they are the last statements in their respective blocks.Apply this diff to improve code clarity:
} catch (e) { - continue }🧰 Tools
🪛 Biome (1.9.4)
[error] 62-62: Unnecessary continue statement
Unsafe fix: Delete the unnecessary continue statement
(lint/correctness/noUnnecessaryContinue)
[error] 78-78: Unnecessary continue statement
Unsafe fix: Delete the unnecessary continue statement
(lint/correctness/noUnnecessaryContinue)
123-208: Consider breaking down the main function for better maintainability.The main function is well-implemented but could be more maintainable by extracting the summary generation and printing logic into separate functions.
Consider refactoring like this:
+async function generateSummary(file: string, dryRun: boolean, verbose: boolean): Promise<ProcessingSummary> { + const result = await updateFrontmatter(file, dryRun, verbose) + const relativePath = path.relative(process.cwd(), file) + + const hasUncertainCategories = + !result.categories.length || + (result.categories.length === 1 && result.categories[0] === 'protocol') + + return { + path: relativePath, + categories: result.categories, + uncertainCategories: hasUncertainCategories && !result.isImported, + contentType: result.contentType, + isImported: result.isImported + } +} + +function printSummaries(summaries: ProcessingSummary[], dryRun: boolean): void { + console.log('\nProcessing Summary:') + console.log('==================') + + summaries.forEach(summary => { + console.log(`\n📄 ${colors.blue}${summary.path}${colors.reset}`) + console.log(` Type: ${summary.contentType}`) + if (summary.isImported) { + console.log(' Status: Imported content') + } + console.log(` Categories: ${summary.categories.join(', ') || 'none'}`) + + if (summary.uncertainCategories) { + console.log(` ${colors.yellow}⚠️ Categories may need manual review${colors.reset}`) + } + }) + + const needsReview = summaries.filter(s => s.uncertainCategories).length + const importedCount = summaries.filter(s => s.isImported).length + + console.log('\n=================') + console.log('Final Summary:') + console.log(`${colors.green}✓ Processed ${summaries.length} files${colors.reset}`) + if (importedCount > 0) { + console.log(`ℹ️ ${importedCount} imported files`) + } + if (needsReview > 0) { + console.log(`${colors.yellow}⚠️ ${needsReview} files may need category review${colors.reset}`) + } + if (dryRun) { + console.log(`${colors.blue}(Dry run - no changes made)${colors.reset}`) + } +}.circleci/config.yml (1)
74-81: New Step: Get Changed FilesThe new step to extract changed files from a pull request is well implemented. It correctly checks for the presence of the
CIRCLE_PULL_REQUESTvariable, extracts the PR number, and filters for.mdxfiles.Suggestion: It would be beneficial to add a brief comment explaining the rationale or handling possible edge cases (e.g., if the API call fails) in the command block.
notes/metadata-update.md (3)
13-27: Using the Scripts Section: List IndentationThe “Using the Scripts” section is informative; however, there are some minor markdown lint issues—specifically, the unordered list items (e.g., on lines 16 and 20) appear to have inconsistent indentation.
Suggestion: Adjust the indentation to conform with markdownlint MD007 guidelines for cleaner formatting.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
20-20: Unordered list indentation
Expected: 0; Actual: 3(MD007, ul-indent)
24-24: Unordered list indentation
Expected: 0; Actual: 3(MD007, ul-indent)
40-90: Configuration Files Block FormattingThis section, which introduces the
keywords.config.yamlsample, is very detailed and useful.Suggestion: Please check for any trailing whitespace (as flagged by static analysis) and remove them to ensure consistency. A minor improvement would be to add a comment above the YAML code block to indicate that it is an example configuration.
181-195: Automated PR Checks – CodeRabbit Review SectionThe “Automated PR Checks” section is very thorough. In the “CodeRabbit Review” bullet, there is a slight repetition with verbs "Reviews" and "Checks."
Suggestion: Consider rephrasing to prevent redundant language, e.g., "Validates that required fields are present in frontmatter" or similar, to improve clarity.
🧰 Tools
🪛 LanguageTool
[grammar] ~191-~191: You’ve repeated a verb. Did you mean to only write one of them?
Context: ... validate-pr-metadata` 2. CodeRabbit Review * Reviews frontmatter in modified files * Chec...(REPEATED_VERBS)
keywords.config.yaml (4)
1-17: Metadata Configuration: Persona SectionThe persona section is clear and enforces that at least one valid persona is selected. The structure and descriptions are effective in communicating the rules.
Note: Be sure to remove any trailing whitespaces from these lines (as indicated by static analysis) to maintain a clean YAML file.
18-31: Content Type Section – Spelling and ClarityThe content type block correctly lists all valid content types. However, on line 25, the comment “concept exlainers” contains a typo; it should be “concept explainers.”
45-90: Categories Section Formatting and Trailing WhitespaceThis section is comprehensive with a long list of valid category values. The structure is clear; however, multiple lines in this block have trailing spaces (as pointed out by YAMLlint).
Suggestion: Remove the trailing whitespace from all affected lines to adhere to formatting standards. For example, you can update lines such as 59, 60, 63, 72, etc., by trimming the extra spaces.
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 59-59: trailing spaces
(trailing-spaces)
[error] 60-60: trailing spaces
(trailing-spaces)
[error] 63-63: trailing spaces
(trailing-spaces)
[error] 70-70: trailing spaces
(trailing-spaces)
[error] 72-72: trailing spaces
(trailing-spaces)
[error] 79-79: trailing spaces
(trailing-spaces)
180-237: Timeframe Section – Regex and File EndingThe timeframe section covers various versioning and date formats well. A couple of points to consider:
Season Format Regex:
The current pattern on line 203 is^S[6-9]|S[1-9][0-9]+$. This may not be fully anchored as intended. Wrapping the alternatives in parentheses (i.e.^(S[6-9]|S[1-9][0-9]+)$) would ensure that the entire string is validated correctly.
Newline at End-of-File:
The static analysis also flagged that there is no new line character at the end of the file. Please add a newline at the end.🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 181-181: trailing spaces
(trailing-spaces)
[error] 237-237: no new line character at the end of file
(new-line-at-end-of-file)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
- pnpm-lock.yamlis excluded by- !**/pnpm-lock.yaml
📒 Files selected for processing (11)
- .circleci/config.yml(1 hunks)
- .coderabbit.yaml(1 hunks)
- keywords.config.yaml(1 hunks)
- notes/README.md(1 hunks)
- notes/metadata-update.md(1 hunks)
- package.json(4 hunks)
- tsconfig.json(2 hunks)
- utils/metadata-analyzer.ts(1 hunks)
- utils/metadata-batch-cli.ts(1 hunks)
- utils/metadata-manager.ts(1 hunks)
- utils/types/metadata-types.ts(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- notes/README.md
🧰 Additional context used
🪛 YAMLlint (1.35.1)
keywords.config.yaml
[error] 59-59: trailing spaces
(trailing-spaces)
[error] 60-60: trailing spaces
(trailing-spaces)
[error] 63-63: trailing spaces
(trailing-spaces)
[error] 70-70: trailing spaces
(trailing-spaces)
[error] 72-72: trailing spaces
(trailing-spaces)
[error] 79-79: trailing spaces
(trailing-spaces)
[error] 92-92: trailing spaces
(trailing-spaces)
[error] 103-103: trailing spaces
(trailing-spaces)
[error] 111-111: trailing spaces
(trailing-spaces)
[error] 119-119: trailing spaces
(trailing-spaces)
[error] 135-135: trailing spaces
(trailing-spaces)
[error] 146-146: trailing spaces
(trailing-spaces)
[error] 159-159: trailing spaces
(trailing-spaces)
[error] 173-173: trailing spaces
(trailing-spaces)
[error] 181-181: trailing spaces
(trailing-spaces)
[error] 237-237: no new line character at the end of file
(new-line-at-end-of-file)
🪛 LanguageTool
notes/metadata-update.md
[uncategorized] ~175-~175: Loose punctuation mark.
Context: ...ion Files  * utils/metadata-manager.ts: Main metadata management system * `util...
(UNLIKELY_OPENING_PUNCTUATION)
[grammar] ~191-~191: You’ve repeated a verb. Did you mean to only write one of them?
Context: ... validate-pr-metadata`  2. CodeRabbit Review    * Reviews frontmatter in modified files    * Chec...
(REPEATED_VERBS)
🪛 markdownlint-cli2 (0.17.2)
notes/metadata-update.md
20-20: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
24-24: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
🪛 Biome (1.9.4)
utils/metadata-batch-cli.ts
[error] 62-62: Unnecessary continue statement
Unsafe fix: Delete the unnecessary continue statement
(lint/correctness/noUnnecessaryContinue)
[error] 78-78: Unnecessary continue statement
Unsafe fix: Delete the unnecessary continue statement
(lint/correctness/noUnnecessaryContinue)
utils/metadata-manager.ts
[error] 17-17: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
utils/types/metadata-types.ts
[error] 56-58: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Redirect rules - docs-optimism
- GitHub Check: Header rules - docs-optimism
- GitHub Check: Pages changed - docs-optimism
🔇 Additional comments (12)
utils/metadata-manager.ts (2)
130-159: Validate environment variable usage.
CHANGED_FILESmight be undefined or empty. Confirm that the pipeline sets this variable reliably, and consider handling undefined cases gracefully.
160-168: Great CLI integration!
The inclusion of a direct CLI option via--pris convenient for automated CI usage, and the fallback toprocess.exitensures a clear pass/fail outcome.utils/types/metadata-types.ts (1)
68-71: Consider dynamic reload of config.
These constants are loaded once at startup. Ifkeywords.config.yamlis updated, your service won't see changes unless it's restarted. Document this or implement a reloading mechanism if live updates are needed.utils/metadata-batch-cli.ts (3)
1-9: LGTM! Well-structured imports and script setup.The script correctly uses ES module imports and includes all necessary dependencies for file operations and metadata management.
10-32: LGTM! Well-defined interfaces and constants.The interfaces are properly typed, and the color constants improve code readability and maintainability.
88-121: LGTM! Well-implemented frontmatter update function.The function has proper error handling, supports dry run and verbose modes, and returns a well-defined type.
tsconfig.json (1)
9-9: LGTM! Proper TypeScript configuration for ES modules.The changes correctly enable ES module support and synthetic default imports, which are necessary for the new metadata management scripts.
Also applies to: 12-12, 19-19, 29-33
.coderabbit.yaml (1)
13-40: LGTM! Comprehensive metadata validation requirements.The instructions clearly define the required metadata fields for both regular and landing pages, and provide helpful guidance for fixing missing metadata.
package.json (1)
5-5: LGTM! Well-configured package for metadata management.The changes properly set up ES modules support and add necessary scripts and dependencies for metadata management.
Also applies to: 7-8, 17-21, 38-38, 54-54
.circleci/config.yml (2)
69-69: Clarify Lint Job DescriptionThe updated description now indicates that the job both lints Markdown files and validates metadata. This clear description helps future maintainers understand the expanded responsibilities of the job.
85-87: New Step: Validate MetadataThe additional step running
pnpm validate-pr-metadatais a clean integration point for ensuring metadata validity. This aligns perfectly with the PR’s objective of enhancing metadata management and linting.notes/metadata-update.md (1)
1-12: Comprehensive IntroductionThe new file starts with a clear title and a brief overview of what the metadata management system does. This introduction is concise and sets the stage well for the detailed instructions that follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
utils/metadata-analyzer.ts (1)
589-637: 🛠️ Refactor suggestionReplace global variables with a proper state management approach.
Using global variables (
global.totalFiles,global.filesNeedingCategoryReview, etc.) for tracking state is error-prone. Consider using a class or context object to manage state.+interface AnalysisState { + totalFiles: number; + filesNeedingCategoryReview: number; + filesNeedingContentTypeReview: number; +} + +class MetadataAnalyzer { + private state: AnalysisState = { + totalFiles: 0, + filesNeedingCategoryReview: 0, + filesNeedingContentTypeReview: 0 + }; + + analyzeContent(content: string, filepath: string, verbose: boolean = false): MetadataResult { // ... existing implementation ... + this.state.totalFiles++; + } + + printSummary(): void { console.log(` Final Summary: - ✓ Processed ${global.totalFiles} files - ⚠️ ${global.filesNeedingCategoryReview || 0} files may need category review - ⚠️ ${global.filesNeedingContentTypeReview || 0} files may need content type review + ✓ Processed ${this.state.totalFiles} files + ⚠️ ${this.state.filesNeedingCategoryReview} files may need category review + ⚠️ ${this.state.filesNeedingContentTypeReview} files may need content type review (Dry run - no changes made) `); } +}
🧹 Nitpick comments (10)
utils/metadata-manager.ts (2)
21-24: Consider caching and stronger typing for the configuration.You are reloading and parsing the config on every validation, which could affect performance. Also, return a stricter type from
loadConfiginstead ofPromise<any>to avoid potential runtime errors.Also applies to: 229-233
182-189: Leverage concurrency for better performance.You could handle file validations in parallel with
Promise.all, improving speed when many.mdxfiles are changed.- for (const file of modifiedFiles) { - const result = await updateMetadata(file, { validateOnly: true, prMode: true }) - ... - } + await Promise.all(modifiedFiles.map(async (file) => { + const result = await updateMetadata(file, { validateOnly: true, prMode: true }) + if (!result.isValid) { + ... + } + }));utils/types/metadata-types.ts (2)
5-8: Consolidate config loading.You load
keywords.config.yamlhere and also inmetadata-manager.ts. Refactor to maintain a single entry point for reading and parsing the YAML configuration.
10-18: Apply optional chaining per static analysis recommendation.Improve readability by using optional chains instead of multiple
&&checks. For instance:-function isValidConfig(config: any): config is MetadataConfig { - return ( - config && - config.metadata_rules && - config.metadata_rules.persona?.validation_rules?.[0]?.enum && - config.metadata_rules.content_type?.validation_rules?.[0]?.enum && - config.metadata_rules.categories?.values - ) +function isValidConfig(config: any): config is MetadataConfig { + return !!( + config?.metadata_rules?.persona?.validation_rules?.[0]?.enum && + config?.metadata_rules?.content_type?.validation_rules?.[0]?.enum && + config?.metadata_rules?.categories?.values + ) }🧰 Tools
🪛 Biome (1.9.4)
[error] 12-14: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
utils/breadcrumbs.ts (1)
45-61: Optimize file reading with concurrency.You can gather file reads in parallel with
Promise.allto reduce overall execution time, especially for directories with many files.- for (const file of files) { - ... - const content = await fs.promises.readFile(filePath, 'utf-8'); - ... - } + await Promise.all(files.map(async (file) => { + ... + const content = await fs.promises.readFile(filePath, 'utf-8'); + ... + }));utils/metadata-batch-cli.ts (3)
9-10: Consider using a type-safe approach for glob import.Instead of using
@ts-ignore, consider installing and using@types/globfor proper TypeScript support.-// @ts-ignore -const globModule = await import('glob') +import type { glob } from 'glob' +const globModule = await import('glob') as { default: typeof glob }
43-84: Reduce code duplication in findParentMetadata.The file reading and metadata extraction logic is duplicated. Consider extracting it into a helper function.
+async function readMetadataFromFile(filePath: string): Promise<ParentMetadata | null> { + try { + const content = await fs.readFile(filePath, 'utf8') + const { data } = matter(content) + return { + path: filePath, + categories: data.categories || [] + } + } catch (e) { + return null + } +} async function findParentMetadata(filePath: string): Promise<ParentMetadata | null> { try { const dir = path.dirname(filePath) const parentFiles = ['index.mdx', 'README.mdx'] // Try same directory first for (const file of parentFiles) { const sameDirPath = path.join(dir, file) - try { - const content = await fs.readFile(sameDirPath, 'utf8') - const { data } = matter(content) - return { - path: sameDirPath, - categories: data.categories || [] - } - } catch (e) { + const metadata = await readMetadataFromFile(sameDirPath) + if (metadata) { + return metadata + } else { continue } } // Try parent directory const parentDir = path.dirname(dir) for (const file of parentFiles) { const parentPath = path.join(parentDir, file) - try { - const content = await fs.readFile(parentPath, 'utf8') - const { data } = matter(content) - return { - path: parentPath, - categories: data.categories || [] - } - } catch (e) { + const metadata = await readMetadataFromFile(parentPath) + if (metadata) { + return metadata + } else { continue } } return null } catch (e) { return null } }🧰 Tools
🪛 Biome (1.9.4)
[error] 60-60: Unnecessary continue statement
Unsafe fix: Delete the unnecessary continue statement
(lint/correctness/noUnnecessaryContinue)
[error] 76-76: Unnecessary continue statement
Unsafe fix: Delete the unnecessary continue statement
(lint/correctness/noUnnecessaryContinue)
121-149: Consider adding batch processing for better performance.For large sets of files, processing them sequentially might be slow. Consider implementing concurrent processing with a batch size limit.
async function processFiles(files: string[]): Promise<boolean> { let hasErrors = false let processedCount = 0 + const BATCH_SIZE = 5 - for (const file of files) { + // Process files in batches + for (let i = 0; i < files.length; i += BATCH_SIZE) { + const batch = files.slice(i, i + BATCH_SIZE) + const results = await Promise.all(batch.map(async (file) => { try { const result = await updateMetadata(file, { dryRun: true, prMode: true }) if (!result.isValid) { hasErrors = true console.log(`\n${colors.red}Error in ${file}:${colors.reset}`) result.errors.forEach(error => { console.log(` ${colors.yellow}→${colors.reset} ${error}`) }) } - processedCount++ + return { success: true } } catch (e) { console.log(`\n${colors.red}Failed to process ${file}: ${e}${colors.reset}`) - hasErrors = true + return { success: false } } - } + })) + + processedCount += results.filter(r => r.success).length + hasErrors = hasErrors || results.some(r => !r.success) + }utils/metadata-analyzer.ts (1)
498-551: Optimize regular expressions for better performance.Some regex patterns could be compiled once and reused. Also, consider using more efficient patterns.
+// Compile regex patterns once +const REFERENCE_PATTERNS = { + tables: /\|.*\|/g, + codeBlocks: /```[^`]+```/g, + specialSyntax: /\{.*?\}/g, + technicalTerms: /\b(API|RPC|SDK|JSON|HTTP|URL|ETH)\b/g, + parameters: /^Parameters:/gm, + returns: /^Returns:/gm, + type: /^Type:/gm, + endpoint: /^Endpoint:/gm, + endpoints: /\bend?points?\b/gi +}; function countReferenceSignals(content: string, filepath: string): number { let score = 0; // Technical content indicators - score += (content.match(/\|.*\|/g) || []).length * 2; // Tables + score += (content.match(REFERENCE_PATTERNS.tables) || []).length * 2; // Tablespackage.json (1)
7-19: Enhance script robustness with error handling.The scripts using
git diffand environment variables could fail silently. Consider adding error handling and validation.- "validate-metadata": "CHANGED_FILES=$(git diff --name-only HEAD) node --loader ts-node/esm utils/metadata-manager.ts", + "validate-metadata": "CHANGED_FILES=$(git diff --name-only HEAD || echo '') node --loader ts-node/esm utils/metadata-manager.ts",Also, consider adding a
preandpostscript for the metadata batch operations to ensure proper setup and cleanup.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
- dist/tsconfig.tsbuildinfois excluded by- !**/dist/**
📒 Files selected for processing (6)
- package.json(3 hunks)
- utils/breadcrumbs.ts(6 hunks)
- utils/metadata-analyzer.ts(1 hunks)
- utils/metadata-batch-cli.ts(1 hunks)
- utils/metadata-manager.ts(1 hunks)
- utils/types/metadata-types.ts(1 hunks)
🧰 Additional context used
🪛 Biome (1.9.4)
utils/types/metadata-types.ts
[error] 12-14: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
utils/metadata-batch-cli.ts
[error] 60-60: Unnecessary continue statement
Unsafe fix: Delete the unnecessary continue statement
(lint/correctness/noUnnecessaryContinue)
[error] 76-76: Unnecessary continue statement
Unsafe fix: Delete the unnecessary continue statement
(lint/correctness/noUnnecessaryContinue)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Redirect rules - docs-optimism
- GitHub Check: Header rules - docs-optimism
- GitHub Check: Pages changed - docs-optimism
🔇 Additional comments (6)
utils/metadata-manager.ts (2)
134-148: Great job implementing atomic writes!Using a temporary file and then renaming it is a proper way to avoid partial writes if an error occurs. This effectively addresses the previous suggestion to ensure file consistency.
169-176:❓ Verification inconclusive
Validate that resolved paths remain within the repository.
Untrusted or malformed
CHANGED_FILEScould resolve to paths outside the repo. Consider restricting paths to a known safe location.To automate, you can run a shell script confirming that all resolved paths share the project root:
🏁 Script executed:
#!/bin/bash # Ensure all resolved .mdx files are within the repository directory REPO_ROOT="$(pwd)" while read -r FILE; do FULL_PATH="$(realpath "$FILE")" if [[ "$FULL_PATH" != "$REPO_ROOT"* ]]; then echo "Potentially unsafe path: $FULL_PATH" exit 1 fi done < <(echo "$CHANGED_FILES" | tr '\n' ' ' | xargs -n1 echo | grep '\.mdx$')Length of output: 208
Attention: Ensure Resolved Paths Remain Within the Repository
The function in
utils/metadata-manager.ts(lines 169–176) processes theCHANGED_FILESvariable by resolving each.mdxfile’s absolute path. However, our validation shell script encountered an error related to process substitution (/dev/fd/63: No such file or directory), which prevents clear confirmation that all resolved paths are safely constrained within the repository root.
- Verify manually that the environment running this script supports process substitution.
- Ensure that the resolved paths from
CHANGED_FILESdo not escape the repository (i.e., they all begin with the repository’s root path).- Consider revising the safety check to use a more robust method or environment-independent approach if needed.
utils/breadcrumbs.ts (1)
7-10: ESM setup looks good.Nicely done converting to ES modules using
fileURLToPathandpath.dirnameto replicate__filenameand__dirname.utils/metadata-batch-cli.ts (1)
86-119: Well-structured implementation with proper error handling and logging!The function effectively handles different modes (dry run, verbose) and provides clear error messages.
utils/metadata-analyzer.ts (1)
36-75: Avoid using a global Set for state management.Relying on a module-scoped
detectedPagescan lead to unintended side effects if the script is reused or extended. Consider making this state local or passing it through function parameters to keep the logic deterministic.package.json (1)
36-36: Dependencies are properly specified with pinned versions.The addition of
js-yamland its type definitions is well-structured and follows best practices.Also applies to: 52-52
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
utils/metadata-analyzer.ts (1)
501-510: 🛠️ Refactor suggestionAvoid using global variables for state management.
The use of global variables (
global.totalFiles,global.filesNeedingCategoryReview,global.filesNeedingContentTypeReview) makes the code harder to test and can lead to unexpected behavior when the module is reused.Consider passing these statistics as parameters or encapsulating them in a class:
-export function printSummary(): void { +export function printSummary(stats: { + totalFiles: number, + filesNeedingCategoryReview: number, + filesNeedingContentTypeReview: number +}): void { console.log(` Final Summary: -✓ Processed ${global.totalFiles} files -⚠️ ${global.filesNeedingCategoryReview || 0} files may need category review -⚠️ ${global.filesNeedingContentTypeReview || 0} files may need content type review +✓ Processed ${stats.totalFiles} files +⚠️ ${stats.filesNeedingCategoryReview} files may need category review +⚠️ ${stats.filesNeedingContentTypeReview} files may need content type review (Dry run - no changes made) `); }
🧹 Nitpick comments (9)
utils/metadata-batch-cli.ts (5)
9-10: Replace @ts-ignore with proper typing.Instead of suppressing TypeScript errors, consider adding proper type definitions for the glob module.
-// @ts-ignore -const globModule = await import('glob') +import type { glob } from 'glob' +const globModule = await import('glob') as { glob: typeof glob }
43-84: Optimize parent metadata lookup.The function can be improved in two ways:
- Remove unnecessary continue statements in catch blocks
- Use Promise.any for concurrent file checks
async function findParentMetadata(filePath: string): Promise<ParentMetadata | null> { try { const dir = path.dirname(filePath) const parentFiles = ['index.mdx', 'README.mdx'] - // Try same directory first - for (const file of parentFiles) { - const sameDirPath = path.join(dir, file) - try { - const content = await fs.readFile(sameDirPath, 'utf8') - const { data } = matter(content) - return { - path: sameDirPath, - categories: data.categories || [] - } - } catch (e) { - continue - } - } + // Try files in same directory concurrently + const sameDirPromises = parentFiles.map(async file => { + const sameDirPath = path.join(dir, file) + const content = await fs.readFile(sameDirPath, 'utf8') + const { data } = matter(content) + return { + path: sameDirPath, + categories: data.categories || [] + } + }) + + try { + return await Promise.any(sameDirPromises) + } catch { + // Try parent directory if same directory fails + const parentDir = path.dirname(dir) + const parentPromises = parentFiles.map(async file => { + const parentPath = path.join(parentDir, file) + const content = await fs.readFile(parentPath, 'utf8') + const { data } = matter(content) + return { + path: parentPath, + categories: data.categories || [] + } + }) - // Try parent directory - const parentDir = path.dirname(dir) - for (const file of parentFiles) { - const parentPath = path.join(parentDir, file) try { - const content = await fs.readFile(parentPath, 'utf8') - const { data } = matter(content) - return { - path: parentPath, - categories: data.categories || [] - } - } catch (e) { - continue + return await Promise.any(parentPromises) + } catch { + return null } } - - return null } catch (e) { return null } }🧰 Tools
🪛 Biome (1.9.4)
[error] 60-60: Unnecessary continue statement
Unsafe fix: Delete the unnecessary continue statement
(lint/correctness/noUnnecessaryContinue)
[error] 76-76: Unnecessary continue statement
Unsafe fix: Delete the unnecessary continue statement
(lint/correctness/noUnnecessaryContinue)
86-90: Improve type safety with a dedicated interface.Define an interface for the return type to improve type safety and documentation.
+interface FrontmatterUpdateResult { + categories: string[] + contentType: string + isImported: boolean +} + -async function updateFrontmatter(filePath: string, dryRun: boolean = false, verbose: boolean = false): Promise<{ - categories: string[] - contentType: string - isImported: boolean -}> { +async function updateFrontmatter(filePath: string, dryRun: boolean = false, verbose: boolean = false): Promise<FrontmatterUpdateResult> {
121-147: Enhance error handling with custom error types.Consider creating custom error types for better error categorization and handling.
+enum FileValidationErrorType { + NOT_FOUND = 'NOT_FOUND', + NOT_FILE = 'NOT_FILE', + NO_ACCESS = 'NO_ACCESS' +} + +interface FileValidationError { + type: FileValidationErrorType + path: string + message: string +} async function validateFilePaths(files: string[]): Promise<string[]> { const validFiles = [] - const errors = [] + const errors: FileValidationError[] = [] for (const file of files) { try { await fs.access(file, fs.constants.R_OK) const stats = await fs.stat(file) if (stats.isFile()) { validFiles.push(file) } else { - errors.push(`${file} is not a file`) + errors.push({ + type: FileValidationErrorType.NOT_FILE, + path: file, + message: 'Path exists but is not a file' + }) } } catch (error) { - errors.push(`Cannot access ${file}: ${error.message}`) + errors.push({ + type: error.code === 'ENOENT' ? FileValidationErrorType.NOT_FOUND : FileValidationErrorType.NO_ACCESS, + path: file, + message: error.message + }) } } if (errors.length > 0) { console.log(`${colors.yellow}Warning: Some files were skipped:${colors.reset}`) - errors.forEach(error => console.log(` ${colors.yellow}→${colors.reset} ${error}`)) + errors.forEach(error => console.log(` ${colors.yellow}→${colors.reset} ${error.path}: ${error.message}`)) } return validFiles }
149-177: Optimize file processing with parallel execution.Consider processing files concurrently for better performance, while maintaining organized error reporting.
async function processFiles(files: string[]): Promise<boolean> { let hasErrors = false - let processedCount = 0 + const results = await Promise.allSettled( + files.map(async file => { + try { + const result = await updateMetadata(file, { dryRun: true, prMode: true }) + return { file, result } + } catch (e) { + throw { file, error: e } + } + }) + ) - for (const file of files) { - try { - const result = await updateMetadata(file, { dryRun: true, prMode: true }) - if (!result.isValid) { - hasErrors = true - console.log(`\n${colors.red}Error in ${file}:${colors.reset}`) - result.errors.forEach(error => { - console.log(` ${colors.yellow}→${colors.reset} ${error}`) - }) - } - processedCount++ - } catch (e) { - console.log(`\n${colors.red}Failed to process ${file}: ${e}${colors.reset}`) - hasErrors = true - } - } + const processedCount = results.length + const errors = results.reduce((acc, result) => { + if (result.status === 'rejected') { + hasErrors = true + console.log(`\n${colors.red}Failed to process ${result.reason.file}: ${result.reason.error}${colors.reset}`) + return acc + 1 + } else if (!result.value.result.isValid) { + hasErrors = true + console.log(`\n${colors.red}Error in ${result.value.file}:${colors.reset}`) + result.value.result.errors.forEach(error => { + console.log(` ${colors.yellow}→${colors.reset} ${error}`) + }) + return acc + 1 + } + return acc + }, 0) console.log( hasErrors - ? `\n${colors.red}✖ Found metadata issues in some files${colors.reset}` + ? `\n${colors.red}✖ Found metadata issues in ${errors} of ${processedCount} files${colors.reset}` : `\n${colors.green}✓ Validated ${processedCount} files successfully${colors.reset}` ) return hasErrors }package.json (1)
7-20: Add script documentation in package.json.Consider adding a "scripts-info" section to document the purpose and usage of each script, especially the new metadata-related scripts.
"scripts": { "lint": "eslint . --ext mdx --max-warnings 0 && pnpm spellcheck:lint && pnpm check-breadcrumbs && pnpm check-redirects && pnpm validate-metadata", "fix": "eslint . --ext mdx --fix && pnpm spellcheck:fix && pnpm breadcrumbs && pnpm fix-redirects && pnpm metadata-batch-cli", ... }, + "scripts-info": { + "metadata-batch-cli": "Process and update metadata in MDX files", + "metadata-batch-cli:dry": "Preview metadata changes without applying them", + "metadata-batch-cli:verbose": "Process metadata with detailed logging", + "validate-metadata": "Validate metadata in changed files", + "validate-pr-metadata": "Validate metadata in PR files" + },utils/metadata-analyzer.ts (3)
36-75: Eliminate side effects in isLandingPage function.The function modifies the passed
detectedPagesSet, which is a side effect that makes the function harder to test and reason about.Consider returning both the boolean result and the updated Set:
-function isLandingPage(content: string, filepath: string, detectedPages: Set<string>): boolean { +function isLandingPage(content: string, filepath: string, detectedPages: Set<string>): + { isLanding: boolean; updatedPages: Set<string> } { // ... existing checks ... const isLanding = (isOperatorLanding || (isOverviewPage && hasMultipleCards)) && !isTooDeep && !notLandingPage; + const updatedPages = new Set(detectedPages); if (isLanding) { - detectedPages.add(filepath); + updatedPages.add(filepath); console.log(`Landing page detected: ${filepath}`); } - return isLanding; + return { isLanding, updatedPages }; }
331-364: Extract magic numbers and improve category prioritization.The category limit of 5 and the priority order array should be configuration constants.
Consider extracting these as configurable constants:
+const MAX_CATEGORIES = 5; +const CATEGORY_PRIORITY_ORDER = [ + 'protocol', + 'infrastructure', + // ... rest of the categories +] as const; // Limit to 5 most relevant categories -const priorityOrder = [ - 'protocol', - 'infrastructure', - // ... rest of the categories -]; const sortedCategories = Array.from(categories) - .sort((a, b) => priorityOrder.indexOf(a) - priorityOrder.indexOf(b)) - .slice(0, 5); + .sort((a, b) => CATEGORY_PRIORITY_ORDER.indexOf(a) - CATEGORY_PRIORITY_ORDER.indexOf(b)) + .slice(0, MAX_CATEGORIES);
371-424: Improve type safety and configurability of content scoring.The scoring weights are hardcoded and scattered throughout the functions. Consider making them configurable and type-safe.
Consider using an enum or const object for scoring weights:
+const CONTENT_SCORES = { + TABLE_WEIGHT: 2, + CODE_BLOCK_WEIGHT: 1, + SPECIAL_SYNTAX_WEIGHT: 1, + TECHNICAL_TERM_WEIGHT: 1, + PARAMETER_WEIGHT: 3, + RETURN_WEIGHT: 3, + TYPE_WEIGHT: 2, + ENDPOINT_WEIGHT: 2, + API_REFERENCE_WEIGHT: 5 +} as const; function countReferenceSignals(content: string, filepath: string): number { let score = 0; - score += (content.match(/\|.*\|/g) || []).length * 2; // Tables - score += (content.match(/```[^`]+```/g) || []).length; // Code blocks + score += (content.match(/\|.*\|/g) || []).length * CONTENT_SCORES.TABLE_WEIGHT; + score += (content.match(/```[^`]+```/g) || []).length * CONTENT_SCORES.CODE_BLOCK_WEIGHT; // ... rest of the scoring logic }
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
- package.json(3 hunks)
- utils/metadata-analyzer.ts(1 hunks)
- utils/metadata-batch-cli.ts(1 hunks)
🧰 Additional context used
🪛 Biome (1.9.4)
utils/metadata-batch-cli.ts
[error] 60-60: Unnecessary continue statement
Unsafe fix: Delete the unnecessary continue statement
(lint/correctness/noUnnecessaryContinue)
[error] 76-76: Unnecessary continue statement
Unsafe fix: Delete the unnecessary continue statement
(lint/correctness/noUnnecessaryContinue)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Redirect rules - docs-optimism
- GitHub Check: Header rules - docs-optimism
- GitHub Check: Pages changed - docs-optimism
🔇 Additional comments (1)
package.json (1)
37-37: LGTM!The js-yaml dependency and its type definitions are correctly added.
Also applies to: 53-53
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
utils/metadata-analyzer.ts (1)
82-121: 🛠️ Refactor suggestionAvoid using console.log in utility functions.
The
isLandingPagefunction directly usesconsole.log, which makes it harder to test and control logging behavior. Consider using the provided logger from the config.- console.log(`Landing page detected: ${filepath}`); + config.logger(`Landing page detected: ${filepath}`);Additionally, consider passing the
detectedPagesSet through the config to maintain better state management.
🧹 Nitpick comments (3)
utils/metadata-analyzer.ts (3)
1-49: Consider using environment variables for configuration.The
DEFAULT_CONFIGcontains hardcoded values that might need to change based on different environments or requirements. Consider making these configurable through environment variables.const DEFAULT_CONFIG: AnalyzerConfig = { - defaultLang: 'en-US', - defaultTitle: '', - defaultDescription: '', - maxCategories: 5, + defaultLang: process.env.DEFAULT_LANG || 'en-US', + defaultTitle: process.env.DEFAULT_TITLE || '', + defaultDescription: process.env.DEFAULT_DESCRIPTION || '', + maxCategories: parseInt(process.env.MAX_CATEGORIES || '5', 10), logger: console.log, priorityOrder: [ // ... existing priorities ... ] };
393-414: Add threshold constants for better maintainability.The
countReferenceSignalsfunction uses magic numbers for scoring. Consider extracting these into named constants for better maintainability.+const REFERENCE_SCORE = { + TABLE_WEIGHT: 2, + CODE_BLOCK_WEIGHT: 1, + SPECIAL_SYNTAX_WEIGHT: 1, + TECHNICAL_TERM_WEIGHT: 1, + PARAMETER_WEIGHT: 3, + RETURN_WEIGHT: 3, + TYPE_WEIGHT: 2, + ENDPOINT_WEIGHT: 2, + API_REFERENCE_WEIGHT: 5 +}; function countReferenceSignals(content: string, filepath: string): number { let score = 0; - score += (content.match(/\|.*\|/g) || []).length * 2; // Tables + score += (content.match(/\|.*\|/g) || []).length * REFERENCE_SCORE.TABLE_WEIGHT; // ... apply similar changes for other scoring rules
539-550: Consider using a test configuration file.The
testingexport exposes internal functions for testing. Consider moving test-specific configurations and exports to a separate test configuration file.Create a new file
utils/__tests__/metadata-analyzer.test-config.ts:import { DEFAULT_CONFIG, detectStackCategories, detectOperatorCategories, detectAppDeveloperCategories, detectCommonCategories, detectCategories, detectContentType, isLandingPage, getLandingPageCategories } from '../metadata-analyzer'; export const testConfig = { DEFAULT_CONFIG, detectStackCategories, detectOperatorCategories, detectAppDeveloperCategories, detectCommonCategories, detectCategories, detectContentType, isLandingPage, getLandingPageCategories };
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Can't validate the typescript, but since an eng has approved the rest of it looks great to me!