Skip to content

Conversation

@mishig25
Copy link
Contributor

@mishig25 mishig25 commented Oct 28, 2025

Because those fields already exist inside individual posts. Example from ai-action-wh-2025.md:

---
title: "AI Policy @🤗: Response to the White House AI Action Plan RFI"
thumbnail: /blog/assets/151_policy_ntia_rfc/us_policy_thumbnail.png
authors:
- user: yjernite
- user: evijit
- user: irenesolaiman
---

# AI Policy @🤗: Response to the White House AI Action Plan RFI
python script that was used
#!/usr/bin/env python3
"""
Script to remove 'title', 'author', and 'thumbnail' fields from _blog.yml entries
while preserving formatting and blank lines.
"""

import re


def remove_fields_from_blog_yml(file_path):
    """
    Remove 'title', 'author', and 'thumbnail' fields from all entries in the blog YAML file.
    Preserves blank lines and formatting.
    
    Args:
        file_path: Path to the blog YAML file
    """
    # Fields to remove
    fields_to_remove = ["title", "author", "thumbnail"]
    
    # Read the file
    print(f"Reading {file_path}...")
    with open(file_path, "r", encoding="utf-8") as f:
        lines = f.readlines()
    
    # Process lines
    print("Processing lines...")
    new_lines = []
    removed_counts = {field: 0 for field in fields_to_remove}
    
    for line in lines:
        # Check if this line contains one of the fields to remove
        should_remove = False
        for field in fields_to_remove:
            # Match patterns like "  title:" or "  author:" or "  thumbnail:"
            if re.match(rf'^\s*{field}:\s*', line):
                should_remove = True
                removed_counts[field] += 1
                break
        
        # Keep the line if it's not one we want to remove
        if not should_remove:
            new_lines.append(line)
    
    # Write back to the file
    print(f"Writing changes back to {file_path}...")
    with open(file_path, "w", encoding="utf-8") as f:
        f.writelines(new_lines)
    
    # Print summary
    print("✓ Done!")
    print("\nSummary:")
    for field, count in removed_counts.items():
        print(f"  - Removed '{field}' from {count} entries")


if __name__ == "__main__":
    import sys
    
    files = sys.argv[1:] if len(sys.argv) > 1 else ["_blog.yml"]
    
    for file_path in files:
        remove_fields_from_blog_yml(file_path)
        print()

1. Removed fields from _blog.yml files

  • Removed title, author, and thumbnail from:
    • _blog.yml (648 entries)
    • zh/_blog.yml (218 entries)
    • fr/_blog.yml (3 entries)
  • Preserved blank lines between entries for readability

2. Updated validation script

  • Removed title, author, thumbnail from _blog.yml schema validation
  • Added frontmatter validation for all blog post .md files
  • Validates required fields: title, thumbnail (with extension check), authors array
  • Handles both Windows (\r\n) and Unix (\n) line endings
  • Skips special files like README.md

3. Fixed markdown files

  • Fixed 19 files in root directory:
    • Added missing thumbnails (9 files)
    • Added placeholder authors (6 files)
    • Fixed invalid thumbnail extensions (2 files)
    • Removed leading whitespace (3 files)
  • Fixed 2 files in zh/ directory:
    • Added missing authors and thumbnails

@mishig25 mishig25 marked this pull request as ready for review October 28, 2025 10:37
Copy link
Member

@julien-c julien-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm on principle

@mishig25 mishig25 force-pushed the remove_duplicate_fields branch from 09b5a57 to 7b20b58 Compare November 7, 2025 12:52
@mishig25 mishig25 merged commit b93ef3e into main Nov 7, 2025
1 check passed
@mishig25 mishig25 deleted the remove_duplicate_fields branch November 7, 2025 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants