# AI Vocabulary Components Generator

This notebook:
1. Reads .md files from the vocab content directory
2. Extracts title and summary from each file
3. Uses Anthropic's API to generate React components
4. Saves the components as .tsx files

For testing purposes, it processes only the first 5 files.

In [33]:
# Cell 1 - Imports and Setup
import os
import yaml
import anthropic
import time
import random
from datetime import datetime
from pathlib import Path
import re

print(f"Anthropic version: {anthropic.__version__}")
os.environ['ANTHROPIC_API_KEY'] = "your-api-here"  # Replace with your actual API key


Anthropic version: 0.37.1


In [34]:
# Cell 2 - Prompt for Claude

def create_prompt(title, summary):
    return f'''
Create an intuitive React component that teaches {title} through visual metaphors and real-world examples.

CONCEPT BREAKDOWN:
1. Name of the concept: {title}
2. Core Principle: {summary}

EDUCATIONAL GOALS:
1. Help the human student understand {title} by connecting it to familiar concepts
2. Show practical applications they might encounter in daily life
3. Build understanding through progressive revelation, not all at once

VISUALIZATION APPROACH:
1. Core Metaphor:
   - Choose 2 or 3 central, relatable metaphors that captures the essence of {title}
   - Example: For "Neural Networks" → use a "Learning to Ride a Bike" metaphor, "Learning to Cook" metaphor or "Learning a New Language" metaphor.
   - Create new metaphor examples
   
2. Animation & Interaction:
   - Start with an automatic demo cycle using useEffect with proper cleanup
   - Implement timing logic using useEffect hook with cleanup functions
   - Allow human user interaction when relevant
   - The interactions might include, but not limited to, moving objects, selecting objects given a criteria, sliding objetcts, Scroll or Pinch-to-zoom, Swipe navigation, Drag and Drop Operations, Scroll-Based Interactions, Interactive tutorials
   - Avoid using just a "next/start/pause/play" button as interaction
   - Provide reset capability
   - Use humor and relatable situations when appropriate
   - Show cause-and-effect clearly

3. Visual Elements:
   - Every visual element should map to a real concept
   - Use Lucide icons as meaningful symbols, not decoration

Remember: A successful visualization is one where users can explain the concept to others after using it.
'''

def format_time(seconds):
    """Format seconds into minutes and seconds"""
    return f"{int(seconds // 60)}m {int(seconds % 60)}s"

def extract_frontmatter(content):
    """Extract YAML frontmatter from markdown content"""
    if content.startswith('---'):
        parts = content.split('---', 2)[1:]
        if len(parts) >= 1:
            try:
                return yaml.safe_load(parts[0])
            except yaml.YAMLError:
                return None
    return None

In [35]:
# Cell 3
def extract_frontmatter(content):
    """Extract YAML frontmatter from markdown content."""
    if content.startswith('---'):
        parts = content.split('---', 2)[1:]
        if len(parts) >= 1:
            try:
                return yaml.safe_load(parts[0])
            except yaml.YAMLError:
                return None
    return None

def get_existing_component_names(output_dir):
    """Get names of existing components in the output directory."""
    existing_names = set()
    if os.path.exists(output_dir):
        for file in os.listdir(output_dir):
            if file.endswith('.tsx'):
                # Store exact filename without .tsx extension
                existing_names.add(file[:-4])
    return existing_names

In [28]:
# Cell 4

def validate_component(component_code):
    """Validate the generated component against best practices and requirements."""
    # Required TypeScript patterns
    typescript_patterns = {
        'interface': 'No TypeScript interfaces defined',
        'React.FC': 'Missing React.FC type declaration',
        'useState<': 'Missing type parameters in useState',
        'LucideIcon': 'Missing LucideIcon type import/usage',
    }

    # Required React patterns
    react_patterns = {
        '"use client";': 'Missing "use client" directive at start',
        'import { useState, useEffect } from "react"': 'Missing or incorrect React imports',
        'export default': 'Missing default export',
        'transition': 'No transitions found for animations',
        'className': 'No Tailwind classes found',
        'onClick': 'No interactive elements found',
        'return': 'No return statement found'
    }

    # Anti-patterns to check - Fixed regex patterns
    anti_patterns = {
        r'className="[^"]*\[[^\]]*\]"': 'Found arbitrary values in Tailwind classes',
        'setTimeout': 'Direct setTimeout usage found - use useEffect instead',
        'setInterval': 'Direct setInterval usage found - use useEffect instead',
        'style=\\{\\{': 'Inline styles found instead of Tailwind classes',
        'styled': 'Styled-components usage found',
        '@keyframes': 'Direct CSS keyframes found',
        'framer': 'Framer Motion import found',
        r'motion\.': 'Framer Motion component found',
        '@emotion': 'Emotion styling found',
        'any': 'Avoid using the "any" type',
        r'scenarios\[step\]\.icon': 'Accessing dynamic icon properties directly - use type-safe approach',
        r'<scenarios\.': 'Invalid JSX usage of dynamic components',
        r'function\s+\w+\s*\(\s*\)\s*\{': 'Function missing type declaration'
    }

    # TypeScript-specific code structure requirements - Fixed regex patterns
    structure_requirements = {
        r'const\s+[A-Z_]+\s*=': 'Missing constant declarations outside component',
        r'const\s+\w+Component\s*=': 'Missing dynamic component declaration',
        r'(interface|type)\s+\w+': 'Missing type definitions',
        r'onClick=\{\s*\([^)]*\)\s*=>': 'Missing event handler type declarations'
    }

    issues = [
        f"❌ {message}"
        for pattern, message in (typescript_patterns | react_patterns).items()
        if pattern not in component_code
    ]
    # Check for anti-patterns using regex
    for pattern, message in anti_patterns.items():
        try:
            if re.search(pattern, component_code, re.MULTILINE):
                issues.append(f"❌ {message}")
        except re.error as e:
            print(f"Warning: Invalid regex pattern '{pattern}': {str(e)}")
            continue

    # Check for proper code structure
    for pattern, message in structure_requirements.items():
        try:
            if not re.search(pattern, component_code, re.MULTILINE):
                issues.append(f"⚠️ {message}")
        except re.error as e:
            print(f"Warning: Invalid regex pattern '{pattern}': {str(e)}")
            continue

    # Check for number of hooks
    if component_code.count('useState') > 5:
        issues.append("⚠️ Too many useState hooks (>5) - consider combining related state")

    if component_code.count('useEffect') > 3:
        issues.append("⚠️ Too many useEffect hooks (>3) - consider combining effects")

    # Check component length
    if len(component_code.split('\n')) > 200:
        issues.append("⚠️ Component is too long (>200 lines) - consider breaking it down")

    # Check for useEffect cleanup
    try:
        effect_blocks = re.findall(r'useEffect\(\s*\(\s*\)\s*=>\s*\{[^}]*\}', component_code, re.MULTILINE | re.DOTALL)
        for block in effect_blocks:
            if 'return' not in block:
                issues.append("❌ useEffect missing cleanup function")
    except re.error as e:
        print(f"Warning: Error checking useEffect cleanup: {str(e)}")

    # Check for proper dynamic icon usage
    try:
        if re.search(r'<\w+\.icon', component_code):
            issues.append("❌ Invalid dynamic icon usage - use proper component reference")
    except re.error as e:
        print(f"Warning: Error checking dynamic icon usage: {str(e)}")

    # Check for Props interface
    try:
        if not re.search(r'(interface|type)\s+Props\s*=?\s*\{', component_code, re.MULTILINE):
            issues.append("❌ Missing Props interface/type definition")
    except re.error as e:
        print(f"Warning: Error checking Props interface: {str(e)}")

    # Check state types
    try:
        state_updates = re.findall(r'set(\w+)\(', component_code)
        for state_name in state_updates:
            if f'useState<{state_name}Type>' not in component_code:
                issues.append(f"⚠️ Missing type definition for {state_name} state")
    except re.error as e:
        print(f"Warning: Error checking state types: {str(e)}")

    return issues

def fix_common_typescript_issues(component_code):
    """Fix common TypeScript issues in the generated component."""
    fixes = []
    
    # Add missing imports
    if 'LucideIcon' not in component_code:
        fixes.append(('import { Brain } from "lucide-react";',
                     'import { Brain, LucideIcon } from "lucide-react";'))
    
    # Fix dynamic icon usage
    if re.search(r'<\w+\.icon', component_code):
        fixes.append((r'<(\w+)\.icon', r'{\1.icon && <\1.icon'))
    
    # Add missing type definitions
    if not re.search(r'interface Props', component_code):
        fixes.append(('export default function', 'interface Props {}\n\nexport default function'))
    
    # Fix scenarios array type safety
    scenarios_pattern = r'const scenarios = \['
    if re.search(scenarios_pattern, component_code):
        fixes.append((scenarios_pattern, 'const SCENARIOS: Array<ScenarioType> = ['))
    
    # Apply all fixes
    fixed_code = component_code
    for old, new in fixes:
        fixed_code = re.sub(old, new, fixed_code)
    
    return fixed_code

In [29]:
# Cell 5


def save_tsx_file(content, md_filename, output_dir):
    """
    Save the API response as a .tsx file with TypeScript validation and fixes.
    
    Args:
        content (str): The component code to save
        md_filename (str): Original markdown filename with .md extension
        output_dir (str): Directory to save the TSX file
    """
    os.makedirs(output_dir, exist_ok=True)
    
    # Convert .md to .tsx while preserving exact filename
    tsx_filename = md_filename.replace('.md', '.tsx')
    filepath = os.path.join(output_dir, tsx_filename)
    
    # Clean the content
    cleaned_content = content
    if content.startswith('```'):
        first_newline = content.find('\n')
        if first_newline != -1:
            content = content[first_newline + 1:]
        if content.strip().endswith('```'):
            content = content.strip()[:-3]
    
    # Ensure "use client"; is properly formatted
    cleaned_content = content.strip()
    if cleaned_content.startswith('use client;'):
        cleaned_content = '"use client";' + cleaned_content[len('use client;'):]
    elif not cleaned_content.startswith('"use client";'):
        cleaned_content = '"use client";\n\n' + cleaned_content
    
    # Validate the component
    issues = validate_component(cleaned_content)
    
    # If there are TypeScript-specific issues, try to fix them
    if any('TypeScript' in issue or 'type' in issue.lower() for issue in issues):
        print("  🔧 Attempting to fix TypeScript issues...")
        cleaned_content = fix_common_typescript_issues(cleaned_content)
        # Revalidate after fixes
        issues = validate_component(cleaned_content)
    
    # Print validation results
    if issues:
        print("\n  ⚠️ Component validation issues found:")
        for issue in issues:
            print(f"    {issue}")
    else:
        print("  ✅ Component validation passed")
    
    # Add necessary type imports if missing
    if 'import type { LucideIcon }' not in cleaned_content:
        import_statements = cleaned_content.split('\n', 1)
        cleaned_content = import_statements[0] + '\nimport type { LucideIcon } from "lucide-react";\n' + import_statements[1]
    
    # Save the file
    with open(filepath, 'w', encoding='utf-8') as f:
        f.write(cleaned_content)
    print(f"  ✓ Saved: {tsx_filename}")
    return issues


In [30]:
# Cell 6
def format_time(seconds):
    """Format seconds into minutes and seconds."""
    return f"{int(seconds // 60)}m {int(seconds % 60)}s"

# def generate_component(client, prompt):
#     """Generate component using Claude API."""
#     response = client.messages.create(
#         model="claude-3-5-sonnet-20241022",
#         max_tokens=6000,
#         temperature=0.7,
#         system='''        
        # You are a creative expert React developer and AI professor specializing in educational components for 15 to 18-year-old humans. 
        # Your components must strictly follow these technical requirements:

        # 1. Architecture:
        # - "use client" directive at start (first line)
        # - import { useState, useEffect } from "react"; as the second line
        # - Only useState and useEffect hooks
        # - Only Lucide icons for visuals
        # - Only Tailwind CSS for styling
        # - No external libraries/components
        # - File extension: .tsx

        # 2. TypeScript Implementation:
        # interface ComponentProps {
        #     // Define if needed, empty interface required
        # }
        
        # // All state must use explicit types
        # const [state, setState] = useState<StateType>(initialValue);
        
        # // Event handlers must be typed
        # const handleEvent = (e: React.MouseEvent<HTMLButtonElement>) => {...};
        
        # // Constants outside component
        # const SCENARIOS: ScenarioType[] = [...];

        # 3. Effects & Cleanup:
        # useEffect(() => {
        #     // Effect logic
        #     return () => {
        #     // Cleanup required
        #     };
        # }, [dependencies]);

        # 4. Styling Standards:
        # - Only core Tailwind classes
        # - No arbitrary values (e.g., h-[500px])
        # - Transitions: duration-300 to duration-500
        # - Color scheme:
        #     • Blue (#3B82F6) - active/focus
        #     • Gray (#6B7280) - background
        #     • Green (#22C55E) - success

        # 5. Code Organization:
        # - Max 200 lines per component
        # - Early returns with type guards
        # - JSDoc component documentation
        # - Proper hooks cleanup
        # - No inline styles
        # - No setTimeout/setInterval (use useEffect)

        # Return only raw TSX code without explanations or markdown.
#         ''', 
#         messages=[{"role": "user", "content": prompt}]
#     )
    
#     component_code = response.content[0].text
#     issues = validate_component(component_code)
    
#     return component_code, issues

def generate_component(client, title, prompt):
    """
    Two-stage component generation with iterative refinement.
    
    Args:
        client: Anthropic client instance
        title: Component title for logging
        prompt: Initial generation prompt
    
    Returns:
        tuple: (final_component_code, final_issues)
    """
    print(f"  ⌛ Stage 1: Generating initial component for {title}...")

    # Stage 1: Initial Generation
    component_code, issues = generate_component(client, prompt)

    if not issues:
        print("  ✅ Initial component passed validation")
        return component_code, issues

    # Stage 2: Refinement
    print("  ⌛ Stage 2: Refining component...")

    # Group issues by category for more structured feedback
    typescript_issues = [i for i in issues if 'TypeScript' in i or 'type' in i.lower()]
    react_issues = [i for i in issues if 'React' in i or 'component' in i.lower()]
    pattern_issues = [i for i in issues if 'pattern' in i.lower() or 'usage' in i]

    refinement_prompt = f"""
    Please fix the following issues in this React component while preserving its core functionality and visual design:

    COMPONENT CODE:
    ```tsx
    {component_code}
    ```

    ISSUES TO FIX:
    
    {'TypeScript Issues:' if typescript_issues else ''}
    {chr(10).join(f"- {issue}" for issue in typescript_issues)}
    
    {'React Issues:' if react_issues else ''}
    {chr(10).join(f"- {issue}" for issue in react_issues)}
    
    {'Pattern Issues:' if pattern_issues else ''}
    {chr(10).join(f"- {issue}" for issue in pattern_issues)}

    Return only the complete fixed component code without explanations.
    Ensure all functionality remains the same while fixing these technical issues.
    """

    # Generate refined version
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=6000,
        temperature=0.7,
        messages=[{"role": "user", "content": refinement_prompt}]
    )

    refined_code = response.content[0].text
    refined_issues = validate_component(refined_code)

    # Log refinement results
    if len(refined_issues) < len(issues):
        print(f"  ✅ Refinement fixed {len(issues) - len(refined_issues)} issues")
    else:
        print("  ⚠️ Refinement did not improve component quality")

    return refined_code if len(refined_issues) < len(issues) else component_code, refined_issues

# Usage in main processing loop:
def process_file(client, md_file, metadata):
    """Process a single file with refinement."""
    print(f"  ⌛ Creating prompt for: {metadata['title']}")
    prompt = create_prompt(metadata['title'], metadata['summary'])
    
    print("  ⌛ Generating and refining component...")
    component_code, issues = generate_component(
        client, 
        metadata['title'], 
        prompt
    )
    
    return component_code, issues

In [31]:
# Cell 7 - Main Function
def main():
    # Initialize Anthropic client
    print("🚀 Starting process...")
    start_time_total = time.time()
    client = anthropic.Client(api_key=os.getenv('ANTHROPIC_API_KEY'))
    
    # Configure paths
    input_dir = "/Users/kemi/Documents/GitHub/vocab/src/content/articles"
    output_dir = "/Users/kemi/Documents/GitHub/vocab/src/components/articles"
    
    # Get existing component names
    print("📂 Checking existing components...")
    existing_components = get_existing_component_names(output_dir)
    if existing_components:
        print(f"  Found {len(existing_components)} existing components")
    
    # Get list of all .md files that don't have corresponding .tsx files
    all_md_files = []
    for md_file in Path(input_dir).glob('*.md'):
        # Use exact filename without extension
        component_name = md_file.stem
        if component_name not in existing_components:
            all_md_files.append(md_file)
    
    total_available = len(all_md_files)
    print(f"📁 Found {total_available} unprocessed files")
    
    if total_available == 0:
        print("❌ No new files to process")
        return
    
    # Select 3 random files (or all if less than 3 available)
    num_files = min(3, total_available)
    md_files = random.sample(all_md_files, num_files)
    
    print(f"🎲 Randomly selected {num_files} files to process")
    
    # Process each file
    for index, md_file in enumerate(md_files, 1):
        print(f"\n📝 Processing file {index}/{num_files}: {md_file.name}")
        start_time_file = time.time()
        
        try:
            print("  ⌛ Reading file...")
            with open(md_file, 'r', encoding='utf-8') as f:
                content = f.read()
            
            print("  ⌛ Extracting metadata...")
            metadata = extract_frontmatter(content)
            if not metadata:
                print("  ❌ Could not extract metadata")
                continue
            
            print(f"  ⌛ Creating prompt for: {metadata['title']}")
            prompt = create_prompt(metadata['title'], metadata['summary'])
            
            print("  ⌛ Generating component...")
            component_code, issues = generate_component(client, prompt)
            
            if component_code:
                print("  ⌛ Saving TSX file...")
                # Pass the original markdown filename
                save_tsx_file(component_code, md_file.name, output_dir)
                
                elapsed_time = time.time() - start_time_file
                if issues:
                    print(f"  ⚠️ Processed with issues: {md_file.stem} in {format_time(elapsed_time)}")
                else:
                    print(f"  ✅ Successfully processed: {md_file.stem} in {format_time(elapsed_time)}")
            else:
                print(f"  ❌ Failed to generate valid component for: {md_file.stem}")
            
        except Exception as e:
            elapsed_time = time.time() - start_time_file
            print(f"  ❌ Error processing {md_file.name} after {format_time(elapsed_time)}: {str(e)}")
    
    total_time = time.time() - start_time_total
    print("\n🎉 Process completed!")
    print(f"📊 Summary: Processed {num_files} files in {format_time(total_time)}")


In [32]:
# Cell 8 - Run the Script
if __name__ == "__main__":
    main()

🚀 Starting process...
📂 Checking existing components...
  Found 7 existing components
📁 Found 813 unprocessed files
🎲 Randomly selected 3 files to process

📝 Processing file 1/3: brainoware.md
  ⌛ Reading file...
  ⌛ Extracting metadata...
  ⌛ Creating prompt for: Brainoware
  ⌛ Generating component...
  ❌ Error processing brainoware.md after 0m 0s: generate_component() missing 1 required positional argument: 'prompt'

📝 Processing file 2/3: image-recognition.md
  ⌛ Reading file...
  ⌛ Extracting metadata...
  ⌛ Creating prompt for: Image Recognition
  ⌛ Generating component...
  ❌ Error processing image-recognition.md after 0m 0s: generate_component() missing 1 required positional argument: 'prompt'

📝 Processing file 3/3: dlms-deep-language-models.md
  ⌛ Reading file...
  ⌛ Extracting metadata...
  ⌛ Creating prompt for: DLMs (Deep Language Models)
  ⌛ Generating component...
  ❌ Error processing dlms-deep-language-models.md after 0m 0s: generate_component() missing 1 required positi