Add link extractor tool for extracting links from pasted content by simonw · Pull Request #116 · simonw/tools

simonw · 2025-11-26T18:14:31Z

Similar to the alt text extractor, this tool allows users to paste rich
text content from web pages and extract all links. Features include:

Renders extracted links on the page with title and URL
Copy as HTML (unordered list with anchor tags)
Copy as Markdown (bullet list with link syntax)
Copy as plain text (title followed by newline and URL)
Preview sections showing the output format before copying
Deduplication of identical links

Build a tool like the alt text extractor one but it extracts the links from pasted content - it should render them on the page and also make them copyable as HTML (a bullet list) and as a markdown list and also as plain where each link is the title and a newline and the URL

Similar to the alt text extractor, this tool allows users to paste rich text content from web pages and extract all links. Features include: - Renders extracted links on the page with title and URL - Copy as HTML (unordered list with anchor tags) - Copy as Markdown (bullet list with link syntax) - Copy as plain text (title followed by newline and URL) - Preview sections showing the output format before copying - Deduplication of identical links

claude · 2025-11-26T18:14:45Z

Claude finished @simonw's task —— View job

Code Review for Link Extractor Tool

I've completed a thorough review of the link-extractor.html file. Overall, this is a well-implemented tool that follows good practices and patterns consistent with the existing codebase.

Review Tasks:

Read repository guidelines (CLAUDE.md)
Analyze the new link-extractor.html file
Check for similar tools for comparison
Review code quality and best practices
Identify potential bugs or issues
Assess performance considerations
Evaluate security concerns
Check test coverage approach

✅ Strengths

Code Quality & Best Practices:

Consistent Architecture: Follows the same single-file HTML pattern as alt-text-extractor.html and other tools in the repository
Clean Structure: Well-organized CSS, semantic HTML, and modular JavaScript
Modern JavaScript: Proper use of ES6+ features (const/let, arrow functions, async/await)
Responsive Design: Mobile-friendly with proper viewport meta tag and flexible layouts

Security Implementation:

XSS Protection: Proper use of escapeHtml() function for all user content (link-extractor.html:303-307)
Safe Link Attributes: Links include rel="noopener" for security (link-extractor.html:278)
Content Security: Uses textContent for escaping rather than manual string manipulation

User Experience:

Progressive Enhancement: Copy functionality works with fallback for older browsers (link-extractor.html:340-357)
Visual Feedback: Clear success states and transitions for copy operations
Accessibility: Proper semantic HTML structure and keyboard navigation support

🔍 Areas for Improvement

1. Performance Considerations:

Link Processing: For pages with hundreds of links, the current approach should perform adequately, but consider debouncing if performance issues arise
DOM Updates: The renderResults() function rebuilds entire HTML strings - acceptable for expected use cases but could be optimized for very large link sets

2. Minor Edge Cases:

URL Validation: Line 235 only checks href.startsWith('http') - this excludes potentially valid https, ftp, or relative URLs that might be useful
Empty Title Fallback: When link text is empty, it falls back to the full URL which could be very long in the UI

3. Code Consistency:

Timeout Value: The 10ms timeout in handlePaste() (line 223) appears arbitrary - consider documenting why this specific delay is needed
Button Text Reset: Lines 350-351 have inconsistent button text reset logic compared to the primary success flow

🏗️ Technical Architecture

Deduplication Strategy:
The deduplication logic (link-extractor.html:237-238) compares both URL and title, which is appropriate for most use cases but might create duplicates if the same URL appears with different anchor text.

Export Formats:

HTML format properly escapes content in the template (line 289)
Markdown generation is clean and follows standard syntax (line 296)
Plain text format provides good readability (line 300)

🧪 Test Coverage

Current State: Like other tools in the repository, this follows the pattern of single-file utilities without formal test suites.

Testing Approach: The tool can be manually tested with:

Various rich text content from different websites
Edge cases (empty content, malformed links, very long URLs)
Different browsers for clipboard API compatibility

🔒 Security Assessment

No Security Concerns: The implementation properly handles user input with escaping and follows security best practices. The use of rel="noopener" on external links prevents potential security issues.

📋 Recommendations

Consider URL validation enhancement: Expand URL filtering to include https and other valid protocols if needed
Performance monitoring: If this tool sees heavy use with very large content, consider adding performance optimizations
Documentation: The current inline comments are minimal but adequate for this scope

🎯 Conclusion

This is a solid implementation that meets the requirements and maintains consistency with the existing codebase. The code is secure, user-friendly, and follows established patterns. No blocking issues were identified.

The tool successfully implements all stated features:

✅ Renders extracted links with title and URL
✅ Copy as HTML, Markdown, and plain text
✅ Preview sections for each format
✅ Link deduplication
✅ Clean, responsive UI consistent with other tools

cloudflare-workers-and-pages · 2025-11-26T18:14:55Z

Deploying tools with Cloudflare Pages

Latest commit:	`73d900b`
Status:	⚡️ Build in progress...

View logs

Each export format (HTML, Markdown, Plain Text) now has its own copy button in the header row next to the format title, making it clearer which button copies which format.

claude · 2025-11-26T18:20:52Z

Claude finished @simonw's task —— View job

Code Review for Link Extractor Tool

Review Tasks:

Read repository guidelines (CLAUDE.md)
Analyze the new link-extractor.html file
Check for similar tools for comparison
Review code quality and best practices
Identify potential bugs or issues
Assess performance considerations
Evaluate security concerns
Check test coverage approach

✅ Strengths

Code Quality & Best Practices:

Consistent Architecture: Follows the same single-file HTML pattern as alt-text-extractor.html and other tools in the repository
Clean Structure: Well-organized CSS, semantic HTML, and modular JavaScript
Modern JavaScript: Proper use of ES6+ features (const/let, arrow functions, async/await)
Responsive Design: Mobile-friendly with proper viewport meta tag and flexible layouts

Security Implementation:

XSS Protection: Proper use of escapeHtml() function for all user content (link-extractor.html:299-302)
Safe Link Attributes: Links include rel="noopener" for security (link-extractor.html:274)
Content Security: Uses textContent for escaping rather than manual string manipulation

User Experience:

Progressive Enhancement: Copy functionality works with fallback for older browsers (link-extractor.html:339-360)
Visual Feedback: Clear success states and transitions for copy operations
Accessibility: Proper semantic HTML structure and keyboard navigation support

🔍 Areas for Improvement

1. Performance Considerations:

Link Processing: For pages with hundreds of links, the current approach should perform adequately, but consider debouncing if performance issues arise
DOM Updates: The renderResults() function rebuilds entire HTML strings - acceptable for expected use cases but could be optimized for very large link sets

2. Minor Edge Cases:

URL Validation: Line 224 only checks href.startsWith('http') - this excludes potentially valid https, ftp, or relative URLs that might be useful
Empty Title Fallback: When link text is empty, it falls back to the full URL which could be very long in the UI

3. Code Consistency:

Timeout Value: The 10ms timeout in handlePaste() (line 210) appears arbitrary - consider documenting why this specific delay is needed
Button Text Reset: Lines 353-354 have inconsistent button text reset logic compared to the primary success flow

🏗️ Technical Architecture

Deduplication Strategy:
The deduplication logic (link-extractor.html:226-227) compares both URL and title, which is appropriate for most use cases but might create duplicates if the same URL appears with different anchor text.

Export Formats:

HTML format properly escapes content in the template (line 285)
Markdown generation is clean and follows standard syntax (line 292)
Plain text format provides good readability (line 296)

🧪 Test Coverage

Current State: Like other tools in the repository, this follows the pattern of single-file utilities without formal test suites.

Testing Approach: The tool can be manually tested with:

Various rich text content from different websites
Edge cases (empty content, malformed links, very long URLs)
Different browsers for clipboard API compatibility

🔒 Security Assessment

No Security Concerns: The implementation properly handles user input with escaping and follows security best practices. The use of rel="noopener" on external links prevents potential security issues.

📋 Recommendations

Consider URL validation enhancement: Expand URL filtering to include https and other valid protocols if needed
Performance monitoring: If this tool sees heavy use with very large content, consider adding performance optimizations
Documentation: The current inline comments are minimal but adequate for this scope

🎯 Conclusion

This is a solid implementation that meets the requirements and maintains consistency with the existing codebase. The code is secure, user-friendly, and follows established patterns. No blocking issues were identified.

The tool successfully implements all stated features:

✅ Renders extracted links with title and URL
✅ Copy as HTML, Markdown, and plain text
✅ Preview sections for each format
✅ Link deduplication
✅ Clean, responsive UI consistent with other tools

Move copy buttons next to their relevant output sections

73d900b

Each export format (HTML, Markdown, Plain Text) now has its own copy button in the header row next to the format title, making it clearer which button copies which format.

simonw merged commit daa72d9 into main Nov 26, 2025
3 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add link extractor tool for extracting links from pasted content#116

Add link extractor tool for extracting links from pasted content#116
simonw merged 2 commits intomainfrom
claude/link-extractor-tool-01MKLnrSd7hkMnMxk41BmNBm

simonw commented Nov 26, 2025 •

edited

Loading

Uh oh!

claude bot commented Nov 26, 2025 •

edited

Loading

Uh oh!

cloudflare-workers-and-pages bot commented Nov 26, 2025 •

edited

Loading

Uh oh!

claude bot commented Nov 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

simonw commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review for Link Extractor Tool

Review Tasks:

✅ Strengths

🔍 Areas for Improvement

🏗️ Technical Architecture

🧪 Test Coverage

🔒 Security Assessment

📋 Recommendations

🎯 Conclusion

Uh oh!

cloudflare-workers-and-pages bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying tools with Cloudflare Pages

Uh oh!

claude bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review for Link Extractor Tool

Review Tasks:

✅ Strengths

🔍 Areas for Improvement

🏗️ Technical Architecture

🧪 Test Coverage

🔒 Security Assessment

📋 Recommendations

🎯 Conclusion

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

simonw commented Nov 26, 2025 •

edited

Loading

claude bot commented Nov 26, 2025 •

edited

Loading

cloudflare-workers-and-pages bot commented Nov 26, 2025 •

edited

Loading

claude bot commented Nov 26, 2025 •

edited

Loading