Skip to content

refactor: optimize image handling and simplify module architecture#6

Merged
gupsammy merged 3 commits into
mainfrom
refactor/optimize-image-handling
Nov 30, 2025
Merged

refactor: optimize image handling and simplify module architecture#6
gupsammy merged 3 commits into
mainfrom
refactor/optimize-image-handling

Conversation

@gupsammy
Copy link
Copy Markdown
Owner

Summary

This refactor consolidates image handling logic and eliminates unnecessary module abstraction layers. The primary improvement is moving image analysis into extractContent() to return pre-analyzed image data, eliminating redundant HTML parsing when handling images later.

  • Single-pass image extraction: analyzeImagesInHtml() replaces separate extractImagesFromHtml() and countImagesInHtml() calls
  • Pre-analyzed images in extraction result: Images are analyzed once during content extraction and passed through to content-script.js
  • Unified naming function: generateBaseName() consolidates generateFilename() and generateSubfolder() logic
  • Removed platform.js wrapper: Direct use of EzyCopyFiles module in content-script.js, reducing module layers
  • Fixed popup.js toggle synchronization: Both checkboxes now sync correctly when enforcement rules apply

Key Changes

  • lib/ezycopy.js: Replaced image extraction functions with unified analyzeImagesInHtml(), integrated image analysis into extractContent() flow, consolidated naming functions
  • content-script.js: Updated to consume pre-analyzed images from extraction result, removed redundant image extraction call, uses EzyCopyFiles directly
  • file-helpers.js: Removed generatePageSubfolder() function (replaced by generateBaseName)
  • injection-files.js: Removed lib/platform.js from injection list
  • popup.js: Fixed toggle state synchronization in both change handlers

Files Changed

  • lib/ezycopy.js: Core refactoring of image and naming functions
  • content-script.js: Updated to work with pre-analyzed image data
  • file-helpers.js: Removed duplicate naming function
  • injection-files.js: Simplified injection list
  • popup.js: UI state synchronization fix
  • lib/platform.js: Deleted (functionality moved to direct EzyCopyFiles usage)
  • CLAUDE.md: Updated documentation

This reduces complexity, improves performance by eliminating redundant parsing, and streamlines the data flow from extraction to download.

…tching

- Add calculateTextOverlap() to match Readability output fingerprints against CMS containers
- Refactor findBestContentContainer() to use hybrid approach: text overlap + image count tiebreaker
- Improves container detection on multi-column and gallery-heavy pages by better distinguishing article containers from sidebars
- Add includeImages option to extractContent() and skip fallback when disabled for performance
- Update content-script.js to pass includeImages setting through extraction pipeline
Consolidate image-related functions and eliminate unnecessary module layers:

- Replace separate extractImagesFromHtml() and countImagesInHtml() with unified analyzeImagesInHtml() for single-pass extraction
- Move image analysis into extractContent() to pre-populate images in extraction result (eliminates redundant parsing)
- Unify generateFilename() and generateSubfolder() into single generateBaseName() function
- Remove redundant platform.js wrapper layer, use EzyCopyFiles directly in content-script.js
- Update content-script.js to consume pre-analyzed images from extraction result instead of extracting separately
- Remove generatePageSubfolder() from file-helpers.js (replaced by generateBaseName)
- Remove lib/platform.js from injection list
- Update documentation to reflect simplified architecture
- Fix popup.js toggle synchronization to handle both checkboxes

This reduces code complexity, eliminates redundant DOM parsing, and streamlines the data flow.
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Nov 29, 2025

Claude finished @gupsammy's task —— View job


Code Review

Todo List:

  • Review lib/ezycopy.js refactoring
  • Check content-script.js integration with pre-analyzed images
  • Verify platform.js removal doesn't break functionality
  • Review popup.js toggle fix
  • Check for critical bugs or Chrome Extension API issues

Review Summary

This PR looks good to ship after fixing the image count bug below. The refactoring achieves its goal of eliminating redundant HTML parsing while simplifying the module architecture.


CRITICAL Issues

1. Image count calculation bug in analyzeImagesInHtml()

Location: lib/ezycopy.js:91

Issue: The function returns count: imgElements.length but the actual images array may be shorter due to filtering (data URIs, duplicates). This causes incorrect image loss ratio calculations in extractContent().

// Current code returns total count, but images array is filtered
return { count: imgElements.length, images };

Impact: The image loss ratio calculation at line 335-337 will be wrong:

const containerImageCount = container.querySelectorAll('img').length;
const readabilityAnalysis = analyzeImagesInHtml(article.content);
const imageLossRatio = containerImageCount > 0
  ? (containerImageCount - readabilityAnalysis.count) / containerImageCount  // <- uses total count, not filtered count
  : 0;

If a page has 10 images but 5 are data URIs, readabilityAnalysis.count will be 10 but readabilityAnalysis.images.length will be 5. The image loss calculation will be wrong, potentially triggering the fallback incorrectly.

Fix: Return the filtered count instead:

return { count: images.length, images };

I've left an inline comment with a code suggestion on the specific line.


Architecture Notes (non-blocking)

The refactoring successfully:

  • ✅ Eliminates redundant HTML parsing by doing single-pass image extraction
  • ✅ Removes unnecessary abstraction layer (platform.js)
  • ✅ Consolidates naming functions (generateFilename + generateSubfolder → generateBaseName)
  • ✅ Fixes popup toggle synchronization bug
  • ✅ Passes pre-analyzed images through extraction result

The code is well-structured and the performance optimization is solid. Once the count bug is fixed, this is ready to merge.


Comment thread lib/ezycopy.js Outdated
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Nov 29, 2025

Claude encountered an error —— View job


Re-reviewing PR Changes

Todo List:

  • Check if critical image count bug was fixed
  • Verify image loss ratio calculation is correct
  • Scan for any new issues
  • Provide final review

Critical bug fixed! Line 91 now correctly returns count: images.length (filtered count)

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Nov 29, 2025

Claude encountered an error —— View job


I'll analyze this and get back to you.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Nov 30, 2025

Claude finished @gupsammy's task —— View job


Re-reviewing PR After Latest Push

Todo List:

  • Verify critical image count bug fix at line 91
  • Review lib/ezycopy.js for other critical issues
  • Check content-script.js integration
  • Scan other changed files for bugs
  • Provide final review verdict

Final Review: ✅ Ready to Merge

The critical bug has been fixed correctly. Line 91 in lib/ezycopy.js now returns count: images.length (the filtered count), which matches the actual images array length. This ensures the image loss ratio calculation at lines 335-337 works properly.


Verification Summary

Critical bug fix confirmed:

  • analyzeImagesInHtml() at line 91 now returns filtered count
  • ✅ Image loss ratio calculation will work correctly
  • ✅ Fallback logic will trigger appropriately

Architecture review:

  • ✅ Single-pass image extraction eliminates redundant parsing
  • ✅ Pre-analyzed images correctly consumed in content-script.js:77-92
  • ✅ Direct EzyCopyFiles usage (platform.js removed) works fine
  • ✅ Popup toggle synchronization fixed (lines 52-54, 64-67 in popup.js)
  • ✅ All integration points check out

No critical/high issues found. The refactoring achieves its performance goals without introducing bugs or Chrome Extension API violations.


@gupsammy gupsammy merged commit 1b597bd into main Nov 30, 2025
1 of 3 checks passed
@gupsammy gupsammy deleted the refactor/optimize-image-handling branch November 30, 2025 05:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant