New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The option that is supposed to group duplicated pictures doesn't seem to work #578
Comments
|
This option is supposed to detect images that are used multiple times, transform them into background images, and use CSS custom properties to reference them. If I inspect the image of an avatar that appears more than once in the page, I see that SingleFile worked as expected. You can verify it worked by right clicking on an avatar that is used more than once and select "Inspect Element" in the context menu. This will open the developer tools and select the corresponding If you look at the Then, if you scroll down a little bit into this panel, you'll be able to find the value of the custom property You can also verify the custom property is used multiple times in the |
|
I think the issue is related to the large images (e.g. the avatar of "Andiandi"). I am forced to define a max width and a max height because Firefox does not support values that are too large for custom properties. As far as I know, this is not really documented so I've chosen safe values. That could probably be optimized. |
|
I'm closing the issue since the option works as expected. |
|
I don't know if I can still add a new comment although the issue has been marked as “closed”. Thanks for the detailed explanations. I don't understand them fully as my knowledge of HTML is very cursory, but I understand that this behavior is due to a compromise made necessary by a limitation in Firefox itself. So, as I understand it, if such large images were replaced by references to the first instance on a given page, the resulting files would not be displayed properly in Firefox, is that what you mean ? Aren't there other methods that could be used to reduce file sizes in such cases ? A possibility would be an option to save images resized to their display size (which would be very small in the case of an avatar), instead of saving the actual source images (only for images which are displayed at a size smaller than their actual size). FWIW, I've had a similar issue some months earlier with this page, which was saved as a 263MB HTML file : Is there any way I could edit those files to remove the redundant data, without breaking the code's compliance ? What happens if I simply wipe (using WinHex) the base64 data for all instances of the problematic image, would it be displayed as a blank image, or would the file no longer be loaded properly in Firefox or any compatible utility ? (By the way, is there any standalone utility that can at least view files in “enhanced HTML”, and possibly edit them as well ? With the MHT format, formerly compatible with Firefox, I've used BlockNote which worked quite well, but it doesn't properly display files created by SingleFile, although I still use it for lack of a better option, to view (e)HTML files outside of Firefox.) Regarding the “side note” part : I've looked into SingleFileZ when I discovered SingleFile, but although there are advantages to saving pages in a compressed format, I prefer to save them in a plain text format, which allows for instance to search keywords inside files (I use Total Commander for such purposes, it does allow to search inside common archive files, but it's much slower), hence why I opted for SingleFile. Thanks again. |
|
Files produced by SingleFileZ can be indexed, there is an option for that. Edit: this is a quick answer. I'll answer to the other points after. |




Describe the bug
When saving a page which contains several instances of the same picture, each instance gets saved as an individual base64 stream, which can result in huge file sizes — even though the specific option meant to prevent that by replacing such redundant copies by references to the first instance is activated.
To Reproduce
For instance this page :
https://www2.yggtorrent.si/torrent/filmvid%C3%A9o/animation-s%C3%A9rie/553077-dragon+ball+z+int%C3%A9grale+-+broadcast+audio-multi+dvdrip+x264-mirolo
...was saved as a whopping 108MB html file, because each avatar picture got saved as an individual file, and one particular avatar picture from a member who posted many messages on that page (nickname “andiandi”) has a size of 1124220 bytes (in base64).
This even though, as I just verified, the option “regrouper les images dupliquées” (group duplicated pictures) was checked. With WinHex I can verify that there are many strictly identical 1124220 bytes blocks in that file.
As a side note, saved pages don't seem to retain references to the URL of saved images, which would definitely be useful. For instance, I can't find the URL of the aforementioned avatar picture without going back to the online page (apparently the format is PNG which can explain the large size).
Source code looks like this :
Expected behavior
In this case, as stated in the description for that option, only one instance of each picture in a page should be actually saved as a base64 stream, and any other copy should be saved as a mere reference to the first instance.
Environment
The text was updated successfully, but these errors were encountered: