[not-issue] usage with SingleFile #1

andrey-jef · 2022-09-10T17:15:42Z

Hi.

Just wanna drop in and say thank you. I'm using SingleFile to archive some website into a single html file. Now with your plugin, I can view the archived html file from within Obsidian. Not counting the fact that these notes have also their own metadata in frontmatter, which is awesome.

nuthrash · 2022-09-11T16:11:59Z

You are welcome! Thanks for your testing report of SingleFile.
Meanwhile, in addition to SingleFile, I'm also using "Print Edit WE"(with Page Save WE) firefox/chrome extensions to archive a website's page into a single html file. It is flexible to delete or hide useless html visual elements.

gildas-lormeau · 2022-09-12T11:25:50Z

It is flexible to delete or hide useless html visual elements.

FYI, SingleFile also includes an annotation editor to do the same thing without installing any additional extension.

nuthrash · 2022-09-12T14:11:14Z

FYI, SingleFile also includes an annotation editor to do the same thing without installing any additional extension.

That's great! I think the SingleFile's annotation editor is very suitable for common cases.
However, in my case, sometimes I prefer to cut out useless visual elements unrestrictedly, which means the main text area where I want to keep would extend to fill almost full page. It seems that SingleFile cannot satisfy my need, or maybe I didn't find the right way?

In fact, I often use SingleFile and other tools to save different kinds of html files depend on which tool can satisfy what I want to keep data. For example, I wanna keep image source address from original remote site, so I simply use the internal saving facility of browser to save html files.

gildas-lormeau · 2022-09-12T18:12:30Z

However, in my case, sometimes I prefer to cut out useless visual elements unrestrictedly, which means the main text area where I want to keep would extend to fill almost full page. It seems that SingleFile cannot satisfy my need, or maybe I didn't find the right way?

I confirm the editor cannot remove margins. The use case I have in mind is rather the removal of ads and other unwanted elements while keeping the style. However, the editor also allows to save the page as it appears in the reader mode. I think this feature could be suitable for your needs.

nuthrash · 2022-09-13T06:34:11Z

I confirm the editor cannot remove margins. The use case I have in mind is rather the removal of ads and other unwanted elements while keeping the style. However, the editor also allows to save the page as it appears in the reader mode. I think this feature could be suitable for your needs.

Oh, the SingleFile's reader mode is my ever seen best read mode, it works perfectly in 99% cases.

By the way, I think you already know there are some problems about the reader mode:

The reader mode is not supported by all websites and all pages. (I guess SingleFile's reader mode is co-operating with this mechanism after I tested some unsupported pages)
It would also disable many styling effects.

The first problem can be overcomed by using "force enable reader mode" extensions.
But the second problem is unresolvable, because the name of reader mode itself means some styling effects shall be disabled.
Therefore, I cannot take reader mode to capture web contents, especially when I wanna keep styling effects of code snippets.

gildas-lormeau · 2022-09-15T00:26:50Z

Actually, SingleFile uses the Firefox implementation for the reader mode, see https://github.com/mozilla/readability. I agree that the reader mode might be too destructive for your use case.

BTW, I added a link to your project here: https://github.com/gildas-lormeau/SingleFile/blob/master/README.MD#projects-usingcompatible-with-singlefile

nuthrash · 2022-09-15T15:05:21Z

Actually, SingleFile uses the Firefox implementation for the reader mode, see https://github.com/mozilla/readability.

That's a very useful clue! I might investigate something interesting about the reader mode.

BTW, I added a link to your project here: https://github.com/gildas-lormeau/SingleFile/blob/master/README.MD#projects-usingcompatible-with-singlefile

It's my honor for my project to be added to SingleFile's compatible list. This means a lot to me.
And thank you very much to bring us such an excellent extension, SingleFile make the capturing information simpler.

scruel · 2022-10-19T10:52:06Z

Can you support Mozilla Archive Format that generated by SignleFileZ? Thanks~

@gildas-lormeau Catch ya! :)

nuthrash · 2022-10-19T16:31:43Z

Can you support Mozilla Archive Format that generated by SignleFileZ? Thanks~

Hmm, I think this plugin is not available to open the HTML files generated by SingleFileZ.

I am new to SingleFileZ web extension, therefore I tried to parse the file generated by it(e.g.: abc.zip.html) and a standard .maff (Mozilla Archive Format) file.
I think the file generated by SingleFileZ is not a Mozilla Archive Format file, that means they use different document format to store HTML and related files.

Refer to this .maff file https://www.amadzone.org/mozilla-archive-format/maff-test-cases/test-basic-type-html.maff, it is a pure ZIP file, and its content starts with "PK" string following by standard ZIP binary code.

By the contrast, the abc.zip.html is a pure HTML file, and its content looks like <html> .... <xmp>![CDATA....</xmp></html> . It seems that the SingleFileZ compress web content to binary code and put them in the <xmp>...</xmp> section.

The SingleFileZ project declare it "save a webpage as a self-extracting HTML file", I think it explain many things.
In Obsidian, it block many access operations to avoid XSS attacks, that means the "self-extracting" operation would be blocked.

If you really want to see the content of compressed HTML files in Obsidian, I think the simplest way is to re-save them by original SingleFile browser extension to plain text HTML files.

gildas-lormeau · 2022-10-19T17:13:12Z

@nuthrash Files produced by SingleFileZ are not pure HTML files. These are invalid HTML files (the HTML specification does not allow embedding binary data as is in the markup) but 100% valid zip files in fact. Indeed, the zip specification does not require a zip file to begin with "PK". It allows to store some random data before (and after) the zip data. I know that because I'm the author of zip.js, see https://github.com/gildas-lormeau/zip.js.
So, files produced by SingleFileZ are zip files but disguised as HTML files. From a technical point of view, this is actually very similar to self-extracting executable files (e.g. driver installation programs on Windows). The main difference is that instead of embedding an additional binary program to unzip the file, the HTML page embeds a JavaScript script to unzip the file (and display the saved page). Thus, if you/Obsidian allow the JavaScript code to run, the page saved with SingleFileZ should simply work. This is typically what happens when you open https://gildas-lormeau.github.io/. Otherwise, you would need to add the code that unzips the file and displays the page in your plugin.
Finally, there are also options in SingleFileZ to save pages as non-extractable zip files (i.e. pure binary files beginning with "PK") and compatible with the MAFF specification. I guess this is what @scruel is referring to. In this case, you would also need to add the code that unzips the file and displays the page in your plugin.

nuthrash · 2022-10-20T00:46:51Z

Thus, if you/Obsidian allow the JavaScript code to run, the page saved with SingleFileZ should simply work.

@gildas-lormeau The Obsidian has blocked such Javascript operations in external files (such as .html .md, etc.) by default. I've confirmed it by the most dangerous function insertAdjacentHTML(), and it would show

Error: Cannot open the page from the filesystem.
    Chrome: Install SingleFileZ and enable the option "Allow access to file URLs" in the details page of the extension (chrome://extensions/?id=offkdfbbigofcgdokjemgjpdockaafjg).
    Microsoft Edge: Install SingleFileZ and enable the option "Allow access to file URLs" in the details page of the extension (edge://extensions/?id=gofneaifncimeglaecpnanbnmnpfjekk).
    Safari: Select "Disable Local File Restrictions" in the "Develop" menu.

I have some questions:

Is the "SingleFileZ" web extension necessary? I opened a xxx.zip.html in Opera, it shows the same message. (NOTE: the Obsidian is based on Electron, which embedded a Chromium browser)
How to detect a .html file made by "SingleFileZ"?
Is there a npm package can convert/decode SingleFileZ's .html content to standard HTML string?

gildas-lormeau · 2022-10-20T01:10:50Z

I have some questions:

1. Is the "SingleFileZ" web extension necessary? I opened a xxx.zip.html in Opera, it shows the same message. (NOTE: the Obsidian is based on [Electron](https://www.electronjs.org/), which embedded a Chromium browser)

It is unfortunately necessary to install SingleFileZ to view pages from the filesystem in Chromium-based browsers because they don't allow to run fetch("") (in order to retrieve the displayed page in binary) when the page is opened from the filesystem. It looks like the same limitation is applied in Obsidian.

2. How to detect a .html file made by "SingleFileZ"?

The file can be unzipped and it contains an index.html file in the root folder or the first folder of the zip file (for MAFF files). In addition, for self-extracting pages, the <html> tag contains the attribute data-sfz.

3. Is there a npm package to convert/decode SingleFileZ's .html content to standard HTML string?

The function extractPage in the code below (heavily inspired from this gist) should help you.

import { extract } from "https://raw.githubusercontent.com/gildas-lormeau/SingleFileZ/master/src/single-file/processors/compression/compression-extract.js";
import * as zip from "https://raw.githubusercontent.com/gildas-lormeau/zip.js/master/index.js";
globalThis.zip = zip;

async function extractPage(zipBlob) {
  const { docContent } = await extract(zipBlob, { noBlobURL: true });
  return docContent;
}

You can also use local imports instead of retrieving scripts from raw.githubusercontent.com by importing single-filez-core and zip.js from NPM, and replacing "https://raw.githubusercontent.com/gildas-lormeau/SingleFileZ/master/src/single-file" with "single-filez-core" and "https://raw.githubusercontent.com/gildas-lormeau/zip.js/master/index.js" with "@zip.js/zip.js".

nuthrash added the good first issue Good for newcomers label Sep 12, 2022

nuthrash closed this as completed Sep 22, 2022

nuthrash mentioned this issue Oct 20, 2022

How to correctly parse the contents of the HTML files generated by SingleFileZ #4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[not-issue] usage with SingleFile #1

[not-issue] usage with SingleFile #1

andrey-jef commented Sep 10, 2022

nuthrash commented Sep 11, 2022

gildas-lormeau commented Sep 12, 2022

nuthrash commented Sep 12, 2022

gildas-lormeau commented Sep 12, 2022

nuthrash commented Sep 13, 2022 •

edited

gildas-lormeau commented Sep 15, 2022 •

edited

nuthrash commented Sep 15, 2022 •

edited

scruel commented Oct 19, 2022

nuthrash commented Oct 19, 2022

gildas-lormeau commented Oct 19, 2022 •

edited

nuthrash commented Oct 20, 2022 •

edited

gildas-lormeau commented Oct 20, 2022 •

edited

[not-issue] usage with SingleFile #1

[not-issue] usage with SingleFile #1

Comments

andrey-jef commented Sep 10, 2022

nuthrash commented Sep 11, 2022

gildas-lormeau commented Sep 12, 2022

nuthrash commented Sep 12, 2022

gildas-lormeau commented Sep 12, 2022

nuthrash commented Sep 13, 2022 • edited

gildas-lormeau commented Sep 15, 2022 • edited

nuthrash commented Sep 15, 2022 • edited

scruel commented Oct 19, 2022

nuthrash commented Oct 19, 2022

gildas-lormeau commented Oct 19, 2022 • edited

nuthrash commented Oct 20, 2022 • edited

gildas-lormeau commented Oct 20, 2022 • edited

nuthrash commented Sep 13, 2022 •

edited

gildas-lormeau commented Sep 15, 2022 •

edited

nuthrash commented Sep 15, 2022 •

edited

gildas-lormeau commented Oct 19, 2022 •

edited

nuthrash commented Oct 20, 2022 •

edited

gildas-lormeau commented Oct 20, 2022 •

edited