Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--file-scope is broken for epub #8741

Closed
BBBalls opened this issue Apr 1, 2023 · 5 comments
Closed

--file-scope is broken for epub #8741

BBBalls opened this issue Apr 1, 2023 · 5 comments
Labels

Comments

@BBBalls
Copy link

BBBalls commented Apr 1, 2023

Linux POP!_OS 22.04
pandoc 3.1.2
Calibre 6.14

When an epub is created from pandoc markdown files using the --file-scope option, the output epub is not broken up into multiple files, even with the use of the --split-level option, and the generated TOC expects a multiple file epub.

For the demonstration epub outputs change the .zip extention to .epub, because GitHub doesn't allow uploading .epub.

metadata.md
test1.md
test2.md

Test1:
pandoc -f markdown -t epub --file-scope --metadata-file metadata.md -o file-scope-test.epub test1.md test2.md

output:
file-scope-test.zip

screenshot selecting a chapter:
test1

The output epub is not split into multiple files according to the default behavior of splitting at level 1 headings, and the footnotes are formatted in the manner of a single file epub.

Test2:
pandoc -f markdown -t epub --file-scope --split-level=1 --metadata-file metadata.md -o file-scope-test_split-level.epub test1.md test2.md

output:
file-scope-test_split-level.zip

screenshot selecting a chapter:
test2

The output epub is not split into multiple files when explicitly instructed split at level 1 headings, and the footnotes are formatted in the manner of a single file epub.

Test3:
This test is to show that pandoc behaves as expected when the --file-scope option is not used.

pandoc -f markdown -t epub --metadata-file metadata.md -o file-scope-test_no-file-scope.epub test1.md test2.md

output:
[WARNING] Duplicate note reference '1' at test2.md line 7 column 1
[WARNING] Duplicate note reference '2' at test2.md line 9 column 1
file-scope-test_no-file-scope.zip

screenshot selecting a chapter:
test3

The output epub is split into multiple files according to the default behavior of splitting at level 1 headings, and there are conflicting footnotes (as expected).

@BBBalls BBBalls added the bug label Apr 1, 2023
@jgm
Copy link
Owner

jgm commented Apr 2, 2023

That's quite unexpected, because --file-scope should only affect the reader, and --split-level only the writer.
Do you see a difference in native output with and without --file-scope?

@BBBalls
Copy link
Author

BBBalls commented Apr 2, 2023

Here are the outputs you requested. There are differences between the two native outputs.

pandoc -f markdown -t native --metadata-file metadata.md -o native_file-scope-test_no_file-scope.txt test1.md test2.md

output:
[WARNING] Duplicate note reference '1' at test2.md line 7 column 1
[WARNING] Duplicate note reference '2' at test2.md line 9 column 1
native_file-scope-test_no_file-scope.txt

pandoc -f markdown -t native --file-scope --metadata-file metadata.md -o native_file-scope-test_with_file-scope.txt test1.md test2.md

output:
native_file-scope-test_with_file-scope.txt

Additional Testing
In doing some more testing, I found that removing the --metadata-file option changed the behavior of the output, mainly it allows the TOC to work as expected.

Test4
pandoc -f markdown -t epub --file-scope -o file-scope-test_2.epub test1.md test2.md

output:
[WARNING] This document format requires a nonempty <title> element.
Defaulting to 'test1' as the title.
To specify a title, use 'title' in metadata or --metadata title="...".
file-scope-test_2.zip

The TOC works correctly. The document was not split into multiple files according to level 1 headings. --file-scope worked, but in the manner of a single file document. (unexpected behavior)

Test5
pandoc -f markdown -t epub --file-scope --split-level=1 -o file-scope-test_split-level_2.epub test1.md test2.md

output:
This document format requires a nonempty <title> element.
Defaulting to 'test1' as the title.
To specify a title, use 'title' in metadata or --metadata title="...".
file-scope-test_split-level_2.zip

The TOC works correctly. The document was not split into multiple files according to level 1 headings. --file-scope worked, but in the manner of a single file document. (unexpected behavior)

Here is the native output without the --metadata-file option

pandoc -f markdown -t native --file-scope -o native_file-scope-test_no_metadata_with_file-scope.txt test1.md test2.md

output:
native_file-scope-test_no_metadata_with_file-scope.txt

@jgm
Copy link
Owner

jgm commented Apr 2, 2023

OK, I see the issue. With --file-scope, we get:

, Div
    ( "test2.md" , [] , [] )
    [ Header
        1
        ( "test2.md__test-2" , [] , [] )
        [ Str "test" , Space , Str "2" ]

That extra Div (which keeps track of the source file name) is interfering with the splitting.
That's a diagnosis, not a solution, but at least I understand now.

@aaron-meyers
Copy link

Just ran into this as well. By combining all of my content into a single xhtml file in the epub, the resulting xhtml file is very large and the Calibre epub viewer at least completely hangs trying to open it. Without --file-scope I was getting individual xhtml files for each section level 1 heading as noted above. I'll avoid --file-scope for now but unfortunately that means I need to renumber in the footnotes in my input markdown files since they currently reuse numbers in each file.

@jgm
Copy link
Owner

jgm commented Apr 20, 2024

Note: I think the best place to apply a fix would be in makeSections:
https://github.com/jgm/pandoc/blob/main/src/Text/Pandoc/Shared.hs#L546-L560
Currently this will treat the Div displayed in my comment above as opaque (and not split sections within it), because the Div and the Header under it have distinct identifiers, and we want to preserve both of them as anchors. I'm not sure what the workaround should be, though. We don't want to throw away either id or we lose a potential anchor for internal links. But in this case throwing away the outer id (corresponding to the filename) is probably the best approach; I don't think it will be the target of any automatically generated internal links.

To make this robust, we should have file-scope put a special marking on the Divs that are generated to hold the sections form a particular file, so they can be ignored by makeSections...

Or, even easier: instead of having file-scope add an id, we could have it add something like data-source=test1.md.

@jgm jgm closed this as completed in 30442b7 Apr 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants