Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pagedjs throws away text #130

Open
greekcntr opened this issue Apr 3, 2023 · 14 comments
Open

pagedjs throws away text #130

greekcntr opened this issue Apr 3, 2023 · 14 comments

Comments

@greekcntr
Copy link

You can see on line 28 of the provided example that the text of the footnote “6ωσ 01  - 02 ” is simply thrown away and not displayed at all! This is presumable a problem with the footnote spacing algorithm. I cut down this example to be as minimal as possible, while still preserving that fact that it deletes text. Obviously, changing things that affect the spacing will cause the reading to reappear on the page because that affects whatever algorithm it is using to calculate the footnotes. But then, text would be deleted on other pages because of the same problem. One possible clue to this bug, is that if you remove “text-align: justify”, the vertical spacing is not affected at all, and thus theoretically should not affect the footnote spacing algorithm at all, and yet the missing footnote will then appear. Very strange. I have found other bugs that I plan on reporting that I was able to find work arounds for, but this bug that deletes text is highly problematic. Out of 500+ pages, it only deletes a few footnotes, but I cannot afford for it randomly delete certain footnotes. I would very much like to use pagedjs, so please fix this bug, or provide a work around where it will not delete the text.

problem4.txt

@julientaq
Copy link
Collaborator

Hi there
first, your txt file is not valid html, the class in the elements are not surrounded by double quotes.

Then, there is a bug in paged.js when hyphen appears at the end of the page.

you can either try to remove the latest hyphen, using a script:

class noHyphenBetweenPage extends Paged.Handler {
  constructor(chunker, polisher, caller) {
    super(chunker, polisher, caller);
    this.hyphenToken;
  }

  afterPageLayout(pageFragment, page, breakToken) {

    if (pageFragment.querySelector('.pagedjs_hyphen')) {

      // find the hyphenated word  
      let block = pageFragment.querySelector('.pagedjs_hyphen');

      block.dataset.ref = this.prevHyphen;

      // move the breakToken
      let offsetMove = getFinalWord(block.innerHTML).length;

      // move the token accordingly
      page.breakToken = page.endToken.offset - offsetMove;

      // remove the last word
      block.innerHTML = block.innerHTML.replace(getFinalWord(block.innerHTML), "");

      breakToken.offset = page.endToken.offset - offsetMove;

    }
  }

}

Paged.registerHandlers(noHyphenBetweenPage);

function getFinalWord(str)
			
{
    return str.split(' ').pop();
}



@julientaq
Copy link
Collaborator

tell us if that helps. meantime, we’ll figure out a way to save this hyphenation

@greekcntr
Copy link
Author

Thanks for your response. I tried adding the code you provided and there was no difference in results. I have attached what I did in case I included the code it wrong. Regardless, I could not see any place where a “hyphen appears at the end of the page”, because there is no hyphen in the text, only in the footnotes. (And I could not find a hyphen ending a page in the footnotes either.) Thanks for looking into this problem. I very much want to use pagedjs, and love what you are doing. I would appreciate any help you can give me in solving this problem, for the rest of it works great.

P.S. You might want to know: “The HTML standard does not require quotes around attribute values.” https://www.w3schools.com/html/html_attributes.asp They are needed if there is a space in the attribute, but even then “Double quotes around attribute values are the most common in HTML, but single quotes can also be used.”

problem4fix.txt

@greekcntr
Copy link
Author

Also, I tried replacing all hyphens in it with another character, and the problem still remains.

@julientaq
Copy link
Collaborator

Ha, didnt know about that single/doubles quotes in the attributes are not mandatory. Still feel weird, but ok.

So your problem is that footnotes disappear between pages right?

i can’t reproduce. Do you mind sharing a pdf output with the missing note so i can understand a bit more?

@greekcntr
Copy link
Author

greekcntr commented Apr 5, 2023

It is the first footnote on the page that disappears, not one between pages. This code

6ωσ 01  - 02 
is not being rendered, and should be the first footnote on the page. (Sorry, when I include the html it interprets it rather than show the code.) Here is the pdf I get, which is the same whether I use the command line interface, or Chrome, or Firefox, or Edge.

problem4.pdf

@julientaq
Copy link
Collaborator

this is what i have with chrome on osx, using the 0.4.1 polyfill
sample.pdf

what OS are you using? which version of paged.js?

@greekcntr
Copy link
Author

greekcntr commented Apr 5, 2023

I am also using the 0.4.1 polyfill and running Windows 10. I just reinstalled pagedjs to make sure it was up to date, but the same problem persists. npm says I am running pagedjs 0.4.1, and pagedjs-cli 0.3.4. I have tried it with pagedjs-cli as well as Chrome, Firefox, and Edge, all with the same results. One thing of note, is that the font in your pdf example is clearly a different font than my example. Yours does not appear to be the 10pt font that is specified, and thus the lines break in different places than my example. Changing the font, or margins, or anything else that affects the spacing could cause that footnote to appear, but then it would cause another footnote to disappear some pages later. I only showed a minimal snippet to demonstrate the problem, but it occurs several times throughout the 500+ pages I am working with.

@julientaq
Copy link
Collaborator

sure, but we need to find a way to redo your issue, it’s the first time that we got a missing footnote number. And the first of the list, that even weirder.

So what font are you using that breaks everything?

@greekcntr
Copy link
Author

Sorry about that. I was using the default font which is "Times New Roman". I added it to the attached html to make it explicit.

problem4.txt

@greekcntr
Copy link
Author

Were you able to recreate the bug with that html?

@greekcntr
Copy link
Author

In the meantime, do you have any type of work around that will stop it from deleting text. Obviously, changing the formatting could make all the footnote text appear for that page, but then the same problem still persists and it will delete footnote text on a different page, so that does not solve anything. Is there any type of encoding I could use that would make sure the footnote text not be deleted?

@NigelCunningham
Copy link
Contributor

Hi there.

Please see #171, which I've just submitted for consideration. It includes a large rework of the pagination code that should help with this issue. I'd appreciate your feedback.

Regards,

Nigel

@greekcntr
Copy link
Author

I dropped in the newer v0.4.3 paged.polyfill.js file, and didn't see any change in the problem. It still throws away the first footnote. Is there something else I should try?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants