Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MD/LaTeX -> PDF: bibliography running off page when penalties=10000 #3255

Closed
jextxadore opened this issue Nov 24, 2016 · 12 comments
Closed

MD/LaTeX -> PDF: bibliography running off page when penalties=10000 #3255

jextxadore opened this issue Nov 24, 2016 · 12 comments

Comments

@jextxadore
Copy link

jextxadore commented Nov 24, 2016

Version 1.16.0.2

Problem: when \clubpenalty=10000 and/or \widowpenalty=10000 in the preamble, the bibliography starts one page later than it should (i.e. leaves a blank page above — in the attached image, it starts on p.13, but should start on p.12), and tries to fit on one page — it goes past the bottom margin.

Commenting out the penalties results in the bibliography displaying all the entries and respecting page margins, but with poor formatting — the rest of the document is also poorly formatted as a result.

Headers are Markdown-style #; LaTeX formatting commands are also used (\textsc{} etc.). I have not set "sloppy" or anything similar.

Expected: a bibliography that starts on p.12 and goes on, formatted without widows or orphans.

Command (bash variables because I use a script that reads the input files from a list):

pandoc \
--latex-engine=xelatex \
-H preamble.tex \
-V fontsize=11pt -V classoption:oneside -V papersize:a4paper -V documentclass:report \
--chapters \
--number-sections \
--bibliography=~/Dropbox/University/Mendeley-bibtexsync/x.bib \
--csl=~/Dropbox/Academic-latex/Bibliography/apa.csl \
$INPUT \
-o "$EXAMNUMBER"_"$WORDCOUNT"\ words_"$TITLE".pdf

preamble.tex:

\usepackage{tabu}
\DeclareTextCommandDefault{\nobreakspace}{\leavevmode\nobreak\ }

\usepackage[T1]{fontenc}
\usepackage[british]{babel}
\usepackage[autostyle=true,english=british]{csquotes}
\usepackage[onehalfspacing]{setspace}
\setlength{\parskip}{8pt}
\setlength{\parindent}{2em}
\usepackage[includeheadfoot,margin=2.5cm]{geometry}
\usepackage{lscape}
\usepackage[font={footnotesize,sf}]{caption}
\usepackage[multiple]{footmisc}

\usepackage{palatino}
\usepackage{helvet}
\usepackage{sectsty}
   \sectionfont{\sffamily}
   \subsectionfont{\sffamily}
   \subsubsectionfont{\sffamily}
\usepackage{fancyhdr}
\fancyhf{}
\pagestyle{fancy}
\fancyhead[L]{examnumber}
\fancyhead[R]{\thepage}
\fancypagestyle{plain}{}

% Orphans & widows
%\interfootnotelinepenalty=10000
\clubpenalty=10000
\widowpenalty=10000
%\brokenpenalty=10000

\setcounter{secnumdepth}{0}
\usepackage{siunitx}
\sisetup{detect-all}
\setcounter{secnumdepth}{3}

u103l

@jgm
Copy link
Owner

jgm commented Nov 25, 2016 via email

@jgm
Copy link
Owner

jgm commented Nov 25, 2016

The penalties were your addition, not default pandoc output. You say

Commenting out the penalties results in the bibliography displaying all the entries and respecting page margins, but with poor formatting — the rest of the document is also poorly formatted as a result.

Can you be more precise about the "poor formatting" when clubpenalty and widowpenalty aren't used?

Have you tried setting these penalties to lower values?

@jextxadore
Copy link
Author

jextxadore commented Nov 25, 2016

Without clubpenalty and widowpenalty, orphans and widows appear. See the bottom line (left) and first line (right) — ideally, it should be both lines on either page as 1 line goes against convention:

2016-11-25 10 21 00

With widowpenalty and clubpenalty at 5000, the page skipping happens again. At 3000 and 4000 though, the output is identical to if I don't set it at all. On the middle page (11), you can see how the text runs far past the bottom margin. It does eventually go to the next page, but there is text lost (i.e. not visible on either page 10 or 12). The bibliography should start on page 10.

2016-11-25 10 24 38

@jgm
Copy link
Owner

jgm commented Nov 25, 2016

OK. This seems to be a LaTeX issue. I don't know what the best solution is, but there's no reason to have an issue here unless you can suggest some specific way in which pandoc's latex output should be different.

@njbart
Copy link

njbart commented Nov 25, 2016

This seems similar to jgm/pandoc-citeproc#264, but I have also seen such effects myself in the past.

I'm not sure what the exact cause of this is – but I think a case could be made that pandoc should output a latex list rather than ordinary paragraphs. latex's native bibliography environment, also used by bibtex and biblatex, is a list environment, too, and probably for excellent reasons.

Both to avoid formatting issues as described in the OP and to obtain a hanging indent format, I have been using a filter for years now that transforms latex reference list entries such as

Doe, J. (2000) \emph{Title.}

Roe, R. (2001) \emph{Title2.}

to

\begin{references}
\item Doe, J. (2000) \emph{Title.}
\item Roe, R. (2001) \emph{Title2.}
\end{references}

which, in combination with a suitable definition of the references environment in a latex preamble:

\newenvironment{references} {\list{}{%
    \leftmargin1.5em%
    \itemindent-\leftmargin%
    \itemsep0.5ex%
    \parsep0pt%
    }}
    {\endlist}

gives a nice hanging indent format without ever running into any of the issues described in the OP.

I feel it would be a good idea if pandoc tried to implement this natively, both to avoid underfull pages and to enable the hanging indent format required by a vast number of styles.

Even better, pandoc could start outputting references as two distinct elements, as required by second-field-align formats, i.e., separating the "first field" from the following ones. For numbered styles, it would typically be the number that goes into the first element, and the rest of the reference into the second element.

In practical terms for latex, this would mean to output a reference item in form of a command with two arguments, say \pandocrefitem{}{}, which would of course have to be defined in the latex preamble, either as

\renewcommand{\pandocrefitem}[2]{\item #1 #2} for hanging indent, or as

\renewcommand{\pandocrefitem}[2]{\item [#1] #2} for second-field-align=flush or =margin,

in combination of course with a suitable (re)definition of the references environment.

@jgm
Copy link
Owner

jgm commented Nov 25, 2016

Currently pandoc-citeproc gives you a structure like this:

Div ("refs",["references"],[])
 [Div ("ref-item1",[],[])
  [Para [Str "Doe",Str ",",Space,Str "John",Str ".",Space,Str "2005",Str ".",Space,Emph [Str "First",Space,Str "Book"],Str ".",Space,Str "Cambridge",Str ":",Space,Str "Cambridge",Space,Str "University",Space,Str "Press",Str "."]]
 ,Div ("ref-item2",[],[])
  [Para [Str "\8212\8212\8212",Str ".",Space,Str "2006",Str ".",Space,Str "\8220",Str "Article",Str ".",Str "\8221",Space,Emph [Str "Journal",Space,Str "of",Space,Str "Generic",Space,Str "Studies"],Space,Str "6",Str ":",Space,Str "33\8211\&34",Str "."]]]]

One approach (A) would be to make an ad hoc modification to the LaTeX writer, so that a structure matching this is rendered as you suggest above (though with the hypertargets that pandoc would normally support). Such a change would be relatively easy and would only affect pandoc, not pandoc-citeproc. A definition of the references environment would need to be inserted into the default latex (and beamer) template, and preferably made conditional on the actual presence of a bibliography.

A more radical approach (B) would be to have pandoc-citeproc emit a list, something like (schematically)

Div #refs
  BulletList
    Item
      Div #ref-item-1
    Item
      Div #ref-item-2

This would have the drawback of causing bibliographies to be rendered as bullet lists in all formats, unless something special was done to style them differently. In HTML you can just use CSS, but this could be a serious problem in many formats, e.g. docx or plain text formats like Markdown. I think it would be better not to do this, unless (C) we added a generic (neither bulleted nor ordered) list type to pandoc-types. But (C) would be a huge amount of work, requiring changes to all writers and readers.

A fourth possible change (D) would be to have pandoc-citeproc emit something more complex than a Div containing a single Para for each entry. For example, as you suggested, we could have something like

Div #refs
  Div #ref-item1
    Div .first-field
      "[1]"
    Div .second-field second-field-align=flush
      "main citation"

This would require more extensive changes to pandoc-citeproc than B, and changes to all the pandoc writers, which would have to be taught something intelligent to do with these constructions.

@njbart
Copy link

njbart commented Nov 27, 2016

I tend to favour "D" as the cleanest approach.

Just note that hanging-indent=true and second-field-align=flush / =margin are per-style settings, so they would be best represented by:

Div #refs .second-field-align=flush
  Div #ref-item1
    Div .first-field
      "[1]"
    Div .second-field
      "main citation"

However, for hanging-indent="true" actually no additional divs are needed at all, so the following would do as well – depending of course on what is easier for the various writers to work with:

Div #refs .hanging-indent=true
  Div #ref-item1
    "full citation"

Spans could work, too (since all of this takes place within one paragraph), and since strictly speaking only the first field needs to be tagged for special treatment, the first example could also look like this:

Div #refs .second-field-align=flush
  Div #ref-item1
    Span .first-field
      "[1]"
    Span .second-field
      "main citation"

or like this:

Div #refs .second-field-align=flush
  Div #ref-item1
    Span .first-field
      "[1]"
    "main citation"

Unfortunately, there's a further complication: To fully implement the CSL specs, pandoc-citeproc would also have to cater for CSL's "display" attributes (http://docs.citationstyles.org/en/stable/specification.html#display).

Example "B" from the CSL specs, with "notes" added, would have to be represented by something like this:

Div #refs
  Div #ref-item1
    Div. display="block"
      "author"
    Div .display="left-margin"
      "year"
    Div .display="right-inline"
      "main citation"
    Div .display="indent"
      "notes"

What's more, since @rmzelle reported (in jgm/pandoc-citeproc#85) that he thinks the CSL specs do not preclude the bibliography-specific "whitespace" options (hanging-indent, second-field-align) from co-occurring with the "display" attributes, constructs like the following seem to be allowed, too:

Div #refs .second-field-align=flush
  Div #ref-item1
    Div .display="block"
      Span .first-field
        "[1]"
      Span .second-field
        "main citation"
    Div .display="indent"
      "abstract"
    Div .display="indent"
      "notes"

Now, while I tend to think that there might be arguments for using hanging-indent=true and second-field-align=flush / =margin within display="block" (e.g., when a hanging-indent main citation is followed by an indented paragraph containing an abstract or comment), I am much more sceptical when it comes to the others.

Actually, I think that within display="left-margin (which doesn't even allow linebreaks and is obviously reserved for numbers, years, and short labels such as "XYZ16"), hanging-indent=true and second-field-align=flush / =margindo not make the slightest sense, and their use in display="right-inline" or display="indent" is at least highly questionable.

Hence I feel pandoc should permit hanging-indent=true and second-field-align=flush / =margin only within display="block" – if it tries to implement the display stuff at all at the moment, that is.

(To be continued, I'll have to think about some of the details more carefully ...)

@jgm
Copy link
Owner

jgm commented Feb 22, 2017

Unambitious first step:

  • Have the LaTeX writer interpret the nested Divs output by pandoc-citeproc by creating a list structure. I suppose a default implementation needs to be put in the template too (conditional on the presence of bibliography entries).

Later we can think about changing pandoc-citeproc's output, etc.

@wilx
Copy link
Contributor

wilx commented Mar 2, 2017

Isn't #2704 related? Does its fix also fix this issue?

@jgm
Copy link
Owner

jgm commented Mar 2, 2017

It would be nice if we had an actual test case so we could see if #2704 helps with this.

@wilx
Copy link
Contributor

wilx commented Mar 2, 2017

@jgm commented on 2. 3. 2017 15:47 SEČ:

It would be nice if we had an actual test case so we could see if #2704 helps with this.

I think it does fix this issue as well. I have constructed a test case with all of my references. Adding the \leavevmode fixed some visible issues.

@jgm
Copy link
Owner

jgm commented Mar 3, 2017

Great, I'll close this then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants