'p.'/'pp.' abbreviation retained with Roman numerals #173

adunning · 2015-09-25T17:40:43Z

When using a page number with Roman numerals, the abbreviation is not suppressed in version 0.7.4 (compare Chicago Manual of Style 14.154):

pandoc -F pandoc-citeproc << EOT

---
references:
- id: test
...

[@test, p. ix]

[@test, pp. ix–x]

EOT

Actual output:

<p><span class="citation">(n.d., p. ix)</span></p>
<p><span class="citation">(n.d., pp. ix–x)</span></p>
<div id="references" class="references">
<div id="ref-test">
<p>n.d.</p>
</div>
</div>

Expected output:

<p><span class="citation">(n.d., ix)</span></p>
<p><span class="citation">(n.d., ix–x)</span></p>
<div id="references" class="references">
<div id="ref-test">
<p>n.d.</p>
</div>
</div>

jgm · 2015-09-26T15:23:51Z

This is because of the current logic for parsing locators out of the generic suffix.
What we get from pandoc is just a suffix -- the part after the citation key itself -- and pandoc-citeproc needs to separate this into a proper locator and a remainder (the suffix). The heuristic we use for identifying locators is designed to capture locators of all of the following forms:

123, 123A, C22, XVII, 33-44, 22-33; 22-11

What is conspicuously missing are lower-case roman numerals. Why? Because it was hard to find a simple heuristic for recognizing these that wouldn't capture regular text, too. But it should be possible to come up with something a bit more robust that could allow lowercase roman too. The relevant part of the code, for reference, is pLocator and pWordWithDigits in src/Text/CSL/Pandoc.hs.

jgm · 2015-09-26T15:30:32Z

To give a better idea of the difficulty:

@smith37, pp. 123A, C22, VII, 22-33, and elsewhere

here we want the locator to include "123A, C22, VII, 22-33"; the rest, , and elsewhere, is the suffix. So we have to be able to determine where the locator stops. Currently that's done by looking for words that contain arabic digits or uppercase roman numerals....a very rough heuristic. I'm open to suggestion for something more robust.

adunning · 2015-09-27T02:39:42Z

The present behaviour allows it to parse things such as [@test, p. 123n5] properly (and technically one could likewise have [@test, p. liin4] in Chicago Style), so I wouldn't change its treatment of words with Arabic numerals.

There are some good regex examples for detecting Roman numerals, but only for uppercase (apologies if you're already using something like this, as I haven't found the code, but it appears to be something similar to their 'flexible' rather than 'strict' example). I suppose this could be extended by looking for words that match this pattern and are entirely uppercase or lowercase (I don't think it's possible to have a mixed-case Roman numeral). I can't presently think of a situation in which this treatment would cause a problem.

jgm · 2015-09-27T04:34:03Z

+++ Andrew Dunning [Sep 26 15 19:39 ]:

The present behaviour allows it to parse things such as [@test, p.
123n5] properly (and technically one could likewise have [@test, p.
liin4] in Chicago Style), so I wouldn't change its treatment of words
with Arabic numerals.

There's some good [1]regex examples for detecting Roman numerals, but
only for uppercase (apologies if you're already using something like
this, as I haven't found the code, but it appears to be something
similar to their 'flexible' rather than 'strict' example). I suppose
this could be extended by looking for words that match this pattern and
are entirely uppercase or lowercase (I don't think it's possible to
have a mixed-case Roman numeral). I can't presently think of a
situation in which this treatment would cause a problem.

I was thinking there might also be cases with mixed roman
numerals and, say, letters: ixn5, for example. So it gets
really complicated to detect these reliably.

adunning · 2015-09-27T17:59:56Z

Unless I am missing something, that's not a problem if you're already treating any word including a number as a locator. The example I gave above of [@test, p. liin4] already works, as does something like p. 2:ix.

jgm · 2015-09-27T18:36:31Z

So, is your suggestion to look for (a) words containing
digits, and (b) words that can be wholly interpreted as roman
numerals (excluding punctuation)?

+++ Andrew Dunning [Sep 27 15 10:59 ]:

Unless I am missing something, that's not a problem if you're already
treating any word including a number as a locator. The example I gave
above of [@test, p. liin4] already works, as does something like p.
2:ix.

—
Reply to this email directly or [1]view it on GitHub.

References

'p.'/'pp.' abbreviation retained with Roman numerals #173 (comment)

jgm · 2015-09-27T18:38:11Z

PS. there's already a romanNumeral parser in Text.Pandoc.Parsing.

+++ Andrew Dunning [Sep 27 15 10:59 ]:

Unless I am missing something, that's not a problem if you're already
treating any word including a number as a locator. The example I gave
above of [@test, p. liin4] already works, as does something like p.
2:ix.

—
Reply to this email directly or [1]view it on GitHub.

References

'p.'/'pp.' abbreviation retained with Roman numerals #173 (comment)

adunning · 2015-09-27T18:47:49Z

Right, (a) being what I understand to be included in the present behaviour. Interpreting entirely lowercase words that can be parsed as Roman numerals might exclude a few edge cases (e.g. a § vii A?); but it would cover the main usage of citing front matter, and adding only this rule shouldn't introduce any adverse effects.

adunning · 2015-11-02T02:12:44Z

Wonderful; thank you!

adunning changed the title ~~Roman numerals retains~~ 'p.'/'pp.' abbreviation retained with Roman numerals Sep 25, 2015

jgm closed this as completed in 5777f36 Nov 2, 2015

seifferth mentioned this issue Sep 1, 2019

Feature request: Allow more options for roman page numbers in citations #415

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'p.'/'pp.' abbreviation retained with Roman numerals #173

'p.'/'pp.' abbreviation retained with Roman numerals #173

adunning commented Sep 25, 2015

jgm commented Sep 26, 2015

jgm commented Sep 26, 2015

adunning commented Sep 27, 2015

jgm commented Sep 27, 2015

adunning commented Sep 27, 2015

jgm commented Sep 27, 2015

jgm commented Sep 27, 2015

adunning commented Sep 27, 2015

adunning commented Nov 2, 2015

'p.'/'pp.' abbreviation retained with Roman numerals #173

'p.'/'pp.' abbreviation retained with Roman numerals #173

Comments

adunning commented Sep 25, 2015

jgm commented Sep 26, 2015

jgm commented Sep 26, 2015

adunning commented Sep 27, 2015

jgm commented Sep 27, 2015

adunning commented Sep 27, 2015

jgm commented Sep 27, 2015

jgm commented Sep 27, 2015

adunning commented Sep 27, 2015

adunning commented Nov 2, 2015