Skip to content
This repository has been archived by the owner on Apr 30, 2021. It is now read-only.

'p.'/'pp.' abbreviation retained with Roman numerals #173

Closed
adunning opened this issue Sep 25, 2015 · 9 comments
Closed

'p.'/'pp.' abbreviation retained with Roman numerals #173

adunning opened this issue Sep 25, 2015 · 9 comments

Comments

@adunning
Copy link
Contributor

When using a page number with Roman numerals, the abbreviation is not suppressed in version 0.7.4 (compare Chicago Manual of Style 14.154):

pandoc -F pandoc-citeproc << EOT

---
references:
- id: test
...

[@test, p. ix]

[@test, pp. ix–x]

EOT

Actual output:

<p><span class="citation">(n.d., p. ix)</span></p>
<p><span class="citation">(n.d., pp. ix–x)</span></p>
<div id="references" class="references">
<div id="ref-test">
<p>n.d.</p>
</div>
</div>

Expected output:

<p><span class="citation">(n.d., ix)</span></p>
<p><span class="citation">(n.d., ix–x)</span></p>
<div id="references" class="references">
<div id="ref-test">
<p>n.d.</p>
</div>
</div>
@adunning adunning changed the title Roman numerals retains 'p.'/'pp.' abbreviation retained with Roman numerals Sep 25, 2015
@jgm
Copy link
Owner

jgm commented Sep 26, 2015

This is because of the current logic for parsing locators out of the generic suffix.
What we get from pandoc is just a suffix -- the part after the citation key itself -- and pandoc-citeproc needs to separate this into a proper locator and a remainder (the suffix). The heuristic we use for identifying locators is designed to capture locators of all of the following forms:

123, 123A, C22, XVII, 33-44, 22-33; 22-11

What is conspicuously missing are lower-case roman numerals. Why? Because it was hard to find a simple heuristic for recognizing these that wouldn't capture regular text, too. But it should be possible to come up with something a bit more robust that could allow lowercase roman too. The relevant part of the code, for reference, is pLocator and pWordWithDigits in src/Text/CSL/Pandoc.hs.

@jgm
Copy link
Owner

jgm commented Sep 26, 2015

To give a better idea of the difficulty:

@smith37, pp. 123A, C22, VII, 22-33, and elsewhere

here we want the locator to include "123A, C22, VII, 22-33"; the rest, , and elsewhere, is the suffix. So we have to be able to determine where the locator stops. Currently that's done by looking for words that contain arabic digits or uppercase roman numerals....a very rough heuristic. I'm open to suggestion for something more robust.

@adunning
Copy link
Contributor Author

The present behaviour allows it to parse things such as [@test, p. 123n5] properly (and technically one could likewise have [@test, p. liin4] in Chicago Style), so I wouldn't change its treatment of words with Arabic numerals.

There are some good regex examples for detecting Roman numerals, but only for uppercase (apologies if you're already using something like this, as I haven't found the code, but it appears to be something similar to their 'flexible' rather than 'strict' example). I suppose this could be extended by looking for words that match this pattern and are entirely uppercase or lowercase (I don't think it's possible to have a mixed-case Roman numeral). I can't presently think of a situation in which this treatment would cause a problem.

@jgm
Copy link
Owner

jgm commented Sep 27, 2015

+++ Andrew Dunning [Sep 26 15 19:39 ]:

The present behaviour allows it to parse things such as [@test, p.
123n5] properly (and technically one could likewise have [@test, p.
liin4] in Chicago Style), so I wouldn't change its treatment of words
with Arabic numerals.

There's some good [1]regex examples for detecting Roman numerals, but
only for uppercase (apologies if you're already using something like
this, as I haven't found the code, but it appears to be something
similar to their 'flexible' rather than 'strict' example). I suppose
this could be extended by looking for words that match this pattern and
are entirely uppercase or lowercase (I don't think it's possible to
have a mixed-case Roman numeral). I can't presently think of a
situation in which this treatment would cause a problem.

I was thinking there might also be cases with mixed roman
numerals and, say, letters: ixn5, for example. So it gets
really complicated to detect these reliably.

@adunning
Copy link
Contributor Author

Unless I am missing something, that's not a problem if you're already treating any word including a number as a locator. The example I gave above of [@test, p. liin4] already works, as does something like p. 2:ix.

@jgm
Copy link
Owner

jgm commented Sep 27, 2015

So, is your suggestion to look for (a) words containing
digits, and (b) words that can be wholly interpreted as roman
numerals (excluding punctuation)?

+++ Andrew Dunning [Sep 27 15 10:59 ]:

Unless I am missing something, that's not a problem if you're already
treating any word including a number as a locator. The example I gave
above of [@test, p. liin4] already works, as does something like p.
2:ix.


Reply to this email directly or [1]view it on GitHub.

References

  1. 'p.'/'pp.' abbreviation retained with Roman numerals #173 (comment)

@jgm
Copy link
Owner

jgm commented Sep 27, 2015

PS. there's already a romanNumeral parser in Text.Pandoc.Parsing.

+++ Andrew Dunning [Sep 27 15 10:59 ]:

Unless I am missing something, that's not a problem if you're already
treating any word including a number as a locator. The example I gave
above of [@test, p. liin4] already works, as does something like p.
2:ix.


Reply to this email directly or [1]view it on GitHub.

References

  1. 'p.'/'pp.' abbreviation retained with Roman numerals #173 (comment)

@adunning
Copy link
Contributor Author

Right, (a) being what I understand to be included in the present behaviour. Interpreting entirely lowercase words that can be parsed as Roman numerals might exclude a few edge cases (e.g. a § vii A?); but it would cover the main usage of citing front matter, and adding only this rule shouldn't introduce any adverse effects.

@jgm jgm closed this as completed in 5777f36 Nov 2, 2015
@adunning
Copy link
Contributor Author

adunning commented Nov 2, 2015

Wonderful; thank you!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants