Skip to content
This repository has been archived by the owner on Apr 30, 2021. It is now read-only.

Date range: March 1,–2, 2011 instead of March 1–2, 2011 #18

Closed
njbart opened this issue Jan 1, 2014 · 12 comments
Closed

Date range: March 1,–2, 2011 instead of March 1–2, 2011 #18

njbart opened this issue Jan 1, 2014 · 12 comments

Comments

@njbart
Copy link
Contributor

njbart commented Jan 1, 2014

Probably introduced by the fix for #12 (at least I don't remember seeing this before):

pandoc --filter pandoc-citeproc -t markdown-citations <<EOT

Foo [@item1].

---
csl: chicago-fullnote-bibliography.csl
references:
- title: Title
  id: item1
  issued:
  - day: 1
    month: 03
    year: 2011
  - day: 2
    month: 03
    year: 2011
  author:
    given:
    - Ann
    family: Author
  container-title: Journal
  type: article-newspaper
...
EOT

Result:

Foo.[^1]

<div class="references">

Author, Ann. “Title.” *Journal*, March 1,–2, 2011.

</div>

[^1]: Ann Author, “Title,” *Journal*, March 1,–2, 2011.

@jgm
Copy link
Owner

jgm commented Jan 3, 2014

Yes, this was introduced by #12. Previous to the fix to #12, pandoc-citeproc simply stripped the suffix entirely from the first day in a date range. #12 changed that behavior so that the suffix is merely trimmed of trailing space.

In your example, the suffix comes from locales/locales-en-US.xml:

  <date form="text">
    <date-part name="month" suffix=" "/>
    <date-part name="day" suffix=", "/>
    <date-part name="year"/>
  </date>

Here is the corresponding bit of locales/locales-da-DK.xml:

  <date form="text">
    <date-part name="day" suffix=". "/>
    <date-part name="month" suffix=" "/>
    <date-part name="year"/>
  </date>

I'm at a loss as to how pandoc-citeproc can distinguish between these cases, stripping the suffix in the en-US case but leaving it on in the da-DK case. I'm actually wondering why the comma suffix is there in the US case. Is that perhaps an error?

@njbart
Copy link
Contributor Author

njbart commented Jan 3, 2014

What seems to work is to revert #12 and use a patched locales/locales-da-DK.xml containing

  <date form="text">
    <date-part name="day" suffix=". "  range-delimiter=".–"/>
    <date-part name="month" suffix=" "/>
    <date-part name="year"/>
  </date>

So my suggestion would be to revert #12 and recommend the use of patched locale files for the time being.

I don't think the comma suffix in en-US is an error, without it you wouldn't get "June 17, 2001" but "June 17 2001".

It seems that CSL did not think of distinguishing between suffixes that are directly coupled to date-parts and "other" punctuation such as the comma in en-US. A cleaner solution might have been using something like

  <date form="text">
  <group suffix=", ">
    <date-part name="month" suffix=" "/>
    <date-part name="day" suffix=" "/>
  </group>
    <date-part name="year"/>
  </date>

... but that does not work, at least not with pandoc-citeproc, and since Zotero doesn't have date ranges, I haven't been able to test this elsewhere yet.

@njbart
Copy link
Contributor Author

njbart commented Jan 3, 2014

[EDIT:] I ran test-citeproc with the post-#12 pandoc-citeproc and two new tests modelled after the ones in the citeproc-js test suite. As expected, date_TextFormFulldateDayRange-NB-da-DK passes, and date_TextFormFulldateDayRange-NB-en-US fails (both without fiddling with range-delimiter):

date_TextFormFulldateDayRange-NB-da-DK.txt:

>>===== MODE =====>>
citation
<<===== MODE =====<<




>>===== RESULT =====>>
10.–23. Marts 2003
<<===== RESULT =====<<


>>===== CSL =====>>
<style 
      xmlns="http://purl.org/net/xbiblio/csl"
      class="note"
      version="1.0">
  <info>
    <id />
    <title />
    <updated>2009-08-10T04:49:00+09:00</updated>
  </info>
  <locale>
    <date form="text">
      <date-part name="day" suffix=". "/>
      <date-part name="month" suffix=" "/>
      <date-part name="year"/>
    </date>
    <date form="numeric">
      <date-part name="day" form="numeric-leading-zeros" suffix="."/>
      <date-part name="month" form="numeric-leading-zeros" suffix="."/>
      <date-part name="year"/>
    </date>
  <terms>
    <term name="month-03">Marts</term>
  </terms>
  </locale>
  <citation>
    <layout>
        <date variable="issued" form="text" date-parts="year-month-day"/>
    </layout>
  </citation>
</style>
<<===== CSL =====<<


>>===== INPUT =====>>
[
    {
        "id": "ITEM-1", 
        "issued": {
            "date-parts": [
                [
                    2003, 
                    3, 
                    10
                ], 
                [
                    2003, 
                    3, 
                    23
                ]
            ]
        }, 
        "title": "Ignore me", 
        "type": "book"
    }
]
<<===== INPUT =====<<

date_TextFormFulldateDayRange-NB-en-US.txt:

>>===== MODE =====>>
citation
<<===== MODE =====<<


>>===== RESULT =====>>
August 10–23, 2003
<<===== RESULT =====<<


>>===== CSL =====>>
<style 
      xmlns="http://purl.org/net/xbiblio/csl"
      class="note"
      version="1.0">
  <info>
    <id />
    <title />
    <updated>2009-08-10T04:49:00+09:00</updated>
  </info>
  <citation>
    <layout>
<!--
      <date prefix="(" suffix=")" variable="issued">
        <date-part name="day" suffix=" " />
        <date-part form="long" name="month" suffix=" "/>
        <date-part name="year" />
      </date>
-->
        <date variable="issued" form="text"/>
    </layout>
  </citation>
</style>
<<===== CSL =====<<


>>===== INPUT =====>>
[
    {
        "id": "ITEM-1", 
        "issued": {
            "date-parts": [
                [
                    2003, 
                    8, 
                    10
                ], 
                [
                    2003, 
                    8, 
                    23
                ]
            ]
        }, 
        "title": "Ignore me", 
        "type": "book"
    }
]
<<===== INPUT =====<<

@jgm
Copy link
Owner

jgm commented Jan 4, 2014

Interesting. What happens if you change the input file just slightly:

-       <date-part name="day" suffix=". "/>
+      <date-part name="day" suffix=", "/>

@njbart
Copy link
Contributor Author

njbart commented Jan 4, 2014

I realized that what I ran was actually a test of the new pandoc-citeproc, not of citeproc-js (see edit above). Sorry. Concerning pandoc-citeproc, this just confirms what we already knew. Would you happen to know how to run the test suite on citeproc-js?

@njbart
Copy link
Contributor Author

njbart commented Jan 5, 2014

I managed to test citeproc-js (on installation see http://gsl-nagoya-u.net/http/pub/citeproc-doc.html#getting-the-citeproc-js-sources and http://gsl-nagoya-u.net/http/pub/citeproc-doc.html#running-the-test-suite) with the two test files I posted here earlier, plus a version of date_TextFormFulldateDayRange-NB-da-DK.txt containing <date-part name="day" suffix=". " range-delimiter=".–"/>.

The en-US test passes, but, interestingly enough, citeproc-js fails both da-DK tests, with and without range-delimiter=".–" (pandoc-citeproc passes when range-delimiter=".–" is included).

Again, I'd like to propose reverting the "fix" for #12 for the moment, to make en-US work correctly again. (And at least in pandoc, one shoud then be able to patch locale files with range-delimiter=".–" to get correct date ranges for da-DK and others until a better solution is found.)

@njbart
Copy link
Contributor Author

njbart commented Jan 5, 2014

Eventually, this turns out not to be a processor bug, but a locale file bug:

Days, and the numerical form of months in Danish, German, and others are ordinal numbers (see https://en.wikipedia.org/wiki/Ordinal_indicator#Croatian.2C_Czech.2C_Danish.2C_Estonian.2C_Faroese.2C_German.2C_Hungarian.2C_Icelandic.2C_Latvian.2C_Norwegian.2C_Polish.2C_Slovak.2C_Slovene.2C_Serbian.2C_Turkish).

Thus, in CSL, the ordinal form should be used instead of the numerical form plus the suffix ".". Both happen to be rendered identically in locales using a dot as ordinal indicator, except, that is, when date ranges are involved.

This means that, instead of

  <date form="text">
    <date-part name="day" suffix=". "/>
    <date-part name="month" suffix=" "/>
    <date-part name="year"/>
  </date>

(from locales/locales-da-DK.xml) the following should be used.

  <date form="text">
    <date-part name="day" form="ordinal" suffix=" "/>
    <date-part name="month" suffix=" "/>
    <date-part name="year"/>
  </date>

Hence, the "fix" for #12 should definitely be reverted, the pre-#12 pandoc-citeproc seems to work well when locale files are fixed instead.

The only thing that will not be possible within the limits of the current CSL specs is to have ordinals with leading zeros.

I will report this issue to the xbiblio list.

@jgm
Copy link
Owner

jgm commented Jan 6, 2014

Is locales/locales-da-DK.xml the only locale file that needs to be fixed?

@jgm jgm closed this as completed in 84070f7 Jan 6, 2014
@njbart
Copy link
Contributor Author

njbart commented Jan 6, 2014

Is locales/locales-da-DK.xml the only locale file that needs to be fixed?

It seems you already fixed
locales-cs-CZ.xml,
locales-da-DK.xml,
locales-de-AT.xml,
locales-de-CH.xml,
locales-de-DE.xml,
locales-et-EE.xml,
locales-fi-FI.xml, and
locales-nb-NO.xml.

Others that contain one or more date-part suffix elements starting with a dot and thus seem likely candidates are
locales-bg-BG.xml,
locales-hr-HR.xml,
locales-hu-HU.xml,
locales-is-IS.xml,
locales-lv-LV.xml,
locales-nn-NO.xml,
locales-pl-PL.xml,
locales-ro-RO.xml,
locales-ru-RU.xml,
locales-sk-SK.xml,
locales-sl-SI.xml, and
locales-sr-RS.xml.

@jgm
Copy link
Owner

jgm commented Jan 6, 2014

+++ nickbart1980 [Jan 06 14 00:16 ]:

Others that contain a suffix element starting with a dot and thus seem
likely candidates are
locales-bg-BG.xml,

Here the dot only occurs in numeric dates (not textual).

locales-hr-HR.xml,

Leading 0s.

locales-hu-HU.xml,

. is not ordinal here.

locales-is-IS.xml,

I've fixed this one.

locales-lv-LV.xml,

. is not ordinal here.

locales-nn-NO.xml,

Fixed.

locales-pl-PL.xml,

. only occurs in numeric dates.

locales-ro-RO.xml,

. is not ordinal.

locales-ru-RU.xml,

. is not ordinal.

locales-sk-SK.xml,

. is not ordinal.

locales-sl-SI.xml, and

. is not ordinal.

locales-sr-RS.xml.

. is not ordinal.

@njbart
Copy link
Contributor Author

njbart commented Jan 6, 2014

Great. However, I tend to think that dots in numerical dates denote ordinals, too. I could not find much on this so far, though, with the exception of de-DE and de-AT: "The format d(d).m(m).(yy)yy (using dots (which denote ordinal numbering)) is the traditional German date format." (http://en.wikipedia.org/wiki/Date_format_by_country)

@jgm
Copy link
Owner

jgm commented Jan 6, 2014

I guess the question is whether in numerical dates with ranges
the dot should be repeated:

1.-3.4.99

I don't know the style.

+++ nickbart1980 [Jan 06 14 11:36 ]:

Great. However, I tend to think that dots in numerical dates denote
ordinals, too. I could not find much on this so far, though, with the
exception of de-DE and de-AT: "The format d(d).m(m).(yy)yy (using dots
(which denote ordinal numbering)) is the traditional German date
format." ([1]http://en.wikipedia.org/wiki/Date_format_by_country)


Reply to this email directly or [2]view it on GitHub.
[3044__eyJzY29wZSI6Ik5ld3NpZXM6QmVhY29uIiwiZXhwaXJlcyI6MTcwNDU2OTgwOSwi
ZGF0YSI6eyJpZCI6MjI3OTQwMTl9fQ==--6000dd72199fd390f001c002321b8a1061f83
046.gif]

References

  1. http://en.wikipedia.org/wiki/Date_format_by_country
  2. Date range: March 1,–2, 2011 instead of March 1–2, 2011 #18 (comment)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants