-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IAST Sanskrit Collation: Letters with diacritics are not sorted properly #765
Comments
Thank you for reporting this. Unfortunately I don't really know about the transliteration Biber stuff and @plk is a bit snowed under at the moment, so I can't promise a quick fix. As far as I know Biber uses an external library to do the transliteration ( I don't suppose you could check that the Full text of the MWE
\documentclass{article}
\listfiles
\usepackage{polyglossia}
\setdefaultlanguage{sanskrit}
\newfontfamily\sanskritfont{Latin Modern Roman}
\usepackage{fontspec}
\usepackage{biblatex}
\usepackage{filecontents}
\addbibresource{\jobname.bib}
\begin{filecontents*}{\jobname.bib}
@misc{aka,
title = {aka},
}
@misc{aṃka,
title = {aṃka},
}
@misc{aca,
title = {aca},
}
@misc{aṃca,
title = {aṃca},
}
@misc{aṭa,
title = {aṭa},
}
@misc{aṃṭa,
title = {aṃṭa},
}
@misc{ata,
title = {ata},
}
@misc{aṃta,
title = {aṃta},
}
@misc{apa,
title = {apa},
}
@misc{aṃpa,
title = {aṃpa},
}
@misc{aya,
title = {aya},
}
@misc{aṃya,
title = {aṃya},
}
@misc{ara,
title = {ara},
}
@misc{aṃra,
title = {aṃra},
}
@misc{ala,
title = {ala},
}
@misc{aṃla,
title = {aṃla},
}
@misc{ava,
title = {ava},
}
@misc{aṃva,
title = {aṃva},
}
@misc{aśa,
title = {aśa},
}
@misc{aṃśa,
title = {aṃśa},
}
@misc{aṣa,
title = {aṣa},
}
@misc{aṃṣa,
title = {aṃṣa},
}
@misc{asa,
title = {asa},
}
@misc{aṃsa,
title = {aṃsa},
}
@misc{aha,
title = {aha},
}
@misc{aṃha,
title = {aṃha},
}
@misc{aḥka,
title = {aḥka},
}
@misc{aḥca,
title = {aḥca},
}
@misc{aḥṭa,
title = {aḥṭa},
}
@misc{aḥta,
title = {aḥta},
}
@misc{aḥpa,
title = {aḥpa},
}
@misc{aḥya,
title = {aḥya},
}
@misc{aḥra,
title = {aḥra},
}
@misc{aḥla,
title = {aḥla},
}
@misc{aḥva,
title = {aḥva},
}
@misc{aḥśa,
title = {aḥśa},
}
@misc{aḥṣa,
title = {aḥṣa},
}
@misc{aḥsa,
title = {aḥsa},
}
@misc{aḥha,
title = {aḥha},
}
@misc{Agnipurāṇa,
title = {Agnipurāṇa},
}
@misc{Agniveśyagṛhyasūtra,
title = {Agniveśyagṛhyasūtra},
}
@misc{Atharvavedapariśiṣṭa,
title = {Atharvavedapariśiṣṭa},
}
@misc{Abhayapaddhati,
title = {Abhayapaddhati},
}
@misc{Amoghapāśakalparāja,
title = {Amoghapāśakalparāja},
}
@misc{Arthaśāstra,
title = {Arthaśāstra},
}
@misc{Alaṃkārakārikā,
title = {Alaṃkārakārikā},
}
@misc{Īśānaśivagurudevapaddhati,
title = {Īśānaśivagurudevapaddhati},
}
@misc{Ṛgvidhāna,
title = {Ṛgvidhāna},
}
@misc{Kalyāṇakāmadhenu,
title = {Kalyāṇakāmadhenu},
}
@misc{Kiraṇatantra,
title = {Kiraṇatantra},
}
@misc{Kubjikāmatatantra,
title = {Kubjikāmatatantra},
}
@misc{Kuṭṭanīmata,
title = {Kuṭṭanīmata},
}
@misc{Kṛṣṇayamāritantrapañjikā,
title = {Kṛṣṇayamāritantrapañjikā},
}
@misc{Guhyasamājatantra,
title = {Guhyasamājatantra},
}
@misc{Guhyasamājamaṇḍalavidhi,
title = {Guhyasamājamaṇḍalavidhi},
}
@misc{Guhyasiddhi,
title = {Guhyasiddhi},
}
@misc{Caṇḍamahāroṣaṇatantra,
title = {Caṇḍamahāroṣaṇatantra},
}
@misc{Caṇḍamahāroṣaṇatantrapañjikā,
title = {Caṇḍamahāroṣaṇatantrapañjikā Padmāvatī},
}
@misc{Chandaḥsaṃgraha,
title = {Chandaḥsaṃgraha},
}
@misc{Chandaḥsāra,
title = {Chandaḥsāra},
}
@misc{Jayākhyasaṃhitā,
title = {Jayākhyasaṃhitā},
}
@misc{Jñānaratnāvalī,
title = {Jñānaratnāvalī},
}
@misc{Jyotiḥsāra,
title = {Jyotiḥsāra},
}
@misc{Tattvaratnāvalī,
title = {Tattvaratnāvalī},
}
@misc{Tantrasadbhāva,
title = {Tantrasadbhāva},
}
@misc{Tantrāloka,
title = {Tantrāloka},
}
@misc{Divyāvadāna,
title = {Divyāvadāna},
}
@misc{Derge,
title = {Derge},
}
@misc{Nityādisaṅgrahābhidhānapaddhati,
title = {Nityādisaṅgrahābhidhānapaddhati},
}
@misc{Niśvāsatattvasaṃhitā,
title = {Niśvāsatattvasaṃhitā},
}
@misc{Niśvāsakārikā,
title = {Niśvāsakārikā},
}
@misc{Parākhyatantra,
title = {Parākhyatantra},
}
@misc{Pārameśvaratantra,
title = {Pārameśvaratantra},
}
@misc{Pūrva-Kāmika,
title = {Pūrva-Kāmika},
}
@misc{Pratiṣṭhālakṣaṇasārasamuccaya,
title = {Pratiṣṭhālakṣaṇasārasamuccaya},
}
@misc{Brahmayāmalatantra,
title = {Brahmayāmalatantra},
}
@misc{Bhairavapadmāvatīkalpa ,
title = {Bhairavapadmāvatīkalpa },
}
@misc{Mañjuśriyamūlakalpa,
title = {Mañjuśriyamūlakalpa},
}
@misc{Mataṅgapārameśvarāgama,
title = {Mataṅgapārameśvarāgama},
}
@misc{Mālinīvijayottaratantra,
title = {Mālinīvijayottaratantra},
}
@misc{Muktāvalī,
title = {Muktāvalī},
}
@misc{Mṛgendratantra,
title = {Mṛgendratantra},
}
@misc{Bṛhatsaṃhitā,
title = {Bṛhatsaṃhitā},
}
@misc{Rauravasūtrasaṅgraha,
title = {Rauravasūtrasaṅgraha},
}
@misc{Laghutantraṭīkā,
title = {Laghutantraṭīkā},
}
@misc{Laghuśaṃvaratantra,
title = {Laghuśaṃvaratantra},
}
@misc{Vajrāvalī,
title = {Vajrāvalī},
}
@misc{Vimalaprabhā,
title = {Vimalaprabhā},
}
@misc{Vīṇāśikhatantra,
title = {Vīṇāśikhatantra},
}
@misc{Śāradātilaka,
title = {Śāradātilaka},
}
@misc{Śivatattvaratnākara,
title = {Śivatattvaratnākara},
}
@misc{Sampuṭatantraprakaraṇārthanirṇaya,
title = {Sampuṭatantraprakaraṇārthanirṇaya},
}
@misc{Sampuṭodbhavatantra,
title = {Sampuṭodbhavatantra},
}
@misc{Sarvajñānottaratantra,
title = {Sarvajñānottaratantra},
}
@misc{Sarvajñānottaravṛtti,
title = {Sarvajñānottaravṛtti},
}
@misc{Sarvatathāgatatattvasaṅgraha,
title = {Sarvatathāgatatattvasaṅgraha},
}
@misc{Sarvatathāgatādhiṣṭhānasattvāvalokanabuddhakṣetrasaṃdarśanavyūha,
title = {Sarvatathāgatādhiṣṭhānasattvāvalokanabuddhakṣetrasaṃdarśanavyūha},
}
@misc{Sādhanamālā,
title = {Sādhanamālā},
}
@misc{Sārdhatriśatikālottara,
title = {Sārdhatriśatikālottara},
}
@misc{Siddhayogeśvarīmata,
title = {Siddhayogeśvarīmata},
}
@misc{Siddhaikavīratantra,
title = {Siddhaikavīratantra},
}
@misc{Saurasaṃhitā,
title = {Saurasaṃhitā},
}
@misc{Svacchandatantra,
title = {Svacchandatantra},
}
@misc{Svāyambhuvapāñcarātra,
title = {Svāyambhuvapāñcarātra},
}
@misc{Svāyambhuvasūtrasaṅgraha,
title = {Svāyambhuvasūtrasaṅgraha},
}
@misc{Harṣacarita,
title = {Harṣacarita},
}
@misc{Hevajratantra,
title = {Hevajratantra},
}
\end{filecontents*}
\DeclareSortTranslit{
\translit[title]{iast}{devanagari}
}
\begin{document}
\nocite{*}
\printbibliography
\end{document} |
Yes, in the link referenced above I have helped @plk create the IAST to Devanāgarī module for Lingua::Translit, and it has been already very useful for me also outside of biblatex for perl scripts converting e-texts. I will check if they still work as expected, or if some bug has crept in there since I last used them. Is there a debugging possibility to have biber write the full transliterated strings to a file for inspection? I have seen only the sortinit fields in the bbl file containing some Devanāgarī, which I'm afraid is not enough to understand what happened. Well, I have now inserted another item into the testing bibliography, Ānanda, and as I feared it got mixed up with the "A"-entries. The sortinit field is {अ̄} which when copying it into gedit looks like a short a with bar over it, I suspect that the diacritical combination of a and ¯ was treated separately, first the a gets transliterated to the proper devanāgarī short a, and then the diacritical mark is added to that, which makes no sense in Devanāgarī. The sortinithash field is the same as for the regular short a. I will now dig out my transliteration perl script and test it with a current version of Lingua::Translit, and let you know about the results. |
I have tested my scripts using Lingua::Translit, their output so far seems correct. I have also updated the module, which appears to not have changed anything. The Sanskrit collation with biblatex is still broken. |
Thank you for checking that. If it's not a Lingua::Translit problem, we will have to wait for PLK. You can run Biber with the |
Hmm, I will check on this. This must be something to do with macro decoding changes. If you run |
This test file give me the wrong output according to the above - can you verify:
अग्निपुर̄न्̣अ |
I get the same output, but that could be an input issue. According to https://w3c.github.io/xml-entities/unicode-names.html the code snippet uses the combining accents #!/opt/local/bin/perl -CS
use v5.24;
use Lingua::Translit;
use utf8;
my $t = new Lingua::Translit('IAST Devanagari');
say $t->translit('Agnipurāṇa');
say $t->translit('Agnipurāṇa'); |
Ah, ok, then it's a Unicode normalisation issue, looking into it. |
Please try biber 2.12 dev version from SF. For some reason calls to |
I can confirm that I now get a different order than before, but whether or not that is right is a question for @ppasedach duvud.pdf |
This pdf looks already much better on a quick look, but I am still a bit
surprised by 1-26 being sorted in before everything else, and 107 at the
very end. This might be according to another (Hindi-?)sorting convention,
the treatment of ṃ, ḥ, and the ligature jñ considered as a letter in its
own right. I still have to look at it more carefully.
…On Thu, Jun 28, 2018 at 4:00 PM, moewew ***@***.***> wrote:
I can confirm that I now get a different order than before, but whether or
not that is right is a question for @ppasedach
<https://github.com/ppasedach> duvud.pdf
<https://github.com/plk/biblatex/files/2145594/duvud.pdf>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#765 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AK-_oIi3I-YnOZnmgeJDARgCq_ipq_rmks5uBOFwgaJpZM4U3kUu>
.
|
According to
do they look OK? |
In this debugging output the IAST looks garbled: Diacritics slide off from their respective base letters to the following ones. Just compare the strings with those of the input file, then you'll see it. I haven't checked it in every way, but looking at a few this seems to happen throughout. This does not seem to affect the Devanāgarī side of things, which on a cursory look seems o.k., apart from the sorting issue of ṃ (I would expect aka to be sorted before aṃka etc, but ala and alaṃkāra° seem o.k. again.), ḥ and jñ (I would expect this ligature to be treated as separate letters, not a letter in its own right, probably coming at penultimate (?) position. I would suspect then also the ligature kṣ is treated as one letter by the collation algorithm, and then sorted in the last position, which at least for Sanskrit you would not want normally). |
I think you can ignore the IAST, it's copied from the So the only issue left is sorting. I compared Biber's sorting below with various settings in http://anubhav-chattoraj.github.io/indic-tools/devanagari_sorter/
I got consistently different results for ज्ञानरत्नावली/ Jñānaratnāvalī (Biber sorts it at the end, the quoted webpage at position 62 between जयाख्यसंहिता/ Jayākhyasaṃhitā and ज्योतिःसार/ Jyotiḥsāra) and सर्वज्ञानोत्तरतन्त्र/ Sarvajñānottaratantra सर्वज्ञानोत्तरवृत्ति/ Sarvajñānottaravṛtti (Biber sorts them after सर्वतथागततत्त्वसङ्ग्रह/ Sarvatathāgatatattvasaṅgraha and सर्वतथागताधिष्ठानसत्त्वावलोकनबुद्धक्षेत्रसंदर्शनव्यूह/ Sarvatathāgatādhiṣṭhānasattvāvalokanabuddhakṣetrasaṃdarśanavyūha the webpage before). So all of this seems to be only about |
Yes, don't worry too much about what is pasted here or what your text editor/terminal displays in the .blg unless you understand how it handles UTF-8 in terms of composed/decomposed form. What matters is the PDF output. |
That website you linked gives you the option to sort the jñ as a separate letter, (as well as the kṣ and tr, for which I should add something to the example), activating which didn't make any difference. But this seems to be a sorting convention used by some people. Here now a new shorter example which confirms that kṣ is also sorted as a separate letter at the end, which, at least for Sanskrit, it should not. tr is sorted at the proper place, so the problem has now boiled down to jñ and kṣ. The sorting of ṃ and ḥ is o.k. as it is.
|
Do you have any source for the complete sorting rules that you would like to see applied? If I understand correctly Devanāgarī is a script and scripts do not necessarily determine the sorting uniquely language-specific rules have to be taken into account as well. Take for example the different sortings of Ö in Swedish and German. See also Q16 What about collation of Indic language data? in http://unicode.org/faq/indic.html#16 and http://www.unicode.org/notes/tn1/ (https://www.unicode.org/notes/tn1/Wissink-IndicCollation.pdf), esp. p. 5 |
As far as I can see, there are currently no alternative tailorings for sanskrit: https://metacpan.org/pod/Unicode::Collate::Locale#A-list-of-tailorable-locales You might look at the references here to see which UCA sanskrit collation is being used and it is then possible to submit a request to the author of Unicode::Collate::Locale for alternative collations if they are available in the UCA. |
With \documentclass{article}
\listfiles
\usepackage{polyglossia}
\setdefaultlanguage{sanskrit}
\newfontfamily\sanskritfont{Latin Modern Roman}
\usepackage{fontspec}
\usepackage[sortlocale=hi]{biblatex}
\usepackage{filecontents}
\addbibresource{\jobname.bib}
\begin{filecontents*}{\jobname.bib}
@misc{kumāra,
title = {kumāra},
}
@misc{kṣetra,
title = {kṣetra},
}
@misc{kha,
title = {kha},
}
@misc{jīvita,
title = {jīvita},
}
@misc{jñāna,
title = {jñāna},
}
@misc{jvara,
title = {jvara},
}
@misc{tyāga,
title = {tyāga},
}
@misc{tridaśa,
title = {tridaśa},
}
@misc{tvid,
title = {tvid},
}
\end{filecontents*}
\DeclareSortTranslit{
\translit[title]{iast}{devanagari}
}
\begin{document}
\nocite{*}
\printbibliography
\end{document} gives
See also Q16 What about collation of Indic language data? in http://unicode.org/faq/indic.html#16 and http://www.unicode.org/notes/tn1/ (https://www.unicode.org/notes/tn1/Wissink-IndicCollation.pdf), esp. p. 5 |
@plk Would it make sense and be possible to enable the |
Well, it already is because |
Oh yes, I hadn't seen the Can one do something similar for |
Hmm, not trivial to do this. These options are inherently global as they are preamble only. Can you see any use-case for this? Such things seem very global ... |
I can definitely see a use in restricting \documentclass{article}
\listfiles
\usepackage{polyglossia}
\setdefaultlanguage{sanskrit}
\newfontfamily\sanskritfont{Latin Modern Roman}
\usepackage{fontspec}
\usepackage{biblatex}
\usepackage{filecontents}
\addbibresource{\jobname.bib}
\begin{filecontents*}{\jobname.bib}
@misc{kumāra,
title = {kumāra},
keywords = {indic},
}
@misc{kṣetra,
title = {kṣetra},
keywords = {indic},
}
@misc{kha,
title = {kha},
keywords = {indic},
}
@misc{jīvita,
title = {jīvita},
keywords = {indic},
}
@misc{jñāna,
title = {jñāna},
keywords = {indic},
}
@misc{jvara,
title = {jvara},
keywords = {indic},
}
@misc{tyāga,
title = {tyāga},
keywords = {indic},
}
@misc{tridaśa,
title = {tridaśa},
keywords = {indic},
}
@misc{tvid,
title = {tvid},
keywords = {indic},
}
@misc{aachen,
title = {Aachen},
}
@misc{augsburg,
title = {Augsburg},
}
@misc{arnhem,
title = {Arnhem},
}
@misc{avignon,
title = {Avignon},
}
@misc{aix-en-provence,
title = {Aix-en-Provence},
}
@misc{berlin,
title = {Berlin},
}
@misc{utrecht,
title = {Utrecht},
}
@misc{zeven,
title = {Zeven},
}
\end{filecontents*}
\DeclareSortTranslit{
\translit[title]{iast}{devanagari}
}
\begin{document}
\nocite{*}
\printbibliography[keyword=indic]
\printbibliography[notkeyword=indic]
\end{document} sorts my Latin sources in their nonsense Devanāgarī form. From trace
|
Right, I see. I don't think a per-refcontext setting will fix this. What about an optional arg that makes transliteration apply only to entries with particular |
Mhhh, yes the example was a bit too sparse on that front. I would could have started a new refcontext for the Latin bibliography and then it would work. I feel that Per- |
Please try dev 3.12 and biber dev 2.12. |
The example works fine with 3.12/2.12 dev. Thank you very much. \documentclass{article}
\listfiles
\usepackage{polyglossia}
\setdefaultlanguage{sanskrit}
\newfontfamily\sanskritfont{Latin Modern Roman}
\usepackage{fontspec}
\usepackage{biblatex}
\usepackage{filecontents}
\addbibresource{\jobname.bib}
\begin{filecontents*}{\jobname.bib}
@misc{kumāra,
title = {kumāra},
keywords = {indic},
langid = {hi},
}
@misc{kṣetra,
title = {kṣetra},
keywords = {indic},
langid = {hi},
}
@misc{kha,
title = {kha},
keywords = {indic},
langid = {hi},
}
@misc{jīvita,
title = {jīvita},
keywords = {indic},
langid = {hi},
}
@misc{jñāna,
title = {jñāna},
keywords = {indic},
langid = {hi},
}
@misc{jvara,
title = {jvara},
keywords = {indic},
langid = {hi},
}
@misc{tyāga,
title = {tyāga},
keywords = {indic},
langid = {hi},
}
@misc{tridaśa,
title = {tridaśa},
keywords = {indic},
langid = {hi},
}
@misc{tvid,
title = {tvid},
keywords = {indic},
langid = {hi},
}
@misc{aachen,
title = {Aachen},
}
@misc{augsburg,
title = {Augsburg},
}
@misc{arnhem,
title = {Arnhem},
}
@misc{avignon,
title = {Avignon},
}
@misc{aix-en-provence,
title = {Aix-en-Provence},
}
@misc{berlin,
title = {Berlin},
}
@misc{utrecht,
title = {Utrecht},
}
@misc{zeven,
title = {Zeven},
}
\end{filecontents*}
\DeclareSortTranslit{
\translit[hindi]{*}{iast}{devanagari}
}
\begin{document}
\nocite{*}
\printbibliography[keyword=indic]
\printbibliography[notkeyword=indic]
\end{document} |
You may want to change the example for |
Technically, you are right about the other three global sorting macros. However, I would rather wait and see if anyone really needs this and can give a convincing example. I think it's not that likely that anyone needs to vary these things within a document as this would mean that some settings used in one part of the document explicitly did not work with other parts. However, these settings are fairly general and would usually apply generally. |
Fair enough. Do you want me to fix up the example for |
It's done, just have to push it. |
I had some months back installed the development version of biblatex into my ~/texmf/ tree, now I want to make my project portable, is it enough to keep biblatex.sty in the project directory, and use biber 2.12, or do I need any other files as well from the development version? |
@ppasedach The dev version is in flow and so I don't know which exact version you got. Assuming that everything works fine so far you are probably good with only your version of That all said, I can not recommend using the dev versions for production work. And I strongly recommend not disseminating the development versions to other people (I'm not sure if that is what you ultimately have in mind when you want to make your project portable). |
The project is a book, being developed in a private repository, so no dissemination of biblatex's and biber's development versions to others apart from one more collaborator who hardly touches the LaTeX sources. The point of my question was just about being able to quickly move my book project to some other computer without needing to modify the TeX Live installation there. I am still using the development version from August, but could also update to a newer version, if that's advisable, but of course I also understand the point about better not using dev versions for production work. Or, has the bug fix been incorporated into the stable versions, or could that be done without much effort? Then of course I'd prefer to use the stable versions. |
You should be fine with just If things work for you on your current machines and on the other target machines as well, there is no need to get a newer development version. Of course that only holds if you can use your version of Biber and There has not been an update to the release versions of either Biber or |
(Already noted here)
IAST-transliterated Sanskrit does not sort correctly any more, it appears as if somewhere diacritics are stripped off or something else happens with them, a and ā, m and ṃ, h and ḥ, ś, ṣ and s, t and ṭ are messed up in the example, I would assume it happens to all diacritical combinations.
test_long.pdf
test_long.tex.gz
The text was updated successfully, but these errors were encountered: