Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use standardized language identifiers for lbx files #160

Open
pauloney opened this issue Sep 20, 2013 · 192 comments
Open

Use standardized language identifiers for lbx files #160

pauloney opened this issue Sep 20, 2013 · 192 comments

Comments

@pauloney
Copy link
Collaborator

Is there are "template" one can use to make the translations to be used in language.lbx? Or should that be done on top of one of the existing files?

I would like to create the files for Romanian, Vietnamese, Chinese and Japanese and I do have people in the office which are capable of making the translations and have experience with Bibliographies, but NONE of them are programmers.

Also: Is there a guide on how to add a new language support ? Even though it is easy to understand what goes on inside \DeclareBibliographyStrings{ }, I would like to know when is preferable to use tex-encoding as supposed to utf8, for example?

Other questions are:

1- Can one add support for a language that is not supported by Babel?

2- When do one use \adddot and when does one use \adddotspace ?

3- Why country support (within language.lbx) is limited to Germany, EU, US, France and GB ?

4- Are you using a framework to do this? In general it is easier to manage them in a single spreadsheet with the translations to each language in each column and a script that reads the column and writes the LBX files! The translators can then easily compare to "nearby" languages and easily make other translations.

Is work by others on this kind of issue welcomed ?

Thanks for the great package!
Paulo Ney

@aboruvka
Copy link
Collaborator

aboruvka commented Sep 20, 2013

General instructions can be found at the old SF wiki for biblatex (edit that file is now in an updated version on the GitHub wiki, please use: https://github.com/plk/biblatex/wiki/Checklist-for-submitting-a-new-localisation-file-(.lbx)). For testing see example 03-localization-keys in the documentation. You can use english.lbx as a starting point. Just complete \DeclareBibliographyStrings; we take care of the rest on the basis of your answers to the questions listed in the wiki.

Languages based on the the Latin alphabet should be encoded in Ascii. That way they will be supported by any backend (BibTeX variants and biber).

Regarding your other questions:

  1. Not yet. The other maintainers are working on polyglossia support, but this is not an easy task.
  2. The output of X\adddotspace Y is less likely to have a linebreak between "X." and "Y" than X\adddot\ Y.
    Decisions on whitespace and punctuation are ideally be made by the translator. Refer to the manual sections on adding punctuation and whitespace for further details.
  3. From the manual: "only a small number of country names is defined by default, mainly
    to illustrate this scheme". If we support all possible country*, patent* and patreq* strings you can imagine that this can get unwieldy.
  4. There are no further support files aside from the resources I mentioned above. A script might be helpful, but there are few things to keep in mind: (1) contributors are working on different platforms, (2) version control of the spreadsheet could prove challenging with multiple editors and (3) for testing, working with the lbx file directly is probably more convenient.

@pauloney
Copy link
Collaborator Author

Audrey, Thanks for the quick answer. I'll build the framework that takes to produce some of the several lbx files that will be necessary to really i18n Biblatex. The problem of working with a single lbx file at a time is that you can't compare to nearby languages or change your mind later about a better translation - because after you written a token, the original is gone. I'll read in all existing lbx files and the produce the database that will be needed to drive the manufacturing of the new ones. There are 45 languages in Babel which are not in Biblatex, so it will take an organized effort of more than just programmers to get there.

I understand the problems of keeping track of changes (via GitHub) and testing are serious, but I'll produce the files and send them ready to you.

Supporting all country* strings is fairly easy! I already have most country names in some 200 languages, so I'll produce the files and make them available to you, if you want to use... but I would strongly recommend the use of a separate file for that - so as not to overload the lbx's.

Before I get started I have one small question. You mention the use of ".\isdot" on the Wiki page, but I do not see any occurrence of that on any of the lbx's. Is that really necessary at this level?

Thanks!
Paulo Ney

@plk
Copy link
Owner

plk commented Sep 20, 2013

I'm for doing this if we can have a way for the releaser(s) (which is currently me) to generate all current .lbx files for a release on demand. I would prefer something like a db and a pull interface in Perl (biber is all in perl ...) which generates the .lbx. If the db was something like SQLlite, the db could also be in the biblatex git repo. The problem then is that contributors would either still send diffs against the generated .lbx or would need to look at the db, which is probably out of the question for most people. Text files are easier for this but, as you say, we don't get tehe coverage or consistency we need in future.

@plk plk reopened this Sep 20, 2013
@aboruvka
Copy link
Collaborator

Philipp Lehman wrote the wiki page. I'm not sure what use-case he had in mind for .\isdot. AFAIK it isn't necessary. You could consider using \isdot in place of \adddot if the string preceding is the output of some command, which may or may not end with a period.

About overloading the lbx files or separate files for country-specific strings - this is what I meant by "unwieldy". Certain aspects of the core biblatex styles are demonstrative rather than exhaustive. This is one good example. Users can easily extend the lbx files. If you're wanting to share all those extra strings with others, consider an add-on package.

The DB/spreadsheet could be maintained similar to the localization keys document - just an extra resource, but not necessary for contributing lbx files. Note that only a fraction of the current lbx files are actually complete, so between-language comparisons are limited.

@pauloney
Copy link
Collaborator Author

It will take some building, but I think it is the only way to go! Imagine making a structural change and have to change/test 45 lbx files! I also want to build an interface so one can choose 3 to 4 languages to compare/edit the DB - as the coverage get bigger that will be more important.

Changes to lbx's made by users or entered directly on GiHub should integrate easily with the DB back... because those will continue to happen!

I'll take a detour and come back when I am able to generate all the current 20 essential lbx files exactly the way they are right now.

PN

@pauloney
Copy link
Collaborator Author

I am almost done with the back-end to produce the lbx files from a DB. I can produce lbx files that are almost identical to the existing ones and get some 50 more languages in the fray.... the problem here will be to get Babel to do the same thing ...but at this point I have an important question:

Why are we using a separate i18n LBX set of files, if we could use the ones from the CSL project at

   https://github.com/citation-style-language/locales

In my (uninformed) way to view it, there are plenty of reasons to use it instead of the lbx's:

  1. They are ready!
  2. They are in XML, making it a lot easier to test consistency, etc ...

Paulo Ney

@plk
Copy link
Owner

plk commented Sep 30, 2013

I like the idea of using standards like this but there are some things to consider though:

  1. Do they cover all of our strings?
  2. We'd have to convert to .lbx because it's much faster for biblatex to read them since they are TeX. XML parsing in TeX is not something you ever want to do.
  3. We'd have to make sure babel/polyglossia language ids are correct.
  4. We'd have to support things like \adddot etc. somehow since lots of .lbx files do this.
  5. There are special things in some .lbx files - all sorts of biblatex settings - we'd have to insert those.

@pauloney
Copy link
Collaborator Author

Answering each of your questions/comments:

  1. No! There are lots in common, but coverage is different, they have some strings that we don't and same way in reverse. Interesting question is WHY ? They in fact should be almost the same since the problem is the same! :)
  2. That settles it! I am glad have a very definite argument! :(. Instead of converting from XML --> LBX and running the danger of not having a complete lbx file back, what I am doing is parsing all LBX's and XML's files in the database, sorting out some conflicting areas by hand, and then exporting way more complete LBX files, and adding a few languages in the process.
  3. This is an area that deserves some immediate standardization! It is wrong to do it by "language" because of the pt_PT/pt_BR, en_GB/en_US/en_CA,... discrepancies. The files should be really labeled by "locale" (which is a standard) and possibly ask the Babel/Polyglossia people to do the same. If you look at the way Babel names the files the is NO procedure in place, each one gets named at one point in time in a different way - including "portuges.lbx" that was named in this fashion (with two errors) because of the DOS restriction on filenames.
  4. That is the case with the XML files as well since there are abbreviations that use a DOT and some that don't... unless I am missing something here.
  5. I am dealing with it considering that every lbx file has a (fixed) pre-amble and a post-amble, and each of them gets picked up and built at the time the file is generated.

PN

@plk
Copy link
Owner

plk commented Sep 30, 2013

Well we could consider the CSL route later if they were more to our needs but currently, they're not really. I had this argument with the "generic bib system" people a few years ago - they didn't seem to understand that high-quality bib typesetting needs semantic integration into the typesetting - there is no good "generic" solution ...
If you can generate identical .lbx files to our current ones, let's discuss further ... which database are you using?

@pauloney
Copy link
Collaborator Author

I can produce identical lbx's already. When they differ, it is because the original lbx's have something wrong - a space out of place, etc ...

I am using MySQL because at the moment is what I have in one particular server that I am interacting with someone lese on the project, but writing very generic code that could be changed to anything.

I would like to add that one more advantage of doing this via the DB, is that you then can interface with people all over, which are interested in i18n of biblatex. They would just need to enter the data in a interface and their lbx files could be exported and later included in the distribution.

@plk
Copy link
Owner

plk commented Sep 30, 2013

Ok - what language are you using for data extraction and creation of .lbxs?

@pauloney
Copy link
Collaborator Author

Perl.

@plk
Copy link
Owner

plk commented Sep 30, 2013

Good. Biber is all in perl too. Perhaps you could send me a MySQL dump and the perl? I'd like to have a look at it.

@pauloney
Copy link
Collaborator Author

Sure! Give me sometime to wrap it up ... I am sorting the issues with translations in to languages that have "gender" right now (so I can parse in the XML) and sort a few other edges and send you the stuff. It is just one script.

@plk
Copy link
Owner

plk commented Sep 30, 2013

No rush, many thanks. We'd then have to think about hosting this in some way or perhaps using SQL lite and keeping just a db file in the git repository etc.

@pauloney
Copy link
Collaborator Author

pauloney commented Oct 1, 2013

One thing I realized today writing the maps to parse the XML files of CSL, is that they have a nice way to recognize the gender and number (singular or plural) of words in other languages that is NOT present in the lbx file structure!

To translate a phrase like

Translated and Annotated by ...

to languages like Portuguese and Spanish requires one to know the gender of the entity being translated and annotated. If it is a book or a an Album will be masculine, but it if is is a Collection or a Thesis it will be feminine. So I don't really see how this could be done in the realm of the current lbx's files.

Would someone mind sharing the wisdom on how these problems with be dealt with ?

PN

@plk
Copy link
Owner

plk commented Oct 1, 2013

@aboruvka - do you have a comment on this?

@aboruvka
Copy link
Collaborator

aboruvka commented Oct 1, 2013

Gender specific strings come up with idem*. These can be selected on the basis of the gender field.

idemsf feminine singular form of idem
idemsm masculine singular form of idem
idemsn neuter singular form of idem
idempf feminine plural form of idem
idempm masculine plural form of idem
idempn neuter plural form of idem
idempp plural form of idem suitable for a mixed gender list of names

Some languages use masculine or feminine ordinals depending on the gender of item being indexed (e.g. series or edition). These are handled on the translator's end with the bibliography "extras" questions I mentioned earlier.

For the "by" roles, you could simply add gender/number-specific variants provided that the gender/number of the work is strongly tied to the entrytype (e.g. @book entries are always masculine-singular, @mvbook masculine-plural, @collection feminine-plural, etc). Note that album entrytypes are not formally supported and the @thesis entrytype doesn't support the role fields (only one person works on a thesis anyway).

The same problem has been mentioned in #48 for non-"by" roles, where the gender/number would be specific to the people filling the role. The strings already consider number because this is available in name list processing. Gender would have to be indicated explicitly in the entry somehow.

@pauloney
Copy link
Collaborator Author

pauloney commented Oct 1, 2013

Thanks! That should do it.

@aboruvka
Copy link
Collaborator

aboruvka commented Oct 1, 2013

Not quite. There is work on our end to be done. The bibliography extras questions would also need expanding to ask about the gender and number of @article, @book, @mvbook, @inbook, @collection, @incollection, and @mvcollection.

I'm saying it is probably do-able, but we have to consider work required to get this done, the relative demand for the new feature, and potential issues the feature might open up. If PL knew about this limitation and decided not to implement it, he likely had a very good reason.

@pauloney
Copy link
Collaborator Author

pauloney commented Oct 1, 2013

PLK, Audrey, I am down to the wire, and about to start the last upload to the db and the last series of tests. Should I grab a set of fresh lbx files from the development branch ? Or use the last public release?

@plk
Copy link
Owner

plk commented Oct 2, 2013

Always grab from DEV - it's more up to date ...

@pauloney
Copy link
Collaborator Author

pauloney commented Oct 2, 2013

One of the hardest things I had to deal with in this side project was the fact that "language" and "locale" are mixed inside BibLatex in some unreasonable ways. It is true that most of what in inherits (or uses) from Babel is in the form of language, but the LBX files contain so much about "locale" that is impossible to do it all in the realm of language only.

When one say that an entry should have "hyphenation = {portuguese}" that is all good and okay, but the entry:

language = {portuguese}

should never be expected format an entry properly because Iran, Bahamas, Kazakhstan, ... are written in one way in pt_PT and in another way in pt_BR.

In order to circumvent my difficulties introducing the translated terms in a DB and importing some new ones I had to literally introduce locales in my table of languages and vice-versa... something a programmer should never have todo!

Now that internationalization is really coming, in order to manage this well and be able to expand in the realm of languages that have many many locales it would be nicer to split this two roles well. I know that, for Portuguese alone there is a portuguese.lbx, portuges.lbx, brazil.lbx and brazilian.lbx - but it is extremely hard to maintain in the way it is laid out, eliminate duplicate and deal with inconsistencies. One should have a unique file "portuguese.lbx" and a couple additional pt-BR.lbx and pt-PT.lbx that should call the main one and define some small local components.

Labeling of language and locale should follow standards (ISO and IETF) so one can interchange with other Bibliography management software and compatibility with the name space of Babel should be an internal issue and the user should never have to deal with that at a bibliography entry level.

Just my 2cents!

Paulo Ney

@plk
Copy link
Owner

plk commented Oct 2, 2013

With the 2.8 DEV branch, I'm moving away from the hyphenation field and re-naming it langid since that's what it is - it's a language ID in babel (or, with 2.8, polyglossia too). There will be a langidopts for specifying polyglossia language options like variant names ("american" and "british" for the langid "english" etc.). The language field is just a printed field - not used to localise anything - it's misleading, I agree.

@pauloney
Copy link
Collaborator Author

Lines 461-462 of the english.lbx file have a curious entry:

countryeu = {{European Union}{EU}},
countryep = {{European Union}{EP}},

can anyone tell me what the second line means ?

Paulo Ney

@pauloney
Copy link
Collaborator Author

I should have said that I saw this:

\keyitem{countryeu} The name , abbreviated as \vrb{EU}.
\keyitem{countryep} Similar to \vrb{countryeu} but abbreviated as \vrb{EP}. This is intended for \bibfield{patent} entries.

in the examples, but I continue puzzled by the meaning of it...

Paulo Ney

@plk
Copy link
Owner

plk commented Oct 11, 2013

Good question - @aboruvka - any idea? It looks to me like a copy-paste which should read:

countryep = {{European Patent}{EP}},

?

@aboruvka
Copy link
Collaborator

No idea. I don't think it is a mistake, though, because then countryep would be redundant with patenteu.

@pauloney
Copy link
Collaborator Author

On top of that, the set of files:

-rw-r--r-- 1 root root 4965 Oct 28  2013 american-apa.lbx
-rw-r--r-- 1 root root 4681 Apr 17 15:19 british-apa.lbx
-rw-r--r-- 1 root root 4963 Apr 17 15:19 english-apa.lbx

should go through the same factorization we did in the other files. The differences between the 3 files above are minuscule and factorization of the keys will greatly simplify things.

In this set:

-rw-r--r-- 1 root root 4555 Apr 17 15:19 austrian-apa.lbx
-rw-r--r-- 1 root root 4525 Apr 17 15:19 german-apa.lbx
-rw-r--r-- 1 root root 4560 Apr 17 15:19 naustrian-apa.lbx
-rw-r--r-- 1 root root 4530 Apr 17 15:19 ngerman-apa.lbx

on top of the differences been also minuscule, there are also redundant keys wit the files they input.

If you could rename then and I could do the clean-up!

Paulo Ney

@pauloney
Copy link
Collaborator Author

Philip,

Why do the APA definition files have to contain a string like "january" all over again ?

american-apa.lbx: january          = {{January}{January}},
austrian-apa.lbx: january          = {{J\"anner}{J\"anner}},
british-apa.lbx:  january          = {{January}{January}},
dutch-apa.lbx:    january          = {{januari}{januari}},
english-apa.lbx:  january          = {{January}{January}},
french-apa.lbx:   january          = {{janvier}{janvier}},
german-apa.lbx:   january          = {{Januar}{Januar}},
greek-apa.lbx:    january          = {{Ιανουάριος}{Ιανουάριος}},
italian-apa.lbx:  january          = {{Gennaio}{Gennaio}},
naustrian-apa.lbx:  january        = {{J\"anner}{J\"anner}},
ngerman-apa.lbx:  january          = {{Januar}{Januar}},
norsk-apa.lbx:    january          = {{januar}{januar}},
norwegian-apa.lbx:  january        = {{januar}{januar}},
nynorsk-apa.lbx:  january          = {{januar}{januar}},
slovene-apa.lbx:  january          = {{januar}{januar}},
spanish-apa.lbx:  january          = {{enero}{enero}},
swedish-apa.lbx:  january          = {{januari}{januari}},

even though they are already defined in the main file ...

I know that in the main files most of them (except for Finish and Croatian) use abbreviation for the short form of the string and in the APA files all of them are spelled out ... but I thought it was easy to write a style that used the full term and not the abbreviation.

Paulo Ney

@plk
Copy link
Owner

plk commented Jul 14, 2014

It was a long time ago but I think it was because at the time there wasn't any other way to force a non-abbreviated month form ...

@pauloney
Copy link
Collaborator Author

You can probably change the code to use the normal standard month names in
the language LBX files - a thousand times faster than I would...

If you do that, you can leave the LBX with me and I'll clean them up from
redundant terms using the database.

Paulo Ney

On Mon, Jul 14, 2014 at 4:46 AM, plk notifications@github.com wrote:

It was a long time ago but I think it was because at the time there wasn't
any other way to force a non-abbreviated month form ...


Reply to this email directly or view it on GitHub
#160 (comment).

@plk
Copy link
Owner

plk commented Jul 14, 2014

Ok, I have used dateabbrev to do this and removed all of the year strings from all biblatex-apa lbx files.

@pauloney
Copy link
Collaborator Author

Great! Where can I get distribution?

Paulo Ney
On Jul 14, 2014 4:37 PM, "plk" notifications@github.com wrote:

Ok, I have used dateabbrev to do this and removed all of the year strings
from all biblatex-apa lbx files.


Reply to this email directly or view it on GitHub
#160 (comment).

@plk
Copy link
Owner

plk commented Jul 15, 2014

It's also on github biblatex-apa ... I also received an offer to add Bulgarian to biblatex - I should direct the person to you?

@pauloney
Copy link
Collaborator Author

I'll have the APA file ready soon.

For the Bulgarian, please put him in touch with me. At this stage I
probably will have him do the file and then parse it to the db ... but that
should change soon. My e-mail is pauloney@gmail.com

Paulo Ney

On Tue, Jul 15, 2014 at 12:54 AM, plk notifications@github.com wrote:

It's also on github biblatex-apa ... I also received an offer to add
Bulgarian to biblatex - I should direct the person to you?


Reply to this email directly or view it on GitHub
#160 (comment).

@plk
Copy link
Owner

plk commented Jul 15, 2014

Thank - you may already know but there is a checklist we recommend for people adding translations so that the various options can be set correctly in the .lbx. We should probably make this more official ...

https://sourceforge.net/p/biblatex/oldwiki/Adding_lbx_Files/

@pauloney
Copy link
Collaborator Author

Yes! I know that and will direct him to it! He is the maintainer of Babel,
so it will be an easy one ....

Thnks,
Paulo Ney

On Tue, Jul 15, 2014 at 4:06 AM, plk notifications@github.com wrote:

Thank - you may already know but there is a checklist we recommend for
people adding translations so that the various options can be set correctly
in the .lbx. We should probably make this more official ...

https://sourceforge.net/p/biblatex/oldwiki/Adding_lbx_Files/


Reply to this email directly or view it on GitHub
#160 (comment).

@pauloney
Copy link
Collaborator Author

Philip, the new APA file are at:

https://drive.google.com/file/d/0B3mOBzjP3W1ndTBPdUlVRWlvdzg/edit?usp=sharing

They are factored among themselves (and the total size has reduced quite a
bit) but they have not been factored agains the main files - I'll do that
sometime and decrease the size even further.

I hope with my newly acquired skills that they will work straight from the
bat! I am pretty sure you have lots of test-files to check if the output is
the same ... let me know!

Paulo Ney

On Tue, Jul 15, 2014 at 4:11 AM, Paulo Ney de Souza pauloney@gmail.com
wrote:

Yes! I know that and will direct him to it! He is the maintainer of Babel,
so it will be an easy one ....

Thnks,
Paulo Ney

On Tue, Jul 15, 2014 at 4:06 AM, plk notifications@github.com wrote:

Thank - you may already know but there is a checklist we recommend for
people adding translations so that the various options can be set correctly
in the .lbx. We should probably make this more official ...

https://sourceforge.net/p/biblatex/oldwiki/Adding_lbx_Files/


Reply to this email directly or view it on GitHub
#160 (comment).

@pauloney
Copy link
Collaborator Author

Philip,

We interact quite a bit and he finished with the file - which is is one of
a kind in terms of previous LBX files. It allows two input encodings in the
SAME lbx file (utf-8 and ASCII commands)... which is different from what
the Russian and Greek currently do (which is to force the use of UTF-8).

I don know how much detail you want on the situation that created this, but
basically, in the 90's there were multiple encodings to deal with Russian
(and Cyrillic in general) and the LaTeX crowd created a form of input, in
general refereed as \cyrxx, where they are able to use and mix with any of
the encodings and write the whole thing in ASCII.

So the current structure of the file right now is:

\lbx@ifutfinput
{%utf encoding
\DeclareBibliographyStrings{%
bibliography={{Библиография}{Библиография}},
references={{Литература}{Литература}},
...
}
\DeclareBibliographyStrings{%
bibliography={{\CYRB\cyri\cyrb\cyrl\cyri\cyro\cyrg\cyrr\cyra\cyrf\cyri\cyrya
}{\CYRB\cyri\cyrb\cyrl\cyri\cyro\cyrg\cyrr\cyra\cyrf\cyri\cyrya }},
references={{\CYRL\cyri\cyrt\cyre\cyrr\cyra\cyrt\cyru\cyrr\cyra}{\CYRL\cyri\cyrt\cyre\cyrr\cyra\cyrt\cyru\cyrr\cyra}},
...
}

The guy that wrote the LBX file (Grigori) is the maintainer of the Babel
Bulgarian package and he say that it is widely used to typeset LaTeX in
Bulgarian, as well as Russian and other Cyrillic based languages. It is
extremely comfortable for small snippets or if you do not have the right
keyboard at hand... and small snippets of text might be exactly what an
user has in mind if he is typesetting one entry in Bulgarian in the middle
of a large English-language bibliography.

Following BCP47 the best way to handle this would be to have two sets of
files named:

bg-BG.lbx
bg.lbx

that would naturally use UTF-8, and another set:

bg-BG-ASCII.lbx
bg-ASCII.lbx

that would use the \cyrxx input.

What are your thoughts on this ?

Paulo Ney

On Tue, Jul 15, 2014 at 12:54 AM, plk notifications@github.com wrote:

It's also on github biblatex-apa ... I also received an offer to add
Bulgarian to biblatex - I should direct the person to you?


Reply to this email directly or view it on GitHub
#160 (comment).

@plk
Copy link
Owner

plk commented Jul 21, 2014

This sounds reasonable. To get all this to work, I still need to address the three things above from a couple of weeks ago however. Did you speak to the babel/polyglossia maintainers about potentially supporting BCP47 lang specifiers? In the long term, this will be necessary.

@pauloney
Copy link
Collaborator Author

Cool!

I am constantly talking to Javier Bezos at Babel and I have - in the last
month - become the maintainer of the Portuguese Babel support files. We are
changing slowly over to BCP 47, and any help from you and others can help.

I am speaking on TUG next week on the subject and if you have the
opportunity to talk to him and to others at Polyglossia, please pass the
message on how important this is.

And I have one more question for you: In the new set-up, what are the steps
needed for adding support for a new language - that is already supported by
Babel ? Is it just to add the language.lbx files ? I see that the lines in
biblkatex.def have been prepared ahead of time, which is nice!

Paulo Ney

On Mon, Jul 21, 2014 at 4:19 AM, plk notifications@github.com wrote:

This sounds reasonable. To get all this to work, I still need to address
the three things above from a couple of weeks ago however. Did you speak to
the babel/polyglossia maintainers about potentially supporting BCP47 lang
specifiers? In the long term, this will be necessary.


Reply to this email directly or view it on GitHub
#160 (comment).

@plk
Copy link
Owner

plk commented Jul 22, 2014

Adding a new language already supported by babel/polyglossia is usually just a matter of having a new .lbx file.

@marczellm
Copy link
Contributor

I started creating a lbx file for the Hungarian language, see https://bitbucket.org/marczellm/latex-magyar-contrib/src. It's incomplete.

A main reason why it's incomplete is me being only a BSc student, therefore I don't know the Hungarian equivalent for a lot of terms without seeing them in context. In this I would appreciate instructions on how to test my lbx file so that all bibstrings appear in the PDF and I can see them in context, that would help a lot.

Also a lot of extra code seems to be needed because of the nature of the Hungarian language, best demonstrated by some examples:

  • with a foreword by X. Y.

    translates to

    X. Y. előszavával

  • on page 55

    translates to

    az 55. oldalon

  • on page 66

    translates to

    a 66. oldalon

    See the definite article (a/az) changing form based on whether the pronounced number begins with a vowel or a consonant. The Hungarian Babel package has a macro for this (\az{}).

All in all I'd really appreciate some advice on how to complete my lbx file to get it added to official BibLaTeX.

@vgvassilev
Copy link

Hi @plk,
just wanted to ask is there progress with bulgarian.lbx. If not we have invested some effort to translate it and we can donate it to the project.
Cheers,
Vassil

@plk
Copy link
Owner

plk commented Jul 2, 2015

I haven't received one for inclusion yet but if you have a bulgarian.lbx, you can send it to me and I can ask our .lbx expert @aboruvka to have a look.

@vgvassilev
Copy link

@mvassilev said he will double-check the translations once again and we could attach it here. Does that make sense?

@plk
Copy link
Owner

plk commented Jul 2, 2015

Yes, fine.

@ahomansikka
Copy link

Are you interested in updated translation of biblate to Finnish? If you are, where do I send it and how?

@plk
Copy link
Owner

plk commented Jul 4, 2015

Absolutely, Please send it to me. You can send a unified diff if you like.

PK

Dr P Kime

On 04 Jul 2015, at 18:01, ahomansikka notifications@github.com wrote:

Are you interested in updated translation of biblate to Finnish? If you
are, where do I send it and how?


Reply to this email directly or view it on GitHub
#160 (comment).

@ahomansikka
Copy link

Here it is!

--- /usr/local/texlive/2014/texmf-dist/tex/latex/biblatex/lbx/finnish.lbx 2013-10-28 00:52:35.000000000 +0200
+++ new-finnish.lbx 2015-07-04 18:27:19.324149029 +0300
@@ -93,14 +93,14 @@
compilers = {{koontaneet}{koontaneet}},% FIXME: unsure
redactor = {{toimittanut}{toim\adddot}},% FIXME: unsure
redactors = {{toimittaneet}{toim\adddot}},% FIXME: unsure
-% reviser = {{}{}},% FIXME: missing
-% revisers = {{}{}},% FIXME: missing
-% founder = {{}{}},% FIXME: missing
-% founders = {{}{}},% FIXME: missing
-% continuator = {{}{}},% FIXME: missing
-% continuators = {{}{}},% FIXME: missing
-% collaborator = {{}{}},% FIXME: missing
-% collaborators = {{}{}},% FIXME: missing

  • reviser = {{toimittanut}{toim\adddot}},% FIXME: unsure
  • revisers = {{toimittaneet}{toim\adddot}},% FIXME: unsure
  • founder = {{perustaja}{perustaja}},
  • founders = {{perustajat}{perustajat}},
  • continuator = {{jatkaja}{jatkaja}},% FIXME: unsure
  • continuators = {{jatkajat}{jatkajat}},% FIXME: unsure
  • collaborator = {{avustaja}{avustaja}},% FIXME: unsure
  • collaborators = {{avustajat}{avustajat}},% FIXME: unsure
    translator = {{k"a"ant"anyt}{k"a"ant\adddot}},
    translators = {{k"a"ant"anyt}{k"a"ant\adddot}},
    commentator = {{kommentaarin kirjoittanut}{kommentaarin kirjoittanut}},
    @@ -252,11 +252,11 @@
    byeditor = {{toimittanut}{toim\adddot}},
    bycompiler = {{koontanut}{koontanut}},
    byredactor = {{toimittanut}{toim\adddot}},% FIXME: unsure
    -% byreviser = {{}{}},% FIXME: missing
    -% byreviewer = {{}{}},% FIXME: missing
    -% byfounder = {{}{}},% FIXME: missing
    -% bycontinuator = {{}{}},% FIXME: missing
    -% bycollaborator = {{}{}},% FIXME: missing
  • byreviser = {{toimittanut}{toim\adddot}},% FIXME: unsure
  • byreviewer = {{toimittanut}{toim\adddot}},% FIXME: unsure
  • byfounder = {{perustanut}{perustanut}},% FIXME: unsure
  • bycontinuator = {{jatkanut}{jatkanut}},% FIXME: unsure
  • bycollaborator = {{yhteisty"oss"a}{{yhteisty"oss"a}},% FIXME: Bad translation. Impossible to translate with one word.
    bytranslator = {{\lbx@lfromlang k"a"ant"anyt}{\lbx@sfromlang k"a"ant\adddot}},
    bycommentator = {{kommentaarin kirjoittanut}{kommentaarin kirjoittanut}},% FIXME: unsure
    byannotator = {{selityksin varustanut}{selityksin varustanut}},% FIXME: unsure
    @@ -336,36 +336,40 @@
    and = {{ja}{ja}},
    andothers = {{et\addabbrvspace al\adddot}{et\addabbrvspace al\adddot}},
    andmore = {{et\addabbrvspace al\adddot}{et\addabbrvspace al\adddot}},
  • volume = {{volyymi}{vol\adddot}},
  • volumes = {{volyymit}{vol\adddot}},
  • involumes = {{tilavuusosaan}{tilavuusosaan}},% FIXME: unsure
  • volume = {{volyymi}{vol\adddot}},% FIXME: Is {{osa}{osa}} a better translation?
  • volumes = {{volyymit}{vol\adddot}},% Incorrect. Correct translation of "volumes" depends on context.
    +%% involumes = {{tilavuusosaan}{tilavuusosaan}},% FIXME: unsure INCORRECT!
    +%% In this context "volume" does not mean "tilavuus", as in "the volume of this bottle is 1 litre".
    +%% Finnish translation of "in 5 volumes" is "5 osassa" or "5 volyymiss"a" or "5 volyymissa".
    +%% Thus the word "in" is not translated but you can not always translate "volumes" as "osassa".
  • involumes = {{}{}},% FIXME: missing
    jourvol = {{volyymi}{vol\adddot}},
    jourser = {{sarja}{sarja}},
    -% book = {{}{}},% FIXME: missing
    -% part = {{}{}},% FIXME: missing
    -% issue = {{}{}},% FIXME: missing
    +% book = {{}{}},% FIXME: missing Could be "kirja", "luku", "kappale", "osa", etc. depending on context...
  • part = {{osa}{osa}},% FIXME: unsure
  • issue = {{numero}{numero}},% FIXME: unsure
    newseries = {{uusi sarja}{uusi sarja}},
    oldseries = {{vanha sarja}{vanha sarja}},
    edition = {{painos}{painos}},
    reprint = {{j"alkipainos}{j"alkipainos}},
    reprintof = {{julkaistu aiemmin nimell"a}{julkaistu aiemmin nimell"a}},
    reprintas = {{julkaistu uudelleen nimell"a}{julkaistu uudelleen nimell"a}},
    -% reprintfrom = {{}{}},% FIXME: missing
    -% translationof = {{}{}},% FIXME: missing
    -% translationas = {{}{}},% FIXME: missing
    -% translationfrom = {{}{}},% FIXME: missing
    -% reviewof = {{}{}},% FIXME: missing
    -% origpubas = {{}{}},% FIXME: missing
    -% origpubin = {{}{}},% FIXME: missing
    -% astitle = {{}{}},% FIXME: missing
    -% bypublisher = {{}{}},% FIXME: missing
  • reprintfrom = {{julkaistu aiemmin nimell"a}{julkaistu aiemmin nimell"a}},% FIXME: unsure
  • translationof = {{k"a"ann"os teoksesta}{k"a"ann"os teoksesta}},% FIXME: unsure
  • translationas = {{k"a"annetty nimell"a}{k"a"annetty nimell"a}},% FIXME: unsure
  • translationfrom = {{k"a"annetty kielest"a}{k"a"annetty kielest"a}},% Result is very bad Finnish.
  • reviewof = {{arvostelu teoksesta}{arvostelu teoksesta}},% FIXME: unsure. Bad Finnish.
  • origpubas = {{julkaistu ensi kerran nimell"a}{julkaistu ensi kerran nimell"a}},
  • origpubin = {{julkaistu ensi kerran vuonna}{julkaistu ensi kerran vuonna}},
  • astitle = {{nimell"a}{nimell"a}},% FIXME: unsure
  • bypublisher = {{julkaissut}{julkaissut}},% FIXME: unsure
    page = {{sivu}{s\adddot}},
    pages = {{sivut}{s\adddot}},
    column = {{palsta}{palsta}},% Here "sarake" is wrong!
    columns = {{palstat}{palstat}},% Here "sarakkeet" is wrong!
    line = {{rivi}{rivi}},
    lines = {{rivit}{rivit}},
  • nodate = {{n\adddot d\adddot}{n\adddot d\adddot}},%FIXME
  • nodate = {{ei julkaisup"aiv"a"a}{ei julkaisup"aiv"a"a}},% FIXME: unsure
    verse = {{s"ae}{s"ae}},
    verses = {{s"akeet}{s"akeet}},
    section = {{kohta}{kohta}},% Bad translation, but "pyk"al"a" is no better.
    @@ -379,7 +383,7 @@
    chapter = {{luku}{luku}},
    mathesis = {{tutkielma}{tutkielma}},% FIXME: unsure
    phdthesis = {{tohtorinv"ait"oskirja}{tohtorinv"ait"oskirja}},
  • candthesis = {{kandidat}{kandidat}},% FIXME: unsure
  • candthesis = {{kanditaatintutkielma}{kanditaatintutkielma}},% Literal translation of "Candidate thesis".
    resreport = {{tutkimusraportti}{tutkimusraportti}},
    techreport = {{tekninen raportti}{tekninen raportti}},
    software = {{ohjelmisto}{ohjelmisto}},
    @@ -387,15 +391,15 @@
    audiocd = {{"a"ani-CD}{"a"ani-CD}},
    version = {{versio}{versio}},
    url = {{url}{url}},
    -% urlfrom = {{}{}},% FIXME: missing
  • urlfrom = {{saatavilla osoitteesta}{saatavilla osoitteesta}},% FIXME: unsure
    urlseen = {{viitattu}{viitattu}},
    -% inpreparation = {{}{}},% FIXME: missing
    -% submitted = {{}{}},% FIXME: missing
    -% forthcoming = {{}{}},% FIXME: missing
    -% inpress = {{}{}},% FIXME: missing
    -% prepublished = {{}{}},% FIXME: missing
  • inpreparation = {{valmisteilla}{valmisteilla}},% FIXME: unsure
  • submitted = {{l"ahetetty}{l"ahetetty}},% FIXME: unsure
  • forthcoming = {{hyv"aksytty julkaistavaksi}{hyv"aksytty julkaistavaksi}},% FIXME: unsure
  • inpress = {{painossa}{painossa}},% FIXME: unsure
  • prepublished = {{esijulkaistu}{esijulkaistu}},% FIXME: unsure
    citedas = {{jatkossa}{jatkossa}},
    -% thiscite = {{}{}},% FIXME: missing
  • thiscite = {{t"am"a lainaus}{t"am"a lainaus}},% FIXME: unsure
    seenote = {{katso viite}{katso viite}},
    quotedin = {{lainattu teoksessa}{lainattu teoksessa}},
    idem = {{idem}{id\adddot}},% It its not necessary to translate Latin phrases.
    @@ -443,42 +447,42 @@
    basicdecember = {{joulukuu}{joulukuu}},
    langamerican = {{amerikanenglanti}{amerikanenglanti}},
    langbrazilian = {{brasilianportugali}{brasilianportugali}},
    -% langcatalan = {{}{}},% FIXME: missing
    -% langcroatian = {{}{}},% FIXME: missing
    -% langczech = {{}{}},% FIXME: missing
  • langcatalan = {{katalonia}{katalonia}},
  • langcroatian = {{kroataia}{kroatia}},
  • langczech = {{t\v{s}ekki}{t\v{s}ekki}},
    langdanish = {{tanska}{tanska}},
    langdutch = {{hollanti}{hollanti}},
    langenglish = {{englanti}{englanti}},
    -% langfinnish = {{}{}},% FIXME: missing
  • langfinnish = {{suomi}{suomi}},
    langfrench = {{ranska}{ranska}},
    langgerman = {{saksa}{saksa}},
    langgreek = {{kreikka}{kreikka}},
    langitalian = {{italia}{italia}},
    langlatin = {{latina}{latina}},
    langnorwegian = {{norja}{norja}},
    -% langpolish = {{}{}},% FIXME: missing
  • langportuguese = {{portugali}{portugali}},
    -% langrussian = {{}{}},% FIXME: missing
  • langpolish = {{puola}{puola}},
  • langportuguese = {{portugali}{portugali}},+
  • langrussian = {{ven"aj"a}{ven"aj"a}},
    langspanish = {{espanja}{espanja}},
    langswedish = {{ruotsi}{ruotsi}},
    fromamerican = {{englannin kielest"a}{englannin kielest"a}},
    frombrazilian = {{portugalin kielest"a}{portugalin kielest"a}},
    -% fromcatalan = {{}{}},% FIXME: missing
    -% fromcroatian = {{}{}},% FIXME: missing
    -% fromczech = {{}{}},% FIXME: missing
  • fromcatalan = {{katalonian kielest"a}{katalonian kielest"a}},
  • fromcroatian = {{kroatian kielest"a}{kroatian kielest"a}},
  • fromczech = {{t\v{s}ekin kielest"a}{t\v{s}ekin kielest"a}},
    fromdanish = {{tanskan kielest"a}{tanskan kielest"a}},
    fromdutch = {{hollannin kielest"a}{hollannin kielest"a}},
    fromenglish = {{englannin kielest"a}{englannin kielest"a}},
    -% fromfinnish = {{}{}},% FIXME: missing
  • fromfinnish = {{suomen kielest"a}{suomen kielest"a}},
    fromfrench = {{ranskan kielest"a}{ranskan kielest"a}},
    fromgerman = {{saksan kielest"a}{saksan kielest"a}},
    fromgreek = {{kreikan kielest"a}{kreikan kielest"a}},
    fromitalian = {{italian kielest"a}{italian kielest"a}},
    fromlatin = {{latinan kielest"a}{latinan kielest"a}},
    fromnorwegian = {{norjan kielest"a}{norjan kielest"a}},
    -% frompolish = {{}{}},% FIXME: missing
  • frompolish = {{puolan kielest"a}{puolan kielest"a}},
    fromportuguese = {{portugalin kielest"a}{portugalin kielest"a}},
    -% fromrussian = {{}{}},% FIXME: missing
  • fromrussian = {{ven"aj"an kielest"a}{ven\aj\an kielest"a}},
    fromspanish = {{espanjan kielest"a}{espanjan kielest"a}},
    fromswedish = {{ruotsin kielest"a}{ruotsin kielest"a}},
    countryde = {{Saksa}{DE}},

L�hett�j�: plk [notifications@github.com]
L�hetetty: 4. hein�kuuta 2015 19:09
Vastaanottaja: plk/biblatex
Kopio: Hannu V�is�nen
Aihe: Re: [biblatex] Adding support for other languages (#160)

Absolutely, Please send it to me. You can send a unified diff if you like.

PK

Dr P Kime

On 04 Jul 2015, at 18:01, ahomansikka notifications@github.com wrote:

Are you interested in updated translation of biblate to Finnish? If you
are, where do I send it and how?


Reply to this email directly or view it on GitHub
#160 (comment).


Reply to this email directly or view it on GitHubhttps://github.com//issues/160#issuecomment-118527218.

@plk
Copy link
Owner

plk commented Jul 4, 2015

This (finnish update) has been committed to the DEV branch, thanks.

@plk plk changed the title Adding support for other languages Use standardized language identifiers for lbx files Nov 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants