Disambiguate <emph> to <foreign> for foreign languages #2
Here are some interesting edge-cases:
<lb n="070468"/><said>--</said><emph>Thanky vous</emph>, Lenehan said, helping himself.</p>
There is no pre-existing language ID for this mix of French and English. But we might be able to define one in the TEI header like this:
<langUsage> <language ident=”franglais”>Franglais, a mixture of French and English</language> </langUsage>
Then we can mark it up like this:
<lb n="070468"/><said>--</said><foreign xml:lang=”franglais”>Thanky vous</foreign>, Lenehan said, helping himself.</p>
I've documented this in CONTRIBUTING.md. Here's another case:
Gifford suggests that "Lacaus esant" is Bloom remembering the line "La cause è santa" from a French opera probably performed in Italian, since Italian performances of the opera were common in the early 20th C. But the French for that, "la cause est sainte" might be rendered the same way in Bloom's mind. I'll leave this as Italian for now, trusting that Gifford knows something I don't.
Gifford writes: "Se reads Sel in the German version of the 'Eighth and Ninth Books of Moses' thus, if the phrase were Sel el yilo, it could be regarded as a phonetic reproduction of the Spanish Cielillo, “Little Heaven”; and “nebrakada” could be Spanish-Arabic for “blessed”. The whole charm would then read: “[My] little heaven of blessed femininity, love only me. Holy! Amen.”
So I'll define a new language, Spanish-Arabic, in the header.
This one is puzzling:
<p rend="non-indent"><lb n="110043"/><emph>Naminedamine.</emph> Preacher is he.</p>
I'll label this as Latin for now.
From the Wikibooks annotations to Ulysses (line references adapted to our edition):
Interestingly enough, while both corpusnomine at 11.1036 and nominedomine at 11.1244 were tagged in
This and the discussion above raises two questions:
This is really interesting. I think you're right--we could mark up "Dominenamine" with
Although we could define a new language for, say, corrupted Latin, that might be a slippery slope, since there is also corrupted (phonetically spelled) Irish, and other nonstandard forms. So I think that's a good idea to add a
<foreign xml:lang="la" type="corrupted" rend="none">Dominenamine</foreign>
Of course, we could also go with something like "Bloomean" instead of "corrupted"! So long as we keep track of the
OK, let’s go with
<lb n="030176"/>devil's name? Paysayenn. P. C. N., you know: <foreign xml:lang="fr">physiques, chimiques et
But we could easily add
I think the 'corrupted' value on
<lb n="080623"/>here. <emph>Lacaus esant tara tara.</emph> Great chorus that. <emph>Taree tara.</emph> Must be washed
<lb n="100849"/><said>--</said><emph>Se el yilo nebrakada femininum! Amor me solo! Sanktus! Amen.</emph></p>
I had a good chat with Hugh Cayless on the Digital Humanities Slack, and he has a few suggestions for improving the language encoding:
I'll go ahead and make these changes now.
Intriguing, Jonathan. More work but more nuance in the markup. I suppose you can tweak your tools to group French and French-ish when they need to and to separate them when that’s preferable?
It makes me think too that Bloom’s
<p><foreign xml:lang="es"><lb n="150216"/>Bueñas noches, señorita Blanca. Que calle es esta?</foreign></p></sp>
is probably Spanglish (
Where would something like Stephen’s “demiurgos” at U 3.18 sit? Right now, we have it encoded as one of our lingering
Or is it just English?
Is “demiurgos” then just an English obsoletism?!