Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disambiguating <emph> into multiple taggings #7

Open
yellwork opened this issue Jan 17, 2017 · 7 comments
Open

Disambiguating <emph> into multiple taggings #7

yellwork opened this issue Jan 17, 2017 · 7 comments

Comments

@yellwork
Copy link
Collaborator

yellwork commented Jan 17, 2017

Here is an interesting example of typographic distinction opening up into multiple possibilities for tagging:

<p><lb n="030099"/><foreign xml:lang="it">All'erta!</foreign></p>

@JonathanReeve switched the inherited <emph> tagging for a <foreign xml:lang="it">. But the italics also render a quotation (not that every quotation is so distinguished!). Gifford has:

All’erta! (Italian) On guard! Be vigilant! These are the opening words of Giuseppe Verdi’s opera Il Trovatore (The Troubador).

Is this then

<p><lb n="030099"/><quote source="Il Trovatore"><foreign xml:lang="it">All'erta!</foreign></quote></p>

Are there other examples in this vein?

@yellwork
Copy link
Collaborator Author

yellwork commented Jan 17, 2017

Whereas

<lb n="161744"/>admiration of Rossini's <emph>Stabat Mater</emph>, a work simply abounding in

Is just an instance of <title> and not also <foreign xml:lang="la">?

@JonathanReeve
Copy link
Member

JonathanReeve commented Jan 18, 2017

I like the embedded <quote> and <foreign> above, with "All'erta". I agree with you about "Stabat Matter," too--I think just <title> is fine with that one.

@yellwork
Copy link
Collaborator Author

yellwork commented Jan 18, 2017

Is it worth indicating that the bare words of such titles as encountered in the episode are also in Latin &c.? I’m looking through ‘Proteus’ here. On an earlier pass for <emph> disambiguation, you rendered a few likely instances of <title> as <foreign>:

<lb n="030167"/> […] But he must send me <foreign xml:lang="fr">La Vie de Jésus</foreign> by M. Léo Taxil.
<lb n="030196"/>[…] Rich booty you brought back; <foreign xml:lang="fr">Le
<lb n="030197"/>Tutu</foreign>, five tattered numbers of <foreign xml:lang="fr">Pantalon Blanc et Culotte Rouge</foreign>;

Gotcha moment aside (!), this is valuable information that we don’t want clipped in the shift to <title>. How about something like the following?

<lb n="030167"/> […] But he must send me <title type="book" xml:lang="fr">La Vie de Jésus</title> by M. Léo Taxil.

I note, in passing, that there’s also a case to be made for marking up the remainder of the sentence as by <foreign xml:lang="fr" rend="none">M.</foreign> Léo Taxil. (Drawing on our discussion in #2.)

@JonathanReeve
Copy link
Member

JonathanReeve commented Jan 18, 2017

I like that syntax of embedding the language in the tag. Let's do it.

And thanks for catching those mistakes! I've just corrected them, using your suggested syntax.

@yellwork
Copy link
Collaborator Author

yellwork commented Feb 13, 2017

I was just finishing the @said tagging for “Lestrygonians” when I spotted something in the earlier encoding that gave me pause:

<p><lb n="081039"/>He hummed, prolonging in solemn echo the closes of the bars:
<lb n="081040"/><said who="Leopold Bloom">―<foreign xml:lang="it">Don Giovanni, a cenar teco
<lb n="081041"/>M'invitasti.</foreign></said></p>
[...]
<lb n="081051"/><said who="Leopold Bloom">―<foreign xml:lang="it">A cenar teco.</foreign></said></p>
<p><lb n="081052"/>What does that <foreign xml:lang="it">teco</foreign> mean? Tonight perhaps.
<lb n="081053"/><said who="Leopold Bloom">―<emph>Don Giovanni, thou hast me invited
<lb n="081054"/>To come to supper tonight,
<lb n="081055"/>The rum the rumdum.</emph></said></p>
<p><lb n="081056"/>Doesn't go properly.</p>

Really, these instances of <foreign> should all be <quote xml:lang="it">, shouldn’t they? I proposed a double encoding – <quote><foreign> – at the head of this issue, but I’m starting to think <quote xml:lang="it"> (like <title xml:lang="fr"> above) would be neater. What’s anyone else’s sense? This would probably require us to rework a lot of the Latin in the book, <foreign xml:lang="la">, as quotation too: <quote xml:lang="it">. See the first line of dialogue, for example. For:

<lb n="010005"/><said who="Buck Mulligan">―<foreign xml:lang="la">Introibo ad altare Dei.</foreign></said></p>

read

<lb n="010005"/><said who="Buck Mulligan">―<quote xml:lang="la">Introibo ad altare Dei.</quote></said></p>

I’m happy to make these changes, but I wanted to run the proposal by the group first. I’m sure if we make our encoding decisions clear in the README, tools like your foreign-language analysis can be tailored to catch non-English quotations, right, Jonathan?

@JonathanReeve
Copy link
Member

JonathanReeve commented Feb 15, 2017

This sounds great. I think <quote> isn't rendered as italicized by default, though, so if we merge contiguous <quote> and <foreign>, we should probably add @rend, like <quote xml:lang="la" rend="italics"> to preserve the rendering as italicized.

@JonathanReeve
Copy link
Member

JonathanReeve commented Feb 15, 2017

And yep, this won't make too much of a difference in analyses, since we can just look for @xml:lang instead of foreign.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants