Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to create a bilingual dictionary entry #44

Open
boydkelly opened this issue May 15, 2022 · 8 comments
Open

How to create a bilingual dictionary entry #44

boydkelly opened this issue May 15, 2022 · 8 comments
Labels

Comments

@boydkelly
Copy link

Hi, Thanks for amazing project. I am interested in this but I can't see how to make a bi-lingual entry. In the Rev34.xml file there are 'to' and 'from' language elements in the meda_info. But they indicate to translations of languages en and lv. However in the ar entries I don't see anything identified by lv. And there does seem to be a translation of 'Home", but this appears to be in Russian but with no language specified. Can you point to any other example? Thanks!!!!

@soshial
Copy link
Owner

soshial commented Jan 15, 2023

Here's an example I created specially for you. I hope it's not too late.

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE xdxf SYSTEM "xdxf_strict.dtd">
<xdxf revision="034">
    <meta_info>
        <languages>
            <from xml:lang="en"/>
            <to xml:lang="eo"/>
            <to xml:lang="lv-LV"/>
            <to xml:lang="es-ES"/>
            <to xml:lang="zh-cmn-Hant-TW"/>
        </languages>
        <title>Multilingual dictionary</title>
        <full_title>Example of a multilingual dictionary</full_title>
        <description>
            This dictionary shows how to compile a dictionary with one-to-many language translation.
            "k" tag is in English and it is translated into Esperanto, Spanish (Spain) and Spanish (Argentina).
        </description>
        <file_ver>v1.1b</file_ver>
        <creation_date>15-01-2023</creation_date>
        <last_edited_date>15-01-2023</last_edited_date>
    </meta_info>
    <lexicon>
        <ar>
            <k xml:lang="en">cell phone</k>
            <def>
                <def xml:lang="es-ES">
                    <deftext>móvil</deftext>
                </def>
                <def xml:lang="es-AR">
                    <deftext>celular</deftext>
                </def>
                <def xml:lang="eo">
                    <deftext>poŝtelefono</deftext>
                </def>
                <def xml:lang="zh-cmn-Hant-TW">
                    <deftext>手機</deftext>
                </def>
            </def>
        </ar>
    </lexicon>
</xdxf>

@soshial soshial changed the title Feature request: Example bilingual dictionary entry How to create a bilingual dictionary entry Jan 15, 2023
@soshial
Copy link
Owner

soshial commented Jan 15, 2023

I noticed that you have figured out, how to create such entries in your dictionary. Tell if there are cases when my scheme doesn't work/describe well.

@boydkelly
Copy link
Author

Yes thank you very much! It is generally working very well. You noticed I added a couple of extra tags to the dtd. This was really for my use case. I needed to uniquely identify definitions, and examples for use with Anki flash cards but has also been useful for importing into neo4j graph database. So I have a uuid field for each definition. So for example (from French to English) verres: glasses;  lunettes: glasses. Ok this is maybe a dumb example its what my brain came up with right now. But there are a lot of situations in the language that I am working with where there is this many to one relationship that happens. Maybe I could have just used the co tag in hindsight. But the uuid has worked well.

Same for examples where the same example phrase may be used with several word definitions contained therein. This enabled me to create a question for each time the example phrase occurs in the dictionary but giving a different 'hint' each time for the phrase meaning.

You also notice I am working with a language in West Africa. This is Jula, which is not super well documented and there are many spelling variations using either phonetics or french phonemes. In addition to the spelling variations, this is a tonal language, so it has been challenging to keep the 'headword' in the local language unique. (I know I can use a comment for that too). But I was tempted to also add a uuid to the ar or k tag.

I have made extensive use of the kref/spv, but now I am dealing with many situations where multiple words share the same kref/spv. This is totally my use case issue but I have awk scripts that search exisiting docs and add http links to the definitions of words. They also search the spv, but may then link back to the wrong definition. This is not at all an xdxf issue but just to let you know my challenges with this.

The one item that I may have found useful is to have source and author tags for the ar (as you have for examples). Again that would not be so useful for well documented languages. In my case I have noted separately where I 'heard' a certain word.

And finally since xml is difficult to work with especially for non technical people I have been working with yaml and converting to xml, (and sometimes back). But the convesions (esp back to yaml) are not perfect.

I'd love to get a conversion going directly from xdxf to neo4j. Its possible. But the easy route for me right now is to do an xslt to csv, and then import that.

I will post back what I may come up with.

Thanks!

@soshial
Copy link
Owner

soshial commented Jan 16, 2023

You noticed I added a couple of extra tags to the dtd. This was really for my use case. I needed to uniquely identify definitions, and examples for use with Anki flash cards but has also been useful for importing into neo4j graph database. So I have a uuid field for each definition. So for example (from French to English) verres: glasses; lunettes: glasses. Ok this is maybe a dumb example its what my brain came up with right now. But there are a lot of situations in the language that I am working with where there is this many to one relationship that happens. Maybe I could have just used the co tag in hindsight. But the uuid has worked well.

According to the DTD, it is possible to assign IDs to both <k> and <def> via id attribute. I wonder, why you needed to create your own def-id attribute? In the case with "verres: glasses vs lunettes: glasses" both glasses should have a kref tag with idref attribute. I am trying to understand what exact use-case that you didn't use the id attribute and used your own def-id?

Same for examples where the same example phrase may be used with several word definitions contained therein. This enabled me to create a question for each time the example phrase occurs in the dictionary but giving a different 'hint' each time for the phrase meaning.

In the DTD it's not possible to add IDs currently. Would you be so kind to point to specific use cases in Gitlab (like this https://gitlab.com/ci-dict/dyu-xdxf/-/blob/main/mandenkan/dict.xdxf#L29687-29692), where it's needed?

@boydkelly
Copy link
Author

boydkelly commented Jan 16, 2023

According to the DTD, it is possible to assign IDs to both <k> and <def> via id attribute. I wonder, why you needed to create your own def-id attribute? In the case with "verres: glasses vs lunettes: glasses" both glasses should have a kref tag with idref attribute. I am trying to understand what exact use-case that you didn't use the id attribute and used your own def-id?

Its been a while... I remember now I had tried to use that id attribute. But I believe it didn't/wouldn't accept a uuid as a valid tag value. (Which I was already using for my Anki cards) But I'd actually like to use that ID. I'll get back on the exact error there.

@boydkelly
Copy link
Author

In the DTD it's not possible to add IDs currently. Would you be so kind to point to specific use cases in Gitlab (like this https://gitlab.com/ci-dict/dyu-xdxf/-/blob/main/mandenkan/dict.xdxf#L29687-29692), where it's needed?

Yes there are lots of examples. But I don't think this is a limitation of the spec. Its really just how I am using the data for question and answer flash cards.

I have an example phrase, "It's not yours!", used in two different definitions: the definition of the word 'not', and also the word 'yours'.

I need to keep them unique so the phrase is presented to the user twice: on one card with a hint providing the meaning of the word 'yours' and on another card with a hint for the word 'not'.

The uuid produced the required results.

 9db7c703-6df2-4bc3-ac29-fb5601417eeb> tá = mien, sien, nôtre, vôtre>i ta tɛ!> Ce n'est pas le tien!>-
  73707939-e5ca-4153-b16c-6f97ed2f7010> tɛ́ = (négation)>i ta tɛ+!>Ce n'est pas le tien+!> 

See: https://coastsystems.net/docs/fr/slides/5words/

@soshial
Copy link
Owner

soshial commented Jan 17, 2023

On a unrelated note, some comments on your XDXF file:

  1. I believe you might have missed <?xml version="1.0" encoding="UTF-8" ?> and <!DOCTYPE xdxf SYSTEM "xdxf_strict.dtd"> in the beginning of the file.
  2. You might have confused <creation_date> and <last_edited_date>

@boydkelly
Copy link
Author

boydkelly commented Jan 17, 2023

Thanks! You are really keeping me on the ball. Actually since I maintain the dictionary in yaml and convert on every change to xml, I had temporarily commented out those lines. I was doing some re-arranging. I have put them back, but I was validating the DTD via script anyways.

yq -x < "$yml > $dict"
  
#tidy -q -m -xml -indent $dict-
sed -i '1 i <!DOCTYPE xdxf SYSTEM "xdxf_strict.dtd">' $dict
sed -i '1 i <?xml version="1.0" encoding="UTF-8" ?>' $dict
xmllint --noout --dtdvalid $project/xdxf_strict.dtd ./$dict
  
 ln -f "$dict ./$project/$xml"

For the dates, yes I was not actually paying to much attention. I will have to automate inserting the current date every time I save or convert my file.

You were asking also about the change I made to categ in the DTD. I am using the categ element for 'tags'. I noted that categ related to wikipedia or something that I thought could be repurposed for my use... The change was so that I could use it as a list element to tag or 'categorize' definitions. (People; Calendar; Work etc) Again this was not so much for the dictionary, but for the Anki cards I produce from the file.

Just as an FYI, I maintain all these scripts in a CI on gitlab, so I make a change to the yaml file it converts to xdxf and updates the dictionary, quiz, anki cards on one shot!

(My neo4j project is temporarily offline. I have to get back to making some adjustments there. )

https://coastsystems.net/docs/fr/lexique-dyu/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants