Import Centre

Sebastian Humenda edited this page Aug 29, 2016 · 5 revisions

Converting Into TEI format

Unfortunately, many translation database/wordlist projects use their own format and so require a specialized converter. Some adhere to standards, but mostly their own. A problem with standards is that you have too many to choose from. It is an aim of FreeDict to lead to some kind of standardisation of data with the TEI format. Also we hope that upstream projects (i.e. the projects where actual dictionary data comes from) make their data available in TEI XML format or even directly use TEI XML as their primary data format. The advantage is that we provide a common format for dictionaries and hence every dictionary ,benefits from measures that we take, to bring dictionaries to new platforms.

Importing Dictionaries

Dictionaries should be imported by a program or script, which produces repeatably the output source for the same input source. The program must be open source, preferably GPL3+. The program/script has to export to TEI P5.

The output dictionary has to be placed in the fd-dictionaries repository and need to follow the ISO 639-2 naming conventions both for the directory as well as for the file name. A German-French dictionary is hence called deu-fra and the TEI file is deu-fra.tei (all lower case).

Documented Importers

Please document importers here Include the used language, usage hints, the location in the Git repository and other instructions. All importers are located below tools/. Please create a separate wiki site, with an entry from here pointing to it, if your explanations are too long.

  • dict2tei.py - conversion of an already formatted dictd database into TEI format

  • ding2tei.pl - conversion of the ding database (English/German) into TEI format

  • hd2tei.pl - conversion of the "hd" format (which dictfmt also understands) into TEI format

  • JMdict*.xsl: Style sheets to create dictionaries out of the JMdict project. The process is documented here.

  • tab2tei.pl - conversion of tab delimited plain text file into TEI format

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.