Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Converting Into TEI format
Unfortunately, many translation database/wordlist projects use their own format and so require a specialized converter. Some adhere to standards, but mostly their own. A problem with standards is that you have too many to choose from. It is an aim of FreeDict to lead to some kind of standardisation of data with the TEI format. Also we hope that upstream projects (i.e. the projects where actual dictionary data comes from) make their data available in TEI XML format or even directly use TEI XML as their primary data format. The advantage is that we provide a common format for dictionaries and hence every dictionary ,benefits from measures that we take, to bring dictionaries to new platforms.
Dictionaries should be imported by a program or script, which produces repeatably the output source for the same input source. The program must be open source, preferably GPL3+. The program/script has to export to TEI P5.
The output dictionary has to be placed in the
fd-dictionaries repository and
need to follow the
ISO 639-2 naming
conventions both for the directory as well as for the file name. A German-French
dictionary is hence called
deu-fra and the TEI file is
Please document importers here Include the used language, usage hints, the
location in the Git repository and other instructions. All importers are located
Please create a separate wiki site, with an entry from here pointing to it, if
your explanations are too long.
dict2tei.py - conversion of an already formatted dictd database into TEI format
ding2tei.pl - conversion of the ding database (English/German) into TEI format
hd2tei.pl - conversion of the "hd" format (which dictfmt also understands) into TEI format
*.xsl: Style sheets to create dictionaries out of the JMdict project. The process is documented here.
tab2tei.pl - conversion of tab delimited plain text file into TEI format