Skip to content

Handle names with Greek characters#6

Merged
nloko merged 1 commit intonloko:masterfrom
dspinellis:master
Oct 20, 2012
Merged

Handle names with Greek characters#6
nloko merged 1 commit intonloko:masterfrom
dspinellis:master

Conversation

@dspinellis
Copy link
Copy Markdown
Contributor

Many Greek people spell their name on Facebook with Latin characters,
so that they can be also recognized by friends who don't read Greek.
This change attempts to match these contacts by transcribing names
using Greek characters in the contact list into names written using
the latin alphabet, so that they can match with the corresponding
Facebook contacts.

The transcription is done according to ISO 843:1997. This has rules
for various phonetic special cases; for instance the epsilon upsilon
diphthoge is transcribed as "ef" or "ev" depending on the following
character. Nevertheless there will still be mismatches, because some
people prefer to spell their name with a C instead of a K or change
their first name into the English equivalent (e.g. George instead of
Giorgos, or Alexander instead of Alexandros).

The transcription rules are quite complex and have thus been expressed
using the jflex lexical analyzer generator. The source file is encoded
in UTF-8, but apparently jflex doesn't support this encoding for its
input file. Therefore the Java code generation is performed through
an intermediate step where the Greek characters in the jflex code are
encoded as Java escape code. To simplify the compilation process
this commit includes both the jflex code and the corresponding generated
Java source code.

See also:
http://transliteration.eki.ee/pdf/Greek.pdf
http://jflex.de/
http://www.spinellis.gr/sw/greek/grconv/

Many Greek people spell their name on Facebook with Latin characters,
so that they can be also recognized by friends who don't read Greek.
This change attempts to match these contacts by transcribing names
using Greek characters in the contact list into names written using
the latin alphabet, so that they can match with the corresponding
Facebook contacts.

The transcription is done according to ISO 843:1997.  This has rules
for various phonetic special cases; for instance the epsilon upsilon
diphthoge is transcribed as "ef" or "ev" depending on the following
character.  Nevertheless there will still be mismatches, because some
people prefer to spell their name with a C instead of a K or change
their first name into the English equivalent (e.g. George instead of
Giorgos, or Alexander instead of Alexandros).

The transcription rules are quite complex and have thus been expressed
using the jflex lexical analyzer generator.  The source file is encoded
in UTF-8, but apparently jflex doesn't support this encoding for its
input file.  Therefore the Java code generation is performed through
an intermediate step where the Greek characters in the jflex code are
encoded as Java escape code.  To simplify the compilation process
this commit includes both the jflex code and the corresponding generated
Java source code.

See also:
http://transliteration.eki.ee/pdf/Greek.pdf
http://jflex.de/
http://www.spinellis.gr/sw/greek/grconv/
nloko added a commit that referenced this pull request Oct 20, 2012
Handle names with Greek characters
@nloko nloko merged commit b6969fa into nloko:master Oct 20, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants