Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle names with Greek characters #6

Merged
merged 1 commit into from
Oct 20, 2012
Merged

Conversation

dspinellis
Copy link
Contributor

Many Greek people spell their name on Facebook with Latin characters,
so that they can be also recognized by friends who don't read Greek.
This change attempts to match these contacts by transcribing names
using Greek characters in the contact list into names written using
the latin alphabet, so that they can match with the corresponding
Facebook contacts.

The transcription is done according to ISO 843:1997. This has rules
for various phonetic special cases; for instance the epsilon upsilon
diphthoge is transcribed as "ef" or "ev" depending on the following
character. Nevertheless there will still be mismatches, because some
people prefer to spell their name with a C instead of a K or change
their first name into the English equivalent (e.g. George instead of
Giorgos, or Alexander instead of Alexandros).

The transcription rules are quite complex and have thus been expressed
using the jflex lexical analyzer generator. The source file is encoded
in UTF-8, but apparently jflex doesn't support this encoding for its
input file. Therefore the Java code generation is performed through
an intermediate step where the Greek characters in the jflex code are
encoded as Java escape code. To simplify the compilation process
this commit includes both the jflex code and the corresponding generated
Java source code.

See also:
http://transliteration.eki.ee/pdf/Greek.pdf
http://jflex.de/
http://www.spinellis.gr/sw/greek/grconv/

Many Greek people spell their name on Facebook with Latin characters,
so that they can be also recognized by friends who don't read Greek.
This change attempts to match these contacts by transcribing names
using Greek characters in the contact list into names written using
the latin alphabet, so that they can match with the corresponding
Facebook contacts.

The transcription is done according to ISO 843:1997.  This has rules
for various phonetic special cases; for instance the epsilon upsilon
diphthoge is transcribed as "ef" or "ev" depending on the following
character.  Nevertheless there will still be mismatches, because some
people prefer to spell their name with a C instead of a K or change
their first name into the English equivalent (e.g. George instead of
Giorgos, or Alexander instead of Alexandros).

The transcription rules are quite complex and have thus been expressed
using the jflex lexical analyzer generator.  The source file is encoded
in UTF-8, but apparently jflex doesn't support this encoding for its
input file.  Therefore the Java code generation is performed through
an intermediate step where the Greek characters in the jflex code are
encoded as Java escape code.  To simplify the compilation process
this commit includes both the jflex code and the corresponding generated
Java source code.

See also:
http://transliteration.eki.ee/pdf/Greek.pdf
http://jflex.de/
http://www.spinellis.gr/sw/greek/grconv/
nloko added a commit that referenced this pull request Oct 20, 2012
Handle names with Greek characters
@nloko nloko merged commit b6969fa into nloko:master Oct 20, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants