New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle names with Greek characters #6

Merged
merged 1 commit into from Oct 20, 2012

Conversation

Projects
None yet
2 participants
@dspinellis
Contributor

dspinellis commented Feb 25, 2012

Many Greek people spell their name on Facebook with Latin characters,
so that they can be also recognized by friends who don't read Greek.
This change attempts to match these contacts by transcribing names
using Greek characters in the contact list into names written using
the latin alphabet, so that they can match with the corresponding
Facebook contacts.

The transcription is done according to ISO 843:1997. This has rules
for various phonetic special cases; for instance the epsilon upsilon
diphthoge is transcribed as "ef" or "ev" depending on the following
character. Nevertheless there will still be mismatches, because some
people prefer to spell their name with a C instead of a K or change
their first name into the English equivalent (e.g. George instead of
Giorgos, or Alexander instead of Alexandros).

The transcription rules are quite complex and have thus been expressed
using the jflex lexical analyzer generator. The source file is encoded
in UTF-8, but apparently jflex doesn't support this encoding for its
input file. Therefore the Java code generation is performed through
an intermediate step where the Greek characters in the jflex code are
encoded as Java escape code. To simplify the compilation process
this commit includes both the jflex code and the corresponding generated
Java source code.

See also:
http://transliteration.eki.ee/pdf/Greek.pdf
http://jflex.de/
http://www.spinellis.gr/sw/greek/grconv/

Match Greek contacts with English FB names
Many Greek people spell their name on Facebook with Latin characters,
so that they can be also recognized by friends who don't read Greek.
This change attempts to match these contacts by transcribing names
using Greek characters in the contact list into names written using
the latin alphabet, so that they can match with the corresponding
Facebook contacts.

The transcription is done according to ISO 843:1997.  This has rules
for various phonetic special cases; for instance the epsilon upsilon
diphthoge is transcribed as "ef" or "ev" depending on the following
character.  Nevertheless there will still be mismatches, because some
people prefer to spell their name with a C instead of a K or change
their first name into the English equivalent (e.g. George instead of
Giorgos, or Alexander instead of Alexandros).

The transcription rules are quite complex and have thus been expressed
using the jflex lexical analyzer generator.  The source file is encoded
in UTF-8, but apparently jflex doesn't support this encoding for its
input file.  Therefore the Java code generation is performed through
an intermediate step where the Greek characters in the jflex code are
encoded as Java escape code.  To simplify the compilation process
this commit includes both the jflex code and the corresponding generated
Java source code.

See also:
http://transliteration.eki.ee/pdf/Greek.pdf
http://jflex.de/
http://www.spinellis.gr/sw/greek/grconv/

nloko added a commit that referenced this pull request Oct 20, 2012

Merge pull request #6 from dspinellis/master
Handle names with Greek characters

@nloko nloko merged commit b6969fa into nloko:master Oct 20, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment