Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for latin case compares. #46

Merged
merged 1 commit into from
Feb 5, 2018
Merged

Conversation

sheredom
Copy link
Owner

@sheredom sheredom commented Feb 4, 2018

This fixes issue #45.

@sheredom sheredom force-pushed the add_latin_case_support branch 3 times, most recently from a7e684a to a0b2a23 Compare February 5, 2018 11:01
@giampaolo
Copy link

I made some tests through a Python project I'm currently working on. Here is a list of chars which utf8.h can handle with this PR in place. The ones which are commented out are supported by Python (which lowercases them) but not by utf8.h.
I took the list from here:
http://heraultetplus.com/files/CharacterCodes.htm

chars = [
    "À",  # Capital A, accent grave
    "Á",  # Capital A, accent acute
    "Â",  # Capital A, accent circumflex
    "Ã",  # Capital A, accent tilde
    "Ä",  # Capital A, accent umlaut
    "Å",  # Capital A, accent ring
    "Æ",  # Capital AE, ligature
    "Ç",  # Capital C, cedilla
    # "Γ",  # Capital gamma, (Greek)
    # "Δ",  # Capital Delta, (Greek)
    "È",  # Capital E, accent grave
    "É",  # Capital E, accent acute
    "Ê",  # Capital E, accent circumflex
    "Ë",  # Capital E, accent umlaut
    # "Ð",  # Capital Eth, (Icelandic)
    # "Θ",  # Capital theta, (Greek)
    "Ì",  # Capital I, accent grave
    "Í",  # Capital I, accent acute
    "Î",  # Capital I, accent circumflex
    "Ï",  # Capital I, accent umlaut
    # "Λ",  # Capital Lambda, (Greek)
    "Ñ",  # Capital N, accent tilde
    # "Ξ",  # Capital Xi, (Greek)
    "Ò",  # Capital O, accent grave
    "Ó",  # Capital O, accent acute
    "Ô",  # Capital O, accent circumflex
    "Õ",  # Capital O, accent tilde
    "Ö",  # Capital O, accent umlaut
    "Ø",  # Capital O, accent slash
    "Œ",  # Capital OE, ligature
    # "Π",  # Capital Pi, (Greek)
    "Š",  # Capital Esh, (Latin)
    # "Σ",  # Capital Sigma, (Greek)
    "Þ",  # Capital THORN, (Icelandic)
    "Ù",  # Capital U, accent grave
    "Ú",  # Capital U, accent acute
    "Û",  # Capital U, accent circumflex
    "Ü",  # Capital U, accent umlaut
    # "Φ",  # Capital Phi, (Greek)
    "Ý",  # Capital Y, accent acute
    "Ÿ",  # Capital Y, accent umlaut
    # "Ψ",  # Capital Psi, (Greek)
    # "Ω",  # Capital Omega, (Greek)
    # "℧",  # Inverted Capital Omega
    "Ž",  # Capital Z, with caron
]

@sheredom
Copy link
Owner Author

sheredom commented Feb 5, 2018

Ok so it looks like for the most part I'm just missing the greek symbols - I'll do a follow-up PR to add them in bulk (I'd rather not just add a few here and there!).

@giampaolo
Copy link

Sounds good!

@sheredom sheredom merged commit 841cb2d into master Feb 5, 2018
@sheredom sheredom deleted the add_latin_case_support branch February 5, 2018 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants