Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

capitalize_ascii don’t work for accentuated letters #593

Closed
a2line opened this issue Feb 26, 2018 · 8 comments
Closed

capitalize_ascii don’t work for accentuated letters #593

a2line opened this issue Feb 26, 2018 · 8 comments
Assignees
Labels

Comments

@a2line
Copy link
Collaborator

a2line commented Feb 26, 2018

The last modification from the 12 february 2018 in the master to be able to compile Gw with Ocaml > 4.05 changed the function capitalize to capitalize_ascii. Since that change, we have for eg. [*event/events]1 (that should write Évènements in french) left in lowercase and written évènements. What can be done to have these characters accentuated again?

@hgouraud
Copy link
Collaborator

hgouraud commented Feb 26, 2018

It looks like proper handling of Unicode stuff requires the use of a separate library (Camomile, Batteries, Uchar, ...)
See https://discuss.ocaml.org/t/whats-the-function-of-uchar/939 and ocaml/ocaml#80
I have not yet figured out exactly what needs to be done, but Camomile seems to be the front runner!!
Possibly, we just have to replace our capitale_utf_8 function by its equivalent in Camomile.

@fablhx
Copy link
Contributor

fablhx commented Feb 26, 2018

Yes this is indeed something that should be done, migrating to an external lib (thus creating a dependency). Long time ago, I was not sure which one should be used, but maybe camomile is the one to go with.

@fablhx
Copy link
Contributor

fablhx commented Feb 26, 2018

And of course, we could get ride of this (horrible) hack in ged2gwd.

@hgouraud
Copy link
Collaborator

hgouraud commented Feb 26, 2018

I tried this in util.ml (line 114):

replace:
let c1 = Char.uppercase_ascii (Char.chr (Char.code s.[1] + 0x40)) in
sprintf "%c%c%s" c (Char.chr (Char.code c1 - 0x40))
by
sprintf "%c%c%s" c (Char.chr (Char.code s.[1] - 0x20))

and it seems to work, at least for the é and â I tried!!
Switching to Camomile is a bit beyond my reach!!

@hgouraud
Copy link
Collaborator

By the way, what was happening before remains quite mystérious for me!!
If Char.uppercase was doing -20, then the whole thing was +40-20-40 = -20
If Char.uppercase_ascii is doing nothing for characters outside the ascii range, then we have +40-40=0
The reason why we were goint through this +40-40 step escapes me!

@hgouraud
Copy link
Collaborator

hgouraud commented Feb 26, 2018

Independently of this fix, it appears that the capitalize (and possibly translate) function is broken for russian!
Two examples :
В браке с в 1995, Ailleurs, Aline Test
instead of
Браке с в 1995, Ailleurs, Aline Test

and

Свидетель:g:--я:a:--я (1985): похороны, Christian Vasseur 1925-1985
(french)
Témoin (1985) : inhumation, Christian Vasseur 1925-1985

@a2line
Copy link
Collaborator Author

a2line commented Nov 9, 2018

This is fixed but dunno when!

@a2line a2line closed this as completed Nov 9, 2018
@hgouraud
Copy link
Collaborator

hgouraud commented Nov 9, 2018

Dunno if it is fully fixed, but the small change I made on Feb 26 does the job!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants