Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ToAscii: Some russian texts in Cyrillic (Azbuka) are wrong rewrited #203

Closed
janbarasek opened this issue Oct 24, 2019 · 4 comments
Closed

Comments

@janbarasek
Copy link
Contributor

Hi,

I can speak Russian language and I noticed that the transcription of Cyrillic (Azbuka) to Ascii is sometimes inaccurate.

For example string:

dump(Strings::toAscii('для'));

should be dlja, but not current dla, because char я means ja (Czech) or ya (English).

Sample:

Snímek obrazovky 2019-10-24 v 16 26 54

If my suggestion makes sense I can implement better translation with support for whole syllables and special cases.

Thanks.

@dg
Copy link
Member

dg commented Oct 29, 2019

Do you have enabled Transliterator?

if ($transliterator === null && class_exists('Transliterator', false)) {
And what implementation of iconv are you using?
if (ICONV_IMPL === 'glibc') {

@janbarasek
Copy link
Contributor Author

@dg Yes, Transliterator was instanced and ICONV_IMPL is libiconv.

I detected it by:

/**
 * Converts UTF-8 string to ASCII.
 */
public static function toAscii(string $s): string
{
	static $transliterator = null;
	if ($transliterator === null && class_exists('Transliterator', false)) {
		$transliterator = \Transliterator::create('Any-Latin; Latin-ASCII');
	}
	dump(['transliterator' => $transliterator, 'iconv' => ICONV_IMPL]);

Dumped:

Snímek obrazovky 2019-10-29 v 17 43 49

Thanks.

@dg
Copy link
Member

dg commented Oct 29, 2019

Transliterator converts я to â (according to ISO 9 https://cs.wikipedia.org/wiki/ISO_9) and then â to a.

Probably solution is to add strtr for chars Я я Ю ю and maybe some others.

@janbarasek
Copy link
Contributor Author

I think we can use (for Czech language):

[
	'ё' => 'jo',
	'ъ' => '',
	'ы' => 'y',
 	'ь' => '', 
	'э' => 'eh', 
	'ю' => 'ju',
	'я' => 'ja',
]

But for correct behavior is very important use target language (which can be loaded from second parameter string $language = 'en').

And table for English:

[
	'ё' => 'jo',
	'ъ' => '',
	'ы' => 'y',
 	'ь' => '', 
	'э' => 'e', 
	'ю' => 'yu',
	'я' => 'ya',
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants