Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

function araby.is_arabicword return false for some arabic word #57

Closed
OWaheed opened this issue Jun 30, 2022 · 4 comments
Closed

function araby.is_arabicword return false for some arabic word #57

OWaheed opened this issue Jun 30, 2022 · 4 comments

Comments

@OWaheed
Copy link

OWaheed commented Jun 30, 2022

is_arabicword is returning false when passing the following words to it
"اﻟﻤﺴﺌﻮﻟﻴﺔ","ﻣﺴﺎﻣﻌﻬﻢ","ﻓﻜﻠﻨﺎ","ﻣﺒﺎدراﺗﻨﺎ","ﻓﻬﻢ","اﻟﻤﻨﻈﻮﻣﺔ"

@linuxscout
Copy link
Owner

Salam,
I tested the given words with pyarabic word as follow,
the word contains encoded glyphs not standard letters, it must be converted to ordinary letters.

To convert glyph based word into a string of letters you can use:
NB: the second unshape function is used only to inverse the result word

 word = "ﻣﺴﺎﻣﻌﻬﻢ"
 from pyarabic.unshape import unshaping_word
unshaping_word(unshaping_word(word))
'مسامعهم'
  • The test used to detect the problem

``>>> import pyarabic.araby as ar

lst=["اﻟﻤﺴﺌﻮﻟﻴﺔ","ﻣﺴﺎﻣﻌﻬﻢ","ﻓﻜﻠﻨﺎ","ﻣﺒﺎدراﺗﻨﺎ","ﻓﻬﻢ","اﻟﻤﻨﻈﻮﻣﺔ"]
for i in lst:
... print(i, ar.is_arabicword(i))
...
اﻟﻤﺴﺌﻮﻟﻴﺔ False
ﻣﺴﺎﻣﻌﻬﻢ False
ﻓﻜﻠﻨﺎ False
ﻣﺒﺎدراﺗﻨﺎ False
ﻓﻬﻢ False
اﻟﻤﻨﻈﻮﻣﺔ False

for i in lst:
... print("%s"%i, ar.is_arabicword(i))
...
اﻟﻤﺴﺌﻮﻟﻴﺔ False
ﻣﺴﺎﻣﻌﻬﻢ False
ﻓﻜﻠﻨﺎ False
ﻣﺒﺎدراﺗﻨﺎ False
ﻓﻬﻢ False
اﻟﻤﻨﻈﻮﻣﺔ False
for i in lst:
... for c in i :
... print(c, ord(c), ar.name(c))
...
ا 1575 ألف
ﻟ 65247
ﻤ 65252
ﺴ 65204
ﺌ 65164
ﻮ 65262
ﻟ 65247
ﻴ 65268
ﺔ 65172
ﻣ 65251
ﺴ 65204
ﺎ 65166
ﻣ 65251
ﻌ 65228
ﻬ 65260
ﻢ 65250
ﻓ 65235
ﻜ 65244
ﻠ 65248
ﻨ 65256
ﺎ 65166
ﻣ 65251
ﺒ 65170
ﺎ 65166
د 1583 دال
ر 1585 راء
ا 1575 ألف
ﺗ 65175
ﻨ 65256
ﺎ 65166
ﻓ 65235
ﻬ 65260
ﻢ 65250
ا 1575 ألف
ﻟ 65247
ﻤ 65252
ﻨ 65256
ﻈ 65224
ﻮ 65262
ﻣ 65251
ﺔ 65172
`

@OWaheed
Copy link
Author

OWaheed commented Jul 3, 2022

know I know the reason , thank you for your effort you made my day ,last thing on twitter I saw you post about the list of Arabic words the 192 millions words list ,I tried to contact you for it but the messages is closed on twitter could you please provide a link for that list , thanks in advance

@linuxscout
Copy link
Owner

linuxscout commented Jul 3, 2022

Salam,
please send me your email, to send the link,
I will publish it as a new open data,
now it's available only on google drive
taha.zerrouki at gmail.com

@OWaheed
Copy link
Author

OWaheed commented Jul 3, 2022

thanks I will send you an email ,thanks for your efforts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants