-
-
Notifications
You must be signed in to change notification settings - Fork 19
Closed
Description
The specification specifically says that we must validate the number of characters (looks like graphemes would be even a more correct term).
Currently scraperlib is using the len
function which is not counting the number of graphemes (what we want to validate because they are the visually perceived thing) but the number of code points (which is not what is visually perceived).
Looks like (according to ChatGPT, let's be honest) we could use the grapheme
library. Not sure this is the appropriate idea since this lib seems barely maintained / released in a proper manner.
import grapheme
print(len("विकी मेड मेडिकल इनसाइक्लोपीडिया हिंदी में")) # Outputs: 41 => Wrong
print(grapheme.length("विकी मेड मेडिकल इनसाइक्लोपीडिया हिंदी में")) # Outputs: 25 => Correct
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working