You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Languages without delimiters - Japanese and Chinese (Simplified, Traditional) and possibly other east Asian languages don't have any delimiter. eg) 九千九百九十九 (9999 in Japanese). These actually have a very similar structure compared to English but the lack of a delimiter makes it tougher.
Also, there isn't a delimiter as such (upto a certain number) for German and Dutch .
One approach in mind for the delimiter thing is reading words character by character and as soon as we have a match in any of the words we insert a space and after this pre-processing step, we can follow the same logic. This does increase the complexity O(string_length ^ 2) which shouldn't be a major issue I believe. (We can use this function only for certain languages without delimiters).
Concrete example
five thousand nine hundred and thirteen - English (5913)
fünftausendneunhundertdreizehn - German (5913)
nine hundred and thirteen - English (913)
negenhonderddertien - Dutch (913)
To handle this we first check f , fü, fün and finally hit fünf = 5 and similary get negen = 9 and insert a space and then start again from the next character.
The text was updated successfully, but these errors were encountered:
Just to give another approach for German and Dutch, depending on the number of unique tokens, we could do the inverse process, trying to match the tokens with the number
As an example (I didn't think how to implementate it, it's just an idea):
Why can't we just translate all other languages to English and then just convert them to numbers ?
I guess this would reduce the effort. Translation can be done using Googletrans.
Languages without delimiters - Japanese and Chinese (Simplified, Traditional) and possibly other east Asian languages don't have any delimiter. eg) 九千九百九十九 (9999 in Japanese). These actually have a very similar structure compared to English but the lack of a delimiter makes it tougher.
Also, there isn't a delimiter as such (upto a certain number) for German and Dutch .
One approach in mind for the delimiter thing is reading words character by character and as soon as we have a match in any of the words we insert a space and after this pre-processing step, we can follow the same logic. This does increase the complexity O(string_length ^ 2) which shouldn't be a major issue I believe. (We can use this function only for certain languages without delimiters).
Concrete example
To handle this we first
check f , fü, fün and finally hit fünf = 5
andsimilary get negen = 9
and insert a space and then start again from the next character.The text was updated successfully, but these errors were encountered: