Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operation failed with UTF8 character #17

Closed
Luvata opened this issue Nov 12, 2019 · 3 comments
Closed

Operation failed with UTF8 character #17

Luvata opened this issue Nov 12, 2019 · 3 comments

Comments

@Luvata
Copy link

Luvata commented Nov 12, 2019

I'm learning pynini to map character number to syllabel, but I always got "Operation failed" when my fst2 in transducer contain "ộ" character, even though I passed token_type='utf8' on both transducer and stringify.

Here is my code

import pynini

ones_map = pynini.union(
    pynini.transducer("1", "một", token_type='utf8'),
    pynini.transducer("2", "hai", token_type='utf8'),
    pynini.transducer("3", "ba", token_type='utf8'),
)

chars = [chr(i) for i in range(1, 91)] + [r"\[", r"\\", r"\]"] + [chr(i) for i in range(94, 256)]
sigma_star = pynini.union(*chars).closure()
numbers = pynini.union("1", "2", "3", "4", "5", "6", "7", "8", "9", "0")

num_norm = (pynini.cdrewrite(ones_map, "", "", sigma_star))

def normalize(string):
    return pynini.compose(string.strip(), num_norm).stringify(token_type='utf8')

print(normalize("1"))  # Operation failed
print(normalize("2"))  # Success, output "hai"
@kylebgorman
Copy link
Owner

kylebgorman commented Nov 12, 2019 via email

@Luvata
Copy link
Author

Luvata commented Nov 12, 2019

Thank you for pointing that out to me, so my quick fix is:

chars = [chr(i) for i in range(1, 91)] + [r"\[", r"\\", r"\]"] + [chr(i) for i in range(94, 256)]
chars += [bytes(i, "utf8") for i in "aáàạãảăắằặẵẳâấầậẫẩbcdđeéèẹẽẻêếềệễểghiíìịĩỉklmnoóòọõỏôốồộỗổơớờợỡởpqrstuúùụũủưứừựữửvxyýỳỵỹỷfjzw"]
chars = set(chars)
sigma_star = pynini.union(*chars).closure()

and also remove all token_type='utf8', and it works seamlessly 💃

Once again, thank you for your awesome library

@Luvata Luvata closed this as completed Nov 12, 2019
@kylebgorman
Copy link
Owner

kylebgorman commented Nov 12, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants