Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python Exception Types #216

Closed
polm opened this issue Jul 4, 2022 · 5 comments
Closed

Python Exception Types #216

polm opened this issue Jul 4, 2022 · 5 comments

Comments

@polm
Copy link

polm commented Jul 4, 2022

I noticed that if input is too long in Python an Exception is thrown, but it's a plain Exception, not a ValueError or something. I see in the Rust code there are a variety of specific error types.

I'm not familiar with Rust, but surely it's possible to have the Python code throw something more specific like a InputTooLongException?

@eiennohito
Copy link
Collaborator

We will try to use better exception types later

@gulldan
Copy link

gulldan commented Apr 24, 2024

return Err(SudachiError::InputTooLong(sz, REALLY_MAX_LENGTH));

is it possible to u16 -> u32 or more?
cant use for large text

@polm
Copy link
Author

polm commented Apr 26, 2024

@gulldan Your question is not actually related to the main issue here, which is not about length but types. You should open a new issue.

Separately, from experience, tokenizers like this aren't designed for long inputs like that and you should split yours up into multiple calls.

@eiennohito
Copy link
Collaborator

eiennohito commented Apr 28, 2024

Adding to @polm, making the max input length to be u32::MAX will make it possible for Sudachi to crash with OOM because memory usage for long sentences will be very significant. In future it would be better to add an API for analyzing long text, as Java version has.

Also, getting to the original issue, I think that I changed all usages of Python Exception type to SudachiError during last 3-4 versions. The next version will fix last couple of usages.

SudachiError will be used instead.

@polm
Copy link
Author

polm commented May 4, 2024

Using a single generic error feels a little too general, but it's much better than a full Exception - thanks! I'll go ahead and close this.

@polm polm closed this as completed May 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants