Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Biterm extractor fails with "IndexError: list index out of range" #58

Open
o87h opened this issue Feb 23, 2024 · 0 comments
Open

Biterm extractor fails with "IndexError: list index out of range" #58

o87h opened this issue Feb 23, 2024 · 0 comments

Comments

@o87h
Copy link

o87h commented Feb 23, 2024

Hi! In the Google Colab notebook, running the biterm extractor on a file above a certain size fails as follows:

Screenshot 2024-02-23 154800

The test_bitext_en_es.tmx test file you supplied works fine. If I truncate my test TMX to less than about 300 lines, it also works fine.

I also tested this on Windows with Python 3.10, 3.11, and 3.12 and the result is the same -- less than 300ish lines works, above 300ish lines fails.

Thanks and love your work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant