-
-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large blocks of text do not get indexed properly #18
Comments
Do you have a link to a text I can replicate the issue with? |
I was able to reproduce this with the lyrics from a song (after modifying
edit Here's the index produced for the above string: https://termbin.com/oad9 |
This problem has to do with the algorithm used in indexer.c (https://github.com/f-prime/fist/blob/master/fist/indexer.c). I haven't had a chance to look deeply yet, but I think there might be a problem with the look ahead logic. If you notice, the phrase missing is at the end of the text. This leads me to believe there is something wrong with line 20 (https://github.com/f-prime/fist/blob/master/fist/indexer.c#L20). If someone is currently investigating the problem that is where I suggest to look. |
Also notice that in the example |
@StefanSlehta sorry, that part was just a bad copy and paste. |
Do you think it is worth adding a longer string to the test cases? |
Might be worth adding a case that wasn't caught by the original test |
When indexing large blocks of text (e.g. a movie script) not all phrases get indexed properly. It seems like randomly some phrases/words get indexed while others do not.
The text was updated successfully, but these errors were encountered: