-
Notifications
You must be signed in to change notification settings - Fork 27
Exception java.lang.ArrayIndexOutOfBoundsException: -1 get's thrown #3
Comments
The Following exception is thrown: |
@ilyastam @VibhutiBansal Thank you for these reports. I cannot actively maintain the code at this moment. The main reason being that the original unit tests are mostly flawed and everything fails. Having said that, if you send a pull request with sensible passing unit tests implemented, I will be happy to merge the changes. However the |
Recently, I ran Mallet with two different datasets (one with 100M and the other one around 1G). Usually, this kind of exception happened with the larger dataset and when I wanted to run in in parallel for larger iteration number like 100 for the larger dataset. It threw Exception: ArrayIndexOutOfBoundsException in two different files: WorkerRunnable and ParallelTopicModel in different spots. So the thing is when the array reaches the end of the array, it prints “overflow in merging on type” to the logger and after that point, the program doesn’t do anything to get out of the situation. I was able to patch these edge cases with index checking before accessing the array. It helps me run it without breaking it but I am not sure how it might change the output anyways and it also keeps printing the same message “overflow in merging on type ” as usual but it goes on and doesn’t throw an exception. I have uploaded the patches on my Github and follow the instructions. It has been able to resolve the issues for me as I haven’t seen this break again under different circumstances. If it doesn’t resolve your issues, you should probably download the latest version from their Github and debug and build it yourself. I have also uploaded both datasets; both are four years of data; (1 Jan 2015- 1 Jan 2019), smaller one is StackExchange (DataScience) and the larger one is Reddit (9 DataScience Subreddits) (datasets) and you would like to play with it. Good luck. |
Every once in a while I see the following exception thrown:
java.lang.ArrayIndexOutOfBoundsException: -1
at cc.mallet.topics.WorkerRunnable.sampleTopicsForOneDoc(WorkerRunnable.java:489)
at cc.mallet.topics.WorkerRunnable.run(WorkerRunnable.java:275)
at cc.mallet.topics.ParallelTopicModel.estimate(ParallelTopicModel.java:874)
When I went to the location of where exception is being thrown, I saw the following code:
It appears that sometimes sample can in fact be less than zero, which legitimately causes java.lang.ArrayIndexOutOfBoundsException to be thrown when jvm runs into newTopic = currentTypeTopicCounts[-1] & topicMask;
This seems like a bug to me. For my purposes I am patching it as follows:
I am not sure about the impact of this on the result, but it seems to fix the immediate problem with the code. Would be great to see a proper fix for this though.
The text was updated successfully, but these errors were encountered: