Skip to content
This repository has been archived by the owner on Mar 21, 2023. It is now read-only.

Exception java.lang.ArrayIndexOutOfBoundsException: -1 get's thrown #3

Open
ilyastam opened this issue Jan 13, 2014 · 3 comments
Open

Comments

@ilyastam
Copy link

Every once in a while I see the following exception thrown:

java.lang.ArrayIndexOutOfBoundsException: -1
at cc.mallet.topics.WorkerRunnable.sampleTopicsForOneDoc(WorkerRunnable.java:489)
at cc.mallet.topics.WorkerRunnable.run(WorkerRunnable.java:275)
at cc.mallet.topics.ParallelTopicModel.estimate(ParallelTopicModel.java:874)

When I went to the location of where exception is being thrown, I saw the following code:

            i = -1;
            while (sample > 0) {
                i++;
                sample -= topicTermScores[i];
            }

            newTopic = currentTypeTopicCounts[i] & topicMask;

It appears that sometimes sample can in fact be less than zero, which legitimately causes java.lang.ArrayIndexOutOfBoundsException to be thrown when jvm runs into newTopic = currentTypeTopicCounts[-1] & topicMask;

This seems like a bug to me. For my purposes I am patching it as follows:

            i = -1;
            while (sample > 0 || i < 0) {
                i++;
                sample -= topicTermScores[i];
            }

            newTopic = currentTypeTopicCounts[i] & topicMask;

I am not sure about the impact of this on the result, but it seems to fix the immediate problem with the code. Would be great to see a proper fix for this though.

@VibhutiBansal
Copy link

The Following exception is thrown:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
at SimpleLDA.main(SimpleLDA.java:563)

@juanmirocks
Copy link
Owner

@ilyastam @VibhutiBansal Thank you for these reports. I cannot actively maintain the code at this moment. The main reason being that the original unit tests are mostly flawed and everything fails.

Having said that, if you send a pull request with sensible passing unit tests implemented, I will be happy to merge the changes.

However the while (sample > 0 || i < 0) { seems mostly a hack. The likely bug as you suggested of negative indexes should be resolved.

@hkarbasi
Copy link

hkarbasi commented May 17, 2019

Recently, I ran Mallet with two different datasets (one with 100M and the other one around 1G). Usually, this kind of exception happened with the larger dataset and when I wanted to run in in parallel for larger iteration number like 100 for the larger dataset. It threw Exception: ArrayIndexOutOfBoundsException in two different files: WorkerRunnable and ParallelTopicModel in different spots. So the thing is when the array reaches the end of the array, it prints “overflow in merging on type” to the logger and after that point, the program doesn’t do anything to get out of the situation. I was able to patch these edge cases with index checking before accessing the array. It helps me run it without breaking it but I am not sure how it might change the output anyways and it also keeps printing the same message “overflow in merging on type ” as usual but it goes on and doesn’t throw an exception.

I have uploaded the patches on my Github and follow the instructions. It has been able to resolve the issues for me as I haven’t seen this break again under different circumstances. If it doesn’t resolve your issues, you should probably download the latest version from their Github and debug and build it yourself.

I have also uploaded both datasets; both are four years of data; (1 Jan 2015- 1 Jan 2019), smaller one is StackExchange (DataScience) and the larger one is Reddit (9 DataScience Subreddits) (datasets) and you would like to play with it.

Good luck.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants