Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.lang.ArrayIndexOutOfBoundsException #70

Open
DeepthiKarnam opened this issue Feb 18, 2016 · 3 comments
Open

java.lang.ArrayIndexOutOfBoundsException #70

DeepthiKarnam opened this issue Feb 18, 2016 · 3 comments

Comments

@DeepthiKarnam
Copy link

Feb 18, 2016 11:28:35 AM edu.ucla.sspace.common.GenericTermDocumentVectorSpace processSpace
INFO: performing log-entropy transform
Feb 18, 2016 11:28:35 AM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Computing the total row counts
Feb 18, 2016 11:28:35 AM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Computing the entropy of each row
Feb 18, 2016 11:28:35 AM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Scaling the entropy of the rows
Feb 18, 2016 11:28:35 AM edu.ucla.sspace.lsa.LatentSemanticAnalysis processSpace
INFO: reducing to 300 dimensions
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
at edu.ucla.sspace.matrix.DiagonalMatrix.checkIndices(DiagonalMatrix.java:78)
at edu.ucla.sspace.matrix.DiagonalMatrix.get(DiagonalMatrix.java:94)
at edu.ucla.sspace.matrix.factorization.SingularValueDecompositionLibJ.factorize(SingularValueDecompositionLibJ.java:89)

The number of words in my corpus turns to be 6000+. Is the code unable to reduce the size of the vector to 300 from 6000+. What is the solution ?

@davidjurgens
Copy link
Collaborator

Hi Deepthi,

Which version of the code are you using? It looks like the stack trace
you have is using SvdlibJ, which we haven't supported for some time (their
implementation is known to have errors in its SVD results). The latest
code should definitely support reducing from 6000 dimensions to 300. How
many documents are in your corpus?

Thanks,
David

On Wed, Feb 17, 2016 at 10:26 PM, DeepthiKarnam notifications@github.com
wrote:

Feb 18, 2016 11:28:35 AM
edu.ucla.sspace.common.GenericTermDocumentVectorSpace processSpace
INFO: performing log-entropy transform
Feb 18, 2016 11:28:35 AM
edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Computing the total row counts
Feb 18, 2016 11:28:35 AM
edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Computing the entropy of each row
Feb 18, 2016 11:28:35 AM
edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Scaling the entropy of the rows
Feb 18, 2016 11:28:35 AM edu.ucla.sspace.lsa.LatentSemanticAnalysis
processSpace
INFO: reducing to 300 dimensions
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
at
edu.ucla.sspace.matrix.DiagonalMatrix.checkIndices(DiagonalMatrix.java:78)
at edu.ucla.sspace.matrix.DiagonalMatrix.get(DiagonalMatrix.java:94)
at
edu.ucla.sspace.matrix.factorization.SingularValueDecompositionLibJ.factorize(SingularValueDecompositionLibJ.java:89)

The number of words in my corpus turns to be 6000+. Is the code unable to
reduce the size of the vector to 300 from 6000+. What is the solution ?


Reply to this email directly or view it on GitHub
#70.

@DeepthiKarnam
Copy link
Author

Hi David,
Thanks for your prompt reply. I have a few more questions in continuation to the above.

I am using the jar "sspace-wordsi-2.0-jar-with-dependencies.jar". Is this not supported ?

Currently, I am running on a sample of size 200 documents. However, the entire corpus is around 9000 documents. Is it scalable ?

Each document is a pdf with close to ~500 words per document (without preprocessing). I am doing a simple preprocessing to remove stopwords and special characters from the text. Do you think, any additional preprocessing will help such as lemmatization ?

@DeepthiKarnam
Copy link
Author

Tried using sspace-2.0.1.jar Problem persists :'(

Feb 18, 2016 12:35:44 PM edu.ucla.sspace.common.GenericTermDocumentVectorSpace processSpace
INFO: performing log-entropy transform
Feb 18, 2016 12:35:44 PM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Computing the total row counts
Feb 18, 2016 12:35:44 PM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Computing the entropy of each row
Feb 18, 2016 12:35:44 PM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Scaling the entropy of the rows
Feb 18, 2016 12:35:44 PM edu.ucla.sspace.lsa.LatentSemanticAnalysis processSpace
INFO: reducing to 300 dimensions
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
at edu.ucla.sspace.matrix.DiagonalMatrix.checkIndices(DiagonalMatrix.java:78)
at edu.ucla.sspace.matrix.DiagonalMatrix.get(DiagonalMatrix.java:85)
at edu.ucla.sspace.matrix.factorization.SingularValueDecompositionLibJ.factorize(SingularValueDecompositionLibJ.java:89)
at edu.ucla.sspace.lsa.LatentSemanticAnalysis.processSpace(LatentSemanticAnalysis.java:360)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants