java.lang.ArrayIndexOutOfBoundsException #70

DeepthiKarnam · 2016-02-18T06:26:17Z

Feb 18, 2016 11:28:35 AM edu.ucla.sspace.common.GenericTermDocumentVectorSpace processSpace
INFO: performing log-entropy transform
Feb 18, 2016 11:28:35 AM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Computing the total row counts
Feb 18, 2016 11:28:35 AM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Computing the entropy of each row
Feb 18, 2016 11:28:35 AM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Scaling the entropy of the rows
Feb 18, 2016 11:28:35 AM edu.ucla.sspace.lsa.LatentSemanticAnalysis processSpace
INFO: reducing to 300 dimensions
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
at edu.ucla.sspace.matrix.DiagonalMatrix.checkIndices(DiagonalMatrix.java:78)
at edu.ucla.sspace.matrix.DiagonalMatrix.get(DiagonalMatrix.java:94)
at edu.ucla.sspace.matrix.factorization.SingularValueDecompositionLibJ.factorize(SingularValueDecompositionLibJ.java:89)

The number of words in my corpus turns to be 6000+. Is the code unable to reduce the size of the vector to 300 from 6000+. What is the solution ?

davidjurgens · 2016-02-18T06:46:22Z

Hi Deepthi,

Which version of the code are you using? It looks like the stack trace
you have is using SvdlibJ, which we haven't supported for some time (their
implementation is known to have errors in its SVD results). The latest
code should definitely support reducing from 6000 dimensions to 300. How
many documents are in your corpus?

Thanks,
David

On Wed, Feb 17, 2016 at 10:26 PM, DeepthiKarnam notifications@github.com
wrote:

Feb 18, 2016 11:28:35 AM
edu.ucla.sspace.common.GenericTermDocumentVectorSpace processSpace
INFO: performing log-entropy transform
Feb 18, 2016 11:28:35 AM
edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Computing the total row counts
Feb 18, 2016 11:28:35 AM
edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Computing the entropy of each row
Feb 18, 2016 11:28:35 AM
edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Scaling the entropy of the rows
Feb 18, 2016 11:28:35 AM edu.ucla.sspace.lsa.LatentSemanticAnalysis
processSpace
INFO: reducing to 300 dimensions
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
at
edu.ucla.sspace.matrix.DiagonalMatrix.checkIndices(DiagonalMatrix.java:78)
at edu.ucla.sspace.matrix.DiagonalMatrix.get(DiagonalMatrix.java:94)
at
edu.ucla.sspace.matrix.factorization.SingularValueDecompositionLibJ.factorize(SingularValueDecompositionLibJ.java:89)

The number of words in my corpus turns to be 6000+. Is the code unable to
reduce the size of the vector to 300 from 6000+. What is the solution ?

—
Reply to this email directly or view it on GitHub
#70.

DeepthiKarnam · 2016-02-18T06:54:48Z

Hi David,
Thanks for your prompt reply. I have a few more questions in continuation to the above.

I am using the jar "sspace-wordsi-2.0-jar-with-dependencies.jar". Is this not supported ?

Currently, I am running on a sample of size 200 documents. However, the entire corpus is around 9000 documents. Is it scalable ?

Each document is a pdf with close to ~500 words per document (without preprocessing). I am doing a simple preprocessing to remove stopwords and special characters from the text. Do you think, any additional preprocessing will help such as lemmatization ?

DeepthiKarnam · 2016-02-18T07:07:33Z

Tried using sspace-2.0.1.jar Problem persists :'(

Feb 18, 2016 12:35:44 PM edu.ucla.sspace.common.GenericTermDocumentVectorSpace processSpace
INFO: performing log-entropy transform
Feb 18, 2016 12:35:44 PM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Computing the total row counts
Feb 18, 2016 12:35:44 PM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Computing the entropy of each row
Feb 18, 2016 12:35:44 PM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Scaling the entropy of the rows
Feb 18, 2016 12:35:44 PM edu.ucla.sspace.lsa.LatentSemanticAnalysis processSpace
INFO: reducing to 300 dimensions
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
at edu.ucla.sspace.matrix.DiagonalMatrix.checkIndices(DiagonalMatrix.java:78)
at edu.ucla.sspace.matrix.DiagonalMatrix.get(DiagonalMatrix.java:85)
at edu.ucla.sspace.matrix.factorization.SingularValueDecompositionLibJ.factorize(SingularValueDecompositionLibJ.java:89)
at edu.ucla.sspace.lsa.LatentSemanticAnalysis.processSpace(LatentSemanticAnalysis.java:360)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

java.lang.ArrayIndexOutOfBoundsException #70

java.lang.ArrayIndexOutOfBoundsException #70

DeepthiKarnam commented Feb 18, 2016

davidjurgens commented Feb 18, 2016

DeepthiKarnam commented Feb 18, 2016

DeepthiKarnam commented Feb 18, 2016

java.lang.ArrayIndexOutOfBoundsException #70

java.lang.ArrayIndexOutOfBoundsException #70

Comments

DeepthiKarnam commented Feb 18, 2016

davidjurgens commented Feb 18, 2016

DeepthiKarnam commented Feb 18, 2016

DeepthiKarnam commented Feb 18, 2016