Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception using Matlab for LSA #26

Open
N2D2 opened this issue Nov 9, 2012 · 16 comments
Open

Exception using Matlab for LSA #26

N2D2 opened this issue Nov 9, 2012 · 16 comments

Comments

@N2D2
Copy link

N2D2 commented Nov 9, 2012

Hi,
if I put the -S MATLAB command, Java throws
IllegalArgumentException: dimensions must be positive
at edu.ucla.sspace.matrix.OnDiskMatrix.(OnDiskMatrix.java:98)
at edu.ucla.sspace.matrix.Matrices.create(Matrices.java:216)
at edu.ucla.sspace.matrix.MatrixIO.readDenseTextMatrix(MatrixIO.java:924)

In Matrices.create(Matrices.java:216) I find the lines:

case SPARSE_ON_DISK:
//return new SparseOnDiskMatrix(rows, cols);
// REMDINER: implement me
return new OnDiskMatrix(rows, cols);

Is the MATLAB matrix format not implemented yet or is it a bug?

The output above the Exception suggests a general problem with reading the Matlab-Output:

Nov 09, 2012 11:21:52 AM edu.ucla.sspace.matrix.MatrixIO readDenseTextMatrix
FINE: reading in text matrix with 15262 rows and 0 cols
Nov 09, 2012 11:21:52 AM edu.ucla.sspace.matrix.MatrixIO readDenseTextMatrix
FINE: reading in text matrix with 100 rows and 0 cols

And Matlab gives the warning:

Warning: Imaginary part of complex variable 'U' not saved to ASCII file.
Warning: Imaginary part of complex variable 'V' not saved to ASCII file.
But the three Matlab output matrices seem normal.

I use Mac 10.7, Matlab 2012a and the SSpace 2.0-Code (but this happened with earlier code, too)

@N2D2
Copy link
Author

N2D2 commented Nov 12, 2012

I found the solution: The java.util.Scanner class is used to import the Matlab-Files. This class is dependent on the language of the Java environment.
The normal english-version of the JRE interprets the numbers (e. g. 7.4566000e+07), generated by Matlab, correctly. But in some Non-US-Versions this strings aren't interpreted as numbers, because there is a comma used instead of the decimal point: 7,4566000e+07.
In future versions you might add java.util.Scanner.useLocale(new Locale("en", "US")) function to the Scanner Object to overcome this issue.

@fozziethebeat
Copy link
Owner

Awesome find! If you wan to send us a pull request, i'll be more than happy to merge this little fix :)

@ganonp
Copy link

ganonp commented Mar 23, 2013

I seem to be having a similar problem:

Mar 23, 2013 2:32:56 PM edu.ucla.sspace.common.GenericTermDocumentVectorSpace pr
ocessSpace
INFO: performing log-entropy transform
Mar 23, 2013 2:32:56 PM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlo
balTransform
INFO: Computing the total row counts
Mar 23, 2013 2:32:56 PM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlo
balTransform
INFO: Computing the entropy of each row
Mar 23, 2013 2:32:56 PM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlo
balTransform
INFO: Scaling the entropy of the rows
Mar 23, 2013 2:32:56 PM edu.ucla.sspace.lsa.LatentSemanticAnalysis processSpace
INFO: reducing to 300 dimensions
Exception in thread "main" java.lang.IllegalArgumentException: dimensions must b
e positive
at edu.ucla.sspace.matrix.OnDiskMatrix.(OnDiskMatrix.java:98)
at edu.ucla.sspace.matrix.Matrices.create(Matrices.java:216)
at edu.ucla.sspace.matrix.MatrixIO.readDenseTextMatrix(MatrixIO.java:924
)
at edu.ucla.sspace.matrix.MatrixIO.readMatrix(MatrixIO.java:794)
at edu.ucla.sspace.matrix.MatrixIO.readMatrix(MatrixIO.java:761)
at edu.ucla.sspace.matrix.factorization.SingularValueDecompositionMatlab
.factorize(SingularValueDecompositionMatlab.java:137)
at edu.ucla.sspace.lsa.LatentSemanticAnalysis.processSpace(LatentSemanti
cAnalysis.java:439)
at edu.ucla.sspace.mains.GenericMain.processDocumentsAndSpace(GenericMai
n.java:514)
at edu.ucla.sspace.mains.GenericMain.run(GenericMain.java:443)
at edu.ucla.sspace.mains.LSAMain.main(LSAMain.java:167)

I use Win7, MatLab2010a, and the current version of sspace (2.03).

@ganonp
Copy link

ganonp commented Mar 23, 2013

All three of the matlab output matrices remain empty. Does anyone know why this might be the case?

@davidjurgens
Copy link
Collaborator

I think we've tested with 2010a, but I don't think we saw this behavior.
Are you using Windows as well? If possible, is there any way to give us a
test corpus that reproduces the issue? Keith and I are finally geting
around to pushing out some new changes, so we could get this fixed soon
once we're able to reproduce the behavior.

Thanks,
David

On Sun, Mar 24, 2013 at 12:40 AM, ganonp notifications@github.com wrote:

All three of the matlab output matrices remain empty. Does anyone know why
this might be the case?


Reply to this email directly or view it on GitHubhttps://github.com//issues/26#issuecomment-15348493
.

@N2D2
Copy link
Author

N2D2 commented Mar 24, 2013

It might well be that the corpus is too small. If you do not have some more documents than dimensions, matlab cannot compute the SVD.

@ganonp
Copy link

ganonp commented Mar 24, 2013

Thank you for your help!

So the problem originally occurred with a rather large corpus (several thousand lines in an ~150 megabyte .txt file), since then I've been using a smaller version and messing with the code in an attempt to remedy the problem.

I have actually taken the matrix file (via matrix.getAbsolutePath()) and run it through matlab manually using the same code found in the matlab script in the SingularValueDecompositionMatlab.java document and it has no problem producing any of the three uOutput, sOutput, or vOutput files correctly (at least not blank). I've never run a java program that accessed matlab, so it's possible there is some issue in my environment variables? Every time I run the program, matlab opens up to a command window, no script is visibly implemented, and the uOutput, sOutput, and vOutput files remain blank, so I believe it's an issue of actually getting matlab to run the script...

@davidjurgens
Copy link
Collaborator

Wow, that really helps track down the issue! My guess is that there is an
issue with how our code is trying to invoke Matlab from the command line
Win7. We never had a Win7 system to test the code on, so there's a good
chance we might not be handling the command line call correctly. I'll see
if I can track down a machine here at work and do some testing.

We've tried hard to make sure all the code is platform independent, so it's
important to fix this kind of bug. :)

Thanks,
David

On Sun, Mar 24, 2013 at 4:52 PM, ganonp notifications@github.com wrote:

Thank you for your help!

So the problem originally occurred with a rather large corpus (several
thousand lines in an ~150 megabyte .txt file), since then I've been using a
smaller version and messing with the code in an attempt to remedy the
problem.

I have actually taken the matrix file (via matrix.getAbsolutePath()) and
run it through matlab manually using the same code found in the matlab
script in the SingularValueDecompositionMatlab.java document and it has no
problem producing any of the three uOutput, sOutput, or vOutput files
correctly (at least not blank). I've never run a java program that accessed
matlab, so it's possible there is some issue in my environment variables?
Every time I run the program, matlab opens up to a command window, no
script is visibly implemented, and the uOutput, sOutput, and vOutput files
remain blank, so I believe it's an issue of actually getting matlab to run
the script...


Reply to this email directly or view it on GitHubhttps://github.com//issues/26#issuecomment-15361143
.

@ganonp
Copy link

ganonp commented Mar 24, 2013

Hey, no problem! I'm thankful someone is putting this type of software out open source, it's going to be really helpful for some projects I'm working on. Let me know if there's anything I can do to help!

Ganon

@N2D2
Copy link
Author

N2D2 commented Mar 24, 2013

Hi,
I have looked at my code and it may be the same problem I described. The exception is thrown by reading the term-document matrix. At this point matlab has not yet been used. If you use a Non-US-Version of Java, you have to insert a line like this: java.util.Locale.setDefault(new Locale("US"));
before the term-doc matrix is read in (before MatrixIO.readMatrix(File matrix, Format format, Type matrixType, boolean transposeOnRead))
or use my pull-request #31

@davidjurgens
Copy link
Collaborator

Whoops, I thought we had integrated that pull request! I'll make sure that
gets integrated in the next release for certain. There's probably a few
other Locale-based bugs so I'll make scan for others as well.

On Sun, Mar 24, 2013 at 8:06 PM, N2D2 notifications@github.com wrote:

Hi,
I have looked at my code and it may be the same problem I described. The
exception is thrown by reading the term-document matrix. At this point
matlab has not yet been used. If you use a Non-US-Version of Java, you have
to insert a line like this: java.util.Locale.setDefault(new Locale("US"));
before the term-doc matrix is read in (before MatrixIO.readMatrix(File
matrix, Format format, Type matrixType, boolean transposeOnRead))
or use my pull-request #31#31


Reply to this email directly or view it on GitHubhttps://github.com//issues/26#issuecomment-15366184
.

@ganonp
Copy link

ganonp commented Mar 24, 2013

See that was my initial thought too, which is why I posted here. I can't imagine why I wouldn't have a US-Version of Java, but I went ahead and changed the scanner objects in the readDenseTextMatrix method in the MatrixIO class to

"Scanner s = new Scanner(line).useLocale(new Locale("en", "US"));" around line 897

and

"Scanner scanner = new Scanner(matrix).useLocale(new Locale("en", "US"));" around line 928

Upon doing this I added:

"System.out.println("reading in text matrix with " + rows +
" rows and " + cols + " cols");"

So I could evaluate whether it was actually counting anything. and it was not - ie it was returning 0 rows and -1 cols.

From here I looked at what the actual file "matrix" was referring to, and it appears to be the sOutput and the uOutput from the SingularValueDecompositionMatlab class in the factorize method.

Now, these are the two files that are empty, however they are not empty when I run the matlab code manually using the exact same MatrixFile used in the factorize method manually. This is what is leading me to believe that it's a problem with the interface between matlab and java/command line. It is also these two files which the scanner is using to count rows and columns - unless I missed something.

I've been trying to evaluate if there is another step where the termdocumentvectorspace variable (from the generictermdocumentvectorspace class) is used to determine columns and rows but I can't find one, but I'm also not that great at java :p

@N2D2
Copy link
Author

N2D2 commented Mar 24, 2013

Ok, that means the Term-Document-Matrix is read in correctly? Than that is not the point. I remember, it was a struggle to set the environment variables for matlab. Finally I put an alias-file of matlab to /usr/bin under Mac OS. Without this, I had the same problem. Have you set matlab in the PATH environment? In my memory I had the situation, matlab starts normally in the terminal, but a java-program in the same terminal did not find it, until I put the matlab-aliases. It may be true that I did something more, e. g. set a path-variable, but I do not remember my steps exactly.

@ganonp
Copy link

ganonp commented Mar 24, 2013

It appears to me that it is read in correctly, though as I said, I could be mistaken. I have set my path environment variable to C:\Program Files (x86)\MATLAB\R2010a Student\bin. I've done some googling on this and can't seem to find anything else.

@N2D2
Copy link
Author

N2D2 commented Mar 26, 2013

  1. "Every time I run the program, matlab opens up to a command window, no script is visibly implemented"
    Does it mean matlab starts a new window or the program go into a matlab environment? That is wrong. Matlab is started with -nodisplay, you see only the matlab-output. You can see the normal output at the bottom.

  2. You checked that the Term-Document-Matrix, at my output the matlab-sparse-matrix6106521009675059906.dat.matrix-transform2454813125364315471.dat file, is successful created?! The file-content looks like:
    1 1 0.637382
    2 1 0.526688
    3 1 0.507444
    4 1 0.491153
    5 1 0.812426
    6 1 0.373287
    7 1 0.651494

  3. I remember my main problem with the PATH-alias. I had to set the alias under mac os to the command line program. Under Mac OS all files are encapsulated in the .app-file. So I had to look into this app-file and set the reference to /Applications/MATLAB_R2012a.app/bin/matlab (Dip into an app-archive is an unusual proceeding). I don´t know the windows-config of matlab, but maybe you point to the wrong matlab-part, not the terminal-program.

I quote things I changed, that has probably nothing to do with your concrete problem, but can help in the future.
4) In the SingularValueDecompositionMatlab.java file, I add some options for matlab, because the standard values are not sufficient. I changed this line: "[U, S, V] = svds(A, " + dimensions + " );\n" +
to

"opts.maxit = 2000;\n" +
"opts.tol = 1e-55;\n" +
"[U, S, V] = svds(A, " + dimensions + ",'L',opts);\n" +

  1. Look at Problem with handling of Matlab-matrix #28 : If the matrix is to small, than the computed singular-values of matlab are less than the requested matrix size and the program run into an exception.
    So I fill up the singular values with very small numbers, that is not the elegant way, but change nothing and make the code stable. (for that reason, the changes at 4)):
    This line in SingularValueDecompositionMatlab.java is changed:

for (int s = 0; s < dimensions; ++s)
singularValues[s] = S.get(s, s);

with:

double lastNotNull=0;
for (int s = 0; s < dimensions; ++s)
{
if(s < S.rows())
{
singularValues[s] = S.get(s, s);
if(singularValues[s] != 0.0)
{
lastNotNull = singularValues[s];
}
}
else
{
lastNotNull = lastNotNull-(lastNotNull/61.0d);
singularValues[s] = lastNotNull;
}
}

61 was the average decrease for my matrices. Remember that this are the smallest and nonrelevant singular values. If this code is reached, you want to get too many dimensions from a too small corpus!

  1. See: Fatal error in S-space saving for LSA?! #30
    so I changed in the AbstractSVD the line

103 dataClasses.set(r, c, U.get(r, c) * singularValues[c]);
with
dataClasses.set(r, c, U.get(r, c) * (1.0d/singularValues[c]));

and

124 classFeatures.set(r, c, V.get(r, c) * singularValues[r]);
with
classFeatures.set(r, c, V.get(r, c) *(1.0d/singularValues[r]));

but at this point I am not sure, that I am right. Maybe I do not understand the code completely, but only with this changes I can reproduce the LSI-example from Landauer et al.

  1. See Forgotten throw() in SingularValueDecompositionMatlab.factorize() [line 157] #27

That are all my changes, with this adjustments you have a stable LSA-implementation.

###########
Here the output of a successful run:

`Mrz 26, 2013 11:04:27 AM edu.ucla.sspace.mains.LSAMain verbose
FINE: parsed document #74509 in 0,000 seconds
...
Mrz 26, 2013 11:04:27 AM edu.ucla.sspace.mains.LSAMain verbose
FINE: Processed all 74514 documents in 13,716 total seconds
Mrz 26, 2013 11:04:27 AM edu.ucla.sspace.matrix.MatlabSparseMatrixBuilder finish
FINE: Finished writing matrix in MATLAB_SPARSE format with 74449 columns
Mrz 26, 2013 11:04:27 AM edu.ucla.sspace.common.GenericTermDocumentVectorSpace processSpace
INFO: performing log-entropy transform
Mrz 26, 2013 11:04:27 AM edu.ucla.sspace.common.GenericTermDocumentVectorSpace processSpace
FINE: stored term-document matrix in format MATLAB_SPARSE at /var/folders/zg/4q1zhr211175mbhhfbcn45lc0000gn/T/matlab-sparse-matrix6106521009675059906.dat
Mar 26, 2013 11:04:28 AM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Computing the total row counts
Mar 26, 2013 11:04:36 AM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Computing the entropy of each row
Mar 26, 2013 11:04:38 AM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Scaling the entropy of the rows
Mar 26, 2013 11:04:47 AM edu.ucla.sspace.common.GenericTermDocumentVectorSpace processSpace
FINE: transformed matrix to /var/folders/zg/4q1zhr211175mbhhfbcn45lc0000gn/T/matlab-sparse-matrix6106521009675059906.dat.matrix-transform2454813125364315471.dat
Mar 26, 2013 11:04:47 AM edu.ucla.sspace.lsa.LatentSemanticAnalysis processSpace
INFO: reducing to 5 dimensions
Mar 26, 2013 11:04:47 AM edu.ucla.sspace.matrix.factorization.SingularValueDecompositionMatlab factorize
FINE: writing Matlab output to files:
/var/folders/zg/4q1zhr211175mbhhfbcn45lc0000gn/T/matlab-svds-U2486854453852387586.dat
/var/folders/zg/4q1zhr211175mbhhfbcn45lc0000gn/T/matlab-svds-S2720750310155449077.dat
/var/folders/zg/4q1zhr211175mbhhfbcn45lc0000gn/T/matlab-svds-V7780446078228956827.dat

Mar 26, 2013 11:04:47 AM edu.ucla.sspace.matrix.factorization.SingularValueDecompositionMatlab factorize
FINE: matlab -nodisplay -nosplash -nojvm
Mar 26, 2013 11:05:10 AM edu.ucla.sspace.matrix.factorization.SingularValueDecompositionMatlab factorize
FINE: Matlab svds output:
< M A T L A B (R) >
Copyright 1984-2012 The MathWorks, Inc.
R2012a (7.14.0.739) 64-bit (maci64)
February 9, 2012

To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

>> >> >> >> >> >> >> >> >> >> >> >> Matlab Finished
>>

Mar 26, 2013 11:05:10 AM edu.ucla.sspace.matrix.factorization.SingularValueDecompositionMatlab factorize
FINE: Matlab svds exit status: 0
Mar 26, 2013 11:05:26 AM edu.ucla.sspace.matrix.MatrixIO readDenseTextMatrix
FINE: reading in text matrix with 68243 rows and 5 cols
Mar 26, 2013 11:05:29 AM edu.ucla.sspace.matrix.MatrixIO readDenseTextMatrix
FINE: reading in text matrix with 5 rows and 5 cols
Mar 26, 2013 11:05:43 AM edu.ucla.sspace.matrix.MatrixIO readDenseTextMatrix
FINE: reading in text matrix with 74449 rows and 5 cols
Mar 26, 2013 11:05:46 AM edu.ucla.sspace.mains.LSAMain verbose
FINE: processed space in 78.274 seconds
output File: m.out
Mar 26, 2013 11:05:46 AM edu.ucla.sspace.common.SemanticSpaceIO writeText
FINE: saving text S-Space with 68243 words with 5-dimensional vectors
Mar 26, 2013 11:05:48 AM edu.ucla.sspace.mains.LSAMain verbose
FINE: printed space in 1.921 seconds`

@jerrygaoLondon
Copy link

I've encountered the same problem (IllegalArgumentException: dimensions must be positive) when running dimension reduction by using SingularValueDecompositionMatlab or SingularValueDecompositionOctave in Windows 7.

I have matlab installed. The problem is exactly the same as the one described by @ganonp .
I can see "Every time I run the program, matlab opens up to a command window,". However, no data can be written into matlab-svds-VXXX.dat, matlab-svds-SXXX.dat and matlab-svds-UXXX.dat.

When i run the script (as below) in matlab, it works well. I suspect that there is an issue to make matlab script run correctly by writing scripts to matlab output stream (as the code in line 102 in SingularValueDecompositionMatlab.java). I have tried the version 2.0.4 and 2.0.3 and none of them works. Any ideas?

Z=load('C:\Users\jerry\AppData\Local\Temp\matlab-input3613993774554135994.dat','-ascii');
A = spconvert(Z);
clear Z;
[U, S, V] = svds(A, 3);
save C:\Users\jerry\AppData\Local\Temp\matlab-svds-U8365859529350808097.dat U -ASCII
save C:\Users\jerry\AppData\Local\Temp\matlab-svds-S1129545791491312334.dat S -ASCII
save C:\Users\jerry\AppData\Local\Temp\matlab-svds-V4284815455262646654.dat V -ASCII
fprintf('Matlab Finished\n');

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants