Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differing results between R and CLI #26

Closed
brogie62 opened this issue Nov 30, 2015 · 14 comments
Closed

Differing results between R and CLI #26

brogie62 opened this issue Nov 30, 2015 · 14 comments
Labels

Comments

@brogie62
Copy link

I had been using the CLI version of Somoclu and getting results consistent with other implementations of batch-trained SOMs (R-Kohonen and Matlab Neural network Toolbox). When you responded to my request for the inclusion of a bubble neighborhood function I decided to use the R package to test it rather than recompile for CLI (which was initially done for me by a colleague.) Following your instructions I compiled and tested the new version of the R package. I found that I was getting much higher quantitization errors than in CLI. In order to determine whether the difference was due to R or the requested changes I installed the current, unmodified, R package and compared the same input file using the same initial codebook. Using CLI I got a quantitization error of 5.73 but with R the quantitization error was 18.95.

Here is the CLI command:

somoclu -c T7_init_weights_nospace_CRend.wts -e 100 -k 1 -m planar -t linear -r 9 -R 1 -T linear -l 1 -L 0.01 -s 0 -x 18 -y 15 t7_norow_somoclu T_Opt_6

Here is the R script:
dataTemp <- data.frame(fread("t7_norow_somoclu"))
dataSource <- as.matrix(dataTemp)
initTemp <- data.frame(fread("T7_init_weights_nospace_CRend.wts"))
initSource <- as.matrix(initTemp)
nSomX <- 18
nSomY <- 15
nEpoch <- 100
radius0 <- 9
radiusN <- 1
radiusCooling <- "linear"
scale0 <- 1
scaleN <- 0.01
scaleCooling <- "linear"
kernelType <- 0
mapType <- "planar"
gridType <- "rectangular"
compactSupport <- FALSE
codebook <- initSource
res <- Rsomoclu.train(dataSource, nEpoch, nSomX, nSomY, radius0, radiusN, radiusCooling, scale0, scaleN, scaleCooling, kernelType, mapType, gridType, compactSupport, codebook)
head(res$globalBmus)

@xgdgsc
Copy link
Collaborator

xgdgsc commented Nov 30, 2015

The kernelType is different? Have you tried both use kernelType=0?

@peterwittek
Copy link
Owner

The starting learning rate is also different.

@brogie62
Copy link
Author

Copied wrong CLI command. They were both done with l =1. (BTW, I was doing parameter optimization and found that, using gaussian, the starting learning rate has little effect on quant error which I had seen previously.) The CLI run was gpu so kernal is 1. I had previously ran cpu on CLI with kernel = 0 and got identical results to the corresponding gpu run.

@peterwittek
Copy link
Owner

Actually, if it is compiled without CUDA support, the GPU kernel (=1) falls back to the CPU kernel without saying a word.

In any case, the problem is odd. To comply with CRAN, the random number generator of the R version is the one from <R.h>:

(RAND_MAX * unif_rand())

which should be identical in effect to the rand() function in <cstdlib>. The generated integer random number is than transformed to the [0, 1] interval in both cases. I saw major discrepancies if and only if the data coordinates were not normalized to [0, 1]. So lets get through a couple of basic points:

  • Is your data normalized?
  • Did you set the environment variable OMP_NUM_THREADS? It should not have any impact on the actual result, but it is good to know how many cores you are using.
  • How do you evaluate quantization error?

@brogie62
Copy link
Author

The CLI version was compiled with CUDA support.

Randomization also should not matter as I am using an initial codebook.

The data is not nomalized and is identical in both cases.

I have not made any changes to OMP_NUM_THREADS.

I evaluate quant error by averaging the euclidean distance between each input vector and its BMU using the rdist function in the fields package in R:

weights <- res$codebook
inputs <- dataSource
distMatrix <- rdist(inputs, weights)
result <- t(sapply(seq(nrow(distMatrix)), function(i) {
j <- which.min(distMatrix[i,])
c(distMatrix[i,j])
}))

MinM <- mean(result)

xgdgsc added a commit that referenced this issue Nov 30, 2015
@xgdgsc
Copy link
Collaborator

xgdgsc commented Nov 30, 2015

For me the above commit gives me the codebook more similar with the CLI version than before. So it might be related to the wrong handling of column-major matrix when converting array between C and R. Please try if this fixes the issue.

@brogie62
Copy link
Author

I deleted my previous comment with the images. Although they are accurate, I decided I am not ready to have my work in a public forum yet. I hope that is OK.

@brogie62
Copy link
Author

I'll give that commit a try and let you know. Does that commit include the neighborhood function parameter?

@xgdgsc
Copy link
Collaborator

xgdgsc commented Nov 30, 2015

That includes the neighborhood function parameter

@brogie62
Copy link
Author

brogie62 commented Dec 1, 2015

Tryed #26. Gaussian gave quant error of 5.78. Very much in line with CLI. Bubble gave improved quant error of 5.34. With Matlab and kohonen R package I was getting < 5. I may need to optimize parameters.

Thanks

@peterwittek peterwittek added the bug label Dec 1, 2015
@peterwittek
Copy link
Owner

The R interface is a bit of a mistreated foster child, as we are inexperienced with it. Thanks for pointing out this bug.

@xgdgsc
Copy link
Collaborator

xgdgsc commented Dec 2, 2015

The fix is on CRAN now.

@peterwittek
Copy link
Owner

Thanks very much. I did some overdue clean up and tagged version 1.5.1. The update is released on MLOSS and GitHub. Please update PyPI.

@xgdgsc
Copy link
Collaborator

xgdgsc commented Dec 2, 2015

OK. Just uploaded source to PyPI, will build the binaries later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants