Differing results between R and CLI #26

brogie62 · 2015-11-30T13:54:54Z

I had been using the CLI version of Somoclu and getting results consistent with other implementations of batch-trained SOMs (R-Kohonen and Matlab Neural network Toolbox). When you responded to my request for the inclusion of a bubble neighborhood function I decided to use the R package to test it rather than recompile for CLI (which was initially done for me by a colleague.) Following your instructions I compiled and tested the new version of the R package. I found that I was getting much higher quantitization errors than in CLI. In order to determine whether the difference was due to R or the requested changes I installed the current, unmodified, R package and compared the same input file using the same initial codebook. Using CLI I got a quantitization error of 5.73 but with R the quantitization error was 18.95.

Here is the CLI command:

somoclu -c T7_init_weights_nospace_CRend.wts -e 100 -k 1 -m planar -t linear -r 9 -R 1 -T linear -l 1 -L 0.01 -s 0 -x 18 -y 15 t7_norow_somoclu T_Opt_6

Here is the R script:
dataTemp <- data.frame(fread("t7_norow_somoclu"))
dataSource <- as.matrix(dataTemp)
initTemp <- data.frame(fread("T7_init_weights_nospace_CRend.wts"))
initSource <- as.matrix(initTemp)
nSomX <- 18
nSomY <- 15
nEpoch <- 100
radius0 <- 9
radiusN <- 1
radiusCooling <- "linear"
scale0 <- 1
scaleN <- 0.01
scaleCooling <- "linear"
kernelType <- 0
mapType <- "planar"
gridType <- "rectangular"
compactSupport <- FALSE
codebook <- initSource
res <- Rsomoclu.train(dataSource, nEpoch, nSomX, nSomY, radius0, radiusN, radiusCooling, scale0, scaleN, scaleCooling, kernelType, mapType, gridType, compactSupport, codebook)
head(res$globalBmus)

xgdgsc · 2015-11-30T14:20:49Z

The kernelType is different? Have you tried both use kernelType=0?

peterwittek · 2015-11-30T14:29:02Z

The starting learning rate is also different.

brogie62 · 2015-11-30T14:37:27Z

Copied wrong CLI command. They were both done with l =1. (BTW, I was doing parameter optimization and found that, using gaussian, the starting learning rate has little effect on quant error which I had seen previously.) The CLI run was gpu so kernal is 1. I had previously ran cpu on CLI with kernel = 0 and got identical results to the corresponding gpu run.

peterwittek · 2015-11-30T14:55:16Z

Actually, if it is compiled without CUDA support, the GPU kernel (=1) falls back to the CPU kernel without saying a word.

In any case, the problem is odd. To comply with CRAN, the random number generator of the R version is the one from <R.h>:

(RAND_MAX * unif_rand())

which should be identical in effect to the rand() function in <cstdlib>. The generated integer random number is than transformed to the [0, 1] interval in both cases. I saw major discrepancies if and only if the data coordinates were not normalized to [0, 1]. So lets get through a couple of basic points:

Is your data normalized?
Did you set the environment variable OMP_NUM_THREADS? It should not have any impact on the actual result, but it is good to know how many cores you are using.
How do you evaluate quantization error?

brogie62 · 2015-11-30T15:13:25Z

The CLI version was compiled with CUDA support.

Randomization also should not matter as I am using an initial codebook.

The data is not nomalized and is identical in both cases.

I have not made any changes to OMP_NUM_THREADS.

I evaluate quant error by averaging the euclidean distance between each input vector and its BMU using the rdist function in the fields package in R:

weights <- res$codebook
inputs <- dataSource
distMatrix <- rdist(inputs, weights)
result <- t(sapply(seq(nrow(distMatrix)), function(i) {
j <- which.min(distMatrix[i,])
c(distMatrix[i,j])
}))

MinM <- mean(result)

xgdgsc · 2015-11-30T16:40:31Z

For me the above commit gives me the codebook more similar with the CLI version than before. So it might be related to the wrong handling of column-major matrix when converting array between C and R. Please try if this fixes the issue.

brogie62 · 2015-11-30T16:53:40Z

I deleted my previous comment with the images. Although they are accurate, I decided I am not ready to have my work in a public forum yet. I hope that is OK.

brogie62 · 2015-11-30T16:56:44Z

I'll give that commit a try and let you know. Does that commit include the neighborhood function parameter?

xgdgsc · 2015-11-30T17:02:57Z

That includes the neighborhood function parameter

brogie62 · 2015-12-01T14:45:02Z

Tryed #26. Gaussian gave quant error of 5.78. Very much in line with CLI. Bubble gave improved quant error of 5.34. With Matlab and kohonen R package I was getting < 5. I may need to optimize parameters.

Thanks

peterwittek · 2015-12-01T14:57:20Z

The R interface is a bit of a mistreated foster child, as we are inexperienced with it. Thanks for pointing out this bug.

xgdgsc · 2015-12-02T04:11:54Z

The fix is on CRAN now.

peterwittek · 2015-12-02T07:21:25Z

Thanks very much. I did some overdue clean up and tagged version 1.5.1. The update is released on MLOSS and GitHub. Please update PyPI.

xgdgsc · 2015-12-02T13:34:58Z

OK. Just uploaded source to PyPI, will build the binaries later.

xgdgsc added a commit that referenced this issue Nov 30, 2015

try #26

284bef0

peterwittek added the bug label Dec 1, 2015

peterwittek closed this as completed Dec 1, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Differing results between R and CLI #26

Differing results between R and CLI #26

brogie62 commented Nov 30, 2015

xgdgsc commented Nov 30, 2015

peterwittek commented Nov 30, 2015

brogie62 commented Nov 30, 2015

peterwittek commented Nov 30, 2015

brogie62 commented Nov 30, 2015

xgdgsc commented Nov 30, 2015

brogie62 commented Nov 30, 2015

brogie62 commented Nov 30, 2015

xgdgsc commented Nov 30, 2015

brogie62 commented Dec 1, 2015

peterwittek commented Dec 1, 2015

xgdgsc commented Dec 2, 2015

peterwittek commented Dec 2, 2015

xgdgsc commented Dec 2, 2015

Differing results between R and CLI #26

Differing results between R and CLI #26

Comments

brogie62 commented Nov 30, 2015

xgdgsc commented Nov 30, 2015

peterwittek commented Nov 30, 2015

brogie62 commented Nov 30, 2015

peterwittek commented Nov 30, 2015

brogie62 commented Nov 30, 2015

xgdgsc commented Nov 30, 2015

brogie62 commented Nov 30, 2015

brogie62 commented Nov 30, 2015

xgdgsc commented Nov 30, 2015

brogie62 commented Dec 1, 2015

peterwittek commented Dec 1, 2015

xgdgsc commented Dec 2, 2015

peterwittek commented Dec 2, 2015

xgdgsc commented Dec 2, 2015