Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The predict function is giving the same values for any test case #14

Closed
rguntha opened this issue Mar 20, 2018 · 13 comments
Closed

The predict function is giving the same values for any test case #14

rguntha opened this issue Mar 20, 2018 · 13 comments

Comments

@rguntha
Copy link

rguntha commented Mar 20, 2018

Please help me...

I am working on blood pressure prediction using the SVM regression methods. I have tried both the Epsilon and Nu regression types. I also have tried with various values of gamma, nu, cost and epsilon values.

=======================================================
The Problem:

I am getting the same predictions no matter what input data I am giving..

=======================================================
Code:

var fs = require("fs");

async function runTest() {
    var trainDataString = fs.readFileSync('../src/assets/bp/trainaims.csv')+"";
    var testDataString = fs.readFileSync('../src/assets/bp/testaims.csv')+"";
    testLibSVM(trainDataString.split('\r\n').splice(1,16),testDataString.split('\r\n').splice(1,5));
}

async function testLibSVM(trainData,testData){
    const SVM = await require('libsvm-js');
    var svmSys = new SVM({type:SVM.SVM_TYPES.EPSILON_SVR,gamma:0.01,nu:[0.01,0.125,0.5,1],cost:1,epsilon:0.1});
    var svmDia = new SVM({type:SVM.SVM_TYPES.EPSILON_SVR,gamma:0.01,nu:[0.01,0.125,0.5,1],cost:1,epsilon:0.1});
    let dataArray = [];
    let sysValues = [];
    let diaValues = [];

    for(let i=0;i<trainData.length;i++){
      let line = trainData[i];
      let trainingRecord = line.split(",").map(data => parseFloat(data));
      if(trainingRecord.length === 14){
        let inputParams = trainingRecord.splice(0,12);
        dataArray.push(inputParams);
        sysValues.push(trainingRecord[0]);
        diaValues.push(trainingRecord[1]);
      }
    }
    svmSys.train(dataArray,sysValues);
    svmDia.train(dataArray,diaValues);
    testData.forEach(element => {
        if(element.length > 0){
            console.log(element);
            let dataArrayStr = element.split(",");
            let testDataArray = dataArrayStr.map(data => parseFloat(data));
            let values = [];
            values.push(svmSys.predictOne(testDataArray));
            values.push(svmDia.predictOne(testDataArray));
            console.log(values);
        }
    });
  }
runTest().then(() => console.log('done!'));

=======================================================
Logs:

trying binaryen method: native-wasm
asynchronously preparing wasm
binaryen method succeeded.
done!
*
optimization finished, #iter = 8
nu = 0.925000
obj = -146.760000, rho = -121.500000
nSV = 16, nBSV = 14
*
optimization finished, #iter = 8
nu = 1.000000
obj = -97.400000, rho = -68.000000
nSV = 16, nBSV = 16
2,42,84.8456,0.5617,0.1455,0.7072,3.97E+04,536.6846,-143.0407,146.9154,164.2148,130.1103
[ 121.5, 68 ]
2,29,87.7407,0.543,0.1408,0.6838,8.10E+04,1.20E+03,-297.5299,297.771,345.1783,252.506
[ 121.5, 68 ]
1,28,75.1024,0.64,0.1589,0.7989,4.41E+04,552.0304,-133.7681,172.2191,189.3553,156.166
[ 121.5, 68 ]
2,28,77.4648,0.6695,0.1051,0.7745,4.50E+04,869.1954,-137.7776,127.5441,138.6392,117.5474
[ 121.5, 68 ]
1,25,96.8411,0.5049,0.1147,0.6196,1.37E+05,2.39E+03,-540.3023,557.6924,533.8753,582.0348
[ 121.5, 68 ]

=======================================================
Input Data

Input data can be found in the attached zip folder. The training data file contains 16 rows. The last two columns are the two label values. The testing file contains 5 rows. From the logs you can see that they are producing same values.

trainaims.zip

@stropitek
Copy link
Member

stropitek commented Mar 21, 2018

Hi
I haven't had the opportunity to test the regression on a real case scenario, only with the demo website's unidimensional example.

Have you tried to do exactly the same with the original libsvm library? Are the results any different? If you could post here the results you get with it that would help a lot!

Thanks

@rguntha
Copy link
Author

rguntha commented Mar 21, 2018

Hi, Thanks very much for the quick reply..

I have not tried the original libsvm library. I have tried it on R and it gave varying results.

I have even tried with the fat dataset published in libsvm site.
https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression/bodyfat
I have used the last 19 records are test records and rest of them as train records.
Below is the result. You can note that all the 19 records produced the same '1.05195' as the result.

==
*
libsvm.js:1
optimization finished, #iter = 0
libsvm.js:1
nu = 0.000000
libsvm.js:1
obj = 0.000000, rho = -1.051950
libsvm.js:1
nSV = 0, nBSV = 0
libsvm.js:1
[[1.05195,1.05195,1.05195,1.05195,1.05195,1.05195,1.05195,1.05195,1.05195,1.05195,1.05195,1.05195,1.05195,1.05195,1.05195,1.05195,1.05195,1.05195,1.05195]]
fat-libsvm.js:37
done

The code is below. The data files are attached.

var fs = require("fs");

async function runTest() {
    var trainDataString = fs.readFileSync('./src/assets/bp/trainfat.csv')+"";
    var testDataString = fs.readFileSync('./src/assets/bp/testfat.csv')+"";
    testLibSVM(trainDataString.split('\n'),testDataString.split('\n'));
}

async function testLibSVM(trainData,testData){
    const SVM = await require('libsvm-js/asm');
    var svmFat = new SVM({type:SVM.SVM_TYPES.EPSILON_SVR,gamma:0.01,nu:[0.01,0.125,0.5,1],cost:1,epsilon:0.1});
    let dataArray = [];
    let fatValues = [];

    for(let i=0;i<trainData.length;i++){
      let line = trainData[i];
      let trainingRecord = line.split(",").map(data => parseFloat(data));
      if(trainingRecord.length === 15){
        let inputParams = trainingRecord.splice(1);
        dataArray.push(inputParams);
        fatValues.push(trainingRecord[0]);
      }
    }
    testArrays = [];
    testData.forEach(element => {
        if(element.length > 0){
            // console.log(element);
            let dataArrayStr = element.split(",");
            let testDataArray = dataArrayStr.map(data => parseFloat(data));
            testArrays.push(testDataArray.splice(1));
        }
    });
    let values = [];
    svmFat.free();
    svmFat.train(dataArray,fatValues);
    values.push(svmFat.predict(testArrays));
    console.log(JSON.stringify(values));
}
runTest().then(() => console.log('done!'));

testfat.zip

@rguntha
Copy link
Author

rguntha commented Mar 21, 2018

I have tested the same fat files on R also..Below are the commands and the final result..

fatTest<-read.csv("C:\Code\WearableVitals\src/assets/bp/testfat.csv")
fat<-read.csv("C:\Code\WearableVitals\src/assets/bp/trainfat.csv")
fatFitRadialEsp<-svm(Fat~.,data=fat,type="eps-regression",kernel="radial")
predFat<-predict(fatFitRadialEsp,fatTest)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
1.042609 1.042513 1.053037 1.043323 1.031655 1.068895 1.033243 1.064847 1.026723 1.032486 1.036302 1.032156 1.064235 1.036778 1.069251 1.033492 1.034420 1.044378 1.038295

@stropitek
Copy link
Member

I find it suspicious that the nu parameter is an Array. Can you try with a number instead. Unlike other libraries you cannot grid-search hyperparameters in libsvm-js.

@stropitek
Copy link
Member

Actually you are using the epsilon regression so shouldn't matter.

I'll try to have at a look at it soon.

@rguntha
Copy link
Author

rguntha commented Mar 22, 2018 via email

@stropitek
Copy link
Member

Hello
Your epsilon value is too high. I tried with 0.001 and it gives something that looks similar to your R result. Have look at https://mljs.github.io/libsvm/#/SVR to see how the epsilon value affects the regression.

I'm closing this issue. Feel free to reopen if something still seems wrong.

@rguntha
Copy link
Author

rguntha commented Mar 23, 2018 via email

@rguntha
Copy link
Author

rguntha commented Mar 31, 2018

Hello,
Your suggestion worked well for fat data. But it's not working for my bp data. The results given by the libsvm-js and R are not matching at all. I am using the same parameters. The difference is very great.

I would really appreciate your help. I am not sure if there are any other parameters I am not considering.

R - Commands (Linear Kernal with Esplion=0.001)

bp<-read.csv("C:/Code/WearableVitals/WearableVitalsApp/src/assets/bp/trainaims.csv")
bptest<-read.csv("C:/Code/WearableVitals/WearableVitalsApp/src/assets/bp/testaims.csv")
traindata<-bp[1:13]
testdata<-bptest[1:12]
bpFitLinearEsp<-svm(bpsys~.,data=traindata,type="eps-regression",kernel="linear",epsilon=0.001)
predBp<-predict(bpFitLinearEsp,testdata)
predBp
1 2 3 4 5
149.2531 139.0186 118.4807 136.4002 130.3284

libsvm-js (Linear Kernal with Esplion=0.001)

epsilon = 0.001;
            let values = [];
            var svmSys = new SVM({type:SVM.SVM_TYPES.EPSILON_SVR,epsilon:epsilon,kernel:SVM.KERNEL_TYPES.LINEAR});
            var svmDia = new SVM({type:SVM.SVM_TYPES.EPSILON_SVR,epsilon:epsilon,kernel:SVM.KERNEL_TYPES.LINEAR});
            svmSys.free();
            svmSys.train(dataArray,sysValues);
            values.push(svmSys.predict(testArrays).map(x => math.round(x)));
            successes.push(JSON.stringify(values));
    console.log("Successes:"+successes.join("\n"));

Results:
optimization finished, #iter = 10000000
nu = 0.875000
obj = -2410.522562, rho = 200.890164
nSV = 16, nBSV = 10
Successes:[[543,-624,539,-32,-2012]]

R - Commands (Linear Kernal with Esplion=1.5)

bpFitLinearEsp<-svm(bpsys~.,data=traindata,type="eps-regression",kernel="linear",epsilon=1.5)
predBp<-predict(bpFitLinearEsp,testdata)
predBp
1 2 3 4 5
122.1728 122.8121 119.0912 119.9891 125.0957

libsvm-js (Linear Kernal with Esplion=1.5)

Results:
optimization finished, #iter = 10000000
nu = 0.228737
obj = -48.633458, rho = 16.987639
nSV = 9, nBSV = 1
Successes:[[379,648,415,-94,1244]]

@stropitek stropitek reopened this Mar 31, 2018
@stropitek
Copy link
Member

@rguntha
I compared the output of the epsilon-SVR in libsvm-js with the output from the original library (C implementation) and it's exactly the same when the parameters are all explicitely set. I noticed however a bug in how the default value for the gamma parameter was chosen. It is supposed to be 1/num_features but actually was hardcoded to 0.1. I fixed that and released a new version of libsvm-js.

Also note that according to the libsvm website, the R package is based on version 3.17 whereas libsvm-js is based on 3.22, so output may slightly differ.

Hope that will fix your issue. I'm closing again, feel free to reopen if you still have issues.

@rguntha
Copy link
Author

rguntha commented Apr 6, 2018

@stropitek
I have taken your latest version 0.2.0 and retried the above test cases (Linear Kernal with epsilon 0.001 and 1.5), but unfortunately the results are same and very much different from R results as mentioned above.

Please note that libsvm takes a very long time for these computations, may because the very large number of iterations (10 million, as mention in the results in previous comment).

It would be great if you can rerun my test files attached earlier in the thread (trainaims.csv and testaims.csv)

Thanks very much for your continued help.

@stropitek
Copy link
Member

Looking in the R documentation, I read:

Per default, data are scaled internally (both x and y variables) to zero mean and unit variance

Indeed SVM does not work well if the data is not scaled. In libsvm-js, data is not scaled by default, you have to do it yourself.

@rguntha
Copy link
Author

rguntha commented Apr 7, 2018

@stropitek
Thanks for the scaling tip. Now I have implemented the scaling using the formula
scaledX = (x-mean(featureVector))/std-dev(featureVector).

The results are exactly matching with R.

Thanks very much for your help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants