Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashes when a feature variance is 0 #10

Closed
zyxue opened this issue Apr 15, 2017 · 5 comments
Closed

Crashes when a feature variance is 0 #10

zyxue opened this issue Apr 15, 2017 · 5 comments
Assignees

Comments

@zyxue
Copy link

zyxue commented Apr 15, 2017

It would be more desirable if it can skip features with zero variance instead of crashing.

I have written some code to do the check myself, but it feels very inefficient. Suggestion on improvement is welcome

        let mat = new Matrix(input).transpose();
        let mat2 = [];
        mat.forEach((vec, idx) => {
            let mean = this.mean(vec);
            let variance = this.variance(vec, mean);
            if (variance > 1e-7) {
                let svec = this.standardize(vec, mean, variance);
                mat2.push(svec);
            } else {
                // consider 0 variance
            }
        })
        mat2 = new Matrix(mat2);
        mat2 = mat2.transpose();

        // scaled myself to avoid 0-division (caused by 0-variance) problem;
        let pca = new Stat.PCA(mat2, {mean: false, scale: false});
@loretoparisi
Copy link

any news?

@targos
Copy link
Member

targos commented Apr 18, 2019

I'm working on the project again. I'll have a look at this.

@targos
Copy link
Member

targos commented Apr 25, 2019

What should we do when the standard deviation is 0 ? Subtract the mean without scaling? I'm going to check what R does.

@targos
Copy link
Member

targos commented Apr 25, 2019

R also throws an error if the dataset cannot be scaled because of a constant column. Is there precedent in other libraries / languages? What do they do in this case?

@zyxue
Copy link
Author

zyxue commented Apr 26, 2019

I think if there are multiple features, the one with no variance should just be ignored as it doesn't contribute anything to PCA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants