Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCA: Error: [princomp] the covariance matrix has missing values #1441

Closed
Rapsodia86 opened this issue Feb 28, 2024 · 6 comments
Closed

PCA: Error: [princomp] the covariance matrix has missing values #1441

Rapsodia86 opened this issue Feb 28, 2024 · 6 comments

Comments

@Rapsodia86
Copy link

Hi @rhijmans,
I first added this as a comment to #1361, but did some additional checkups and I thought I had found a culprit in my dataset (although in some layers there are only a few pixels without NA, but when I tried to plot them, an error of empty raster showed up).
So, I created a new dataset, and checked it before PCA, but the Error: [princomp] the covariance matrix has missing values showed up again.

Problem:
I do have a *.vrt of dimensions: 1566, 1962, 90 (nrow, ncol, nlyr). There are NAs in each layer. I did a check using freq(is.na()) and there is no layer that consists only of NAs. (see attached results). Also, I plotted each layer to be sure there was no empty raster error.
Using
pca <- princomp(vrt_r)
I get
Error: [princomp] the covariance matrix has missing values

Would it be due to the fact some cells are always NA (across all layers)?

NA_check_vrt.txt

I created a *.tif from *.vrt, uploaded it to my work onedrive, and added your gmail to access (cannot have open access):
https://michiganstate-my.sharepoint.com/:i:/g/personal/monikat_msu_edu/EY8K8bl37NNBjEa-W3X1DA8BLWfb8IlRONMTj3F-6FXbNw?email=r.hijmans%40gmail.com&e=W2qDyZ

Thanks!

@rhijmans
Copy link
Member

rhijmans commented Mar 10, 2024

This can happen when any combination of two layers does not have cells with values in both layers. For example:

x <- rast(ncol=10, nrow=10)
r1 <- setValues(x, 1:100)
r2 <- setValues(x, c(1:50, rep(NA, 50)))
r3 <- setValues(x, c(rep(NA, 50), 1:50))
r <- c(r1, r2, r3)
names(r) <- c("a", "b", "c")

layerCor(r, "cov", na.rm=TRUE)
#$covariance
#         a     b     c
#a 841.6667 212.5 212.5
#b 212.5000 212.5   NaN
#c 212.5000   NaN 212.5
#
#$mean
#     a    b    c
#a 50.5 25.5 75.5
#b 25.5 25.5  NaN
#c 25.5  NaN 25.5
#
#$n
#     [,1] [,2] [,3]
#[1,]  100   50   50
#[2,]   50   50    0
#[3,]   50    0   50

Is that the case with your data?

@Rapsodia86
Copy link
Author

Do you mean that all cells are NA within one layer or there are cells that are NA across all the layers?

@rhijmans
Copy link
Member

Neither. See my example. All pairs of layers must have cells that are not NA in both layers.

@Rapsodia86
Copy link
Author

Rapsodia86 commented Mar 10, 2024

Yes, this is my case. I did run layerCor() on just the first 10 layers (of 90), and there are NaN in covariance & mean like in your example.
Here is $n:

$n
      [,1]   [,2] [,3] [,4] [,5]  [,6]    [,7]   [,8]   [,9]   [,10]
 [1,]   37      0    0    0    0    34      37      0      0      37
 [2,]    0 985949   65   33 1803 11840  963395  73850  36457  928551
 [3,]    0     65  518    0    0    88     360      0      0     422
 [4,]    0     33    0   41    0     0      41     38      0      41
 [5,]    0   1803    0    0 6843   279    6693   1304    420    6175
 [6,]   34  11840   88    0  279 73739   73046   3336    354   73277
 [7,]   37 963395  360   41 6693 73046 2835618 324832  99174 2663468
 [8,]    0  73850    0   38 1304  3336  324832 327853  23597  320619
 [9,]    0  36457    0    0  420   354   99174  23597 100516  100342
[10,]   37 928551  422   41 6175 73277 2663468 320619 100342 2706786

@rhijmans
Copy link
Member

Then you need to remove some of these variables. Layers 1, 3, and 4 do not seem useful as they hardly have any areas with data that overlap with areas with data in the other layers. But the variation in n for other layer pairs is also very high. It is one thing to have a few missing values, but your data does not seem to be matched very well at all.

@Rapsodia86
Copy link
Author

Yes indeed, cleaning of those layers is needed!
Thank you Robert!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants