Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HiClimR other error #2

Closed
fipoucat opened this issue Nov 27, 2019 · 8 comments
Closed

HiClimR other error #2

fipoucat opened this issue Nov 27, 2019 · 8 comments

Comments

@fipoucat
Copy link

I am still testing HiClimR and after the matrix creation, I used xGrid2D to create lon and lat column appended to x as follow:

lon <- c(xGrid$lon)
lat <- c(xGrid$lat)
x2<-cbind(lon,lat,x1)
print(x2)
lon lat 1961 1962 1963 1964 1965 1966
[1,] -30.25 -5.25 NA NA NA NA NA NA
[2,] -30.25 -4.75 NA NA NA NA NA NA
[3,] -30.25 -4.25 NA NA NA NA NA NA
[4,] -30.25 -3.75 NA NA NA NA NA NA

It looks the column names should not be there? which way to do it? because when I run an example on simple regionalization following the tutorial I got an error:
Error in x - t(fitted(lm(t(x) ~ as.integer(colnames(x))))) :
non-conformable arrays
In addition: Warning message:
In eval(predvars, data, env) : NAs introduced by coercion

@fipoucat
Copy link
Author

Maybe not issue, but more a data handling to fulfill HiClimR data structure, I wonder if possible to attach a sample file?

@hsbadr
Copy link
Owner

hsbadr commented Dec 10, 2019

The observations (time dimension) should not include any missing values. Since HiClimR does clustering based on correlation distance, all time steps for a specific location/point should be valid. It removes the rows (locations/points) that has any missing values and that could be all rows if one or more years are missing. You need to remove all columns with missing values manually because otherwise the dissimilarity measure (correlation distance) will represent something else. For example, if you are interested in interannual correlations, it is important to keep valid data every year instead of randomly providing information at different frequency.

Solution: Make sure that you have enough rows (>2) with no missing values or handle missing values before passing the data to HiClimR.

@hsbadr hsbadr closed this as completed Dec 10, 2019
@fipoucat
Copy link
Author

fipoucat commented Dec 11, 2019 via email

@fipoucat
Copy link
Author

Sorry Hamada,

I had to update the post because the NAs were added by R where it is zero. I change it but still some problems: file look like this;
1961 1962 1963 1964 1965 1966
[1,] -35.25 -9.75 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
[2,] -35.25 -9.25 151.78334045 135.20834351 129.99166870 161.87500000 111.40833282 121.28334045
[3,] -35.25 -8.75 157.24166870 135.23333740 141.16667175 215.67500305 144.05833435 153.30000305
[4,] -35.25 -8.25 161.11666870 129.23333740 139.60833740 189.05000305 129.08334351 168.54167175
[5,] -35.25 -7.75 152.60833740 103.01667023 107.14167023 183.56666565 124.19166565 160.69166565
[6,] -35.25 -7.25 140.84167480 98.59166718 115.30833435 219.06666565 128.42500305 145.00000000
[7,] -35.25 -6.75 132.85000610 87.50833893 113.50833130 203.16667175 121.45833588 124.03333282
[8,] -35.25 -6.25 115.58333588 77.18333435 93.69166565 183.38333130 121.65833282 110.37500000
[9,] -35.25 -5.75 99.72499847 72.81666565 68.29167175 156.61666870 115.56666565 89.47500610
[10,] -35.25 -5.25 80.71666718 61.26666641 52.80000305 130.15834045 95.60833740 65.12500000
[11,] -35.25 -4.75 0

The command i use end up with an error:

y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE,

  •          continent = "Africa", meanThresh = 10, varThresh = 0, detrend = TRUE,
    
  •          standardize = TRUE, nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL,
    
  •          members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE,
    
  •          validClimR = TRUE, k = 5, minSize = 1, alpha = 0.01,
    
  •          plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE)
    

PROCESSING STARTED

Checking Multivariate Clustering (MVC)...
---> x is a matrix
---> single-variate clustering: 1 variable
Checking data...
---> Checking dimensions...
---> Checking row names...
---> Checking column names...
Data filtering...
---> Computing mean for each row...
---> Checking rows with mean bellow meanThresh...
---> 5697 rows found, mean ≤ 10
---> Computing variance for each row...
---> Checking rows with near-zero-variance...
---> 0 rows found, variance ≤ 0
Data preprocessing...
---> Applying mask...
---> Checking columns with missing values...
---> Removing linear trend...
Error in x - t(fitted(lm(t(x) ~ as.integer(colnames(x))))) :
non-conformable arrays

I extacted a region froma global date is this a problem? because I see you use continent like "Africa". How to it for a region? what you this is still creating the non conformable arrays?

@hsbadr
Copy link
Owner

hsbadr commented Dec 11, 2019

What's the size of your matrix? You set the mean threshold to 10, which masks out 5697 rows (try to use meanThresh = 0). Also, check the column names or try to use coarseR (change the steps as you wish, 1 means keeping the original data):

colnames(x) <- NULL
xc <- coarseR(x = x, lon = lon, lat = lat, lonStep = 1, latStep = 1)
lon <- xc$lon
lat <- xc$lat
x <- xc$x

Finally, disable standardization and detrending: detrend = FALSE, standardize = FALSE.

It seems to me that HiClimR can't find valid rows in the matrix you provided.

@hsbadr hsbadr reopened this Dec 11, 2019
@fipoucat
Copy link
Author

Sorry Hamada,

I had to update the post because the NAs were added by R where it is zero. I change it but still some problems: file look like this;
1961 1962 1963 1964 1965 1966
[1,] -35.25 -9.75 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
[2,] -35.25 -9.25 151.78334045 135.20834351 129.99166870 161.87500000 111.40833282 121.28334045
[3,] -35.25 -8.75 157.24166870 135.23333740 141.16667175 215.67500305 144.05833435 153.30000305
[4,] -35.25 -8.25 161.11666870 129.23333740 139.60833740 189.05000305 129.08334351 168.54167175
[5,] -35.25 -7.75 152.60833740 103.01667023 107.14167023 183.56666565 124.19166565 160.69166565
[6,] -35.25 -7.25 140.84167480 98.59166718 115.30833435 219.06666565 128.42500305 145.00000000
[7,] -35.25 -6.75 132.85000610 87.50833893 113.50833130 203.16667175 121.45833588 124.03333282
[8,] -35.25 -6.25 115.58333588 77.18333435 93.69166565 183.38333130 121.65833282 110.37500000
[9,] -35.25 -5.75 99.72499847 72.81666565 68.29167175 156.61666870 115.56666565 89.47500610
[10,] -35.25 -5.25 80.71666718 61.26666641 52.80000305 130.15834045 95.60833740 65.12500000
[11,] -35.25 -4.75 0

The command i use end up with an error:

y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE,

  •          continent = "Africa", meanThresh = 10, varThresh = 0, detrend = TRUE,
    
  •          standardize = TRUE, nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL,
    
  •          members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE,
    
  •          validClimR = TRUE, k = 5, minSize = 1, alpha = 0.01,
    
  •          plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE)
    

PROCESSING STARTED

Checking Multivariate Clustering (MVC)...
---> x is a matrix
---> single-variate clustering: 1 variable
Checking data...
---> Checking dimensions...
---> Checking row names...
---> Checking column names...
Data filtering...
---> Computing mean for each row...
---> Checking rows with mean bellow meanThresh...
---> 5697 rows found, mean ≤ 10
---> Computing variance for each row...
---> Checking rows with near-zero-variance...
---> 0 rows found, variance ≤ 0
Data preprocessing...
---> Applying mask...
---> Checking columns with missing values...
---> Removing linear trend...
Error in x - t(fitted(lm(t(x) ~ as.integer(colnames(x))))) :
non-conformable arrays

I extacted a region froma global date is this a problem? because I see you use continent like "Africa". How to it for a region? what you this is still creating the non conformable arrays?

@fipoucat
Copy link
Author

Using the setting you gave gone without error and produced a plot. My file have 57 years rainfall data for a window -10 to 25 lat and -30 to -25 lon

y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE,

  •          continent = "Africa", meanThresh = 0, varThresh = 0, detrend = FALSE,
    
  •          standardize = FALSE, nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL, 
    
  •          members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE, 
    
  •          validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01, 
    
  •          plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE)
    

PROCESSING STARTED

Checking Multivariate Clustering (MVC)...
---> x is a matrix
---> single-variate clustering: 1 variable
Checking data...
---> Checking dimensions...
---> Checking row names...
---> Checking column names...
Data filtering...
---> Computing mean for each row...
---> Checking rows with mean bellow meanThresh...
---> 3735 rows found, mean ≤ 0
---> Computing variance for each row...
---> Checking rows with near-zero-variance...
---> 0 rows found, variance ≤ 0
Data preprocessing...
---> Applying mask...
---> Checking columns with missing values...
Agglomerative Hierarchical Clustering...
---> Computing correlation/dissimilarity matrix...
---> Starting clustering process...
---> Constructing dendrogram tree...
Calling cluster validation...
---> Computing cluster means...
---> Computing inter-cluster correlations...
---> Computing intra-cluster correlations...
---> Computing summary statistics...
Generating region map...

PROCESSING COMPLETED

Running Time:
user system elapsed
5.585 0.518 6.109
Time difference of 6.109582 secs
Maybe I need to adjust the settings to have more rows considered

@hsbadr
Copy link
Owner

hsbadr commented Dec 11, 2019

You should be careful when setting thresholds for data processing. For example, meanThresh will mask out the points the receives rainfall less than the threshold value, which could be all of your data depending on the threshold value and data range/unit. Invalid data with near-zero variance (~constant year to year) will be excluded too.

I'm closing this issue now.

@hsbadr hsbadr closed this as completed Dec 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants