HiClimR
HiClimR: Hierarchical Climate Regionalization
Table of Contents
- HiClimR
- Introduction
- Features
- Implementation
- Installation
- Source
- License
- Citation
- History
- Changes
- 2020-02-22: version 2.1.6
- 2019-12-10: version 2.1.5
- 2019-01-20: version 2.1.4
- 2019-01-10: version 2.1.3
- 2019-01-04: version 2.1.2
- 2019-01-02: version 2.1.1
- 2019-01-01: version 2.1.0
- 2018-12-22: version 2.0.0
- 2015-08-05: version 1.2.3
- 2015-07-21: version 1.2.2
- 2015-05-24: version 1.2.1
- 2015-03-27: version 1.2.0
- 2015-03-01: version 1.1.6
- 2014-11-12: version 1.1.5
- 2014-09-01: version 1.1.4
- 2014-08-28: version 1.1.3
- 2014-07-26: version 1.1.2
- 2014-07-14: version 1.1.1
- 2014-05-15: version 1.1.0
- 2014-05-07: version 1.0.9
- 2014-05-06: version 1.0.8
- 2014-03-30: version 1.0.7
- 2014-03-25: version 1.0.6
- 2014-03-18: version 1.0.5
- 2014-03-14: version 1.0.4
- 2014-03-12: version 1.0.3
- 2014-03-09: version 1.0.2
- 2014-03-08: version 1.0.1
- 2014-03-07: version 1.0.0
- Examples
Introduction
HiClimR is a tool for Hierarchical Climate Regionalization applicable to any correlation-based clustering. Climate regionalization is the process of dividing an area into smaller regions that are homogeneous with respect to a specified climatic metric. Several features are added to facilitate the applications of climate regionalization (or spatiotemporal analysis in general) and to implement a cluster validation function with an objective tree cutting to find an optimal number of clusters for a user-specified confidence level. These include options for preprocessing and postprocessing as well as efficient code execution for large datasets and options for splitting big data and computing only the upper-triangular half of the correlation/dissimilarity matrix to overcome memory limitations. Hybrid hierarchical clustering reconstructs the upper part of the tree above a cut to get the best of the available methods. Multivariate clustering (MVC) provides options for filtering all variables before preprocessing, detrending and standardization of each variable, and applying weights for the preprocessed variables.
Features
HiClimR adds several features and a new clustering method (called, regional linkage) to hierarchical clustering in R (hclust function in stats library) including:
- data regridding
- coarsening spatial resolution
- geographic masking
- by continents
- by regions
- by countries
- contiguity-constrained clustering
- data filtering by thresholds
- mean threshold
- variance threshold
- data preprocessing
- detrending
- standardization
- PCA
- faster correlation function
- splitting big data matrix
- computing upper-triangular matrix
- using optimized
BLASlibrary on 64-Bit machinesATLASOpenBLASIntel MKL
- different clustering methods
regionallinkage or minimum inter-regional correlationward's minimum variance or error sum of squares methodsinglelinkage or nearest neighbor methodcompletelinkage or diameteraveragelinkage, group average, or UPGMA methodmcquitty's or WPGMA methodmedian, Gower's or WPGMC methodcentroidor UPGMC method
- hybrid hierarchical clustering
- the upper part of the tree is reconstructed above a cut
- the lower part of the tree uses user-selected method
- the upper part of the tree uses
regionallinkage method
- multivariate clustering (MVC)
- filtering all variables before preprocessing
- detrending and standardization of each variable
- applying weight for the preprocessed variables
- cluster validation
- summary statistics based on raw data or the data reconstructed by PCA
- objective tree cut using minimum significant correlation between region means
- visualization of regionalization results
- exporting region map and mean timeseries into NetCDF-4
The regional linkage method is explained in the context of a spatiotemporal problem, in which N spatial elements (e.g., weather stations) are divided into k regions, given that each element has a time series of length M. It is based on inter-regional correlation distance between the temporal means of different regions (or elements at the first merging step). It modifies the update formulae of average linkage method by incorporating the standard deviation of the merged region timeseries, which is a function of the correlation between the individual regions, and their standard deviations before merging. It is equal to the average of their standard deviations if and only if the correlation between the two merged regions is 100%. In this special case, the regional linkage method is reduced to the classic average linkage clustering method.
Implementation
Badr et al. (2015) describes the regionalization algorithms, features, and data processing tools included in the package and presents a demonstration application in which the package is used to regionalize Africa on the basis of interannual precipitation variability. The figure below shows a detailed flowchart for the package. Cyan blocks represent helper functions, green is input data or parameters, yellow indicates agglomeration Fortran code, and purple shows graphics options. For multivariate clustering (MVC), the input data is a list of matrices (one matrix for each variable with the same number of rows to be clustered; the number of columns may vary per variable). The blue dashed boxes involve a loop for all variables to apply mean and/or variance thresholds, detrending, and/or standardization per variable before weighing the preprocessed variables and binding them by columns in one matrix for clustering. x is the input N x M data matrix, xc is the coarsened N0 x M data matrix where N0 ≤ N (N0 = N only if lonStep = 1 and latStep = 1), xm is the masked and filtered N1 x M1 data matrix where N1 ≤ N0 (N1 = N0 only if the number of masked stations/points is zero) and M1 ≤ M (M1 = M only if no columns are removed due to missing values), and x1 is the reconstructed N1 x M1 data matrix if PCA is performed.
HiClimR is applicable to any correlation-based clustering.
Installation
There are many ways to install an R package from precompiled binaries or source code. For more details, you may search for how to install an R package, but here are the most convenient ways to install HiClimR:
From CRAN
This is the easiest way to install an R package on Windows, Mac, or Linux. You just fire up an R shell and type:
install.packages("HiClimR")In theory the package should just install, however, you may be asked to select your local mirror (i.e. which server should you use to download the package). If you are using R-GUI or R-Studio, you can find a menu for package installation where you can just search for HiClimR and install it.
From GitHub
This is intended for developers and requires a development environment (compilers, libraries, ... etc) to install the latest development release of HiClimR. On Linux and Mac, you can download the source code and use R CMD INSTALL to install it. In a convenient way, you may use devtools as follows:
- Install the release version of
devtoolsfrom CRAN:
install.packages("devtools")-
Make sure you have a working development environment:
- Windows: Install
Rtools. - Mac: Install Xcode from the Mac App Store.
- Linux: Install a compiler and various development libraries (details vary across different flavors of Linux).
- Windows: Install
-
Install
HiClimRfrom GitHub source:
devtools::install_github("hsbadr/HiClimR")Source
The source code repository can be found on GitHub at hsbadr/HiClimR.
License
HiClimR is licensed under GPL-2 | GPL-3. The code is modified by Hamada S. Badr from src/library/stats/R/hclust.R part of R package Copyright © 1995-2020 The R Core Team.
-
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
-
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
A copy of the GNU General Public License is available at https://www.r-project.org/Licenses.
Copyright © 2013-2020 Earth and Planetary Sciences (EPS), Johns Hopkins University (JHU).
Citation
To cite HiClimR in publications, please use:
citation("HiClimR")Hamada S. Badr, Zaitchik, B. F. and Dezfuli, A. K. (2015): A Tool for Hierarchical Climate Regionalization, Earth Science Informatics, 8(4), 949-958, https://doi.org/10.1007/s12145-015-0221-7.
Hamada S. Badr, Zaitchik, B. F. and Dezfuli, A. K. (2014): HiClimR: Hierarchical Climate Regionalization, Comprehensive R Archive Network (CRAN), https://cran.r-project.org/package=HiClimR.
History
| Version | Date | Comment | Author | |
|---|---|---|---|---|
| May 1992 | Original | F. Murtagh | ||
| Dec 1996 | Modified | Ross Ihaka | ||
| Apr 1998 | Modified | F. Leisch | ||
| Jun 2000 | Modified | F. Leisch | ||
| 1.0.0 | 03/07/14 | HiClimR | Hamada S. Badr | badr@jhu.edu |
| 1.0.1 | 03/08/14 | Updated | Hamada S. Badr | badr@jhu.edu |
| 1.0.2 | 03/09/14 | Updated | Hamada S. Badr | badr@jhu.edu |
| 1.0.3 | 03/12/14 | Updated | Hamada S. Badr | badr@jhu.edu |
| 1.0.4 | 03/14/14 | Updated | Hamada S. Badr | badr@jhu.edu |
| 1.0.5 | 03/18/14 | Updated | Hamada S. Badr | badr@jhu.edu |
| 1.0.6 | 03/25/14 | Updated | Hamada S. Badr | badr@jhu.edu |
| 1.0.7 | 03/30/14 | Hybrid | Hamada S. Badr | badr@jhu.edu |
| 1.0.8 | 05/06/14 | Updated | Hamada S. Badr | badr@jhu.edu |
| 1.0.9 | 05/07/14 | CRAN | Hamada S. Badr | badr@jhu.edu |
| 1.1.0 | 05/15/14 | Updated | Hamada S. Badr | badr@jhu.edu |
| 1.1.1 | 07/14/14 | Updated | Hamada S. Badr | badr@jhu.edu |
| 1.1.2 | 07/26/14 | Updated | Hamada S. Badr | badr@jhu.edu |
| 1.1.3 | 08/28/14 | Updated | Hamada S. Badr | badr@jhu.edu |
| 1.1.4 | 09/01/14 | Updated | Hamada S. Badr | badr@jhu.edu |
| 1.1.5 | 11/12/14 | Updated | Hamada S. Badr | badr@jhu.edu |
| 1.1.6 | 03/01/15 | GitHub | Hamada S. Badr | badr@jhu.edu |
| 1.2.0 | 03/27/15 | MVC | Hamada S. Badr | badr@jhu.edu |
| 1.2.1 | 05/24/15 | Updated | Hamada S. Badr | badr@jhu.edu |
| 1.2.2 | 07/21/15 | Updated | Hamada S. Badr | badr@jhu.edu |
| 1.2.3 | 08/05/15 | Updated | Hamada S. Badr | badr@jhu.edu |
| 2.0.0 | 12/22/18 | NOTE | Hamada S. Badr | badr@jhu.edu |
| 2.1.0 | 01/01/19 | NetCDF | Hamada S. Badr | badr@jhu.edu |
| 2.1.1 | 01/02/19 | Updated | Hamada S. Badr | badr@jhu.edu |
| 2.1.2 | 01/04/19 | Updated | Hamada S. Badr | badr@jhu.edu |
| 2.1.3 | 01/10/19 | Updated | Hamada S. Badr | badr@jhu.edu |
| 2.1.4 | 01/20/19 | Updated | Hamada S. Badr | badr@jhu.edu |
| 2.1.5 | 12/10/19 | inherits | Hamada S. Badr | badr@jhu.edu |
| 2.1.6 | 02/22/20 | Updated | Hamada S. Badr | badr@jhu.edu |
Changes
2020-02-22: version 2.1.6
- README: Added CRAN downloads badge
- R: Fix non-informative failure for unsupported input of a vector
2019-12-10: version 2.1.5
- R: Use
inherits()to check class inheritance
2019-01-20: version 2.1.4
- Added vignette for HiClimR Bug Reporting
HiClimR2nc: Updated documentation and examples- man: Use
\code{}instead of\bold{}for classes
2019-01-10: version 2.1.3
- Fixed spelling errors and allowed custom words
HiClimR2nc: Fixed timeseries variable definitionREADME: LinkHiClimRtoCRANpackage page
2019-01-04: version 2.1.2
- Fixed example ERROR in CRAN checks
- Added example to export NetCDF-4 file
- Updated dependencies and suggested packages
2019-01-02: version 2.1.1
fastCor: Fixed row/col names of the correlation matrixfastCor: Cleaned up zero-variance data check- Examples: Minor comment update
2019-01-01: version 2.1.0
- Supported contiguity constraint based on geographic distance
- Exporting region map and mean timeseries into NetCDF-4 file
- Replaced
multi-variatewithmultivariate - Renamed
weightedVartoweightMVC - Updated citation information
- Updated and cleaned up package
DESCRIPTION - Updated and cleaned up
README
2018-12-22: version 2.0.0
- Fixed NOTE: Registering native routines
fastCor: Removed zero-variance datafastCor: IntroducedoptBLASfastCor: Code cleanup- Reformatted R source code
- Updated and fixed the examples
- Updated CRU TS dataset citation
- Updated
READMEand all URLs
2015-08-05: version 1.2.3
- Fixed
geogMaskconfusing country codes/names - Fixed
geogMaskfilteringInDisputeareas - Corrected data construction in the user manual
xshould be created usingas.vector(t(x0))x0is then by moriginal data matrixn = length(unique(lon))andm = length(unique(lat))
coarseRnow returns the original row numbers- Minor
READMEcorrections and updates
2015-07-21: version 1.2.2
- Changes for
Undefined global functions - Checking geographic masking output
- Minor
READMEcorrections and updates
2015-05-24: version 1.2.1
- Updating variance for multivariate clustering
- More plotting options (
pchandcex) geogMasksupports ungridded data- Updated user manual with the following notes:
- longitudes takes values from
-180to180(not0to360) - for gridded data, the rows of input data matrix for each variable is ordered by longitudes
- check
rownames(TestCase$x)for example!- each row represents a station (grid point)
- row name is in the form of
longitude,latitude
- check
- longitudes takes values from
- Minor
verbosefixes and updates - Minor
READMEcorrections and updates - Citation updated: technical paper has been published
2015-03-27: version 1.2.0
- Multivariate clustering (MVC)
- the input matrix
xcan now be a list of matrices (one matrix for each variable)length(x) = nvarswherenvarsis the number of variables- number of rows
N= number of objects (e.g., stations) to be clustered - number of columns
Mmay vary for each variables- e.g., different temporal periods or record lengths
- Each variable is separately preprocessed to allow for all possible options
- preprocessing is specified by lists with length of
nvars(number of variables)length(meanThresh) = length(x) = nvarslength(varThresh) = length(x) = nvarslength(detrend) = length(x) = nvarslength(standardize) = length(x) = nvarslength(weightMVC) = length(x) = nvars
- filtering with
meanThreshandvarThreshthresholds - detrending with
detrendoption, if requested - standardization with
standardizeoption, if requested- strongly recommended since variables may have different magnitudes
- weighting by the new
weightMVCoption (default is1) - combining variables by column (for each object: spatial points or stations)
- applying PCA (if requested) and computing the correlation/dissimilarity matrix
- preprocessing is specified by lists with length of
- the input matrix
- Preliminary big data support
- function
fastCorcan now split the data matrix intonSplitsplits - adds a logical parameter
upperTritofastCorfunction- computes only the upper-triangular half of the correlation/dissimilarity matrix
- it includes all required information since the correlation/dissimilarity matrix is symmetric
- this almost halves memory use, which can be very important for big data.
- fixes "integer overflow" for very large number of objects to be clustered
- function
- Adds a logical parameter
verbosefor printing processing information - Adds a logical parameter
dendrogramfor plotting dendrogram - Uses
\dontrun{}to skip time-consuming examples- for more examples: https://github.com/hsbadr/HiClimR#examples
- Backward compatibility with previous versions
- The user manual is updated and revised
2015-03-01: version 1.1.6
- Setting minimum
k = 2, for objective tree cutting- this addresses an issue caused by undefined
k = NULLinvalidClimRfunction - when all inter-cluster correlations are significant at the user-specified significance level
- this addresses an issue caused by undefined
- Code reformatting using
formatR - Package description and URLs have been revised
- Source code is now maintained on GitHub by authors
2014-11-12: version 1.1.5
- Updating description, URL, and citation info
2014-09-01: version 1.1.4
- Addresses an issue for zero-length mask vector:
Error in -mask : invalid argument to unary operator- this error was introduced in v1.1.2+ after fixing the data-mean bug
2014-08-28: version 1.1.3
- The user manual is revised
lonSkipandlatSkiprenamed tolonStepandlatStep, respectively- Minor bug fixes
2014-07-26: version 1.1.2
- A bug has been fixed where data mean is added to centered data if
standardize = FALSE- objective tree cut and
datacomponent are now corrected- to match input parameters especially when clustering of raw data
- centered data was used in previous versions
- objective tree cut and
2014-07-14: version 1.1.1
- Minor bug fixes and memory optimizations especially for the geographic masking function
geogMask - The limit for data size has been removed (use with caution)
- A logical parameter
InDisputeis added togeogMaskfunction to optionally consider areas in dispute for geographic masking by country
2014-05-15: version 1.1.0
- Code cleanup and bug fixes
- An issue with
fastCorfunction that degrades its performance on 32-bit machines has been fixed- A significant performance improvement can only be achieved when building R on 64-bit machines with an optimized
BLASlibrary, such asATLAS,OpenBLAS, or the commercialIntel MKL
- A significant performance improvement can only be achieved when building R on 64-bit machines with an optimized
- The citation info has been updated to reflect the current status of the technical paper
2014-05-07: version 1.0.9
- Minor changes and fixes for CRAN
- For memory considerations,
- smaller test case with 1 degree resolution instead of 0.5 degree
- the resolution option (
resparameter) in geographic masking is removed - Mask data is only available in 0.1 degree (~10 km) resolution
LazyLoadandLazyDataare enabled in the description file- The
worldMaskandTestCasedata are converted to lists to avoid conflicts of variable names (lon,lat,info, andmask) with lazy loading
2014-05-06: version 1.0.8
- Code cleanup and bug fixes
- Region maps are unified for both gridded and ungridded data
2014-03-30: version 1.0.7
- Hybrid hierarchical clustering feature that utilizes the pros of the available methods
- especially the better overall homogeneity in Ward's method and the separation and objective tree cut of the regional linkage method.
- The logical parameter
hybridis added to enable a second clustering step- using
regionallinkage for reconstructing the upper part of the tree at a cut - defined by
kH(number of clusters to restart with using theregionallinkage method) - If
kH = NULL, the tree will be reconstructed for the upper part with the first merging cost larger than the mean merging cost for the entire tree- merging cost is the loss of overall homogeneity at each merging step
- using
- If hybrid clustering is requested, the updated upper-part of the tree will be used for cluster validation.
2014-03-25: version 1.0.6
- Code cleanup and bug fixes
2014-03-18: version 1.0.5
- Code cleanup and bug fixes
- Adds support to generate region maps for ungridded data
2014-03-14: version 1.0.4
- Code cleanup and bug fixes
- The
coarseRfunction is called inside the coreHiClimRfunction - Adds
coordscomponent to the output tree for the longitude and latitude coordinates- they may be changed by coarsening
validClimRfunction does not requirelonandlatarguments- they are now available in the output tree (
coordscomponent)
- they are now available in the output tree (
2014-03-12: version 1.0.3
- Code cleanup and bug fixes
- One main/wrapper function
HiClimRinternally calls all other functions - Unified component names for all functions
- Objective tree cut is supported only for the
regionallinkage method- Otherwise, the number of clusters
kshould be specified
- Otherwise, the number of clusters
- The new clustering method has been renamed from
HiClimRtoregionallinkage method
2014-03-09: version 1.0.2
- Code cleanup and bug fixes.
- adds a new feature that to return the preprocessed data used for clustering, by a logical argument
retData.- the data will be returned in a component
dataof the output tree - this can be used to utilize
HiCLimRpreprocessing options for further analysis
- the data will be returned in a component
- Ordered regions vector for the selected number of clusters are now returned in the
regioncomponent- length equals the number of spatial elements
N
- length equals the number of spatial elements
2014-03-08: version 1.0.1
- Code cleanup and bug fixes
- Adds a new feature in
validCLimRthat enables users to exclude very small clusters from validation indicesinterCor,intraCor,diffCor, andstatSum, by setting a value for the minimum cluster size (minSize > 1)- the excluded clusters can be identified from the output of
validClimRinclustFlagcomponent, which takes a value of1for valid clusters or0for excluded clusters - in
HiClimR(currently,regionallinkage) method, noisy spatial elements (or stations) are isolated in very small-size clusters or individuals since they do not correlate well with any other elements - this should be followed by a quality control step
- the excluded clusters can be identified from the output of
- Adds
coarseRfunction for coarsening spatial resolution of the input matrixx
2014-03-07: version 1.0.0
- Initial version of
HiClimRpackage that modifieshclustfunction instatslibrary - Adds a new clustering method to the set of available methods
- The new method is explained in the context of a spatiotemporal problem, in which
Nspatial elements (e.g., stations) are divided intokregions, given that each element has observations (or timeseries) of lengthM- minimizes the inter-regional correlation between region means
- modifies
averageupdate formulae by incorporating the standard deviation of the mean of the merged region - a function of the correlation between the individual regions, and their standard deviations before merging
- equals the average of their standard deviations if and only if the correlation between the two merged regions is
100%. - in this special case, the new method is reduced to the classic
averagelinkage clustering method
- Several features are included to facilitate spatiotemporal analysis applications:
- options for preprocessing and postprocessing
- efficient code execution for large datasets.
- cluster validation function
validClimR - implements an objective tree cut to find an optimal number of clusters
- Applicable to any correlation-based clustering
Examples
Single-Variate Clustering
library(HiClimR)
#----------------------------------------------------------------------------------#
# Typical use of HiClimR for single-variate clustering: #
#----------------------------------------------------------------------------------#
## Load the test data included/loaded in the package (1 degree resolution)
x <- TestCase$x
lon <- TestCase$lon
lat <- TestCase$lat
## Generate/check longitude and latitude mesh vectors for gridded data
xGrid <- grid2D(lon = unique(TestCase$lon), lat = unique(TestCase$lat))
lon <- c(xGrid$lon)
lat <- c(xGrid$lat)
## Single-Variate Hierarchical Climate Regionalization
y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE,
continent = "Africa", meanThresh = 10, varThresh = 0, detrend = TRUE,
standardize = TRUE, nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL,
members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE,
validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01,
plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE)
#----------------------------------------------------------------------------------#
# Additional Examples: #
#----------------------------------------------------------------------------------#
## Use Ward's method
y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE,
continent = "Africa", meanThresh = 10, varThresh = 0, detrend = TRUE,
standardize = TRUE, nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL,
members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE,
validClimR = TRUE, k = 5, minSize = 1, alpha = 0.01,
plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE)
## Use data splitting for big data
y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE,
continent = "Africa", meanThresh = 10, varThresh = 0, detrend = TRUE,
standardize = TRUE, nPC = NULL, method = "ward", hybrid = TRUE, kH = NULL,
members = NULL, nSplit = 10, upperTri = TRUE, verbose = TRUE,
validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01,
plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE)
## Use hybrid Ward-Regional method
y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE,
continent = "Africa", meanThresh = 10, varThresh = 0, detrend = TRUE,
standardize = TRUE, nPC = NULL, method = "ward", hybrid = TRUE, kH = NULL,
members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE,
validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01,
plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE)
## Check senitivity to kH for the hybrid method aboveMultivariate Clustering
require(HiClimR)
#----------------------------------------------------------------------------------#
# Typical use of HiClimR for multivariate clustering: #
#----------------------------------------------------------------------------------#
## Load the test data included/loaded in the package (1 degree resolution)
x1 <- TestCase$x
lon <- TestCase$lon
lat <- TestCase$lat
## Generate/check longitude and latitude mesh vectors for gridded data
xGrid <- grid2D(lon = unique(TestCase$lon), lat = unique(TestCase$lat))
lon <- c(xGrid$lon)
lat <- c(xGrid$lat)
## Test if we can replicate single-variate region map with repeated variable
y <- HiClimR(x=list(x1, x1), lon = lon, lat = lat, lonStep = 1, latStep = 1,
geogMask = FALSE, continent = "Africa", meanThresh = list(10, 10),
varThresh = list(0, 0), detrend = list(TRUE, TRUE), standardize = list(TRUE, TRUE),
nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL,
members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE,
validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01,
plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE)
## Generate a random matrix with the same number of rows
x2 <- matrix(rnorm(nrow(x1) * 100, mean=0, sd=1), nrow(x1), 100)
## Multivariate Hierarchical Climate Regionalization
y <- HiClimR(x=list(x1, x2), lon = lon, lat = lat, lonStep = 1, latStep = 1,
geogMask = FALSE, continent = "Africa", meanThresh = list(10, NULL),
varThresh = list(0, 0), detrend = list(TRUE, FALSE), standardize = list(TRUE, TRUE),
weightMVC = list(1, 1), nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL,
members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE,
validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01,
plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE)
## You can apply all clustering methods and optionsMiscellaneous Examples
require(HiClimR)
#----------------------------------------------------------------------------------#
# Miscellaneous examples to provide more information about functionality and usage #
# of the helper functions that can be used separately or for other applications. # #
#----------------------------------------------------------------------------------#
## Load test case data
x <- TestCase$x
## Generate longitude and latitude mesh vectors
xGrid <- grid2D(lon = unique(TestCase$lon), lat = unique(TestCase$lat))
lon <- c(xGrid$lon)
lat <- c(xGrid$lat)
## Coarsening spatial resolution
xc <- coarseR(x = x, lon = lon, lat = lat, lonStep = 2, latStep = 2)
lon <- xc$lon
lat <- xc$lat
x <- xc$x
## Use fastCor function to compute the correlation matrix
t0 <- proc.time(); xcor <- fastCor(t(x)); proc.time() - t0
## compare with cor function
t0 <- proc.time(); xcor0 <- cor(t(x)); proc.time() - t0
## Check the valid options for geographic masking
geogMask()
## geographic mask for Africa
gMask <- geogMask(continent = "Africa", lon = lon, lat = lat, plot = TRUE,
colPalette = NULL)
## Hierarchical Climate Regionalization Without geographic masking
y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = FALSE,
continent = "Africa", meanThresh = 10, varThresh = 0, detrend = TRUE,
standardize = TRUE, nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL,
members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE,
validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01,
plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE)
## With geographic masking (you may specify the mask produced above to save time)
y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = TRUE,
continent = "Africa", meanThresh = 10, varThresh = 0, detrend = TRUE,
standardize = TRUE, nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL,
members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE,
validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01,
plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE)
## With geographic masking and contiguity constraint
## Change contigConst as appropriate
y <- HiClimR(x, lon = lon, lat = lat, lonStep = 1, latStep = 1, geogMask = TRUE,
continent = "Africa", contigConst = 1, meanThresh = 10, varThresh = 0, detrend = TRUE,
standardize = TRUE, nPC = NULL, method = "ward", hybrid = FALSE, kH = NULL,
members = NULL, nSplit = 1, upperTri = TRUE, verbose = TRUE,
validClimR = TRUE, k = 12, minSize = 1, alpha = 0.01,
plot = TRUE, colPalette = NULL, hang = -1, labels = FALSE)
## Find minimum significant correlation at 95% confidence level
rMin <- minSigCor(n = nrow(x), alpha = 0.05, r = seq(0, 1, by = 1e-06))
## Validtion of Hierarchical Climate Regionalization
z <- validClimR(y, k = 12, minSize = 1, alpha = 0.01, plot = TRUE, colPalette = NULL)
## Apply minimum cluster size (minSize = 25)
z <- validClimR(y, k = 12, minSize = 25, alpha = 0.01, plot = TRUE, colPalette = NULL)
## The optimal number of clusters, including small clusters
k <- length(z$clustFlag)
## The selected number of clusters, after excluding small clusters (if minSize > 1)
ks <- sum(z$clustFlag)
## Dendrogram plot
plot(y, hang = -1, labels = FALSE)
## Tree cut
cutTree <- cutree(y, k = k)
table(cutTree)
## Visualization for gridded data
RegionsMap <- matrix(y$region, nrow = length(unique(y$coords[, 1])), byrow = TRUE)
colPalette <- colorRampPalette(c("#00007F", "blue", "#007FFF", "cyan",
"#7FFF7F", "yellow", "#FF7F00", "red", "#7F0000"))
image(unique(y$coords[, 1]), unique(y$coords[, 2]), RegionsMap, col = colPalette(ks))
## Visualization for gridded or ungridded data
plot(y$coords[, 1], y$coords[, 2], col = colPalette(max(y$region, na.rm = TRUE))[y$region], pch = 15, cex = 1)
## Change pch and cex as appropriate!
## Export region map and mean timeseries into NetCDF-4 file
library(ncdf4)
y.nc <- HiClimR2nc(y=y, ncfile="HiClimR.nc", timeunit="years", dataunit="mm")
## The NetCDF-4 file is still open to add other variables or close it
nc_close(y.nc)
