This function allows to calculate the sequence diffeence (S**D) view metrics. For a point i the formula is : $$ SD_k(i) = \frac{1}{2} \sum_{j \in V^l_k(i)}[k-\rho^l_i(j)].|\rho^l_i(j)-\rho^h_i(j)|+ \frac{1}{2} \sum_{j \in V^h_k(i)}[k-\rho^h_i(j)].|\rho^l_i(j)-\rho^h_i(j)|, \label{EqSD} $$ where Vkd(i) is the k−neighborhood of i in the dimension d, and ρid(j) is the rank of j in the k−neighborhood of i
Seq_calcul <- function(l_data, dataRef, listK)
- l_data : list of data frame whose structure is :
Sample_ID | x | y | … |
---|---|---|---|
ID1 | x_coord | y_coords | … |
These data frames contain samples’ coordinates which could be defined in ℝ𝕟. **warning : ** It must be a list of dates frames and not a list of data tables.
-
dataRef : reference data frame whose structure is defined above.
**warning : ** It must be a list of dates frames and not a list of data tables. -
listK : list k levels
- A inner join on samples’ ID is effected if they differs between the different data frames.
- Calculations use a parallel computing according the levels k.
A list of containing l elements is returned (where l corresponds to
the number of data frames containing in l_data
). Each element contains
n SQ values, where n is the number of common samles’ ID between the
reference data frame and the data frames in l_data
.
This function allows to calculate the S**D values for several data
frames and for differents k levels. Distributions of means S**D
values by levels k, i.e
Seq_main <- function(l_data, dataRef, listK, colnames_res_df = NULL , filename = NULL , graphics = FALSE, stats = FALSE)
- l_data : list of data frames whose the respective structure must be :
Sample_ID | x | y | … |
---|---|---|---|
ID1 | x_coord | y_coords | … |
These data frames contain samples’ coordinates which could be defined in ℝ𝕟.
-
dataref : data frame of reference whose strucuture is the same as define above.
-
listK : list k levels.
-
colnames_res_df : This optional argument allows to specify the colnames of the returned data frame and also the plot’s legend if it was computed. If this argument is unsecified then the default values will be set to : V1, V2, ..., V**n (where n is the length of
l_data
). -
filename : This optional arguement allows to defined the filename on which results will be written. If this argument is unspecified then results will be returned and not written. If users choose a filename that ever exits in the current directory a incrementation in the filename will be done.
-
graphics : This boolean argument allows to computes plot. This plot will represent means of S**D values for the different k levels and for the different data frames in
l_data
. -
stats : This option allows to run statistic tests, it is available only if the number of defines method is higher at least equal to two, (i.e
l_data
’s length is ≥ 2). If only two data frames were given as input via thel_data
then a test will be computed to compare the distribution of the the means by k levels of absolute differences between S**D values. If more than two methods were defined then paired tests are done. If more than 30$\overline{SD}_k$ values have been computed Student tests are done, otherwise Wilcoxon tests are preferred.
- A inner join on samples’ ID is effected if those differs between the different data frames.
According options activated the return list contains the following elements :
-
Seq_df : data frame containing a column with the samples’ Id, a column correspoding to the levels k, and n colunms corresponding to the S**D values. This data frame could be written in a file if
filename
is defined. -
Seq_mean_by_k : data frame containing the
$\overline{SD}_k$ , for each data frame contained inl_data
. -
paired_test : Student or Wilcoxon paired test’s results,
-
pairwise_tests : Matrix of Student or Wilcoxon pairwise tests’ p.value.
-
graphics : GGplot of the
$\overline{SD}_k$ in function of the levels k if the graphic option was activated.
This function displays the graphic of means of sequence difference values by k values.
Seq_graph_by_k <-function (data_Seq, Names=NULL, data_diff_mean_K = NULL, log=False)
- data_Seq : data frame of sequence difference values structured such as :
Sample_ID | K | CP1 |
---|---|---|
ID1 | 1Kst_level | CP1_id1 |
-
Names : optional argument allowing to precise legend labels. If this argument is unprecised lengend labels are equal to
data_Seq
’s colnames. -
data_diff_mean_K : optional data frame contining means of SQ values by K level. If this argument is precised then means are not calculated.
-
log:Boolean optional argument, if it set to true then a logarithmic scale will be used.
A ggplot object is returned.
Seq_main
Then this function test the random hypothesis i.e.: Does S**D values
calculated on real data set are equivalent to those expected on random
data ? In order to do this n simulations are realized. According these
simulations the
seq_permutation_test <- function(data, data_ref, list_K, n=30, graph = TRUE)
- data : data frame defined such as :
Sample_ID | x | y | … |
---|---|---|---|
ID1 | x_coord | y_coords | … |
-
data_ref : reference data frame whose structure is equivalent to the one defined above.
-
listK : list k levels.
-
n : number of simulations.
-
graph : optional boolean argument, if this argument is TRUE, simulations resulting graphic is computed.
This function returns the Wilcoxon test’s results.
According the n
the simulation could be long.
This function allows display the sample’s mean
SD_map_f <- function(SD_df, Coords_df, legend_pos = "right")
###
Arguments
- SD_df : data frame defined such as :
Sample_ID | k | SD | … |
---|---|---|---|
ID1 | levelk1 | SD_1,k | … |
-
Coords_df : data frame defined such as : | Sample_ID | X | Y | … | |———–|———|———-|—–| | ID1 | x1 | y1 | … |
-
legend_pos: Optional argument to define legend’s position
This return a map of the ean
This function allows to calculate Moran’s Index, spatial autocorrelation index such as: $$ I = \frac{N \sum_{i=1}^N \sum_{j=1}^N W_{ij}(x_i - \bar{x})(x_j - \bar{x})}{\sum_{i=1}^N \sum_{j=1}^N (W_{ij}) \sum_{i=1}^N (x_i - \bar{x})^2}$$
where W is a binary spatial weight matrix, defining through the K−nearest neighbors method (KNN) such as Wi**j equals one if i belongs to the first k neighbors of j, and zero otherwise, and where x is the value of the variable associated to the sample i, and reciprocally for x**j, and x̄ corresponds to the general mean of x. The results values are calculated for several variables according several projection and for differents k levels. Graphics of Moran Indexes distribution for each variable, could be computed. Finally significance tests according the Monte Carlo procedure, could be computed. ### Usage
moran_I_main <-function(l_coords_data , spatial_att, listK, nsim = 500, Stat=FALSE, Graph = FALSE, methods_name = NULL),
- l_coords_data : list of coordinates data frames whose structure is :
Sample_ID | x | y | … |
---|---|---|---|
ID1 | x_coord | y_coords | … |
These data frames contain samples’ coordinates which could be defined in ℝ𝕟.
- spatial_att : data frame containing variables values.
Sample_ID | Variable1 | Variable2 | … |
---|---|---|---|
ID1 | V1_id1 | V2_id1 | … |
-
listK : list k values
-
nsim : number of simulations for the significance test.
-
Stat : optional boolean argument, if this argument is set to TRUE, then the significance test will be calculated.
-
Graph : optional boolean argument, if this argument is set to TRUE, then the graphic of Moran Index distributions is drawn. This graphic depicts Moran Indexes distributions for each variable.
-
methods_name : optional parameter allowing to specify the named of the space included in
l_coords_data
argument.
A inner join on samples’ ID is effected if those differs between the
different data frames. Moran Indexes and statistics are computed
according moran_index_HD
and moran_stat_HD
functions.
According options activated the return list contains the following elements :
-
MI_array : 3D array containing Moran Index for each projection in row i, each variable in colunm j and each k level.
-
MS_array : 3D array containing Moran Significance tests’ p.value for each projection in row i, each variable in colunm j and each k level.
-
Graph : GGplots are printed if the option is activated. The plots correspond to the Moran’s index values for each variables in function of the k levels, for each spaces.
moran_index_HD
, moran_stat_HD
and moran_I_scatter_plot
This function allows to calculate Moran Indexes for high dimensional data by generalizing the process effected in 2D. In order to get Moran Indexes the k−nearest neighbors are defined for each sample according the brute method of knn algorithm. This k−nearest neighbors is use to define the spatial weights matrix. Then Moran Indexes are computed classically.
moran_index_HD <- function(data, spatial_att, K, merge = TRUE)
- data : data frame defining such as :
Sample_ID | x | y | z | … |
---|---|---|---|---|
MYID | x_coords | y_coords | z_coords | … |
- spatial_att : data frame which contains variables values.
Sample_ID | Variable1 | Variable2 | … |
---|---|---|---|
MYID | V1_myid | V2_myid | … |
-
K : numeric argument defining k level.
-
merge : optional boolean argument that allows to checked if
spatial_att
anddatta
contains the same samples’ ID. If samples’ ID differs then an inner join will be done.
Moran Index (numeric value).
Singnificance test are computed according Monte Carlo procedure. Like
this n simulations are done, at each iteration the vector of the
variable of interest is shuffle, and then Moran Indexes are clculatated
using moran_index_HD
function. Finally the rank of the observed Moran
Index in the resulting vector is computed to infer the p.value. This
p.value is the proportion of Moran Indexes obtained with random data
that are greater then the observed Moran Index.
moran_stat_HD <- function(data, K, spatial_att, obs_moran_I, nsim = 99)
- data : data frame defining such as :
Sample_ID | x | y | z | … |
---|---|---|---|---|
MYID | x_coords | y_coords | z_coords | … |
-
K : numeric argument defining a k level.
-
spatial_att : data frame which contains variables values.
Sample_ID | Variable1 |
---|---|
MYID | V1_myid |
-
obs_moran_I : observed moran Index computed according the real spatial distribution of the variable.
-
nsim : number of simulations.
Moran Significance test p.value.
moran_I_main
This function allows to displays the plot of Moran Index values for each variable and and each method,either a scatter or a boxplot is display if the Moran Index values have been calculated for several k levels.
moran_I_scatter_plot <- function(data, Xlab = NULL, Ylab=NULL, Title= NULL)
- data : 3D array containing Moran Index values whose the structure is the following :
k = i
Varaible1 | Varaible2 | Varaible3 | … | |
---|---|---|---|---|
Method1 | MI_v1_m1 | MI_v2_m1 | MI_v3_m1 | … |
Method2 | MI_v1_m2 | … | … | … |
… | … | … | … | … |
k = j
Varaible1 | Varaible2 | Varaible3 | … | |
---|---|---|---|---|
Method1 | MI_v1_m1 | MI_v2_m1 | MI_v3_m1 | … |
Method2 | MI_v1_m2 | … | … | … |
… | … | … | … | … |
-
Xlab : this optional argument is used to define the x-axis label.
-
Ylab : this optional argument is used to define the y-axis label.
-
Title : This optional argument is used to define the plot title.
This function return a ggplot object.
moran_I_main