Sequence difference view

Sequence difference view
Spatial autocorrelation

Sequence difference view

Sequence difference view calculation `Seq_calcul`

Description

This function allows to calculate the sequence diffeence (S**D) view metrics. For a point i the formula is : $$ SD_k(i) = \frac{1}{2} \sum_{j \in V^l_k(i)}[k-\rho^l_i(j)].|\rho^l_i(j)-\rho^h_i(j)|+ \frac{1}{2} \sum_{j \in V^h_k(i)}[k-\rho^h_i(j)].|\rho^l_i(j)-\rho^h_i(j)|, \label{EqSD} $$ where V_k^d(i) is the k−neighborhood of i in the dimension d, and ρ_i^d(j) is the rank of j in the k−neighborhood of i

Usage

Seq_calcul <- function(l_data, dataRef, listK)

Arguments

l_data : list of data frame whose structure is :

Sample_ID	x	y	…
ID1	x_coord	y_coords	…

These data frames contain samples’ coordinates which could be defined in ℝ^𝕟. **warning : ** It must be a list of dates frames and not a list of data tables.

dataRef : reference data frame whose structure is defined above.
**warning : ** It must be a list of dates frames and not a list of data tables.
listK : list k levels

Details

A inner join on samples’ ID is effected if they differs between the different data frames.
Calculations use a parallel computing according the levels k.

Value

A list of containing l elements is returned (where l corresponds to the number of data frames containing in l_data). Each element contains n SQ values, where n is the number of common samles’ ID between the reference data frame and the data frames in l_data.

Main function for sequence difference view `Seq_main`

Description

This function allows to calculate the S**D values for several data frames and for differents k levels. Distributions of means S**D values by levels k, i.e $\overline{SD}_k$ could be plot. Finally statistic tests could be computed if at least two low dimensionals projections are given in input. ### Usage

Seq_main <- function(l_data, dataRef, listK, colnames_res_df = NULL , filename = NULL , graphics = FALSE, stats = FALSE)

Arguments

l_data : list of data frames whose the respective structure must be :

Sample_ID	x	y	…
ID1	x_coord	y_coords	…

These data frames contain samples’ coordinates which could be defined in ℝ^𝕟.

dataref : data frame of reference whose strucuture is the same as define above.
listK : list k levels.
colnames_res_df : This optional argument allows to specify the colnames of the returned data frame and also the plot’s legend if it was computed. If this argument is unsecified then the default values will be set to : V1, V2, ..., V**n (where n is the length of l_data).
filename : This optional arguement allows to defined the filename on which results will be written. If this argument is unspecified then results will be returned and not written. If users choose a filename that ever exits in the current directory a incrementation in the filename will be done.
graphics : This boolean argument allows to computes plot. This plot will represent means of S**D values for the different k levels and for the different data frames in l_data.
stats : This option allows to run statistic tests, it is available only if the number of defines method is higher at least equal to two, (i.e l_data’s length is ≥ 2). If only two data frames were given as input via the l_data then a test will be computed to compare the distribution of the the means by k levels of absolute differences between S**D values. If more than two methods were defined then paired tests are done. If more than 30 $\overline{SD}_k$ values have been computed Student tests are done, otherwise Wilcoxon tests are preferred.

Details

A inner join on samples’ ID is effected if those differs between the different data frames.

Value

According options activated the return list contains the following elements :

Seq_df : data frame containing a column with the samples’ Id, a column correspoding to the levels k, and n colunms corresponding to the S**D values. This data frame could be written in a file if filename is defined.
Seq_mean_by_k : data frame containing the $\overline{SD}_k$, for each data frame contained in l_data.
paired_test : Student or Wilcoxon paired test’s results,
pairwise_tests : Matrix of Student or Wilcoxon pairwise tests’ p.value.
graphics : GGplot of the $\overline{SD}_k$ in function of the levels k if the graphic option was activated.

Graphic of means of sequence difference values by k values `Seq_graph_by_k`

Description

This function displays the graphic of means of sequence difference values by k values.

Usage

Seq_graph_by_k <-function (data_Seq, Names=NULL, data_diff_mean_K = NULL, log=False)

Arguments

data_Seq : data frame of sequence difference values structured such as :

Sample_ID	K	CP1
ID1	1Kst_level	CP1_id1

Names : optional argument allowing to precise legend labels. If this argument is unprecised lengend labels are equal to data_Seq’s colnames.
data_diff_mean_K : optional data frame contining means of SQ values by K level. If this argument is precised then means are not calculated.
log:Boolean optional argument, if it set to true then a logarithmic scale will be used.

Value

A ggplot object is returned.

Sequence difference values permutation test `seq_permutation_test`

Description

Then this function test the random hypothesis i.e.: Does S**D values calculated on real data set are equivalent to those expected on random data ? In order to do this n simulations are realized. According these simulations the $\overline{SD}_k$ are calculated. Finally wilcoxon test is effected to compare the mean random distribution and the real one.

Usage

seq_permutation_test <- function(data, data_ref, list_K, n=30, graph = TRUE)

Arguments

data : data frame defined such as :

Sample_ID	x	y	…
ID1	x_coord	y_coords	…

data_ref : reference data frame whose structure is equivalent to the one defined above.
listK : list k levels.
n : number of simulations.
graph : optional boolean argument, if this argument is TRUE, simulations resulting graphic is computed.

Value

This function returns the Wilcoxon test’s results.

Details

According the n the simulation could be long.

Sequence difference map `SD_map_f`

Description

This function allows display the sample’s mean $\overline{SD}_k$ on a two dimensional projection.

Usage

SD_map_f <- function(SD_df, Coords_df, legend_pos = "right") ### Arguments

SD_df : data frame defined such as :

Sample_ID	k	SD	…
ID1	levelk1	SD_1,k	…

Coords_df : data frame defined such as : | Sample_ID | X | Y | … | |———–|———|———-|—–| | ID1 | x1 | y1 | … |
legend_pos: Optional argument to define legend’s position

Value

This return a map of the ean $\overline{SD}_k$ per sample.

Spatial autocorrelation

Moran index main function `moran_I_main`

Description

This function allows to calculate Moran’s Index, spatial autocorrelation index such as: $$ I = \frac{N \sum_{i=1}^N \sum_{j=1}^N W_{ij}(x_i - \bar{x})(x_j - \bar{x})}{\sum_{i=1}^N \sum_{j=1}^N (W_{ij}) \sum_{i=1}^N (x_i - \bar{x})^2}$$

where W is a binary spatial weight matrix, defining through the K−nearest neighbors method (KNN) such as W_i**j equals one if i belongs to the first k neighbors of j, and zero otherwise, and where x is the value of the variable associated to the sample i, and reciprocally for x**j, and x̄ corresponds to the general mean of x. The results values are calculated for several variables according several projection and for differents k levels. Graphics of Moran Indexes distribution for each variable, could be computed. Finally significance tests according the Monte Carlo procedure, could be computed. ### Usage

moran_I_main <-function(l_coords_data , spatial_att, listK, nsim = 500, Stat=FALSE, Graph = FALSE, methods_name = NULL),

Arguments

l_coords_data : list of coordinates data frames whose structure is :

Sample_ID	x	y	…
ID1	x_coord	y_coords	…

These data frames contain samples’ coordinates which could be defined in ℝ^𝕟.

spatial_att : data frame containing variables values.

Sample_ID	Variable1	Variable2	…
ID1	V1_id1	V2_id1	…

listK : list k values
nsim : number of simulations for the significance test.
Stat : optional boolean argument, if this argument is set to TRUE, then the significance test will be calculated.
Graph : optional boolean argument, if this argument is set to TRUE, then the graphic of Moran Index distributions is drawn. This graphic depicts Moran Indexes distributions for each variable.
methods_name : optional parameter allowing to specify the named of the space included in l_coords_data argument.

Details

A inner join on samples’ ID is effected if those differs between the different data frames. Moran Indexes and statistics are computed according moran_index_HD and moran_stat_HDfunctions.

Value

According options activated the return list contains the following elements :

MI_array : 3D array containing Moran Index for each projection in row i, each variable in colunm j and each k level.
MS_array : 3D array containing Moran Significance tests’ p.value for each projection in row i, each variable in colunm j and each k level.
Graph : GGplots are printed if the option is activated. The plots correspond to the Moran’s index values for each variables in function of the k levels, for each spaces.

Calcul of Moran Indexes for high dimensional data `moran_index_HD`

Description

This function allows to calculate Moran Indexes for high dimensional data by generalizing the process effected in 2D. In order to get Moran Indexes the k−nearest neighbors are defined for each sample according the brute method of knn algorithm. This k−nearest neighbors is use to define the spatial weights matrix. Then Moran Indexes are computed classically.

Usage

moran_index_HD <- function(data, spatial_att, K, merge = TRUE)

Arguments

data : data frame defining such as :

Sample_ID	x	y	z	…
MYID	x_coords	y_coords	z_coords	…

spatial_att : data frame which contains variables values.

Sample_ID	Variable1	Variable2	…
MYID	V1_myid	V2_myid	…

K : numeric argument defining k level.
merge : optional boolean argument that allows to checked if spatial_att and datta contains the same samples’ ID. If samples’ ID differs then an inner join will be done.

Value

Moran Index (numeric value).

Moran significance test for high dimensional data `moran_stat_HD`

Description

Singnificance test are computed according Monte Carlo procedure. Like this n simulations are done, at each iteration the vector of the variable of interest is shuffle, and then Moran Indexes are clculatated using moran_index_HD function. Finally the rank of the observed Moran Index in the resulting vector is computed to infer the p.value. This p.value is the proportion of Moran Indexes obtained with random data that are greater then the observed Moran Index.

Usage

moran_stat_HD <- function(data, K, spatial_att, obs_moran_I, nsim = 99)

Arguments

data : data frame defining such as :

Sample_ID	x	y	z	…
MYID	x_coords	y_coords	z_coords	…

K : numeric argument defining a k level.
spatial_att : data frame which contains variables values.

Sample_ID	Variable1
MYID	V1_myid

obs_moran_I : observed moran Index computed according the real spatial distribution of the variable.
nsim : number of simulations.

Value

Moran Significance test p.value.

Graphic of Moran Indexes for each variable and each method `moran_I_scatter_plot_by_k`

Description

This function allows to displays the plot of Moran Index values for each variable and and each method,either a scatter or a boxplot is display if the Moran Index values have been calculated for several k levels.

Usage

moran_I_scatter_plot <- function(data, Xlab = NULL, Ylab=NULL, Title= NULL)

Arguments

data : 3D array containing Moran Index values whose the structure is the following :

k = i

	Varaible1	Varaible2	Varaible3	…
Method1	MI_v1_m1	MI_v2_m1	MI_v3_m1	…
Method2	MI_v1_m2	…	…	…
…	…	…	…	…

k = j

	Varaible1	Varaible2	Varaible3	…
Method1	MI_v1_m1	MI_v2_m1	MI_v3_m1	…
Method2	MI_v1_m2	…	…	…
…	…	…	…	…

Xlab : this optional argument is used to define the x-axis label.
Ylab : this optional argument is used to define the y-axis label.
Title : This optional argument is used to define the plot title.

Value

This function return a ggplot object.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
DR_Method		DR_Method
RCODE		RCODE
data		data
README.md		README.md

emathian/DRMetrics

Folders and files

Latest commit

History

Repository files navigation

Sequence difference view