Purpose: DHOEM densifies (or loosen) a real SNP marker data set by performing statistical learning on local characteristics of the data.

# Examples

Use example from Examples_DHOEM.R

In [3]:
# Preliminary remark: all the data sets are in the 'Data folder' 
# (684 haplotypes with 8336 SNP for 12 chromosomes, for which 7924 SNP have MAF>1%)

# Clear memory and set seed
rm(list=ls())
set.seed(123)

setwd('~/simulations/libs/DHOEM/Source_codes/')
source('DHOEM.R')
source('Calcul_corr_bernoulli_vector.R')
source('Suppression_SNP_MAF_inf_limit.R')

## Example 1: calling DHOEM in order to densify an existing real marker data set

In [5]:
#Calling DHOEM
New_data = DHOEM('Haplotype_file.txt', 'Physical_map_file.txt', 'Physical_map_centromeres_file.txt',
                 Average_length_Kb_centromeres_low_SNP_coverage=1000, Nb_chromosomes=12, MAF_limit_for_all_SNP=0.01,
                 Nb_more_less_SNP_per_chromo_per_run=50, New_minimum_maximum_nb_SNP_specified=20000 )

In [6]:
Densified_haplotypes = New_data$Simulated_haplotypes
Densified_genotypes = New_data$Simulated_genotypes
Densified_physical_map = New_data$Simulated_physical_map

dim(Densified_haplotypes)
dim(Densified_genotypes)
dim(Densified_physical_map)

Densified_physical_map[(dim(Densified_physical_map)[1]-10):dim(Densified_physical_map)[1],]
Densified_haplotypes[(dim(Densified_haplotypes)[1]-10):dim(Densified_haplotypes)[1],
                     (dim(Densified_haplotypes)[2]-10):dim(Densified_haplotypes)[2]]

0,1,2
21114.0,12.0,27502.5
21115.0,12.0,27502.54
21116.0,12.0,27502.68
21117.0,12.0,27502.77
21118.0,12.0,27502.81
21119.0,12.0,27502.89
21120.0,12.0,27504.74
21121.0,12.0,27504.76
21122.0,12.0,27504.79
21123.0,12.0,27504.82


0,1,2,3,4,5,6,7,8,9,10
0,0,1,0,0,0,0,0,0,0,0
0,0,0,0,0,1,0,0,0,0,0
0,0,0,0,0,1,0,0,0,0,0
0,0,0,0,0,1,0,0,0,0,0
0,0,1,0,0,1,0,1,0,0,0
1,1,0,1,0,0,0,0,1,0,0
0,0,1,0,0,1,0,1,1,0,0
0,0,1,0,0,1,0,0,0,0,0
0,0,0,0,0,1,0,0,1,0,0
0,0,0,0,0,1,0,0,1,0,0


## Example 2: calling DHOEM in order to loosen an existing real marker data set

In [7]:
New_data = DHOEM('Haplotype_file.txt', 'Physical_map_file.txt', 'Physical_map_centromeres_file.txt', 
                 1000, 12, 0.01, 50, 5000 )

In [8]:
Loosened_haplotypes = New_data$Simulated_haplotypes
Loosened_genotypes = New_data$Simulated_genotypes
Loosened_physical_map = New_data$Simulated_physical_map

dim(Loosened_haplotypes)
dim(Loosened_genotypes)
dim(Loosened_physical_map)

Loosened_physical_map[(dim(Loosened_physical_map)[1]-10):dim(Loosened_physical_map)[1],]
Loosened_haplotypes[(dim(Loosened_haplotypes)[1]-10):dim(Loosened_haplotypes)[1],
                    (dim(Loosened_haplotypes)[2]-10):dim(Loosened_haplotypes)[2]]

Unnamed: 0,Repere_chrom_marq,Repere_pos_Kb
4914.0,12.0,26771.75
4915.0,12.0,26772.85
4916.0,12.0,27029.51
4917.0,12.0,27194.83
4918.0,12.0,27207.65
4919.0,12.0,27282.82
4920.0,12.0,27343.4
4921.0,12.0,27379.65
4922.0,12.0,27439.3
4923.0,12.0,27469.74


0,1,2,3,4,5,6,7,8,9,10
0,1,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0
0,1,0,0,0,0,0,0,0,0,0
0,1,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0


## Example 3: ERROR cases (densification and loosening cases) for the paramater Nb_more_less_SNP_per_chromo_per_run

Note: if all chromosomes have at least 100 SNP, small values like 20 and 50 for  Nb_more_less_SNP_per_chromo_per_run  will be ok 

In [9]:
New_data = DHOEM('Haplotype_file.txt', 'Physical_map_file.txt', 'Physical_map_centromeres_file.txt', 
                 1000, 12, 0.01, 2000, 20000 )

ERROR: Error in DHOEM("Haplotype_file.txt", "Physical_map_file.txt", "Physical_map_centromeres_file.txt", : 

 DHOEM error message: 

 Impossible to densify chromosome 1 which has 1007 potential SNP ( outside the region involving the centromere) for anchoring 2000 new simulated SNP.

 Please, decrease the parameter Nb_more_less_SNP_per_chromo_per_run 




In [11]:
New_data = DHOEM('Haplotype_file.txt', 'Physical_map_file.txt', 'Physical_map_centromeres_file.txt', 
                 1000, 12, 0.01, 2000, 1000 )

ERROR: Error in DHOEM("Haplotype_file.txt", "Physical_map_file.txt", "Physical_map_centromeres_file.txt", : 

 DHOEM error message: 

 Impossible to loosen the marker density of chromosome 1, which has only 1026 SNP, in attempting to suppress 2000 SNP.

 Please, decrease the parameter Nb_more_less_SNP_per_chromo_per_run 


