## How is the dispersal rate estimation by BEAST affected by sampling area and dispersal model?

This notebook investigates how the **sampling area** and the **underlying dispersal model** influence the accuracy and variability of dispersal rate estimates inferred by BEAST using continuous phylogeographic diffusion models.

---

### 1. GSpace simulation using various sampling areas

We simulate sequence data under controlled spatial population genetic conditions using GSpace. The key variable in this notebook is the **sampling radius**  r , which defines the sampling area.

#### Sampling area:

The sampling area is defined as a square located at the center of the lattice with area:

$
A = 2r^2
$

- A larger r results in a broader spatial coverage.
- A smaller r means more spatially restricted sampling.

We vary $r \in \{2, 4, 6, 8\}$ keeping the total number of sampled individuals constant across runs.

#### Parameters (fixed):

- Lattice size: 20 × 20
- Individuals per node: 30
- Sampled nodes: 4
- Chromosome: 1, 1000 bp
- Mutation model: HKY
- Dispersal model: uniform, maximum distance = 1,1
- Output: FASTA + coordinates in header

#### Objective:

To assess whether **increasing the sampling area**:

- Increases the precision or bias of **diffusion rate estimates**.
- Improves the correlation between true and estimated movement distances.

### Workflow

In [4]:
import shutil
from utils.gspace_utils import *
from utils.beast_utils import *
from utils.file_utils import *

g_mutation_rate=0.00001
radii = [2, 4, 6, 8]

#### 1. Set the working directory

Set the working where your GSpaceSettings.txt file will be generated. further results of the analysis will be found here as well.

In [2]:
set_tests_dir()

Moved into 'Tests' directory. Current working directory: /Users/ayoubrayaneaitallaoua/Documents/LIRMM/Pathogen_Dispersal_Rate_Estimation/Tests


#### 2. generate GSpace Settings.txt with variable radii

In [3]:
for radius in radii:
    generate_gspace_settings_variable_sample(r=radius,mutation_rate=g_mutation_rate)

GSpaceSettings_r_2.txt generated with random sampling coordinates in .!
GSpaceSettings_r_4.txt generated with random sampling coordinates in .!
GSpaceSettings_r_6.txt generated with random sampling coordinates in .!
GSpaceSettings_r_8.txt generated with random sampling coordinates in .!


#### 3. run GSpace simulations

In [4]:
for radius in radii:

        # make a dir for each radius
        os.mkdir(f"result_r_{radius}")

        # move the GSpacesettings_r_{r}.txt files into their respective directories
        shutil.move(
            f"GSpacesettings_r_{radius}.txt",
            f"result_r_{radius}/GSpacesettings_r_{radius}.txt")

        # change directories for each radius and rename the file
        os.chdir(f"result_r_{radius}")
        os.rename(f"GSpaceSettings_r_{radius}.txt","GSpaceSettings.txt")

        # run GSpace
        run_gspace(gspace_dir="../../../GSpace/build/GSpace")

        # go back to Tests dir
        os.chdir("..")

reading settings file : GSpaceSettings.txt

Random assignation 1 chromosome MRCA nucleotidic states. Press any key to resume.


         This is  GSpace  v0.1 (Built on Apr 22 2025 at 15:19:09)    
               (Virgoulay et al. 2020 Bioinformatics)                       
         an exact coalescent simulator of genetic /  genomic data           
            under generalized models of isolation by distance               
Settings summary : Generic output filename is simulated_sequences_r_2
 Simulation of 1 data sets
   with 1 chromosomes / independant loci with 1000 linked sites /  loci each. 
 Mutation model is hky
   with a mutation rate of 1e-05 mutations per site per generation.
   and a recombination rate of 0 between adjacent sites per generation.
Homogeneous sample of size (4x4)*5 = 80 haploid individuals 
evolving on a 20 x 20 lattice with reflecting boundaries
  where each node carries 30 individuals.
Dispersal settings are summarized in the simulated_sequences_r_2_GSpace_

#### 4. generate BEAST xml file

In [5]:
set_tests_dir()

for radius in radii:
    os.chdir(f"result_r_{radius}")
    generate_beast_xml(output_xml=f"r_{radius}.xml", mutation_rate=g_mutation_rate)
    os.chdir("..")

Already in the 'Tests' directory!
Current working directory: /Users/ayoubrayaneaitallaoua/Documents/LIRMM/Pathogen_Dispersal_Rate_Estimation/Tests
For file: simulated_sequences_r_2_Fasta_1.fa BEAST XML generated: r_2.xml
For file: simulated_sequences_r_4_Fasta_1.fa BEAST XML generated: r_4.xml
For file: simulated_sequences_r_6_Fasta_1.fa BEAST XML generated: r_6.xml
For file: simulated_sequences_r_8_Fasta_1.fa BEAST XML generated: r_8.xml


#### run BEAST for each simulation