# RV-NPL analysis for simulated genes

### Aim
In this notebook I show the workflow process to simulate family data with genotypes generated conditional on affection status using RarePedSim. And use RV-NPL to analyze the simulated families.

### Workflow process

Here I analyze one gene (MYO7A), as hard-coded in this notebook:

```
sos run NPL_Simulation.ipynb simulate
```

First I simulate family data using RarePedSim with the output of VCF file. (The parameters are annotated below)

In [1]:
[global]
# Path to the ped file (6-column PED format)
parameter: ped_file = './data/100extend01.ped'
# Variant information for genes to analyze (sfs format)
parameter: sfs_file = './data/MYO7A.sfs'
# conf file contains the simulation specifications (either Mendelian or Complex, details in RarePedSim doc)
parameter: conf_file = './data/ComplexPhenotype.conf'
# the output directory
parameter: out_dir = './data/simulation'

[simulate]
depends: executable('rarepedsim')
output: f'{out_dir}/MYO7A/rep1.vcf'
bash: expand = '${ }'
    rarepedsim generate -s ${sfs_file} -c ${conf_file} -p ${ped_file} --num_genes 1 --num_reps 1 -o ${out_dir} --vcf --debug -b -1
    bgzip -c ${_output} > ${_output}.gz
    tabix -p vcf ${_output}.gz
[collapse]
depends: executable('rvnpl')
output: f'{out_dir}/MYO7A/rep1/MERLIN/rep1.chr11.ped'
bash: expand = '${ }'
    rvnpl collapse --fam ${ped_file} --vcf ${out_dir}/MYO7A/rep1.vcf.gz --output ${out_dir}/MYO7A/rep1 --freq EVSMAF -c 0.01 --rvhaplo
[npl]
depends: executable('rvnpl')
output: f'{out_dir}/MYO7A/rep1/pvalue.txt'
bash: expand = '${ }'
    rvnpl npl --path ${out_dir}/MYO7A/rep1 --output ${out_dir}/MYO7A/rep1 --exact --info_only --perfect --sall --rvibd --n_jobs 8 -c 0.001 --rep 2000000

### Results

<Add Result> 