# xQTL Hierarchical Multiple Testing

This protocol implements a three-step procedure:
1. Local adjustment: p-values of all cis-variants adjusted within each gene
2. Global adjustment: minimum adjusted p-values from Step 1 further adjusted across all genes
3. Global informed identification of significant xQTL: xQTL with locally adjusted p-value below the threshold

It also reorganizes intermediate files produced in tensorQTL analysis to specified archive folder for book-keeping or deletion, to save space.

## Example command

```bash
sos run ~/GIT/xqtl-protocol/code/association_scan/qtl_association_postprocessing.ipynb default \
    --gene-coordinates look_up_gene_id.tsv \
    --cwd ~/Downloads/snuc_DeJager_Ast_tensorQTL_MWE --sub-dir LR \
    --output-dir ~/output --archive-dir ~/archive
```

When `maf_cutoff` and `cis_window` are not zero, the program will first compute the number of variants after filtering by these metric and write files with `n_variants_suffix` in the same folder as QTL data, then use those numbers to create filtered list of variants for Bonferroni adjusted p-value.

In [None]:
[global]
parameter: cwd = path(".")
parameter: gene_coordinates = path
parameter: output_dir = path
parameter: archive_dir = path
parameter: sub_dir = "LR"
parameter: maf_cutoff = 0.05
parameter: cis_window = 1000000
parameter: tss_dist_col = "start_distance"
parameter: tes_dist_col = "end_distance"
# This is for selecting the subset of data to process on protential signals 
# assuming we drop those above this threshold
# This might lead to underestimates in qvalue method since qvalue < 0.05 may contain pvalue > 0.05
parameter: pvalue_cutoff = 0.05
# This is used for both event and variant level significance filter
parameter: fdr_threshold = 0.05
parameter: regional_pattern = "*.cis_qtl.regional.tsv.gz$"
parameter: n_variants_suffix = ".cis_n_variants_stats.tsv.gz"
parameter: qtl_pattern = "*.cis_qtl.pairs.tsv.gz$"

In [1]:
[default]
output: f"{output_dir:a}/{cwd:ab}/{sub_dir}/{sub_dir}_consolidated.rds"
task: trunk_workers = 1, tags = f'tensorqtl_postprocessing_{_output:n}'
R: expand = "${ }"

    params <- list()
    params$workdir           <- "${cwd:a}/${sub_dir}"
    params$sub_dir          <- "${sub_dir}"
    params$maf_cutoff        <- ${maf_cutoff}
    params$cis_window        <- ${cis_window}
    params$pvalue_cutoff     <- ${pvalue_cutoff}
    params$fdr_threshold     <- ${fdr_threshold}
    params$gene_coordinates  <- "${gene_coordinates:a}"
    params$output_dir        <- "${output_dir:a}/${cwd:ab}/${sub_dir}"
    params$archive_dir       <- "${archive_dir:a}/${cwd:ab}"
    params$regional_pattern  <- "${regional_pattern}"
    params$n_variants_suffix<- "${n_variants_suffix}"
    params$qtl_pattern      <- "${qtl_pattern}"
    params$start_distance_col <- "${tss_dist_col}"
    params$end_distance_col <- "${tes_dist_col}"

    source("~/GIT/pecotmr/inst/code/tensorqtl_postprocessor.R")
    results <- hierarchical_multiple_testing_correction(params)
    write_results(results, params$output_dir, params$workdir, to_cwd = "regional")
    archive_files(params)
    saveRDS(results, ${_output:r})