# Univariate Fine-Mapping of Functional (Epigenomic) Data with fSuSiE


## Description

Univariate fine-mapping for functional (epigenomic) data is conducted with fSuSiE. This is similar to the normal univariate fine-mapping, with the main difference being the use of epigonmic data. 


## Input


    
`--genoFile`: path to a text file contatining information on genotype files. For example:
```
#id     #path
21      $PATH/protocol_example.genotype.chr21_22.21.bed
22      $PATH/protocol_example.genotype.chr21_22.22.bed
```
`--phenoFile`: a tab delimited file containing chr, start, end, ID and path for the regions. For example:
```
#chr    start   end     ID      path
chr21   0       14120807        TADB_1297       $PATH/protocol_example.ha.bed.gz
chr21   10840000        16880069        TADB_1298       $PATH/protocol_example.ha.bed.gz
```

`--covFile`: path to a gzipped file containing covariates in the rows, and sample ids in the columns.  
`--customized-association-windows`: a tab delimited file containing chr, start, end, and ID regions. For example:
```
#chr    start   end     ID
chr21   0       14120807        TADB_1297
chr21   10840000        16880069        TADB_1298
```
`--region-name`: if you only wish to analyze one region, then include the ID of a region found in the `customized-association-windows` file

## Output

* `*_marks.dataset.rds`
```
> str(readRDS("/restricted/projectnb/xqtl/xqtl_protocol/toy_xqtl_protocol/output/fsusie/fsus/Mic.chr7_139293693_145380632.16_marks.dataset.rds"))
List of 1
 $ chr7:139293693-145380632:List of 13
  ..$ residual_Y       :List of 1
  .. ..$ ROSMAP_Mic_snATACQTL: num [1:65, 1:166] -0.0444 -0.3137 -0.0634 0.0658 0.8817 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:65] "sample_10" "sample_11" "sample_12" "sample_13" ...
  .. .. .. ..$ : NULL
  ..$ residual_X       :List of 1
  .. ..$ : num [1:65, 1:15857] -0.4025 -0.8767 -0.2054 -0.0908 0.6848 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:65] "sample_10" "sample_11" "sample_12" "sample_13" ...
  .. .. .. ..$ : chr [1:15857] "chr7:139302775:AACACACACAC:AACACACACACAC" "chr7:139302775:AACACACACAC:AACACACACACACAC" "chr7:139304706:G:GGT" "chr7:139305695:G:T" ...
  ..$ residual_Y_scalar: num 1
  ..$ residual_X_scalar: num 1
  ..$ covar            :List of 1
  .. ..$ : num [1:65, 1:48] 1 1 1 0 0 0 0 0 1 0 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:65] "sample_10" "sample_11" "sample_12" "sample_13" ...
  .. .. .. ..$ : NULL
  ..$ Y                :List of 1
  .. ..$ : num [1:65, 1:166] 1.99 2.19 2.02 2.48 3.42 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:65] "sample_10" "sample_11" "sample_12" "sample_13" ...
  .. .. .. ..$ : NULL
  ..$ X_data           :List of 1
  .. ..$ : num [1:65, 1:15857] 0 0 1 1 1 0 2 0 1 1 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:65] "sample_10" "sample_11" "sample_12" "sample_13" ...
  .. .. .. ..$ : chr [1:15857] "chr7:139302775:AACACACACAC:AACACACACACAC" "chr7:139302775:AACACACACAC:AACACACACACACAC" "chr7:139304706:G:GGT" "chr7:139305695:G:T" ...
  ..$ maf              :List of 1
  .. ..$ : Named num [1:15857] 0.2692 0.1538 0.0692 0.0923 0.4615 ...
  .. .. ..- attr(*, "names")= chr [1:15857] "chr7:139302775:AACACACACAC:AACACACACACAC" "chr7:139302775:AACACACACAC:AACACACACACACAC" "chr7:139304706:G:GGT" "chr7:139305695:G:T" ...
  ..$ grange           : chr [1:2] "139293693" "145380632"
  ..$ Y_coordinates    :List of 1
  .. ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	166 obs. of  3 variables:
  .. .. ..$ #chr : chr [1:166] "chr7" "chr7" "chr7" "chr7" ...
  .. .. ..$ start: num [1:166] 1.39e+08 1.39e+08 1.39e+08 1.39e+08 1.39e+08 ...
  .. .. ..$ end  : num [1:166] 1.39e+08 1.39e+08 1.39e+08 1.39e+08 1.39e+08 ...
  ..$ dropped_sample   :List of 3
  .. ..$ X    :List of 1
  .. .. ..$ : chr [1:3] "sample_71" "sample_6" "sample_46"
  .. ..$ Y    :List of 1
  .. .. ..$ : chr [1:5] "sample_1" "sample_6" "sample_7" "sample_46" ...
  .. ..$ covar:List of 1
  .. .. ..$ : chr [1:3] "sample_1" "sample_47" "sample_7"
  ..$ X                : num [1:65, 1:15857] 0 0 1 1 1 0 2 0 1 1 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:65] "sample_10" "sample_11" "sample_12" "sample_13" ...
  .. .. ..$ : chr [1:15857] "chr7:139302775:AACACACACAC:AACACACACACAC" "chr7:139302775:AACACACACAC:AACACACACACACAC" "chr7:139304706:G:GGT" "chr7:139305695:G:T" ...
  ..$ chrom            : chr "chr7"
```

* `*_top_pc_weights.rds`
```
> str(readRDS("/restricted/projectnb/xqtl/xqtl_protocol/toy_xqtl_protocol/output/fsusie/fsus/Mic.chr7_139293693_145380632.fsusie_mixture_normal_none__top_pc_weights.rds"),max.level = 3)
List of 1
 $ chr7:139293693-145380632:List of 1
  ..$ ROSMAP_Mic_snATACQTL:List of 10
  .. ..$ susie_on_top_pc           :List of 1
  .. ..$ susie_weights_intermediate:List of 6
  .. ..$ twas_weights              :List of 6
  .. ..$ twas_predictions          :List of 6
  .. ..$ twas_cv_result            :List of 4
  .. ..$ total_time_elapsed        : 'proc_time' Named num [1:5] 31992.49 40.47 32206.65 304.39 4.65
  .. .. ..- attr(*, "names")= chr [1:5] "user.self" "sys.self" "elapsed" "user.child" ...
  .. ..$ fsusie_result             :List of 34
  .. .. ..- attr(*, "class")= chr "susiF"
  .. ..$ Y_coordinates             :Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	166 obs. of  3 variables:
  .. ..$ fsusie_summary            :List of 5
  .. ..$ region_info               :List of 3
```

## Minimal Working Example Steps

### iii. [Run the Fine-Mapping with fSuSiE](https://statfungen.github.io/xqtl-protocol/code/mnm_analysis/mnm_methods/mnm_regression.html#ii-fsusie)

In [None]:
sos run pipeline/mnm_regression.ipynb fsusie \
    --cwd output/fsusie/ \
    --name   Mic  \
    --genoFile data/fsusie/mwe.genotype_by_chrom_files.txt \
    --phenoFile data/fsusie/mwe.pheno.region_list \
    --covFile   data/fsusie/mwe.chr7_139293693_145380632.Marchenko_PC.anon.gz \
    --cis-window 0 --max-cv-variants 5000 \
    --susie_top_pc 0 --phenotype-names ROSMAP_Mic_snATACQTL --maf 0.01 \
    --save-data \
    --numThreads 8 \
    --post_processing "none" --small-sample-correction 

## Anticipated Results

Univariate finemapping for functional data will produce a file containing results for the top hits and a file containing residuals from SuSiE.

`Mic.chr7_139293693_145380632.fsusie_mixture_normal_none__top_pc_weights.rds`:
* For each region of interest, this file contains: 
    1. susie_on_top_pc 
    2. twas_weights - for each variant (for enet, lasso and mrash methods). 
    3. twas predictions - for each sample (for enet, lasso, mrash methods)
    4. twas cross validation results - information on the best method. Data is split into five parts
    6. fsusie results 
    7. Y coordinates 
    8. fsusie summary 
    9. total time elapsed
    10. region info - information on the region specified

`Mic.chr7_139293693_145380632.16_marks.dataset.rds`:
* For each gene of interest, contains residuals for each sample and phenotype
* see [pecotmr code](https://github.com/statfungen/pecotmr/blob/68d87ca1d0a059022bf4e55339621cbddc8993cc/R/file_utils.R#L461) for description at fsusie uses the `load_regional_functional_data` function, an explanation of the arguments can be found at the similar `load_regional_association_data` function