Skip to content
Zeyun edited this page Aug 12, 2022 · 11 revisions

Welcome to the focus wiki!

Here you'll find details on how to import weights, train weights using your own data, clean summary GWAS, and perform fine-mapping on TWAS results in single-ancestry or multi-ancestry settings.

Please see the sidebar for links to each.

Reference LD

We recommend using reference LD from LDSC. From the command line this can be done as

wget https://data.broadinstitute.org/alkesgroup/LDSCORE/1000G_Phase3_plinkfiles.tgz
tar -xvzf 1000G_Phase3_plinkfiles.tgz

We recommend using a multiple tissue, multiple eQTL reference panel weight database here. This combines GTExv7 weights from PrediXcan with METSIM, NTR, YFS, and CMC weights from FUSION software into a single usable database for FOCUS.

wget https://www.dropbox.com/s/ep3dzlqnp7p8e5j/focus.db?dl=0
mv focus.db?dl=0 focus.db

LD Blocks

We recommend using independent genomic regions generated by LDetect proposed in

Berisa, T., and Pickrell, J.K. (2016). Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics 32, 283–285. DOI: 10.1093/bioinformatics/btv546

and independent genomic regions across multiple ancestries by modified LDetect proposed in

Shi, H., Burch, K.S., Johnson, R., Freund, M.K., Kichaev, G., Mancuso, N., Manuel, A.M., Dong, N., and Pasaniuc, B. (2020). Localizing Components of Shared Transethnic Genetic Architecture of Complex Traits from GWAS Summary Data. Am. J. Hum. Genet. 106, 805–817. DOI:10.1016/j.ajhg.2020.04.012

In the software, we provide several genome-wide independent regions:

  • grch37.eur.afr.loci.bed: use --locations 37:EUR-AFR for independent genomic regions across EUR and AFR ancestries. --locations 38:EUR-AFR for the GRCh38 version.

  • grch37.eur.eas.afr.loci.bed: use --locations 37:EUR-EAS-AFR for independent genomic regions across EUR, EAS, and AFR ancestries. --locations 38:EUR-EAS-AFR for the GRCh38 version.

  • grch37.eur.eas.loci.bed: use --locations 37:EUR-EAS for independent genomic regions across EUR and EAS ancestries. --locations 38:EUR-EAS for the GRCh38 version.

  • grch37.eur.loci.bed: use --locations 37:EUR for independent genomic regions across EUR. --locations 38:EUR for the GRCh38 version.

You can use your own independent risk region files by directly specifying the path after --locations. Please make sure that your bed files contain column names chrom, start, stop for the chromosome name, regions start position, and region stop position. chrom has to be integer such as 1, 2, 3 (not chr1, chr2, chr3, ...).

Gencode Files

We have default gencode files with both v37 and v38 available with the software. Please specify this with --prior-prob gencode38 or --prior-prob gencode37.

You can use your own gencode files by directly specifying the path after --prior-prob. Please make sure that your files are in tsv format and contain column names chrom, start, stop, and gene_name for the chromosome name, start position, stop position, and gene name. chrom has to be integer such as 1, 2, 3 (not chr1, chr2, chr3, ...).

We recommend to make sure that your gencode files do not contain duplicated rows (gene names are unique).

Other than using gencode files, you can still specify the prior probability for a gene to be causal as a numeric number (e.g., 0.01, 0.005).