<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Software-required" data-toc-modified-id="Software-required-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Software required</a></span></li><li><span><a href="#Downloading-data" data-toc-modified-id="Downloading-data-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Downloading data</a></span><ul class="toc-item"><li><span><a href="#Data-provided-and-folder-hierarchy" data-toc-modified-id="Data-provided-and-folder-hierarchy-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Data provided and folder hierarchy</a></span></li><li><span><a href="#1000Genomes-data" data-toc-modified-id="1000Genomes-data-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>1000Genomes data</a></span></li></ul></li><li><span><a href="#Running-LMI" data-toc-modified-id="Running-LMI-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Running LMI</a></span><ul class="toc-item"><li><span><a href="#Download-the-scripts" data-toc-modified-id="Download-the-scripts-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Download the scripts</a></span></li><li><span><a href="#Format-sumstats-file:" data-toc-modified-id="Format-sumstats-file:-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Format sumstats file:</a></span></li><li><span><a href="#Now-run-the-script-this-like-this." data-toc-modified-id="Now-run-the-script-this-like-this.-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Now run the script this like this.</a></span></li></ul></li><li><span><a href="#Format-&amp;-prepare-GWAS-sumstats-and-LMI-results" data-toc-modified-id="Format-&amp;-prepare-GWAS-sumstats-and-LMI-results-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Format &amp; prepare GWAS sumstats and LMI results</a></span></li></ul></div>

### Software required

Download and install the following required software and add them to your path.

Plink: https://www.cog-genomics.org/plink/1.9/

Vcftools http://vcftools.sourceforge.net/

R https://cran.r-project.org/

### Downloading data 

Download the folder **`data_LMI.tar.gz`** from the following Google Drive link:

https://drive.google.com/file/d/11jaU23jGxW6pdXSkjalrcSe5w1qD9wYf/view?usp=sharing

transfer to your working directory and unpack it using:

In [None]:
tar xvzf data_LMI.tar.gz

#### Data provided and folder hierarchy
This `data` folder contains several files and sub-folders:

**Folders:**
- `results_LMI`: Folder where the LMI calculations will be stored
- `sumstats`: Folder where the script `run_LMI.sh` will place sumstats-derived processed files (sumstats split by chromosome,cummulative distribution functions, list of snp codes)
- `snp_files`: Same as above, but to place SNP-derived intermediate calculations
- `1000genomes_CEU_MAF001.tar.gz`: See below; next section.

**Files:**
- `sumstats.raw.pv1e4`: List of the 143 SNPs with pv <= 1e-04
- `CEU_indv.txt`: See below.
- `sumstats.raw`: All the sumstats for the full set of SNPs and the 3 cohorts
- `sumstats.raw.pangen_isblac_epicuro`: Sumstats for the cohort used here to calculated LMI.
- `LMI_pv.snps`: The list of 624 SNPs selected through p-value and LMI.


#### 1000Genomes data
To calculate linkage disequilibrium and populational frecuencies we need a reference population. You can use any VCF from any population, but our study was run with CEU population (n=85) (check file `CEU_indv.txt`, in `data` folder for a list of the individuals used) and MAF < 0.01. 

Under the `data` folder, in subfolder `1000genomes_CEU_MAF001.tar.gz` we are providing files for 22 chromosomes filtered to:
- Contain only CEU individuals (n=85)
- Contain only those variants (common variants) with MAF >= 0.01 in our selected population

Enter the `your_working_dir/data/` folder you just unpacked and unpack this other file as well: 

In [None]:
tar xvzf 1000genomes_CEU_MAF001.tar.gz

### Running LMI

#### Download the scripts
Download the gzipped folder containing the scripts from: https://github.com/pollicipes/Local-Moran-Index-1D/, or clone the repository.
Move to that folder and you'll find some scripts.
To calculate LMI You only have to run the main script `run_LMI.sh` following the commands below.

#### Format sumstats file:
The order of the columns for the sumstats **MUST** be: (not need to have the same string headers, but columns must contain what the header refers to)

`SNP	CHR	BP	OR	P`

Being respectively: SNP name code, chromosome, base pair position, effect size, pvalue

In [None]:
# Switching to the folder you just download
cd ${YOUR_PATH}/Local-Moran-Index-1D/

#### Now run the script this like this.
Set the variables and the options you prefer. This code runs for a single chromosome. You can parallelize the code in several processors; one chromosome each.

In [None]:
./run_lmi.sh 
    ${YOUR_PATH}/data/                              # Starting folder full path until the data folder
    ${YOUR_PATH}/data/chr_files_1KG_CEU_MAF001/     # 1000 genomes files full path
    ${YOUR_PATH}/sumstats.raw.pangen_isblac_epicuro # Summary statistics (should be on the starting folder)
    22                                              # Chromosome to process
    500000                                          # Window on which calculate LMI
    0.05                                            # difference allowed in MAF
    alias                                           # just an alias/prefix to identify the sample.                                                 

### Format & prepare GWAS sumstats and LMI results

In our sumstats we have several cohorts and the resulting pvalues and effect sizes for each one of them. We selected the `pangen_isblac_epicuro` cohort.

In [5]:
# working folder
wd=${YOUR_PATH}/data;
cd ${wd};

In [6]:
# Substitute header to suitable format:
# From the full raw sumstats, select only pangen-isblac epicuro 
cut -f1-4,6 sumstats.raw | sed 1d > sumstats.raw.pangen_isblac_epicuro
# Add header manually.

In [7]:
# how many markers with pv <1e-4 in this cohort
awk '$5 < 0.0001' sumstats.raw.pangen_isblac_epicuro| wc -l

89


Concatenate all LMI values for each chromosome

In [8]:
cat results_LMI/sumstats.raw.pangen_isblac_epicuro_localMoranI.chr* > results_LMI/sumstats.raw.pangen_isblac_epicuro_localMoranI.all

Proceed to `01_GWAS_LMI_integration`, using `R`

**END**