# Using PLINK to run a GWAS Analyse

### Toy data received

toy data from UKBiobank with 10_000 individuals: unrel_10k_EUR, 
Alzeihmer dissease phenotypes created following the method from Jansen et al. paper: ukb_alz.pheno
Multiple covariate (sex, localisation): ukb_alz.cov

### PLINKing

Following command is used to extract only one chromosome from the data

--bfile: tels plink it is a bed file

--chr: lets you choose the chromosome

--make-bed: recreates bed, bim, fam files 

In [14]:
%%
plink --bfile unrel_10k_EUR --chr 2 --make-bed --out 2

UsageError: Cell magic `%%` not found.


Extracting only some selected snp's to do some testing

In [1]:
%%
plink --bfile unrel_10k_EUR --snps 10:68564:A_G-10:73537:A_G, 10:82187:C_G, 10:85499:A_G --make-bed --out TESTING

UsageError: Cell magic `%%` not found.


Changing the access rights to a folder

In [15]:
%%
chmod +rwx alz_cc.plink

UsageError: Cell magic `%%` not found.


Observing the head of an interessting *zipped* data

In [16]:
%%
zcat ukb_alz.pheno.gz | head

UsageError: Cell magic `%%` not found.


Changing folders from one place to another

In [17]:
%%
scp -r alz_matthieu/ukb_alz.pheno.gz unrel_10K_EUR/

UsageError: Cell magic `%%` not found.


Running plink on the phenotypes (in this case only for chromosome 1)

--pheno: asks for a phenotype file

--pheno-name: lets you specify the name of the phenotype you want to analyse

--assoc: Simple association (important: does not run covariate, and no error statement!)


In [None]:
%%
plink --bfile 1 --pheno ukb_alz.pheno --pheno-name alz_wt --assoc --allow-no-sex --out test_1

Sexy looping instead of manually doing it for every chromosomes

In [18]:
%%
for chr in {1..23}; /
do plink --bfile $chr --pheno ukb_alz.pheno --pheno-name alz_wt /
--assoc --allow-no-sex --out test_${chr}; /
done

UsageError: Cell magic `%%` not found.


Running plink but on the full toy data

In [19]:
%%
plink --bfile unrel_10k_EUR --pheno ukb_alz.pheno --pheno-name alz_wt /
--assoc --allow-no-sex --out test_full_data

UsageError: Cell magic `%%` not found.


Grepping only the parts of interessests for a Manhattan plot
CHR (1 chromosome), SNP (2), BP (3 base pair), P (4 P-value)

In [20]:
%%
awk '{if (NR>1) print $1, $2, $3,$9}' test_full_data.qassoc | grep -v NA > plot.full_data.txt

UsageError: Cell magic `%%` not found.


Unzipping

In [21]:
%%
gunzip ukb_alz.covs.gz

UsageError: Cell magic `%%` not found.


When running for too long,
create a file (tutojob f.ex)

In [23]:
%%
nano tutojob

UsageError: Cell magic `%%` not found.


Fill the file with the following commands + the job you want to be done

--linear: make a linear regression (again here --assoc is not possible since we are using covars)

--covar: asks for the covariates files

--memory: the memory use may be too small (here push from 2GB to 4GB)

In [25]:
%%
#!/bin/bash
#SBATCH -N 1
#SBATCH -t 48:00:00
#SBATCH --output="job%j.o"
#SBATCH --error="job%j.e"
cd $HOME/toydata/unrel_10K_EUR || exit
module load pre2019
module load plink/1.90b6.9
for i in {1..16}; do
(
plink --bfile $i --pheno ukb_alz.pheno --pheno-name alz_wt --linear --covar ukb_alz.covs --allow-no-sex --memory 4000 --out test_full_data_02
)&
done
wait

UsageError: Cell magic `%%` not found.


Run you job

In [26]:
%%
sbatch tutojob

UsageError: Cell magic `%%` not found.


See where your jobs using your username (here matthieu), you will be able to see the job ID

In [28]:
%%
squeue -u matthieu

UsageError: Cell magic `%%` not found.


Cancel your job

In [29]:
%%
scancel [jobid]

UsageError: Cell magic `%%` not found.


Script to concatenate multiple files together (here multiple chromosomes)

In [1]:
%%
head -1 FINAL_17.assoc.linear > head.txt # Take the headers and put them in a separate file
grep ADD FINAL_*.assoc.linear > data.txt 
cat head.txt data.txt > data1.txt

UsageError: Cell magic `%%` not found.


count elements of a file

In [2]:
%%
wc -l <file>

UsageError: Cell magic `%%` not found.


Exctract only the data that interest us (Chrom, SNP, BP, P)

In [4]:
%%
awk '{if (NR>1) print $2, $3, $4,$10}' data1.txt | grep grep -v NA > data2.txt

UsageError: Cell magic `%%` not found.
