Skip to content

Pipeline for running smartpca, ADMIXTURE and ALDER

Notifications You must be signed in to change notification settings

priyamoorjani/CCB293

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CCB293

Pipeline for running smartpca, ADMIXTURE and ALDER

Enter interactive mode and set up environment

srun --pty -A ic_ccb293 -p savio -t 01:00:00 bash -i
module load gcc/6.3.0
module load gsl
module load openblas
module load fftw
module load r/3.5.1

Datafiles and executables:

/global/home/users/sandrahui/ccb293/    # data
/global/home/users/sandrahui/ccb293/bin # executables

# copy folder to your home directory
cd ~/
cp -rp /global/home/users/sandrahui/ccb293 .

Smartpca

Documentation:

Population Structure and Eigenanalysis. https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.0020190 https://github.com/chrchang/eigensoft/blob/master/POPGEN/README

Command line:

cd ~/ccb293/classEx
./bin/smartpca -p classEx.smartpca.par

Input:

This program requires data in EIGENSTRAT format (See https://reich.hms.harvard.edu/software/InputFileFormats).

The program also requires a parameter file. See format below.

Parameter file arguments:

genotypename: classEx.geno     # file with genotype information
snpname:      classEx.snp      # file with snp information
indivname:    classEx.ind      # file with individual information   
poplistname:  classEx.pop.list # list of pops to include in the run.
evecoutname:  classEx.evec     # output file of eigenvectors.
evaloutname:  classEx.eval     # output file of all eigenvalues
phylipname:   classEx.phyl     # file with Fst values across populations 
numoutevec:   10               # number of PCs to output 
numthreads:   1                # if running interactively, use 1 only
numoutliter: 0                 # number of outlier inds to remove. 

Plot output

Plot the output in R using:

# start R 
R [enter]

# read output files
evecs <- read.table("classEx.evec")
colnames(evecs) <- c("sample", paste("PC", 1:10, sep=""), "pop")

# filter down to only 4 populations
evecs <- evecs[evecs$pop %in% c("YRI", "CEU", "CHB", "ASW"),]
evecs$pop <- factor(evecs$pop)

# plot
# png("classEx.PCA.png") # uncomment to save as png
# pdf("classEx.PCA.pdf") # uncomment to save as pdf
plot(x=evecs$PC1, y=evecs$PC2, col=factor(evecs$pop), xlab="PC1", ylab="PC2", pch=16)
legend("bottomright", legend=levels(evecs$pop), col=1:length(levels(evecs$pop)),pch=16)
# dev.off() # uncomment to save image

# quit R
q() [enter]

ADMIXTURE

Documentation:

Fast model-based estimation of ancestry in unrelated individuals.https://genome.cshlp.org/content/19/9/1655.full http://www.genetics.ucla.edu/software/admixture/admixture-manual.pdf

Command line:

# Set K=1..6
./bin/admixture --cv classEx.bed $K

Input:

This program requires input in plink format (http://zzz.bwh.harvard.edu/plink/data.shtml). Note, K = number of clusters. Run the command for each value of K of interest (for example, 1..6)

Plot output

To read in the ADMIXTURE results:

# start R
R [enter]

# read output files
admix_k1 <- read.table("classEx.1.Q")
admix_k2 <- read.table("classEx.2.Q")
admix_k3 <- read.table("classEx.3.Q")
admix_k4 <- read.table("classEx.4.Q")
admix_k5 <- read.table("classEx.5.Q")
admix_k6 <- read.table("classEx.6.Q")

# plot
# pdf("classEx.admixPlot.pdf") # uncomment to save as pdf
# png("classEx.admixPlot.png") # uncomment to save as png
par(mfrow=c(8,1), mar=c(1,4,1,1))
par(mfrow=c(6,1), mar=c(1,4,1,1))
barplot(t(as.matrix(admix_k1)), col=rainbow(1), border=NA)
barplot(t(as.matrix(admix_k2)), col=rainbow(2), border=NA)
barplot(t(as.matrix(admix_k3)), col=rainbow(3), border=NA)
barplot(t(as.matrix(admix_k4)), col=rainbow(4), border=NA)
barplot(t(as.matrix(admix_k5)), col=rainbow(5), border=NA)
par(mar=c(3,4,1,1))
x <- barplot(t(as.matrix(admix_k6)), col=rainbow(6), border=NA)

# add pop labels
inds <- c("ASW", rep("", 61), "CEU", rep("", 99), "CHB", rep("", 103), "TSI", rep("", 107), "YRI", rep("", 108))
mtext(inds, 1, at=x, las=1, adj=0)
# dev.off() # uncomment to save image

# Quit R
q() [enter]

ALDER

Documentation:

Inferring Admixture Histories of Human Populations Using Linkage Disequilibrium. http://www.genetics.org/content/193/4/1233.full https://github.com/priyamoorjani/CCB293/blob/master/ALDER.txt

Command line:

./bin/alder -p classEx.alder.par &> alder.log

Input:

This program requires data in EIGENSTRAT format (See https://reich.hms.harvard.edu/software/InputFileFormats). The input geno, snp, and ind files must be consistent.

The program also requires a parameter file. See format below.

Parameter file arguments:

genotypename: classEx.geno # file with genotype information
snpname:      classEx.snp  # file with snp information
indivname:    classEx.ind  # file with individual information   
admixpop:     ASW          # name of admixed population
refpops:      YRI;CEU      # name of reference (ancestral) population (seperated by semicolon).
checkmap:     NO           # check genetic map included in snp file.

Plot output

Included in the log file.

About

Pipeline for running smartpca, ADMIXTURE and ALDER

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published