eQTL mapping analysis cookbook (eQTLGen)
- QTL mapping pipeline
- Genotype Harmonizer
- Genotype IO
- ASE
- GADO Command line
- Downstreamer
- GeneNetwork Analysis
Analysis plans
Other
Clone this wiki locally
This is the cookbook for performing the eQTL analysis. This software allows for eQTL mapping using linear models and direct meta-analysis of such data through weighted Z-score analysis. This cookbook illustrates only a part of the capabilities of our software. See for other options the other parts of the wiki.
Questions or suggestions?
You can contact the authors of this manual and software at urmo.vosa @ gmail.com, patrickdeelen @ gmail.com, natalia.tsernikova @ gmail.com, bonder.m.j @ gmail.com, westra.harmjan @ gmail.com, dashazhernakova @ gmail.com or lude @ ludesign.nl for questions or suggestions regarding the software or the manual. This manual was written in cooperation with Marjolein Peters, Joyce van Meurs, Annique Claringbould and Pieter Neerincx.
Manual contents
- Step 1 - Preparation of expression data
- Step 2 - Preparation of genotype data
- Step 3 - Formatting expression data
- Step 4 - MixupMapper
- Step 5 - Removing principal components
- Step 6 - Performing the cis-eQTL analysis
- Step 7 - Getting the account to the sharing server
- Step 8 - Performing the trans-eQTL analysis
- Step 9 - Performing the Polygenic Risk Score eQTL analysis
- Step 10 - Uploading your result files
Downloading the software and reference data
You will need three programs to perform the QTL analysis.
- The PLINK toolcase (v 1.07), to calculate MDS components.
- The software to process and convert the genotype data, please download Genotype Harmonizer version 1.4.9.
- The actual QTL mapping software. The latest version for using is eQTLGen for both cis- and trans/PRS-eQTL phase is version 1.2.4F. This adds additional functionality for PRS-eQTL analysis and is available under the eQTLGen shared account: (Step 7 - Getting the account to the sharing server)
/pipeline/eqtl-mapping-pipeline-1.2.4F-SNAPSHOT-dist.tar.gz
.
Additionally you will need the following reference files:
- Giant genotype reference file, which can be downloaded here.
- The annotation files for the expression probes, which can be downloaded from the FTP. Please contact us if the annotation file for your platform is missing.
Before you start
In this manual we assume that you have basic knowledge on how to work with the java Virtual Machine (VM) and the command line. If you are not sure about working with the command line, please have a look at our introduction page and/or check the information about the Java VM (including the Java memory options). If you run into any problems with heapspace or out of memory errors please supply the -Xmx
and -Xms
command, see Java VM information. At each java command we supply the recomended settings.
Uploading the results
We use an FTP server to collect all the results and log files. You can obtain an account via patrickdeelen @ gmail.com. If you need any advice on how to work with a FTP server feel free to ask for help. Please also make sure to upload the log files of the steps where asked.
Step-by-step eQTL analysis
This is a step-by-step guide which will guide you through the eQTL mapping process using our software. Our general method consists of seven steps described below:
- Step 1 - Preparation of expression data
- Step 2 - Preparation of genotype data
- Step 3 - Formatting expression data
- Step 4 - MixupMapper
- Step 5 - Removing principal components
- Step 6 - Performing the cis-eQTL analysis
- Step 7 - Getting the account to the sharing server
- Step 8 - Performing the trans-eQTL analysis
- Step 9 - Performing the Polygenic Risk Score eQTL analysis
- Step 10 - Uploading your result files
Definitions
Throughout the manual, references to different full paths will be made. Here is an overview of these paths:
- The expression file will be referred to as
traitfile
- The probe/trait/gene annotation file will be referred to as
annotationfile
- The full path of your genotype data will be referred to as
genotypedir
- The file linking expression sample identifiers to genotype sample identifiers will be referred to as
genotypeexpressioncoupling
- The file containing covariates will be defined as
covariatefile
Descriptions of each of these files and their usage is detailed below, and their formats are described in the data formats section which can be found in the general QTL mapping documentation.
Step 1 - Preparation of expression data
Because our software uses a nonparametric test by default, you can use virtually any continuous data as trait values to map a variety of QTL effects. However, currently the normalization tools provided with this package are focused on array-based expression data, array-based (Illumina) expression data, and preprocessed RNA-seq data (e.g. gene or transcript level quantified data) or (GC)-RMA processed Affymetrix data.
Please format your expression data in a simple tab separated text file. The format of this file is described here: [Phenotype file, covariate file] (https://github.com/molgenis/systemsgenetics/wiki/File%20descriptions#phenotype-file-covariate-file).
eQTL mapping will require an annotation of the probes/traits/genes in your traitfile
. We store this annotation in a separate file. The format of this file is described here: Probe annotation file.
Illumina array-based expression data
Because Illumina has several ways to annotate and preprocess their gene expression arrays, we provide detailed instructions below, that will yield raw, untouched gene expression levels.
Please use the GenomeStudio files of your expression arrays.
❗ Important: Please do not use any normalization, transformation, imputation, or background correction method that is offered by Illumina’s GenomeStudio! Just load your raw data, without doing any of the above.
- Export a so-called ‘Final Report’ using GenomeStudio, including array_address_id (as the probe identifier) and the average expression signals per sample for all probes (click 'Analysis' in the top menu bar, then 'Reports').
- If you create more than one Final Report, merge the Final Reports, so that all data will be combined in one Expression Matrix File
- Rows should contain the different probes, and columns should contain the different sample IDs
- Remove any header information that Genome studio might produce: the header of the matrix is expected at the first row
- Create a new header (expected at the first row and column) for both rows and columns, describing the sample names (columns) and the probe IDs (rows). Note that you use the probe array address for your platform as the probe name.
- The final file should look like this format: [Phenotype file, covariate file] (https://github.com/molgenis/systemsgenetics/wiki/File%20descriptions#phenotype-file-covariate-file).
RNA-seq data
If your data contains RNA-seq data please contact us before starting to work with the expression data. It is possible to perform step 2.
Affymetrix array-based expression data
If your data contains Affymetrix data please contact us before starting to work with the expression data. It is possible to perform step 2.
Step 2 - Preparation of genotype data
Our software is able to use both unimputed called genotypes, as well as imputed genotypes and their dosage values. The software, however, requires these files to be in TriTyper format. We provide Genotype Harmonizer to harmonize and convert your genotype files. However, we first need to calculate MDS components to correct the expression data for population stratification.
Step 2A - Correction for population stratification.
You will need to correct the expression data for 4 MDS components during the gene expression normalization step. First we need to get our data into the PLINK format. It is recommend to use the binary PLINK format and you should therefore have these files: mydata.bed, mydata.fam and mydata.bim.
📌 Note: this step should be performed using the nonimputed autosomal SNPs. If you run into problems feel free to contact patrickdeelen @ gmail.com for advice. It is also recommended to use pruned results. Please have a look at the PLINK manual (http://zzz.bwh.harvard.edu/plink/summary.shtml#prune) on how to obtain this.
Because PLINK uses complete linkage agglomerative clustering based on pairwise identity-by-state (IBS) distance, we first need to execute this command:
plink --bfile mydata --genome
which creates the file plink.genome.
This file will have as many rows as there are unique pairs of individuals in the sample. For large samples with thousands of individuals, this file will take a considerable time to generate.
To obtain these MDS components, you can use a simple command using the PLINK tool to create a multidimensional scaling matrix. The command line will look like this:
plink --bfile mydata --read-genome plink.genome --cluster --mds-plot 4
This command creates the file plink.mds which contains one row per individual with the following fields:
- FID - Family ID;
- IID - Individual ID;
- SOL - Assigned solution code;
- C1 - Position on first dimension;
- C2 - Position on second dimension;
- C3 - Position on third dimension;
- C4 - Position on fourth dimension;
You can find more information on PLINK website
✅ Please upload your log file to the central server.
Step 2B - Convert data to the TriTyper fileformat.
For the eQTL mapping the data needs to be in TriTyper format. Using GenotypeHarmonizer one can harmonize and convert genotype formats. In this step we will directly harmonize, filter and convert the genotype data. The data is harmonized to be matched to the GIANT release of 1000G. You can either run this step per each chromosome separately or for all chromosomes at once.
❗ Important: If you are using imputed genotypes, make sure to use as an input format a file that contains the probabilities or dosages and not just genotype calls. In case of doubt please contact patrickdeelen @ gmail.com
📌 Note: please make sure to use verion 1.4.6, 1.4.7 or 1.4.9 of the Genotype Harmoinzer.
java -Xmx10g -Xms10g -jar ./GenotypeHarmonizer.jar -i {locationOfInputData} -I {InputType} -o {Outputlocation} -O TRITYPER -r {locationOf1000G-GiantVcfs} --refType VCF_FOLDER --update-id -cf 0.95 -hf 0.0001 -mf 0.01 -mrf 0.5 -ip 0
Input arguments: -i
location of the input data (without extension), -I
input type, -o
output location, -O
output type TRITYPER, -r
location of the reference data, --refType
type of reference data. See the Genotype Harmonization manual for further details on the input flags.
Note: it is not recommended to exclude samples at this step. This can easily be done at a later stage using the genotype to expression coupling file. Having all samples will help resolve possible sample mixups in step 4 more accurately.
✅ Please upload your log files to the FTP server.
If you ran the GenotypeHarmonizer per chromosome, you now need to merge all TriTyper folders per chromosome to one TriTyper folder containig data from all chromosomes. By using the following command you merge the individual TriTyper folders.
java -Xmx2g -Xms2g -jar ./eqtl-mapping-pipeline.jar --imputationtool --mode concat --in {folder1;folder2;folder3;ETC} --out {OutputFolder}
Input arguments:--in
is a semicolon-separated list of input TriTyper folders with information per chromosome, --out
is the output location of the merged TriTyper data. Note: in case of problems try putting quotes ("
) around your list of input folders
Great! You now have your genotype and expression data ready to go!
File Checklist
Before continuing, now is a good time to check whether your files are in the correct format and whether you have all the required files ready. Please check the following:
- All required TriTyper files are in your
genotypedir
. - The expression files are formatted properly.
- You have an
annotationfile
(you should have downloaded this from the FTP at the start). - The identifiers of samples are identicial in your expression and genotype files. If not, you should make a file to link the proper samples. This file is called a
genotypeexpressioncoupling
file, the format is described here: Genotype - phenotype coupling. This file is not necessary if the names are already matched.
Step 3 - Formatting expression data
Step 3A - Removing outliers from expression data
Before performing the actual normalization it is essential that we remove outlier samples. These can easily be detected by performing a PCA on the data without prior centring of the expression values.
java -Xmx10g -Xms10g -jar eqtl-mapping-pipeline.jar --mode normalize --in traitfile --out {tmpoutdir} --qqnorm --logtransform --adjustPCA --maxnrpcaremoved 0 --stepsizepcaremoval 0 2>&1 | tee {tmpoutdir}/outlierDetection.log
Use the command below to extract the first 2 principle components. Please plot these components. You might observe a few clear outliers: please remove these samples from the analysis (see instruction below). In case of doubt feel free to contact patrickdeelen @ gmail.com for support, please provide a plot of the first component.
zcat ***.QuantileNormalized.Log2Transformed.PCAOverSamplesEigenvectors.txt.gz | awk 'BEGIN {OFS="\t"} NR == 1 {print "Sample","Comp1","Comp2"} NR > 1 {print $1,$2,$3}' > pc1_2.txt
You can use the R code below to create a plot of these components and create a file with samples to include.
pc1_2 <- read.delim("pc1_2.txt")
plot(pc1_2$Comp1, pc1_2$Comp2, xlab = "Component 1", ylab = "Component 2")
#
## Please fill in the outlier cut-off below
#
outlierCutoff <- #
pdf("OutlierDetection.pdf")
palette(c("dodgerblue2", "red2"))
plot(pc1_2$Comp1, pc1_2$Comp2, xlab = "Component 1", ylab = "Component 2", col = as.numeric(pc1_2$Comp1 < outlierCutoff)+1, main = paste("Removed", sum(pc1_2$Comp1 < outlierCutoff), "expression outlier samples"))
abline(v=outlierCutoff, col = "red2")
dev.off()
## NB!: please check on which side of the outlierCutoff the outlier samples are:
# If outlier samples are on the left of the cutoff line run:
write.table(pc1_2$Sample[pc1_2$Comp1 >= outlierCutoff], file = "includedExpressionSamples.txt", row.names = FALSE, col.names = FALSE, quote = FALSE)
# If outlier samples are on the right of the cutoff line run:
# write.table(pc1_2$Sample[pc1_2$Comp1 <= outlierCutoff], file = "includedExpressionSamples.txt", row.names = FALSE, col.names = FALSE, quote = FALSE)
✅ Please upload your log file and the plot of the first two components to the FTP server.
After you uploaded the plot and the log to the FTP server, it is wise to remove most of the data in the tmpoutdir. The data, apart from the log and the pc1_2.txt, is no longer needed.
Step 3B - Expression data normalization
After removal of low quality samples we go for the actual normalization. Generally, continuous trait data need to be normalized prior to applying statistical testing. Our software package provides different ways of normalizing your data contained in the traitfile
.
Our general normalization strategy for Illumina array-based expression data consists of the following steps:
- Quantile normalization
- Log2 transformation
- Probe centering and scaling (Z-transform): (ExpressionProbe,Sample = ExpressionProbe,Sample - MeanProbe) / Std.Dev.Probe
- Removal of covariates - Correct gene expression data for the first 4 MDS of the GWAS data - to remove possible population stratification
- We run a Generalized Linear Model with the probes as dependent variables, and the GWAS PCs as orthogonal covariates. For the remainder of the analysis, we use the residuals of this model.
- Principal component adjustment
- In the Principal Component Analysis, the software tries to convert the normalized and standardized expression results into a set of values of uncorrelated variables called Principal Components (PCs). The number of PCs is equal to the number of samples, and the first PCs explain a higher variance than the last PCs. By adjusting the normalized and standardized expression data for a set of PCs (like we did with the removal of the covariates - use of a Generalized Linear Model), we try to remove batch effects in the data. Note: the PCs will be used in Step 5.
This step should be performed only on samples that are not marked as outliers in the previous step. To do this we need a file with the identifiers of the samples to include. Each sample identifier should be on a separate line and no header should be used. If you used the example R script you can use includedExpressionSamples.txt
or you can make it based on the first column in the pc1_2.txt
file.
Run these steps using the following command:
java -Xmx10g -Xms10g -jar eqtl-mapping-pipeline.jar --mode normalize --in traitfile --out {outdir} --sampleInclude {includeFile} --qqnorm --logtransform --centerscale --adjustcovariates --cov {covariatefile} --adjustPCA --maxnrpcaremoved 0 --stepsizepcaremoval 0 2>&1 | tee {outdir}/normalization.log
For details on the covariate file see this link. We want to use the PLINK MDS results as confounders for our analysis. Before you can use the PLINK MDS results you need to run this command: awk '{print $2,$4,$5,$6,$7}' < plink.mds > plinkMds.txt
.
[] (Additionally, this file can be used for correcting the expression data for other confounders, for example pedigree structure or batch effects. For that, just add the corresponding covariates to the columns of the file.)
❗ Please make sure that the sample identifiers in this file correspond to the sample identifiers in the expression data.
✅ Please upload your log file to the FTP server.
Check your output
Running the general normalization procedure yields a number of files in the directory of your traitfile
, or in the outdir
if you specified one. The default procedure will generate file suffixes listed below. Suffixes will be appended in the default order as described above. Selecting multiple normalization methods will add multiple suffixes. Please do not remove or replace ANY intermediate files from this normalization step as they may be used in subsequent steps.
Suffix | Description |
---|---|
SampleSelection | Data after selection of the samples. |
QuantileNormalized | Quantile Normalized trait data. |
Log2Transformed | Log2 transformed data. |
ProbesCentered | Centered probes. |
SamplesZTransformed | Z-transformed samples. |
CovariatesRemoved | Expression data adjusted for covariates. |
PCAOverSamplesEigenvalues | Eigenvalues created during eigenvalue decomposition of the expression sample correlation matrix (created from the Quantile Normalized, Log2 Transformed, Z-transformed data) |
PCAOverSamplesEigenvectors | Eigenvectors created during eigenvalue decomposition of the expression sample correlation matrix (created from the Quantile Normalized, Log2 Transformed, Z-transformed data) |
PCAOverSamplesEigenvectorsTransposed | Eigenvectors transposed |
PCAOverSamplesPrincipalComponents | Principal Components describing the sample correlation matrix (created from the Quantile Normalized, Log2 Transformed, Z-transformed data) |
Step 4 - MixupMapper
In our paper published in Bioinformatics (Westra et al.: MixupMapper: correcting sample mix-ups in genome-wide datasets increases power to detect small genetic effects) we have shown that sample mix-ups often occur in genetical genomics datasets (i.e. datasets with both genotype and gene expression data). Therefore, we developed a method called MixupMapper, which is implemented in the QTL Mapping Pipeline. This program performs the following steps:
- A cis-eQTL analysis on the dataset:
- using a 250 kb window between the SNP and the mid-probe position
- performing 10 permutations to control the false discovery rate (FDR) at 0.05
- Calculation of how well the expression data matches the genotype data. For details on this process please consult the paper.
Data checklist
- Note down the full path to your TriTyper genotype data directory. We will refer to this directory as
genotypedir
. - Determine the full path of the expression data produced in step 3, the file ending with
CovariatesRemoved.txt.gz
. We will refer to this path astraitfile
. - Locate your
annotationfile
. Also note down theplatformidentifier
, which is written in the first column of theannotationfile
. Illumina HT12-V3: "GPL6947", Illumina HT12-V4: "GPL10558", Illumina HT12-V4 WG-DASL: "GPLGPL13938". Please contact us if the annotation file for your platform is missing. - (Optional) Locate your
genotypeexpressioncoupling
if you have such a file. You can also use this file to test specific combinations of genotype and expression individuals. - Find a location on your hard drive to store the output. We will refer to this directory as
outputdir
.
Commands to be issued
The MixupMapper analysis can be run using the following command:
java -Xmx10g -Xms10g -XX:StringTableSize=319973 -jar eqtl-mapping-pipeline.jar --mode mixupmapper --in {genotypedir} --out {outdir} --inexp {traitfile} --inexpplatform {platformidentifier} --inexpannot {annotationfile} --snps {snpfile} (--testall) (--gte {genotypeexpressioncoupling}) 2>&1 | tee {outdir}/mixupmapping.log
It is more efficient to test a subset of SNPs as testing a full 1000G imputed dataset will take a lot of time. To do this you can use this snpfile and adding this flag: --snps {snpfile}
(remember to use the full path).
If your genotypes and/or expression data includes more samples than those which are matched based on sample names, it is beneficial to test all possible combinations. This can be performed using the following command line switch --testall
.
If you are running the software in a cluster environment, you can specify the number of threads to use (nrthreads
) by appending the command above with the following command line switch --threads nrthreads
(nrthreads should be an integer).
Note that the --gte
, --threads
and --snps
flags are optional, --gte
only applies if you are using a genotypeexpressioncoupling
file.
Check your output
MixupMapper is a two stage approach. Therefore the default procedure creates two directories in the outdir
you specified: cis-eQTLs and MixupMapping. Both directories contain a different set of output files described below.
cis-eQTLs directory
This directory contains output from a default cis-eQTL mapping approach. However, we are not interested in this folder, these files are only necessary to create the actual mixupmapping output which can be found in the MixupMapping directory.
MixupMapping directory
File | Description |
---|---|
Heatmap.pdf | This file contains a visualization of overall Z-scores per assessed pair of samples. The genotyped samples are plotted on the X-axis, and the expression samples are plotted on the Y-axis. The brightness of each box corresponds to the height of the overall Z-score, with lower values having brighter colours. Samples are sorted alphabetically on both axes. |
BestMatchPerGenotype.txt | This file shows the best matching trait (i.e. expression) samples per genotype: the result matrix (MixupMapperScores.txt) is not symmetrical. Therefore scanning for the best sample per genotype may yield other results than scanning for the best sample per trait. |
BestMatchPerTrait.txt | This file shows the best matching genotype sample per trait sample. |
MixupMapperScores.txt | This file contains a matrix showing the scores per pair of samples (combinations of traits and genotypes). |
In the BestMatchPerGenotype.txt and BestMatchPerTrait.txt files, you can find the best matching trait sample for each genotyped sample and vice versa:
- 1st column = genotyped sample ID, or trait sample ID dependent on file chosen (see above)
- 2nd column = trait sample originally linked to genotype sample ID in column 1, or genotype sample originally linked with trait sample in column 1
- 3rd column = the MixupMapper Z-score for the link between the samples in column 1 and 2
- 4th column = best matching trait (for BestMatchPerGenotype.txt) or best matching genotype (for BestMatchPerTrait.txt)
- 5th column = the MixupMapper Z-score for the link between the samples in column 1 and 4
- 6th column = this column determines whether the best matching trait or genotype is identical to the sample found in column 2
Example of BestMatchPerGenotype.txt
Trait OriginalLinkedGenotype OriginalLinkedGenotypeScore BestMatchingGenotype BestMatchingGenotypeScore Mixup Sample1 Sample1 -11.357 Sample1 -11.357 false Sample2 Sample2 -15.232 Sample2 -15.232 false Sample3 Sample3 -3.774 Sample4 -6.341 true Sample4 Sample4 -3.892 Sample3 -12.263 true
In this example there are two mix-ups indentified. If you don't have any mix-ups you are done, otherwise you need to resolve or remove these mix-ups. Please check this separate document for instructions on fixing mix-ups.
Always rerun the sample Mix-up Mapper after removing or changing sample IDs and check if you actually fixed/removed mix-ups instead of creating more!
✅ Please upload your logs file to the FTP server.
Step 5 - Removing principal components
Prior to QTL mapping, you will need to remove the first 20 expression components to increase the power to detect cis- and trans-QTLs. As we have seen that some PCs explain genetic variation, we perform our analysis on data only adjusted for PCs which explain no genetic variation. To identify the PCs that have no genetic association, we perform a QTL mapping on the principal component eigenvector matrix (PCAOverSamplesEigenvectorsTransposed.txt.gz). We call PCs genetically associated when they have a FDR of 0 (thus selecting significantly affected components only).
Data checklist
- Note down the full path to your TriTyper genotype data directory, created in step 2. We will refer to this directory as
genotypedir
. - Determine the full path of the expression data you want to use, the file created in step 3 before PC correction. The file name ends with "*.CovariatesRemoved.txt.gz".
- (Optional) Locate your
genotypeexpressioncoupling
if you have such a file. You can also use this file to test specific combinations of genotype and expression individuals. - Find a location on your hard drive to store the output. We will refer to this directory as
outputdir
.
As with MixupMapper, SNPs will only be tested with a minor allele frequency of > 0.05, a Hardy-Weinberg P-value > 0.001 and a call-rate > 0.95. Examples of these files can be found here: resources
❗ Important: We fixed a bug which caused an error during this step. Please make sure that you use at least version 1.2.4B of the QTL mapping pipeline, if necessary please download the new version from the locations mentioned in chapter 1.
Commands to be issued
Use the following command to run the mapping on the eigenvectors and to subsequently remove the PCs which are not under genetic control:
For eQTL mapping pipeline v1.2.4:
java -Xmx20g -Xms20g -XX:StringTableSize=10000019 -XX:MaxPermSize=512m -jar eqtl-mapping-pipeline.jar --metamode nongeneticPcaCorrection --in {TriTyper location} --inexp {traitfile} --gte {gtefile} --out {outdir} --adjustPCA --stepsizepcaremoval 20 --maxnrpcaremoved 20 --nreqtls 500000 2>&1 | tee {outdir}/removeExpressionComponents.log
For eQTL mapping pipeline newer than v1.4:
java -Xmx20g -Xms20g -XX:StringTableSize=10000019 -XX:MaxPermSize=512m -jar eqtl-mapping-pipeline.jar --mode util --nonGeneticPcaCorrection --in {TriTyper location} --inexp {traitfile} --gte {gtefile} --out {outdir} --adjustPCA --stepsizepcaremoval 20 --maxnrpcaremoved 20 --nreqtls 500000 2>&1 | tee {outdir}/removeExpressionComponents.log
Note that the --gte
switch is required for this step so you do need genotypeexpressioncoupling
file. Please make sure any sample mixups are resolved in this file. If the sample identifiers in your expression and genotype data are identical you can simple create one using the command below.
awk 'BEGIN {OFS="\t"}{print $1,$1}' < Individuals.txt > gte.txt
✅ Please upload your log file to the FTP server.
Check your output
After running the nongeneticPcaCorrection command, the outdir
will contain the output of a full QTL mapping on the eigenvectors for your dataset. At the location of your input expression data the program has written a new PC corrected expression file, which was corrected for all components but the ones under genetic control (FDR 0). So if the process has finished succesfully there should be a new file which ends with 20PCAsOverSamplesRemoved-GeneticVectorsNotRemoved.txt.gz
. This is the traitfile
which will be used in the subsequent cis-QTL mapping.
Step 6 - Performing the cis-eQTL analysis
❗ Please note the following
- For the steps below make sure to use version 1.2.4D, 1.2.4E or 1.2.4F of the eQTL mapping software.
- We have updated the probe mapping files, please use the files starting with: ProbeAnnotation6 for HT12v3 and ProbeAnnotation7 for HT12v4 and HT12-v4 WG-DASL. These files can be downloaded from the FTP server. There is no need to redo previous steps. However, please make sure you use the same file for the cis-eQTL mapping and the SNP-in-Probe detection.
The cis-QTL mapping can be performed using a single command. Standard settings for the cis-eQTL mapping are: HWEP > 0.0001, MAF > 0.01, and Call Rate > 0.95. For cis-eQTL mapping, the maximum distance between the SNP and the middle of the probe is 1,000,000 bp. To control for multiple testing, we perform 10 permutations, thereby shuffling sample labels to calculate the false discovery rate (FDR) controlled at 0.05.
Data checklist
- Note down the full path to your TriTyper genotype data directory, created in step 2. We will refer to this directory as
genotypedir
. - Determine the full path of the trait data you want to use, created in step 5. We will refer to this path as
traitfile
. - Locate your expression probe annotation file:
annotationfile
. Also note down theplatformidentifier
. Illumina HT12-V3: "GPL6947", Illumina HT12-V4: "GPL10558", Illumina HT12-V4 WG-DASL: "GPL13938". Please contact us if the annotation file for your platform is missing. - (Optional) Locate your
genotypeexpressioncoupling
if you have such a file. You can also use this file to test specific combinations of genotype and expression individuals. - Find a location on your hard drive to store the output. We will refer to this directory as
outputdir
.
Commands to be issued
The command needed to be issued for this cis-QTL analysis:
java -jar -Xmx40g -Xms20g -XX:StringTableSize=10000019 -XX:MaxPermSize=512m eqtl-mapping-pipeline.jar --mode metaqtl --in {genotypedir} --out {outdir} --inexp {traitfile} --inexpplatform {platformidentifier} --inexpannot {annotationfile} --cis --binary --perm 10 --maf 0.01 2>&1 | tee cisEQtlMapping.log
If you are running the software in a cluster or a multithreaded environment, you can specify the number of threads to use (nrthreads
) by appending the command above with the following command line switch --threads nrthreads
(nrthreads should be an integer).
Note that you should apply the --gte
switch if you are using a genotypeexpressioncoupling
file.
QTL Mapping output - Binary mode
This section describes the binary output files generated using the --binary
command line switch.
File | Description |
---|---|
Dataset.dat | Zscore matrix for the data analysis. |
Dataset-ColNames.txt.gz | Column names for the Zscore matrix. |
Dataset-RowNames.txt.gz | Row names for the Zscore matrix. |
Dataset-PermutationRound-n.dat | Zscore matrix for the permuted data analysis |
Dataset-PermutationRound-n-ColNames.txt | Column names for the permuted Zscore matrix. |
Dataset-PermutationRound-n-RowNames.txt | Row names for the permuted Zscore matrix. |
excludedSNPsBySNPFilter.txt.gz | SNPs excluded by the user. |
excludedSNPsBySNPProbeCombinationFilter.txt.gz | The file with no gene expression probes located within 250.000bp of this SNP position. |
ProbeQCLog.txt.gz | List of probes excluded from analysis. |
SNPQCLog.txt.gz | QC information for all tested SNPs, including MAF, HWE-pvalue and call-rate (which is always 1.0 for imputed SNPs). |
UsedSettings.txt | The settings used for the eQTL mapping. |
cisEQtlMapping.log | Log file of the mapping. |
✅ Please upload all the above mentioned result files
Checking SNPs with proxies within a probe
To make sure that you can later validate that an eQTL is not an artefact of proxy SNPs within the probe binding region we run the script below. The validCombinations file contains all SNP-probe combinations that are safe to analyse. Make sure you use the same mappingfile for the SNP-in-Probe detection and the cis-eQTL mapping!
java -jar -Xmx10g -Xms10g eqtl-mapping-pipeline.jar --metamode determineSnpProbList -dp 0.2 -r 0.2 -ri {genotypedir} -rt TRITYPER -pm 1 -ws 1000000 -i {annotationfile} -o {outdir}/validCombinations.txt 2>&1 | tee validCombinations.log
✅ Please upload the validCombinations file and the log file.
Cis-eQTL mapping completed
🎉 You are done with the cis-eQTL analysis. Please drop a mail to urmo.vosa @ gmail.com when your uploads are completed.
Step 7 - Getting the account to the sharing server
We will use UMCG Genomics Coordination Center .sftp server named: cher-ami.hpc.rug.nl to deliver your result files and provide you the files and tools needed for next steps.
To get an access to the upload server please follow the steps:
-
Generate the private-public key pair as documented here: http://wiki.gcc.rug.nl/wiki/RequestAccount
-
Send the e-mail to GCC Helpdesk (Helpdesk dot GCC dot Groningen at gmail dot com) and your FrankeSwertzLab collaborator (urmo.vosa @ gmail.com) asking the guest account, specifying the name of the cohort and including your public key. The format is as follows:
Dear GCC Helpdesk, I am working in the collaboration with FrankeSwertzLab, project is eQTLGen, contact is Urmo Võsa (cc’d). The name of the cohort is [ALSPAC/LIFE/Rotterdam, etc]. Please provide me: 1. guest account to the cher-ami SFTP server for three months. 2. access to the umcg-eqtlgen shared account. My public key is attached. Please cc: urmo.vosa@gmail.com and anniqueclaringbould@gmail.com to all the correspondence and provide them access to my guest account. Best regards, [Name] [Institution]
After getting the account you will gain the access to shared eQTLGen account (umcg-eqtlgen) where we supply files needed for running the trans-eQTL and PRS-QTL analyses.
The folder structure for the umcg-eqtlgen account is:
|--pipeline
|--eqtl-mapping-pipeline-1.2.4F-SNAPSHOT-dist.tar.gz # this pipeline version is used for the next steps
|--cis_eQTL_results
|--eQTLsFDR-ProbeLevel.txt.gz
|--eQTLsFDR-ProbeLevel.txt.gz.md5
|--ProbeAnnotation2_CorrectlyMapped_mapping_annotations_added_20170301.txt
|--README
|--trans_eQTL
|--AnnotationFiles
|--ProbeAnnotation6_CorrectlyMapped_HT12V3_Fixed.txt
|--ProbeAnnotation7_CorrectlyMapped_HT12V4_Fixed.txt
|--ProbeAnnotation7_CorrectlyMapped_HT12V4DASL_Fixed.txt
|--README
|--HelpFiles
|--ListOfGeneticRiskFactors_20161121.txt
|--Regress_cis_eQTLs_HT12v3.txt
|--Regress_cis_eQTLs_HT12v4.txt
|--Regress_cis_eQTLs_HT12v4_WGDASL.txt
|--DummySnpsFile.txt
|--README
|--PRS_QTL
|--SummaryStatisticsDatabase_freeze1b # folder with all the harmonised summary statistic files
|--SummaryStatisticsDatabase_freeze1b.md5 # please use this to check the integrity of files after download
|--GeneticRiskScoreCalculator-0.1.2c-SNAPSHOT # folder with PRS calculating tool
|--README
Download instructions are here: http://wiki.gcc.rug.nl/wiki/DataSharing
If the instructions below do not work for your cluster setup, please consult with the extended manual: https://github.com/molgenis/systemsgenetics/wiki/Using-sharing-server-for-eQTLGen-analyses
## one possibility is to use lftp, if this is installed in your environment
## use these commands to download data from cher-ami server
# go to the local folder where you want to run analysis:
cd local/analysis/folder/
# start lftp
lftp
# connect with the shared eQTLGen account:
lftp :~> open -u umcg-eqtlgen,none -p 22 sftp://cher-ami.hpc.rug.nl
# download needed folder:
lftp :~> mirror folder_needed_for_analysis
# exit from the remote server
lftp :~> exit
You will also get the separate guest account (named umcg-guest[0-9]) that will be used to upload your results files in Step 10.
Step 8 - Performing the trans-eQTL analysis
In the trans-eQTL mapping phase we will use the dataset processed by the previous steps of the pipeline.
- If your cohort was part of the cis-eQTL mapping phase please use the same genotype data and preprocessed expression data as in Step 6.
- If your cohort did not participate in the cis-eQTL mapping analysis, please follow the previous pipeline from Step 0 to Step 5, save the noted .log files and plots for later upload to our server. You can skip Step 6 and proceed to Step 7.
- If your cohort has expression data profiled by other platform than Illumina expression arrays, then we will provide you the instructions after trans-eQTL mapping is completed for Illumina-based cohorts.
Standard settings for the trans-eQTL mapping are: HWEP > 0.0001, MAF > 0.01, and Call Rate > 0.95. For trans-eQTL mapping, the minimum distance between the SNP and the middle of the probe equals 5,000,000bp. To control for multiple testing, we perform 10 permutations, thereby shuffling sample labels, to calculate the false discovery rate (FDR) at 0.05.
We have found that removing cis effects greatly enhances the power to detect trans effects. Consequently, our software provides a way to correct your traitfile
data for cis effects. Note down the full path of the file containing the cis-QTL effects to be removed. We will refer to this file as cisqtlfile
. The format of this file is identical to the QTL file.
Due to the limitations on the computational power and file sizes we test only preselected list of SNPs in this phase, consisting of EBI GWAS Catalogue and Immunobase (all GWAS P<=5e-8). We will refer to this file as snpsFile
.
❗ Please note the following
- For the steps below make sure to use version 1.2.4F of the eQTL mapping software.
- As for probe mapping files, please use the files starting with: ProbeAnnotation6 for HT12v3 and ProbeAnnotation7 for HT12v4 and HT12-v4 WG-DASL. These are available under eQTLGen account in cher-ami server (trans_eQTL/AnnotationFiles/)
Data checklist
- Note down the full path to your TriTyper genotype data directory, created in step 2. We will refer to this directory as
genotypedir
. - Determine the full path of the trait data you want to use, created in step 5. We will refer to this path as
traitfile
. - Locate your expression probe annotation file: annotationfile. Also note down the
platformidentifier
. Illumina HT12-V3: "GPL6947", Illumina HT12-V4: "GPL10558", Illumina HT12-V4 WG-DASL: "GPL13938". Please contact us if the annotation file for your platform is missing. - (Optional) Locate your
genotypeexpressioncoupling
if you have such a file. You can also use this file to test specific combinations of genotype and expression individuals. - Use the results of previous cis-eQTL meta-analysis to regress out the cis-eQTL effects. We will refer to this file as
cisqtlfile
and it is available under the umcg-eqtlgen account in the cher-ami server (trans_eQTL/HelpFiles/). Please use the file corresponding starting with 'Regress_cis_eQTLs' and ending with the name of your expression array platform (HT12v3, HT12v4, HT12v4_WGDASL). - Use the list of preselected SNPs for trans-eQTL analysis. This file is referred as
snpsFile
and is available under the umcg-eqtlgen account in the cher-ami server (trans_eQTL/HelpFiles/ListOfGeneticRiskFactors_20161121.txt). - Find a location on your hard drive to store the output. We will refer to this directory as
outdir
.
Commands to be issued
Use the following command to run the trans-eQTL analysis:
java -jar -Xmx40g -Xms20g -XX:StringTableSize=10000019 -XX:MaxPermSize=512m eqtl-mapping-pipeline.jar --mode metaqtl --in {genotypedir} --out {outdir} --inexp {traitfile} --inexpplatform {platformidentifier} --inexpannot {annotationfile} --trans --binary --perm 10 --maf 0.01 --snps {snpsFile} --regressouteqtls {cisqtlfile} 2>&1 | tee {outdir}/transEQtlMapping.log
❗ While you are waiting for the trans-eQTL analysis to complete, please already proceed with calculating the genetic risk scores in Step 9A.
QTL Mapping output - Binary mode
This section describes the binary output files generated using the --binary command line switch.
File | Description |
---|---|
Dataset.dat | Zscore matrix for the data analysis. |
Dataset-ColNames.txt.gz | Column names for the Zscore matrix. |
Dataset-RowNames.txt.gz | Row names for the Zscore matrix. |
Dataset-PermutationRound-n.dat | Zscore matrix for the permuted data analysis. |
Dataset-PermutationRound-n-ColNames.txt | Column names for the permuted Zscore matrix. |
Dataset-PermutationRound-n-RowNames.txt | Row names for the permuted Zscore matrix. |
excludedSNPsBySNPFilter.txt.gz | SNPs excluded by the user. |
excludedSNPsBySNPProbeCombinationFilter.txt.gz | The file with no gene expression probes located within 250.000bp of this SNP position. |
ProbeQCLog.txt.gz | List of probes excluded from analysis. |
SNPQCLog.txt.gz | QC information for all tested SNPs, including MAF, HWE-pvalue and call-rate (which is always 1.0 for imputed SNPs). |
UsedSettings.txt | The settings used for the eQTL mapping. |
transEQtlMapping.log | Log file of the trans mapping. |
check.md5 | md5 file Zscore matrix. |
check-PermutationRound-n.md5 | md5 file for permuted Zscore matrix. |
✅ Please upload all the above mentioned result files
This step produces the file ExpressionData.SampleSelection.QuantileNormalized.Log2Transformed.ProbesCentered.SamplesZTransformed.CovariatesRemoved.20PCAsOverSamplesRemoved-GeneticVectorsNotRemoved.txt.gz-EQTLEffectsRemoved.txt.gz into the same directory where your input expression data is. From this expression file, the cis-eQTL effect are removed and this is the expression input for the next step.
❗ If this file was not written, please check whether you used the correct version 1.2.4F of QTL mapping pipeline.
Step 9 - Performing the Polygenic Risk Score eQTL analysis
In this part of the pipeline we will calculate polygenic risk scores for all the individuals in your eQTL dataset, using publicly available GWAS summary statistics for traits. Those risk scores we will correlate with expression levels of all genes to identify novel trait-associated "hub" genes and pathways.
Step 9A
The software for calculating genetic risk scores is available under umcg-eqtlgen account in cher-ami server (PRS_QTL/GeneticRiskScoreCalculator-0.1.2c-SNAPSHOT). Please make sure to use v0.1.2c of the software. Usage instructions can be seen by the command:
java -jar GeneticRiskScoreCalculator-0.1.2c-SNAPSHOT/GeneticRiskScoreCalculator.jar
NB! This tool is tested and works with Java versions 7 and 8, please make sure that you are using one of those!
The command for PRS calculation:
java -Xmx40g -Xms40g -jar GeneticRiskScoreCalculator-0.1.2c-SNAPSHOT/GeneticRiskScoreCalculator.jar -gi {InputTriTyperFolder} -gt TRITYPER -i {SummaryStatisticsFolder} -o {OutputFolder} -p 5e-8:1e-5:1e-4:1e-3:1e-2 -r 0.1 -w 250000:5000000 -er 6:25000000-35000000 2>&1 | tee {OutputFolder}/PRSCalculation.log
- NB! If you rerun the job, please remove the output folder constructed by the last job before!
✅ Please upload PRSCalculation.log
Arguments:
-gi
- this is your genotype folder constructed in Step 2 (InputTriTyperFolder
), meaning it has to be harmonized to match GIANT 1000G p1v3.
-i
- this is folder consisting gzipped GWAS summary statistic files (SummaryStatisticsFolder
) and is available under cher-ami eQTLGen account (PRS_QTL/SummaryStatisticsDatabase_freeze1b/). These files are preprocessed to match GIANT 1000G p1v3.
❗Please note the following! Most of the datasets included were publicly available. The Terms and Conditions, when available on the corresponding web page of respective GWAS consortum, were checked and adhered. However, these files are meant to use in this project only.
Therefore, if you use those summary statistics in any other project than eQTLGen, it is your responsibility to make sure that you:
1. Adhere to the Terms and Conditions indicated on the respective consortium web page.
2. Cite correctly the respective study in all the resulting publications.
-o
- output folder of the genetic risk score calculation. This incorporates TT folder which consists scaled risk scores in TriTyper format. This folder will be used as input for Step 9B, analogously to trans-eQTL mapping.
-p 5e-8:1e-5:1e-4:1e-3:1e-2
- this flag specifies the GWAS significance thresholds used for constructing the risk score. For this project we construct five risk scores per trait (GWAS P: 5e-8, 1e-5, 1e-4, 1e-3, 1e-2).
-r 0.1
- this flag specifies the R2 threshold for clumping.
-w 250000:5000000
- this flags specifies the clumping strategy. Top SNP is defined by the lowest P-value or the highest absolute effect size if there are several equal P-values. In the first round of clumping all the SNPs in the LD (R2>0.1) are removed from the distance of 250 Kb from the top SNP. Second round of clumping removes all the remaining SNPs in LD (R2>0.1) in the extended distance of 5 Mb.
-er 6:25000000-35000000
- this flag removes the SNPs from HLA region (chr6:25Mb-35Mb) from the calculation of genetic risk scores.
Step 9B
In this step we perform PRS-eQTL analysis on PC-corrected expression matrix.
Data checklist
- Note down the full path to your TriTyper PRS data directory, created in step 9A. We will refer to this directory as
PRSdir
. - Determine the full path of the expression data you want to use, created in step 8. We will refer to this path as
traitfile_cis_removed
. - Locate your expression probe annotation file: annotationfile. Also note down the
platformidentifier
. Illumina HT12-V3: "GPL6947", Illumina HT12-V4: "GPL10558", Illumina HT12-V4 WG-DASL: "GPL13938". Please contact us if the annotation file for your platform is missing. - (Optional) Locate your
genotypeexpressioncoupling
if you have such a file. You can also use this file to test specific combinations of genotype and expression individuals. - Find a location on your hard drive to store the output. We will refer to this directory as
outputdir
.
Commands to be issued
Use the following command to run the PRS-eQTL mapping:
java -jar -Xmx40g -Xms20g -XX:StringTableSize=10000019 -XX:MaxPermSize=512m eqtl-mapping-pipeline-1.2.4F-SNAPSHOT/eqtl-mapping-pipeline.jar --mode metaqtl --in {PRSdir} --out {outdir} --inexp {traitfile_cis_removed} --inexpplatform {platformidentifier} --inexpannot {annotationfile} --trans --binary --hwe 0 --maf 0 --perm 10 2>&1 | tee {outdir}/PRSeQtlMappingAdjPC.log
Data checklist
- Note down the full path to your TriTyper genotype data directory, created in step 2. We will refer to this directory as
genotypedir
. - Determine the full path of the trait data you want to use, created in step 5. We will refer to this path as
traitfile
. - Locate your expression probe annotation file: annotationfile. Also note down the
platformidentifier
. Illumina HT12-V3: "GPL6947", Illumina HT12-V4: "GPL10558", Illumina HT12-V4 WG-DASL: "GPL13938". Please contact us if the annotation file for your platform is missing. - (Optional) Locate your
genotypeexpressioncoupling
if you have such a file. You can also use this file to test specific combinations of genotype and expression individuals.
The output files are named the same as indicated in Step 8 binary output file description.
✅ Please upload all the files in the output directory.
Step 9C
In this step we perform trans-eQTL and PRS-eQTL analyses on expression matrix not corrected for 20 first PCs. We are interested for those results to evaluate the principal component correction effect on the observed eQTLs.
trans-eQTL mapping on PC-uncorrected expression data
For that, we first have to rerun the trans-eQTL analysis using this time the output expression matrix from Step 3 ending with the *.CovariatesRemoved.txt.gz. This is by default in the same folder where PC-corrected expression matrix was written.
Data checklist
- Note down the full path to your TriTyper genotype data directory, created in step 2. We will refer to this directory as
genotypedir
. - Determine the full path of the trait data you want to use, created in step 3. We will refer to this path as
traitfile
. - Locate your expression probe annotation file: annotationfile. Also note down the
platformidentifier
. Illumina HT12-V3: "GPL6947", Illumina HT12-V4: "GPL10558", Illumina HT12-V4 WG-DASL: "GPL13938". Please contact us if the annotation file for your platform is missing. - (Optional) Locate your
genotypeexpressioncoupling
if you have such a file. You can also use this file to test specific combinations of genotype and expression individuals. - Use the results of previous cis-eQTL meta-analysis to regress out the cis-eQTL effects. We will refer to this file as
cisqtlfile
and it is available under the umcg-eqtlgen account in the cher-ami server (trans_eQTL/HelpFiles/). Please use the file corresponding starting with 'Regress_cis_eQTLs' and ending with the name of your expression array platform (HT12v3, HT12v4, HT12v4_WGDASL). - Use the list of preselected SNPs for the second trans-eQTL analysis. This file is referred as
snpsFile
and is available under the umcg-eqtlgen account in the cher-ami server (trans_eQTL/HelpFiles/ListOfGeneticRiskFactors_20161121.txt). - Find a location on your hard drive to store the output. We will refer to this directory as
outdir
.
Use the following command to run the trans-eQTL analysis:
java -jar -Xmx40g -Xms20g -XX:StringTableSize=10000019 -XX:MaxPermSize=512m eqtl-mapping-pipeline.jar --mode metaqtl --in {genotypedir} --out {outdir} --inexp {traitfile} --inexpplatform {platformidentifier} --inexpannot {annotationfile} --trans --binary --perm 10 --maf 0.01 --snps {snpsFile} --regressouteqtls {cisqtlfile} 2>&1 | tee {outdir}/NonPcTransEQtlMapping.log
✅ Please upload the all the files in outdir
, including NonPcTransEQtlMapping.log.
This step produces the file ExpressionData.SampleSelection.QuantileNormalized.Log2Transformed.ProbesCentered.SamplesZTransformed.CovariatesRemoved.txt.gz-EQTLEffectsRemoved.txt.gz into the same directory where your input expression data is. From this expression file, the cis-eQTL effect are removed and this is the expression input for the following command.
PRS-eQTL mapping on PC-uncorrected expression data
We then run again the PRS-eQTL analysis, using the expression matrix not corrected to PCs as an input.
Data checklist
- Note down the full path to your TriTyper PRS data directory, created in step 9A. We will refer to this directory as
PRSdir
. - Determine the full path of the expression data you want to use, created in step 9C. We will refer to this path as
traitfile_PCs_not_removed_cis_removed
. - Locate your expression probe annotation file: annotationfile. Also note down the
platformidentifier
. Illumina HT12-V3: "GPL6947", Illumina HT12-V4: "GPL10558", Illumina HT12-V4 WG-DASL: "GPL13938". Please contact us if the annotation file for your platform is missing. - (Optional) Locate your
genotypeexpressioncoupling
if you have such a file. You can also use this file to test specific combinations of genotype and expression individuals. - Find a location on your hard drive to store the output. We will refer to this directory as
outputdir
.
Commands to be issued
Use the following command to run the second PRS-eQTL mapping:
java -jar -Xmx40g -Xms20g -XX:StringTableSize=10000019 -XX:MaxPermSize=512m eqtl-mapping-pipeline-1.2.4F-SNAPSHOT/eqtl-mapping-pipeline.jar --mode metaqtl --in {PRSdir} --out {outdir} --inexp {traitfile_PCs_not_removed_cis_removed} --inexpplatform {platformidentifier} --inexpannot {annotationfile} --trans --binary --hwe 0 --maf 0 --perm 10 2>&1 | tee {outdir}/PRSeQtlMappingNotAdjPC.log
✅ Please upload all the files in the output directory.
Step 10 - Uploading your result files
If you are ready with the analyses, please upload the the data to the server to your guest account acquired in Step 7 (named umcg-guest[0-9]). Please upload all the output files, .log files and .md5 files from Step 8, Step 9A, Step 9B and Step 9C.
If you analysed several datasets, please organise your analysis results into separate folders (e.g. LIFEa1 and LIFEb3, etc.). Please organise your results into subfolders (i.e. PRS_eQTL_results_PC_corrected and PRS_eQTL_results_PC_uncorrected).
The example of folder structure is following:
|--LIFEa1
|--trans_eQTL_results
|--trans_eQTL_results_PC_corrected
|--trans_eQTL_results_PC_uncorrected
|--PRS_eQTL_results
|--PRS_eQTL_results_PC_corrected
|--PRS_eQTL_results_PC_uncorrected
|--LIFEb3
|--trans_eQTL_results
|--trans_eQTL_results_PC_corrected
|--trans_eQTL_results_PC_uncorrected
|--PRS_eQTL_results
|--PRS_eQTL_results_PC_corrected
|--PRS_eQTL_results_PC_uncorrected
The upload instructions are here: http://wiki.gcc.rug.nl/wiki/DataSharing
If the instructions below do not work for your cluster setup, please consult with the extended manual: https://github.com/molgenis/systemsgenetics/wiki/Using-sharing-server-for-eQTLGen-analyses
## one possibility is to use lftp, if this is installed in your environment
## use these commands to upload/download data from cher-ami server
# go to the local folder where data is stored:
cd local/folder/with/your/data/
# start lftp
lftp
# connect with your guest account:
lftp :~> open -u [your_guest_accountname],none -p 22 sftp://cher-ami.hpc.rug.nl
# make remote upload directories and subdirectories for trans-eQTL and PRS-QTL results. e.g:
lftp :~> mkdir LIFEa1
lftp :~> mkdir LIFEa1/trans_eQTL_results
lftp :~> mkdir LIFEa1/trans_eQTL_results/trans_eQTL_results_PC_corrected
lftp :~> mkdir LIFEa1/trans_eQTL_results/trans_eQTL_results_PC_uncorrected
lftp :~> mkdir LIFEa1/PRS_eQTL_results
lftp :~> mkdir LIFEa1/PRS_eQTL_results/PRS_eQTL_results_PC_corrected
lftp :~> mkdir LIFEa1/PRS_eQTL_results/PRS_eQTL_results_PC_uncorrected
# go to the folder where you upload the results. e.g:
lftp :~> cd LIFEa1/trans_eQTL_results
# use put command for uploading the individual files and/or mirror command for uploading whole directories
lftp :~> put individual_result_file_in_local_server
lftp :~> mirror -R result_folder_in_local_server
# exit from the remote server when uploaded all necessary files:
lftp :~> exit
- Please send additional e-mail to urmo.vosa @ gmail.com when the data has finished uploading.
- We will do the sanity check and you will get the feedback whether the upload of the data succeeded. By default, your access to cher-ami will be active for three months, but it can be extended if needed.
NB! In case of issues with uploading, please contact to GCC Helpdesk with cc: urmo.vosa @ gmail.com.
🎉 You are done with the trans-eQTL and PRS-eQTL analysis.