# Installing Ricopili

Please go through this document step by step.

# 1. Log to the Broad server

In [None]:
ssh ldomenec@login.broadinstitute.org

use UGER

ish -l h_vmem=8G (or the necessary memory)

# 2. Download Ricopili Dependencies (all necessary binaries and additional datafiles):

2.1. Download Ricopili Dependencies:

wget https://personal.broadinstitute.org/sripke/share_links/JeklRDhPD6FKm8Gnda7JsUOsMan2P2_Ricopili_Dependencies.1118b.tar.gz/Ricopili_Dependencies.1118b.tar.gz

wget https://personal.broadinstitute.org/sripke/share_links/JeklRDhPD6FKm8Gnda7JsUOsMan2P2_Ricopili_Dependencies.1118b.tar.gz/Ricopili_Dependencies.1118b.tar.gz.cksum.txt

Compare “cksum Ricopili_Dependencies.1118b.tar.gz” with content of “Ricopili_Dependencies.1118b.cksum.txt”

2.2. Test and configure Dependencies. Try if you can start the following binaries:

In [None]:
./eagle/eagle 	(version in this package 2.3.5)
./eigensoft/EIG-6.1.4/bin/smartpca.perl
./impute_v2/impute2 	(version 2.3.2)
./impute_v4/impute4.r265.2
./metal/metal 	(version 2011-03-25)
./liftover/liftOver
./plink/plink	(version v1.90b6.4 64-bit)
./shapeit3/shapeit3.r884.1
./shapeit/shapeit.v2.r837.linux.x86_64
./Minimac3/Minimac3 	(version 2.0.1)
./tabix/tabix	(version 74bcfd7-dirty)
./bgzip/bgzip	(version 74bcfd7-dirty)
./bcftools/bcftools-1.9_bin/bcftools	(version 1.9)

2.3. Get file human_g1k_v37.fasta in place (download this file, then extract and put into directory bcftools/resources)

In [None]:
- cd bcftools/resources
- wget http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz
- gunzip human_g1k_v37.fasta.gz
- cd -

2.4. Install LDscore

a) Install "conda" for software package management (already installed at Broad server)

b) Follow instructions on LDSC Github page (https://github.com/bulik/ldsc):

- Downlowad ldsc

In [None]:
git clone https://github.com/bulik/ldsc.git

In [None]:
cd ldsc

- In order to install the Python dependencies, you will need the Anaconda Python distribution and package manager. 

In [None]:
use .anaconda2-5.3.1

Comments here:

last version at June 26th 2020 is ".anaconda3-5.3.1", but it looks like "munge_sumstats.py" (that we will need afterwards) is not compatible with python 3)

the default version invoked when running "use Anaconda3" failed when compiling pandas

- After installing Anaconda, run the following commands to create an environment with LDSC's dependencies:

In [None]:
conda env create --file environment.yml 

In [None]:
source activate ldsc

- Once the above has completed, you can run:

In [None]:
./ldsc.py -h
./munge_sumstats.py -h

to print a list of all command-line options. If these commands fail with an error, then something as gone wrong during the installation process.

Short tutorials describing the four basic functions of ldsc (estimating LD Scores, h2 and partitioned h2, genetic correlation, the LD Score regression intercept) can be found in the wiki. If you would like to run the tests, please see the wiki.

- Activate ldsc

In [None]:
source activate ldsc

2.5. R and R libraries

Ricopili needs the "rmeta" library for R. To test for that, you need to open R.|

In [None]:
use R-3.5
or
use R_program_name_you_need (to know which R programs are available, write "use -l")


Then run:

In [None]:
library(rmeta)

Or install it before if you don't have it:'

In [None]:
install.packages("rmeta")
library(rmeta)

Later in the Ricopili installation you will need the path for the installed package , so keep note of it. Type this command within R to get the path:

In [None]:
.libPaths()

On top of keeping note of this path, make sure to make it “readable” for others, so that future users in the same cluster environment can use your custom-file as a template as well as re-use your dependency downloads.

2.6. TeX Live / pdflatex

- Checkout if pdflatex is installed on your system with:

In [None]:
pdflatex --help

If you get output then this section is finished for you. If you get something like "-bash: latex: command not found" you should follow these instructions to get it installed:


- Create subdirectories "local" and "local/bin" in your $HOME directory if not already there.

- in a directory of your choice: Download install-tl-unx.tar.gz from https://www.tug.org/texlive/acquire-netinstall.html and unpack (tar -xvf ….)

- From within the new unzipper directory run the ./install-tl script.

2.7. Online Reference of binaries used in Ricopili (no action needed here):

METAL - http://www.sph.umich.edu/csg/abecasis/Metal/download/
TABIX - https://sourceforge.net/projects/samtools/files/tabix/
Bfctools - http://www.htslib.org/download/ (set bgziploc, bcrloc)
IMPUTE4 - https://jmarchini.org/impute-4/ (set i4loc)
Minimac3 - https://genome.sph.umich.edu/wiki/Minimac3
SHAPEIT3 - https://jmarchini.org/shapeit3/
PLINK2 - https://www.cog-genomics.org/plink2/
EAGLE - https://data.broadinstitute.org/alkesgroup/Eagle/downloads/
IMPUTE2 - https://mathgen.stats.ox.ac.uk/impute/impute_v2.html
EIGENSOFT - http://www.hsph.harvard.edu/alkes-price/software/
Liftover - http://genome.ucsc.edu/cgi-bin/hgLiftOver
LDSC - https://github.com/bulik/ldsc
Statistics-Distributions-1.02 - https://www.cpan.org/authors/id/M/MI/MIKEK/Statistics-Distributions-1.02.tar.gz
Spreadsheet-WriteExcel - https://metacpan.org/release/Spreadsheet-WriteExcel

2.8. Mail

If you don’t have the program “mail” in your path, ricopili might fail.

Here is how you can create a “fake” binary, that won’t send you emails but will protect ricopili from dying along the way (assuming ~/bin is included early in your $PATH):

In [None]:
mkdir -p ~/bin && ln -sfT /bin/true ~/bin/mail

2.9. Python 2.7 

On some systems you might have to add some python modules. The following is a system specific solution from Robert Karlsson on Uppmax/bianca. Please consult your IT helpdesk or other sources for installing these modules on your system. In the custom file from Robert Karlsson (see Example Values from Uppmax in the google spreadsheet: https://docs.google.com/spreadsheets/d/1LhNYIXhFi7yXBC17UkjI1KMzHhKYz0j2hwnJECBGZk4/edit#gid=255132922) you will see how this is then embedded into the custom-file. 

On the internet-connected server (rackham), do this:

module add python/2.7.11

mkdir -p python_modules

pip install --target python_modules --no-deps bitarray==0.8 pandas==0.20 pybedtools==0.7 pysam==0.15

tar czf ricopili_python_modules_`date +%y%m%d`.tar.gz python_modules


Then unpack the resulting tar.gz file in the ricopili dependencies folder on the secure server (bianca)


2.10. Other Users

If you have successfully downloaded / installed the Ricopili Dependencies make sure to make the Dependency directory recursively readable/executable for all.

In [None]:
chmod -R go+rx *

# 3. Ricopili Scripts

3.1. Configure Ricopili Scripts

Start the script rp_config. Please read this footnote () since you can start with some historically successful configuration files for various computer cluster environments. 

In [None]:
./rp_config

If Ricopili is already installed in the system under your account, it will ask you if you wish to unset the Ricopili PATH settings first. For first time custom installation it is highly recommended to do so. The configuration script will give you the two commands you have to issue. You just need to copy/paste them into the command line. 

If the configuration script cannot find a configuration file (by default the script is looking for a file named rp_config.custom.txt) an empty file is created, that needs to be filled by you and/or a system-administrator with the knowledge gained in the previous chapters.

This file follows a two column structure, where variable-names are found in the first column and variable-values in the second. “###” means comments, everything after that is discarded.

- Whitespace can be as long as necessary
- Spaces are not allowed. Please use term _SPACE_ if needed.
- The Template file will have no values, Here you find a commented version of this file with example parameters:

https://docs.google.com/spreadsheets/d/1LhNYIXhFi7yXBC17UkjI1KMzHhKYz0j2hwnJECBGZk4/edit#gid=255132922

- If you don’t find a solution to very special environments please do not hesitate to contact us (we are happy to help)
- If you created a successful custom-file for your environment, please share back with us, so we can make them available to the community
- If you don’t want to use a scheduler, please take any value (e.g. SERIAL) for the second half of the file. If you use --serial and --sepa for all Ricopili modules these flags will be discarded.
- Every Variable needs to have a value in the second column. If it doesn’t apply (e.g. no job scheduler, or no need for LDScore) you still need to provide mock values (NA or SERIAL).

After (rp_config.custom.txt) has been filled by you and/or the administrator you need to restart rp_config and choose “custom” again. The configuration script will configure ricopili now based on the information provided in rp_config.custom.txt.
If configuration ran successfully (after Testing Ricopili below), you can re-use your customfile for any future Ricopili version.

For some users it might be necessary to use another directory for the file ricopili.conf (by default ricopili chooses the $HOME directory). You can change that with flag --rphome.

You can use the term DEPDIR in your customfile and then replace all occurrencies of this term with the flag --depdir. This is useful for the dependency directory, which usually gets used multiple times.




In [None]:
When running ./rp_config (in ~/Ricopili/rp_bin) make sure that you are in the cluster after doing:
use UGER
use ish
use .anaconda2-5.3.1

The first time, it will ask to copy this at the end of your .my.bashrc file:

#Ricopili
PATH=/home/unix/ldomenec/Ricopili/rp_bin:$PATH
PATH=/home/unix/ldomenec/Ricopili/rp_bin/pdfjam:$PATH
export rp_perlpackages=/home/unix/ldomenec/Ricopili_Dependencies.1118b/perl_modules
export RPHOME=/home/unix/ldomenec

In [None]:
After doing this, close the terminal tab, open a new one, start again with:
use UGER
use ish
use .anaconda2-5.3.1

After that, run this again:

If everything is correct, you the terminal will return you a message like this:

-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
-------------------------   RP_CONFIG    ------------------------------------
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
 Script to ease Downloading, Configuration, Installation of Ricopili
-----------------------------------------------------------------------------
        please use flag --help if you are not familiar with its usage
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------


Using the following cluster: custom
in combination with rp_config.custom.txt

<variable_name> <variable_value>
<rp_dependencies_dir> </home/unix/ldomenec/Ricopili_Dependencies.1118b>
<R_packages_dir> </broad/software/free/Linux/redhat_6_x86_64/pkgs/r_2.14.0/lib64/R/library>
<starting_R> <source_SPACE_/broad/software/scripts/useuse;_SPACE_use_SPACE_R-2.15;_SPACE_R>
<path_to_Perlmodules> </home/unix/ldomenec/Ricopili_Dependencies.1118b/perl_modules>
<path_to_scratchdir> </home/unix/ldomenec/Riopili>
<starting_ldsc> <source_SPACE_/broad/software/scripts/useuse;_SPACE_use_SPACE_.anaconda-2.1.0-no-mkl;_SPACE_python_SPACE_/home/unix/ldomenec/Ricopili_Dependencies.1118b/ldsc>
<ldsc_reference> </home/unix/ldomenec/Ricopili_Dependencies.1118b/ldsc>
<rp_user_initials> <lds>
<rp_user_email> <ldomenechsalgado@mgh.harvard.edu>
<rp_logfiles> </home/unix/ldomenec/Ricopili>
<batch_jobcommand> <qsub>
<batch_memory_request> <-l_SPACE_h_vmem=XXXg>
<batch_walltime> <-l_SPACE_h_rt=HH:MM:SS>
<batch_array> <-t_SPACE_1-XXX>
<batch_max_parallel_jobs_per_one_array> <-tc_SPACE_YYY>
<batch_jobfile> <XXX>
<batch_name> <-N_SPACE_XXX>
<batch_stdout> <-o_SPACE_XXX>
<batch_stderr> <-e_SPACE_XXX>
<batch_job_dependency> <-hold_jid_SPACE_XXX>
<batch_array_task_id> <$SGE_TASK_ID>
<batch_other_job_flags> <NONE>
<batch_job_output_jid> <Your_SPACE_job-array_SPACE_XXX.1-YYY:1_SPACE("ZZZ")_SPACE_has_SPACE_been_SPACE_submitted>
<batch_ncores_per_node> <NA>
<batch_mem_per_node> <NA>


You are using the following shell (is this correct? if not please stop with ctr-c): bash

 - bash detected - 

Required directories found in search path:
	rp_bin/ -- success
	rp_bin/pdfjam/ -- success

Detected latex is installed.

Detected rp_perlpackages as an environmental variable.

Detected RPHOME as an environmental variable.

Using the following default scratch directory:
	/home/unix/ldomenec/Riopili


Creating pipeline status file to /home/unix/ldomenec/Ricopili/preimp_dir_info
Creating pipeline status file to /home/unix/ldomenec/Ricopili/impute_dir_info
Creating pipeline status file to /home/unix/ldomenec/Ricopili/pcaer_info
Creating pipeline status file to /home/unix/ldomenec/Ricopili/idtager_info
Creating pipeline status file to /home/unix/ldomenec/Ricopili/repqc2_info
Creating pipeline status file to /home/unix/ldomenec/Ricopili/areator_info
Creating pipeline status file to /home/unix/ldomenec/Ricopili/merge_caller_info
Creating pipeline status file to /home/unix/ldomenec/Ricopili/postimp_navi_info
Creating pipeline status file to /home/unix/ldomenec/Ricopili/reference_dir_info
Creating pipeline status file to /home/unix/ldomenec/Ricopili/test_info
Creating pipeline status file to /home/unix/ldomenec/Ricopili/clumper_info
-------------------------------------------------------------------
------------- here some tips        -------------------------------
-------------------------------------------------------------------

adding these commands to your ~/.bashrc can be very helpful(you have to logout and login again for these to be in effect)

## for colored output of ls:
alias ls='ls --color=auto'

## for easy copy over to your local machine:
alias c='sed "s#.*#scp ldomenec@login03.broadinstitute.org.broadinstitute.org:$(pwd)/& .#"'

## for list of currently running jobs:
alias q='bjobs -w'

## different prompt:
PS1="ldomenec@login03.broadinstitute.org.broadinstitute.org:"'\w'" "



*********************************************************************
*****  Installation successful                    *******************
*********************************************************************
*********************************************************************
*********************************************************************
*****  you should be able to use the pipeline as described        ***
*****  if you want to use it from a different login:              ***
*****    please logout and relogin so that PATH is set correctly  ***
*****    or if you want to use ricopili in this session, please   ***
*****       run the following from the command line               ***
*********************************************************************
*********************************************************************


If you do not receive a success email with the subject rp_config, please check your email address is entered correctly at /home/unix/ldomenec/ricopili.conf

3.2 Testing Ricopili

Before you start analyzing real data it might be worth using the testing module since it tests for common installation errors.

3.2.1 Preparing the testing environment:

Please take a new empty directory (preferably not below the installation directory).

Copy over (or symlink) the testing daner files from the subdir testing_data (subdir within the dependency download):
- PGC_cohort1.ch.fl.r4.gz
- PGC_cohort2.ch.fl.r4.gz
- PGC_cohort3.ch.fl.r4.gz
- PGC_cohort4.ch.fl.r4.gz
- PGC_meta.r4.gz

(The testing_data folder is in:
/home/unix/ldomenec/Ricopili_Dependencies.1118b/testing_data)

3.2.2. Start the testing module with

In [None]:
rp_test_navi

#When doing that, I got the following error:

    LaTeX package pdfpages.sty is not installed
    
Ricopili instructions already say: 

    "A possible problem with pdfjoin (combining and arranging PDFs is that you get an error like this): ERROR: LaTeX package pdfpages.sty is not installed. We are working on finding a local solution that you can solve by yourself, but meanwhile ask your system admins to install this package systemwide, that probably solves this problem: texlive-latex-extra"

So, I run this command in my directory:

tlmgr install pdfpages


(I couldn't do 

sudo apt-get install texlive-latex-recommended

because I had no permissions).


Now is running and I obtained the following message:

------------------------------------------------------------
20 jobs successfully submitted

please see tail of /home/unix/ldomenec/Ricopili/test_info for regular updates

also check bjobs -w for running jobs

possibly different command on different computer cluster: e.g. qstat -u USER

you will be informed via email if errors or successes occur

------------------------------------------------------------


#############################################################################
 When starting again, make sure you activate everything (ldsc, for example!)
        Make sure that you give enought memory for the testing-module
#############################################################################

# 4. Troubleshooting

BUT IT STOPPED RUNNING!!! THERE WERE SOME PROBLEMS THAT I SOLVED FOLLOWING THESE STEPS:

1. There was a problem with pdflatex/pdfjam/pdfjoin


I had to change pdfjam.config (in /home/unix/ldomenec) adding the path to the pdflatex installed in my direcotry like this:

###############################################################

##UNUSUAL TEX INSTALLATION, OR SPECIFIC LATEX WANTED?

##Specify the full path to the 'latex' to be used.
##Examples:

#latex='/usr/bin/pdflatex'     ## typical unix installation
#latex='/usr/texbin/pdflatex'  ## for MacTeX on Mac OS X?
#latex='C:/texmf/miktex/bin/pdflatex.exe'    ## Windows??

#latex='/usr/bin/xelatex'     ## if you want xelatex
#latex='/usr/bin/lualatex'    ## if you want lualatex

latex='/home/unix/ldomenec/texttlive/bin/x86_64-linux/pdflatex'

###############################################################



Then I changed the configpath from the pdfjam script (home/unix/ldomenec/Ricopili/rp_bin/pdfjam), specifying the directory where the pdfjam.config that I've just modified was.

I changed the configpath like this:

#########################################################################
                                                                     
pdfjam: A shell-script interface to the "pdfpages" LaTeX package                                                                                                                        
Author: David Firth (http://go.warwick.ac.uk/dfirth)               
                                                                     
Usage: see http://go.warwick.ac.uk/pdfjam or "pdfjam --help"       
                                                                  
Relies on:                                                         
-- pdflatex                                                        
-- the 'pdfpages' package for LaTeX (ideally version >= 0.4f)      
                                                                    
License: GPL version 2 or later.  This software comes with         
ABSOLUTELY NO WARRANTY of fitness for any purpose at all; if you   
do not accept that, then you must not use it.                      
                                                                   
The path searched for site-wide configuration files can be set     
by editing the following variable:                                 
                                                                   
#configpath='/etc:/usr/share/etc:/usr/local/share:/usr/local/etc'   

configpath='/home/unix/ldomenec/'                                    

Nothing else in this file should need to be changed.               
                                                                   
#########################################################################


Then I had to install pdflscape (it was missing, and producing an error) doing:

$tlmgr install pdflscape

2. Now LDSC is not working.

However, I do have now an error with the next step (never ending...). The next step of the testing script is the ldsc. If you remember (or see in the first messages), to install Ricopili I had to install first ldsc, and I was not able to do it. You explained me how to do it, and one of the key things was using anaconda2-5.3.1, because munge_sumstats.py is not compatible with python 3.

Now, the error that I have is this one:

Error with munging: source /broad/software/scripts/useuse; use .anaconda2-5.3.1; python /stanley/scharf_lab_storage/ldomenec/ricopili_laura/Ricopili_Dependencies/ldsc/munge_sumstats.py --sumstats PGC_meta.r4.gz  --daner --out PGC_meta.r4.gz.ldsc --merge-alleles /stanley/scharf_lab_storage/ldomenec/ricopili_laura/Ricopili_Dependencies/ldsc/w_hm3.snplist

 

And I don't if there's something wrong with the python that the system is using (I'm a little bit confused with this). So I tried to understand it. I did

use UGER
ish -l h_vmem=4G
use UGER
source activate ldsc #in Ricopili instructions they say to activate ldsc environment (which I already had) and activate ldsc
use .anaconda2-5.3.1

When I do 'echo $PATH' I obtain:
/broad/software/free/Linux/redhat_7_x86_64/pkgs/anaconda2_5.3.1/bin:/broad/uge/8.5.5/bin/lx-amd64:/home/unix/ldomenec/.conda/envs/ldsc/bin:/broad/software/free/Linux/redhat_7_x86_64/pkgs/anaconda3_5.3.1/condabin:/stanley/scharf_lab_storage/ldomenec/ricopili_laura/Ricopili/rp_bin/pdfjam:/stanley/scharf_lab_storage/ldomenec/ricopili_laura/Ricopili/rp_bin:/home/unix/ldomenec/PRIMUS_v1.8.0/bin:/stanley/scharf_lab_storage/ldomenec/picopili/bin:/home/unix/ldomenec/EIG-master/bin:/home/unix/ldomenec/texttlive/bin/x86_64-linux:/home/unix/ldomenec/google-cloud-sdk/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/puppetlabs/bin:/home/unix/ldomenec/bin

and I see 2 anaconda paths. I don't know if maybe this is what's giving me the error... Could this be the problem?




So... I did uninstall environment ldsc conda

rm -r /home/unix/ldomenec/.conda/envs/ldsc

rm /home/unix/ldomenec/.conda/environments.txt (take care if you have more than one environment! It will remove all, you can remove only the line of ldsc instead)



And in .bashrc I commented all the part referring to conda environment, as it was defining another version (not) as base:

#>>> conda initialize >>>
#!! Contents within this block are managed by 'conda init' !!
#__conda_setup="$('/broad/software/free/Linux/redhat_7_x86_64/pkgs/anaconda3_5.3.1/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
#if [ $? -eq 0 ]; then
#eval "$__conda_setup"
#else
#if [ -f "/broad/software/free/Linux/redhat_7_x86_64/pkgs/anaconda3_5.3.1/etc/profile.d/conda.sh" ]; then
#. "/broad/software/free/Linux/redhat_7_x86_64/pkgs/anaconda3_5.3.1/etc/profile.d/conda.sh"
#else
#export PATH="/broad/software/free/Linux/redhat_7_x86_64/pkgs/anaconda3_5.3.1/bin:$PATH"
#fi
#fi
#unset __conda_setup

BEFORE:

conda info --envs

#conda environments:

base                  *  /broad/software/free/Linux/redhat_7_x86_64/pkgs/anaconda3_5.3.1
ldsc                     /home/unix/ldomenec/.conda/envs/ldsc


AFTER COMMENTING THE CONDA SECTION IN .bashrc:

conda info --envs
#conda environments:

base                  *  /broad/software/free/Linux/redhat_7_x86_64/pkgs/anaconda2_5.3.1
ldsc                     /home/unix/ldomenec/.conda/envs/ldsc


I then did:

use UGER
ish -l h_vmem=4G
use UGER
source activate ldsc #in Ricopili instructions they say to activate ldsc environment (which I already had) and activate ldsc
use .anaconda2-5.3.1

And then:
rp_test_navi --serial --sepa INT


#####And it worked! :D##### 

(btw, if you don't activate ldsc, it doesn't work!)