In [1]:
# You can ignore the stuff on the next lines, its related to making the formatting nicer.
import os
os.environ['COLUMNS'] = '110'

# Getting Started

<!-- [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/mennowitteveen/pgsbenchmark/main?labpath=nbs/PPB-demonstration.ipynb) -->

<br>
<div style="font-size:20px; line-height:1.4;">
    
<!-- [![Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mennowitteveen/pgsbenchmark/blob/main/nbs/PPB-demonstration.ipynb) -->
<!-- or if the Colab messages *"Warning: This notebook was not authored by Google."* scared you, you can alternatively click the following button for a similar cloud experience (=slower though): -->
<!-- [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/mennojw/prstools-release/main?labpath=tutorials/01_getting_started.ipynb)  -->

[![Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mennojw/prstools-release/blob/main/tutorials/01_getting_started.ipynb)  <-- You can run exactly this notebook by clicking this button.

<b> Notes: </b>
  - **Remove the '!' in front of the command, if you are running it in a terminal/commandline.**
  - **This tutorial was created for Linux/MacOS systems. The code works on windows, but examples here need to be changed somewhat**
    
Hi there! <br>

This getting-started is part of the prstools tutorial that will let you
generate polygenic risk scores in no time.
It is basically

`prstools` provides a powerfull and convenient way for your to create polygenic risk scores.
It was created with python, but does not require you to have any python skills and it works mostly via the command line.
When you do something with `prstools` you will get adequite and quick feedback on what is happening, 
and it tell you what you need to do to fix things if something goes wrong. Hence, it's ok to press enter often to see where things stand. The tool will let you know when things are missing and/or need to be fixed in a context dependent manner, in order to minimize busy-work [sighs]. Additionally, there are many checks happening behind the scence to make sure the inputs are right and that the creation of the PRS is happening correctly. This means that even the PRS-uninitialed can build powerfull polygenic risk scores in no time. For the ones who might be interested, prstools can also be used inside of a python environment and is designed for that.<br>
Every `prstools` version has been tested and benchmarked before uploading.
<!-- [various checks and context dependent warning and instructions to make sure PRSs are generated in the right way, 
    and minimize busy-work, so you can get on with your day.] -->

This getting started guide is centered around examples showing you interactively how to use this tool using real data.
There is also documentation that goes through what all things in the tool do (basically a web version of the commandline --help).
    
For this tutorial we will assume you ran `pip install -U prstools`, probably in a conda enviroment, and that that worked out just fine.
If it did you should now be able to run the following basic example.<br>
<br>
<i>Note: If one uses !something in a jupyter notebook (like this one) this means you will be running a bash/commandline command (for instance !ls for a directory). </i>
</div>

In [6]:
!pip install -U prstools # Run this line if you have not yet installed prstools (without '!' in front if in terminal)

## Basic Example

<br>
<div style="font-size:20px; line-height:1.4;">
   
The first thing we will do is to type `prstools` in the command line which will give you the following output.

In [7]:
!prstools # WARNING! Do not type the exclamation mark if your using normal commandline so just `prstools`


Usage:
 prstools <command>   ...

Convenient and powerfull Polygenic Risk Score creation. 
'prst' is a commandline shorthand for 'prstools'

Models & Utility Commands:
 <command>  
  downloadutil  Download and unpack LD reference panels and other data.
  combine       A tool to combine genetics-related text files.
  prscs2        PRS-CS v2: A polygenic prediction method that infers posterior SNP effect sizes under
                continuous shrinkage (CS) priors.


<br>
<div style="font-size:20px; line-height:1.4;">
    
<!-- weird stuff has to be empty line here -->
This is the base command of prstools. You can also type `prst` as a shorthand.
We can type `prst prscs2` to access a new version for PRS-CS for us to generate PRSs with. For now you don't need to read all the stuff it outputs. It's to give you an overview.
</div>

In [8]:
!prst prscs2 # WARNING! Remove the !


Usage:
 prst prscs2 [-h  --cpus <number-of-cpus>] --ref <dir/refcode> --target <bim-prefix> --sst <file>
                   --out <dir+prefix> [--n_gwas <num>  --chrom <chroms>  --colmap <alternative_colnames>]
                   [--pred  --n_iter <n_iter>  --n_burnin <n_burnin>  --n_slice <n_slice>]
                   [--seed <seed>  --a <a>  --b <b>  --phi <phi>  --clip <clip>  --sampler <sampler>]
                   [--n_jobs <n_jobs>]

PRS-CS v2: A polygenic prediction method that infers posterior SNP effect sizes under continuous shrinkage (CS) priors.

General Options:
 -h, --help                               Show this help message and exit.
 -c, --cpus <number-of-cpus>              The number of cpus to use. Generally most efficient if chosen to
                                          be between 1 and 5. Functionality can be turned-off completely by
                                          setting it to -1. (default: 1)

Data Arguments (first 5 required):
 

<br>
<div style="font-size:20px; line-height:1.4;"> 
    
This printed the help of the `prscs2` model and it is quite a bunch of stuff, but before we look into all the special things we can do, it's important to
see that there are 3 types of arguments; **General, Data** and **Model** arguments. General and Model are not nessicary, but optional arguments.
The nessicary ones are the **Data arguments**. To see how to use them we are going to look at an example, which
is convenently located at the bottom of the help output. Now, before we can do anything we need some of data to do things with.
For this we will run:    

In [11]:
!prst downloadutil --pattern example --destdir ./; cd example
# Comment: You only have to the next line if you are using a jupyter notebook, but if you copy everything its gonna be ok too.
%cd example


A pattern was used, which matched the following files:
example.tar.gz

Downloading LD reference data, which might take some time. Data will be stored in: ./
example.tar.gz        : 100%|[32m███████████████████████████████████[0m| 1.83M/1.83M [00:01<00:00, 1.43MB/s][0m

Finished downloading all LD data. Now we need to unpack all the tar.gz files (takes some time to start):
Extracting: 100%|[32m████████████████████████[0m| 40/40 [00:00<00:00, 130.32it/s, file=./example/ldref_1kg_pop/snpinfo_1kg_hm3][0m
Deleting the following files: example.tar.gz, 
Completely done with downloading & unpacking

/home/jovyan/proj/repos/prstools/tutorials/example


<br>
<div style="font-size:20px; line-height:1.4;">
This downloaded the example data and moved to the directory 'example'. If any of it fails there is information on how to do it manually in the 'Manual download' section below.
Let's have a look what's inside the example data folder.

In [12]:
!ls

ldgm_1kg_pop  ldref_1kg_pop  sumstats.tsv  target.bed  target.bim  target.fam


<br>
<div style="font-size:20px; line-height:1.4;">
What we have here are: <br>
    
- Two example LD references: `ldgm_1kg_pop` & `ldref_1kg_pop`
- Target dataset in plink format:  `target.{bed/bim/fam}`
- GWAS sumstat: `sumstats.tsv`

Together with the GWAS sample size (N=2565, specified with `--n_gwas`), we now have all the inputs we need to create a PRS.
In addition the inputs, we only need to tell the tool where we would like our outputs to be stored.
We can do this with the `--out` option. This is where the PRS weights will be stored. Lastly, if we would like a PRS prediction for the target in addition the weights we can add the `--pred` flag (Note: It can also be done manually using plink described in a section below).
Combining everything gives use the following command which we will now run.
</div>

In [13]:
!prstools prscs2 --ref ldref_1kg_pop --target target.bim --sst sumstats.tsv --n_gwas 2565 --out ./result-prscs2 -p

PRSTOOLS v0.0.31 (18-02-2025)
Running command: prscs2
Options in effect:
  --ref ldref_1kg_pop
  --target target
  --sst sumstats.tsv
  --out ./result-prscs2
  --n_gwas 2565
  --pred 

Hostname:          45c1c31b873a
Working directory: /home/jovyan/proj/repos/prstools/tutorials/example
Start time:        Thu, 02 Oct 2025 22:46:56 
Random seed:       1759445216
Number of workers: 4 [--n_jobs]
Number of cpus:    1 (per worker)
 
Loading sumstat file.   ->          947 variants sumstat loaded.
Loading target file.    ->          947 variants bim file loaded (used pyarrow).
Creating snp register for efficient operations (only once)[snpinfo_1kg_hm3] @ ldref_1kg_pop/snpregister.tsv -> Done
Loading reference file. ->          936 variants loaded.
Matching sumstat & reference & target. -> 936 common variants after matching reference (100.0% incl.), target (98.8% incl.) and sumstat (98.8% incl.).
Computed beta marginal (=X'y/n) from sumstat using p-values and the sign of beta and sample size 



In [None]:
%cd ..

<br>
<div style="font-size:20px; line-height:1.4;">
A good thing to know is that for the target file we could have also specified `target.bed`, `target.fam` or `target`. All these will work since the software determines on its own where the relevant files are.
Now, we have stored the PRS weights and already computed the polygenic risk scores for our target plink dataset. Lets have a look!

</div>

In [16]:
!ls
!head -4 result* # Let's run this the see whats in the result files

ldgm_1kg_pop   result-prscs2_.prspred.tsv      sumstats.tsv  target.bim
ldref_1kg_pop  result-prscs2_.prstweights.tsv  target.bed    target.fam
==> result-prscs2_.prspred.tsv <==
fid	iid	prs
5	5	0.2751176357269287
8	8	0.42913275957107544
9	9	0.41239479184150696

==> result-prscs2_.prstweights.tsv <==
chrom	snp	pos	A1	A2	allele_weight
22	rs5747999	17075353	C	A	-0.003970123524139204
22	rs874836	17301843	A	G	0.0007087665216776075
22	rs7293026	17398800	T	C	-0.0010396795388768344


<br>
<div style="font-size:20px; line-height:1.4;">

So we have the weights for the SNPs and the PRS itself with the fid and iid matched. 
This last bit you don't have to follow along, but I will just show you that one can also use plink
on these weight files and that it gives the same PRS result:

</div>

In [17]:
!plink --bfile target --out prspred --keep-allele-order --score ./result-*prstweights.tsv 2 4 6 sum

PLINK v1.90b7 64-bit (16 Jan 2023)             www.cog-genomics.org/plink/1.9/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to prspred.log.
Options in effect:
  --bfile target
  --keep-allele-order
  --out prspred
  --score ./result-prscs2_.prstweights.tsv 2 4 6 sum

7939 MB RAM detected; reserving 3969 MB for main workspace.
947 variants loaded from .bim file.
4275 people (2101 males, 1988 females, 186 ambiguous) loaded from .fam.
Ambiguous sex IDs written to prspred.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 4275 founders and 0 nonfounders present.
Calculating allele frequencies... 10111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989 done.
Total genotyping rate is exactly 1.
947 variants and 4275 people pass filters and QC.
Note: No phenotypes present.
allele code mism

In [20]:
!head -4 prspred.profile

 FID  IID  PHENO    CNT   CNT2 SCORESUM
   5    5     -9   1872    529 0.275118
   8    8     -9   1872    569 0.429133
   9    9     -9   1872    553 0.412395


<div style="font-size:20px; line-height:1.4;">

As we can see the plink result in the column SCORESUM neatly matches the earlier computed PRS.

With this we have a basic understand of how to use prstools. In the next section, we go through a brief case study that shows how to use prstools in realworld scenario and will show
you some tricks that you can do with prstools that can greatly speedup your workflow.
<br>
</div>

In [23]:
%cd ..
#!rm -r example

/home/jovyan/proj/repos/prstools/tutorials


## Case Study A - Getting a Sumstat from GWAS Catalog and predicting into 1KG

<br>
<div style="font-size:20px; line-height:1.4;">
    
The previous basic example, although insightful, does not provide a sense of how to use prstools in a more realistic case.<br>
Therefore, we will now download a GWAS sumstat from GWAS catalog and train an example model and predict into the 1000 Genomes cohort for which
we will also download the relevant data. <br>
First we create a case study directory to which we can download our data:


In [4]:
!pwd
# !rm -r casestudy-a
!mkdir casestudy-a
%cd casestudy-a

/home/jovyan/proj/repos/prstools/tutorials/casestudy-a
/home/jovyan/proj/repos/prstools/tutorials/casestudy-a/casestudy-a


### Get a GWAS Sumstat from GWAS Catalog

<br>
<div style="font-size:20px; line-height:1.4;">
    
Then we find this GWAS sumstat for height on GWAS Catalog:
    

In [32]:
!wget https://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90245001-GCST90246000/GCST90245992/harmonised/GCST90245992.h.tsv.gz

--2025-10-02 23:34:21--  https://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90245001-GCST90246000/GCST90245992/harmonised/GCST90245992.h.tsv.gz
Resolving ftp.ebi.ac.uk (ftp.ebi.ac.uk)... 193.62.193.165
Connecting to ftp.ebi.ac.uk (ftp.ebi.ac.uk)|193.62.193.165|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 39622810 (38M) [application/x-gzip]
Saving to: ‘GCST90245992.h.tsv.gz’


2025-10-02 23:34:44 (1.70 MB/s) - ‘GCST90245992.h.tsv.gz’ saved [39622810/39622810]



<br>
<div style="font-size:20px; line-height:1.4;">
    
From the GWAS Catalog site we learned that this sumstat is from a European population and the sample size is 1,597,374, which is information we need later. <br>
When we have a quick peak into the file we can see it is still compressed and not directly readable, but will address this shortly.

In [33]:
!head -2 GCST90245992.h.tsv.gz

�     � BC �ZlZ�ndG<��S,��܎����rwn�-yԲ���� �KU��T|zQ�L2���?���|������������������>�||}{�n��~����������z����>?�v�x����~yy��|{{������o��������_���|��v�����������������oo�_���|����?����R}��v�ӗ�+^5G��t�r����s�~�#�GH?R���O����Ur��+�-&`�����)֖bR�����C*x1�$��v����=]?_�*��R�����u���^+_L�J��(y���=�,p?ے�H��7+Ĵz�r�q�t�XFN�ņ�{H/�����*��2������P��#�ሣ4��`$�8��D,[zYob�a���hJ�	��1��r
~�y�v	��Gz�����%I>��Ċ�s��ciRb�5��A?��J��7�E�iw'��)1bk��O�	�m��S(����$���֭/�P:t#)��~\n�y��n=�ŕ��4	�S"B��F�O)X�Bt�¾Cغ�]�CȤ�2G�i�όPB�8���t�"VX`��&bέ֤x9,�(1w�G�/�6���x#��


<!-- leave empty line -->
 
### Get the relevant LD reference

In [37]:
!prst downloadutil


Usage:
 prst downloadutil [-h  --destdir <destdir>  --pattern <pattern>  --list  --keeptar]

Download and unpack LD reference panels and other data.
The files that can be downloaded and unpacked with this command includes the standard reference files for PRS-CS and PRS-CSx.
Additional information can be found at https://github.com/getian107/PRScsx

Options:
 -h, --help           Show this help message and exit.
 --destdir <destdir>  Directory in which all the data will be downloaded. Option required if you want the
                      download to start.
 --pattern <pattern>  A string pattern that retrieves every file that it matches. Matches everything by
                      default. Without --destdir option (required to start downloading) one can see which
                      files get matched. (default: ALL)
 --list               Show which files can optionally be downloaded and exit.
 --keeptar            Keep the tar.gz files in the destdir. If this option is

In [36]:
!prst downloadutil --list


Files available for downloading & unpacking:

filename              description                                         url                                                                                                               
 snpinfo_mult_1kg_hm3 1000G multi-ancestry SNP info (for PRS-CSx) (~106M)                                                https://www.dropbox.com/s/rhi806sstvppzzz/snpinfo_mult_1kg_hm3?dl=1
snpinfo_mult_ukbb_hm3  UKBB multi-ancestry SNP info (for PRS-CSx) (~108M)                                               https://www.dropbox.com/s/oyn5trwtuei27qj/snpinfo_mult_ukbb_hm3?dl=1
 ldblk_1kg_afr.tar.gz              1000G AFR Population LD panel (~4.44G)                                                https://www.dropbox.com/s/mq94h1q9uuhun1h/ldblk_1kg_afr.tar.gz?dl=1
 ldblk_1kg_amr.tar.gz              1000G AMR Population LD panel (~3.84G)                                                https://www.dropbox.com/s/uv5ydr4uv528lca/ldblk_1kg_amr.tar.gz?dl=1
 ldblk_1

<br>
<div style="font-size:20px; line-height:1.4;">

We can use `downloadutil` to download and unpack the European LD reference. By adding a pattern and a destdir we can start the download.

In [38]:
!prst downloadutil --pattern 1kg_eur --destdir ./


A pattern was used, which matched the following files:
ldblk_1kg_eur.tar.gz

Downloading LD reference data, which might take some time. Data will be stored in: ./
ldblk_1kg_eur.tar.gz  : 100%|[32m███████████████████████████████████[0m| 4.56G/4.56G [07:07<00:00, 10.7MB/s][0m    

Finished downloading all LD data. Now we need to unpack all the tar.gz files (takes some time to start):
Extracting: 100%|[32m███████████████████████████████████[0m| 24/24 [02:36<00:00,  6.52s/it, file=ldblk_1kg_eur/snpinfo_1kg_hm3][0m
Deleting the following files: ldblk_1kg_eur.tar.gz, 
Completely done with downloading & unpacking



<!-- leave empty line -->

### Download a target dataset: 1KG

<div style="font-size:20px; line-height:1.4;">
    
In addition to the LD reference data we will also be needed a target dataset, for this we will use a EUR 1000 Genomes dataset which can be downloaded with:

In [39]:
!prst downloadutil --pattern g1000 --destdir ./


A pattern was used, which matched the following files:
g1000.tar.gz

Downloading LD reference data, which might take some time. Data will be stored in: ./
g1000.tar.gz          : 100%|[32m███████████████████████████████████[0m| 64.0M/64.0M [00:05<00:00, 10.9MB/s][0m  

Finished downloading all LD data. Now we need to unpack all the tar.gz files (takes some time to start):
Extracting: 100%|[32m███████████████████████████████████████████████████[0m| 4/4 [00:06<00:00,  1.61s/it, file=g1000/g1000.bed][0m
Deleting the following files: g1000.tar.gz, 
Completely done with downloading & unpacking



### Combining Everything and creating a model

<div style="font-size:20px; line-height:1.4;">
    
Let's see what we have gathered in our case study directory:

In [41]:
!ls

g1000  GCST90245992.h.tsv.gz  ldblk_1kg_eur


In [44]:
!ls g1000

g1000.bed  g1000.bim  g1000.fam


<div style="font-size:20px; line-height:1.4;">
    
Now we have everything we need, and we can feed these inputs to prstools to create the PRS. In the command below we make use of the escape character `\` which allow you to write a command on multiple lines, which the commandline interface will interpret it as one single line. Usefull in many cases! See that we added a --pred at the end, since we want to immediately generate the PRS for our target dataset.

In [94]:
!prst prscs2 \
    --ref ./ldblk_1kg_eur/ \
    --target ./g1000/g1000.bed \
    --sst GCST90245992.h.tsv.gz \
    --n_gwas 1597374 \
    --out ./prscs2_ \
    --pred

PRSTOOLS v0.0.31 (18-02-2025)
Running command: prscs2
Options in effect:
  --ref ./ldblk_1kg_eur/
  --target /home/jovyan/proj/data/g1000/g1000_hm3_eur
  --sst GCST90245992.h.tsv.gz
  --out ./prscs2_
  --n_gwas 1597374
  --pred 

Hostname:          45c1c31b873a
Working directory: /home/jovyan/proj/repos/prstools/tutorials/casestudy-a
Start time:        Mon, 29 Sep 2025 20:35:38 
Random seed:       1759178138
Number of workers: 4 [--n_jobs]
Number of cpus:    1
 
Loading sumstat file. Current --colmap is SNP,A1,A2,BETA,OR,P,SE,N. This is the default. The colmap argument should list the column names as they appear in your input file. For instance --colmap rsid,Allele1,Allele2,BETA,,Pval,StdErr,Ntotal. Mind that not all positions need to have a column name and can be left empty. With this example colmap we will get the following column mapping:
[colmap column-name conversions:  rsid -> SNP, Allele1 -> A1, Allele2 -> A2, Pval -> P, StdErr -> SE, Ntotal -> N ]
Also, if the conversion column

<!-- <div style="font-size:20px; line-height:1.4;">

Ok that did not seem to go completely according to plan: .. A whole bunch of information with an error at the bottom, whichs reads <br>
`Missing required column(s) 'snp'/'A1'/'A2' (alternative name(s): 'SNP'/'A1'/'A2'), please add the column(s) to the sumstat or use --colmap option.` <br>

Now reading from top to bottom we can see this-that
    
We can also see a part of the contents of GWAS Sumstat file, with .....
    
So it seems we now need to supply --colmap to have prstools properly read the sumstat file we downloaded (alternatively you can manually process the file into the exact same format as the sumstats.tsv example file).

[yap yap this is how we make the --colmap] <br>
[since doing all the iterations for a tutorial would take up a bit too much, add --n_iter 4] <br>
[interestingly we can also dump into chatbot]
     -->
 
<div style="font-size:20px; line-height:1.4;">
    
Ok, that did not go completely according to plan!
When we first ran the command, prstools gave us a lot of diagnostic information and then an error at the bottom:

`Missing required column(s) 'snp'/'A1'/'A2' (alternative name(s): 'SNP'/'A1'/'A2'), please add the column(s) to the sumstat or use --colmap option.`

Reading the output from top to bottom, we see that the tool shows us how it tried to map the columns from our GWAS summary statistics file into the required format. It even prints out a preview of the file contents (transposed so you can read it more easily).

From this we learn two things:
The input file uses different column names (rsid, effect_allele, other_allele, beta, p_value, etc.).

1. prstools expects specific names (snp, A1, A2, beta, P, SE, N).
2. So the solution is to tell prstools how to translate the column names using the --colmap option. This way, the software knows which column in your file corresponds to which required input.

Since this is just a tutorial, we’ll also limit the number of iterations (--n_iter 4) so the run finishes quickly. Lastly mind that you can also just dump the entire error message into a chatbot and get the correct command or a `--colmap` argument.

Now we will run the corrected command:

In [46]:
!prst prscs2 \
    --ref ./ldblk_1kg_eur/ \
    --target ./g1000/g1000.bed \
    --sst GCST90245992.h.tsv.gz \
    --n_gwas 1597374 \
    --out ./prscs2_ \
    --pred \
    --n_iter 4 \
    --colmap rsid,effect_allele,other_allele,beta,,p_value,standard_error,n

PRSTOOLS v0.0.31 (18-02-2025)
Running command: prscs2
Options in effect:
  --ref ./ldblk_1kg_eur/
  --target ./g1000/g1000.bed
  --sst GCST90245992.h.tsv.gz
  --out ./prscs2_
  --n_gwas 1597374
  --colmap rsid,effect_allele,other_allele,beta,,p_value,standard_error,n
  --pred 
  --n_iter 4

Hostname:          45c1c31b873a
Working directory: /home/jovyan/proj/repos/prstools/tutorials/casestudy-a
Start time:        Fri, 03 Oct 2025 00:19:02 
Random seed:       1759450742
Number of workers: 4 [--n_jobs]
Number of cpus:    1 (per worker)
 
[colmap column-name conversions:  rsid -> SNP, effect_allele -> A1, other_allele -> A2, beta -> BETA, p_value -> P, standard_error -> SE, n -> N ]
Loading sumstat file.   ->    1,369,840 variants sumstat loaded.
Loading target file.    ->    1,360,142 variants bim file loaded (used pyarrow).
Creating snp register for efficient operations (only once)[snpinfo_1kg_hm3] @ ./ldblk_1kg_eur/snpregister.tsv -> Done
Loading reference file. ->    1,120,696 variant

<div style="font-size:20px; line-height:1.4;">

As we an see there is also a "Number of workers" specified. If you are running this code on a powerfull system you can specify a large number of workers. For instance, giving the option `--n_jobs 22` would spin up 22 workers which can then do the PRS model creation for the 22 chromosome in parallel, leading to a big decrease in overall runtime. Increasing the number of workers above 22 will not lead to a decrease in runtime since there are only 22 tasks in this case. If we set `--n_jobs 22` the 22 workers would require about 22\*1=22 cores and approximately 2GB\*22=44GB of RAM to run. Increasing the number of cores using the `--cpus` option will only lead to a modest decrease in runtime if enough cores are availabe. If one would set `--cpus 2` this would require 22 workers * 2 cpus = 44 total cores. If the required number of cores is higher than the available cores this can substantially decrease computational efficiency.


### Other Cool Stuff the tool can do

<div style="font-size:20px; line-height:1.4;">

<!-- As a little side note, it's good to know that one can copy paste the options in effect and add a `\` at the and put .. -->
Use {insertions} to construct output filenames:

In [None]:
!prst prscs2 \
  --ref /home/jovyan/proj/data/ldrefs/prscs/ldblk_1kg_eur/ \
  --target /home/jovyan/proj/data/g1000/g1000_hm3_eur \
  --sst GCST90245992.h.tsv.gz \
  --out ./{command}_{ref}_{sst}_{n_iter}_{target}_ngwas={n_gwas}_chrom={chrom} \
  --n_gwas 1597374 \
  --pred \
  --chrom 21,22 \
  --n_iter 4 \
  --colmap rsid,effect_allele,other_allele,beta,,p_value,standard_error,n

# Extra information

## Manual download

In [None]:
# Not yet created

## Manual PRS prediction from weights using plink

In [None]:
# Not yet created

In [61]:
#plink --bfile target --out prspred --keep-allele-order --score ./result-{cmdname}_* 2 4 6 # Make predictions from weights (plink must be installed).
#plink --bfile target --out prspred --keep-allele-order --score ./result-{cmdname}_* 2 4 6 'sum' to do perfect match