In [9]:
# You can ignore the stuff on the next lines, its related to making the formatting nicer.
import os
os.environ['COLUMNS'] = '110'

# Getting Started

<br>
<div style="font-size:20px; line-height:1.4;">
    
Hi there! <br>
This is the getting started part of the prstools tutorial that in not time will make/let you
generate polygenic risk scores. prstools was created with python but it was created with the 
goal of not requiring you to have any python skills and do most things via the command line.
It also can be used inside of a python environment for which there will be a tutorial too. <br>
<br>
For this tutorial we will assume you ran `pip install -U prstools`, probably in a conda enviroment, and that worked out just fine.
If it did you should now be able to run the following basic example.<br>
<i> Good to know: If one uses !something in a jupyter notebook that just means you will be running a bash/commandline command (for instance !ls for a directory). </i> <br>
</div>

In [None]:
!pip install -U prstools # Run this line if you have not yet installed prstools

## Basic Example

<br>
<div style="font-size:20px; line-height:1.4;">
   
We can start with running command `prstools` :

In [7]:
!prstools


Usage:
 prstools <command>   ...

Convenient and powerfull Polygenic Risk Score creation. 
'prst' is a commandline shorthand for 'prstools'

Models & Utility Commands:
 <command>  
  downloadutil  Download and unpack LD reference panels and other data.
  combine       A tool to combine genetics-related text files.
  prscs2        PRS-CS v2: A polygenic prediction method that infers posterior
                SNP effect sizes under continuous shrinkage (CS) priors.


<br>
<div style="font-size:20px; line-height:1.4;">
    
<!-- weird stuff has to be empty line here -->
This is the base command of prstools and you can also type `prst` as a shorthand.
Also its best to see if you can follow along step-by-step so you can see how everything works.
We can type `prst prscs2` to access a new version for PRS-CS for us to generate PRSs with. Also you don't have to read all the stuff it outputs now. It's to give you an overview.
</div>

In [10]:
!prst prscs2


Usage:
 prst prscs2 [-h  --cpus <number-of-cpus>] --ref <dir/refcode> --target <bim-prefix> --sst <file>
                   --out <dir+prefix> [--n_gwas <num>  --chrom <chroms>  --colmap <alternative_colnames>]
                   [--pred  --n_iter <n_iter>  --n_burnin <n_burnin>  --n_slice <n_slice>]
                   [--seed <seed>  --a <a>  --b <b>  --phi <phi>  --clip <clip>  --sampler <sampler>]

PRS-CS v2: A polygenic prediction method that infers posterior SNP effect sizes under continuous shrinkage (CS) priors.

General Options:
 -h, --help                               Show this help message and exit.
 -c, --cpus <number-of-cpus>              The number of cpus to use. Generally most efficient if chosen to
                                          be between 1 and 5. Functionality can be turned-off completely by
                                          setting it to -1. (default: 1)

Data Arguments (first 5 required):
 -r, --ref_dir, --ref <dir/refcode>      

<br>
<div style="font-size:20px; line-height:1.4;"> 
    
This printed the help of the `prscs2` model. Now this is quite a bunch of stuff, but before we look into all the special things we can do, it's important to
see that there are 3 types of arguments; **General, Data** and **Model** arguments. General and Model are not nessicary, but optional arguments.
The nessicary ones are the Data arguments. To see how to use them we are going to look at an example, which
is convenently located at the bottom of the help output. Now, before we can do anything we need some of data to do things with.
For this we will run:    

In [31]:
!prst downloadutil --pattern example --destdir ./; cd example
# Comment: You only have to the next line if you are using a jupyter notebook, but if you copy everything its gonna be ok too.
%cd example 


A pattern was used, which matched the following files:
example.tar.gz

Downloading LD reference data, which might take some time. Data will be stored in: ./
example.tar.gz        : 100%|[32m███████████████████████████████████[0m| 1.83M/1.83M [00:01<00:00, 1.32MB/s][0m

Finished downloading all LD data. Now we need to unpack all the tar.gz files (takes some time to start):
Extracting: 100%|[32m████████████████████████[0m| 40/40 [00:00<00:00, 150.75it/s, file=./example/ldref_1kg_pop/snpinfo_1kg_hm3][0m
Deleting the following files: example.tar.gz, 
Completely done with downloading & unpacking

/home/jovyan/proj/repos/prstools/tutorials/example


<br>
<div style="font-size:20px; line-height:1.4;">
This downloaded the example data and moved to the directory 'example'. If any of it fails there is information on how to do it manually in the 'Manual download' section below.
Let's have a look what's inside the example data folder.

In [34]:
!ls

ldgm_1kg_pop  ldref_1kg_pop  sumstats.tsv  target.bed  target.bim  target.fam


<br>
<div style="font-size:20px; line-height:1.4;">
What we have here are: <br>
    
    - Two LD references (ldgm_1kg_pop & ldref_1kg_pop), 
    - A GWAS sumstat and a target dataset in plink format ( target.{bed/bim/fam} ).
For this

In [37]:
!prstools prscs2 --ref ldref_1kg_pop --target target --sst sumstats.tsv --n_gwas 2565 --out ./result-prscs2 -p

PRSTOOLS v0.0.2 (18-02-2025)
Running command: prscs2
Options in effect:
  --ref ldref_1kg_pop
  --target target
  --sst sumstats.tsv
  --out ./result-prscs2
  --n_gwas 2565
  --pred 

Hostname:          ae633bb73cbb
Working directory: /home/jovyan/proj/repos/prstools/tutorials/example
Start time:        Sat, 28 Jun 2025 06:27:22 
Random seed:       1751092042
Number of cpus:    1
 
Loading sumstat file.   ->          947 variants sumstat loaded.
Loading target file.    ->          947 variants bim file loaded (used pyarrow).
Loading reference file. ->          936 variants loaded.
Matching sumstat & reference & target. -> 936 common variants after matching reference (100.0% incl.), target (98.8% incl.) and sumstat (98.8% incl.).
Computed beta marginal (=X'y/n) from sumstat using p-values and the sign of beta and sample size 

Starting iterations of model(s):
100%|[32m██████████████████████████████████████████████████████████████████████[0m| 1000/1000 [00:49<00:00, 20.14it/s][0m
Savi

<br>
<div style="font-size:20px; line-height:1.4;">
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor 
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis 
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. 
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore 
eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt 
in culpa qui officia deserunt mollit anim id est laborum.

In [30]:
%cd ..
!rm -r example

/home/jovyan/proj/repos/prstools/tutorials


In [29]:
!ls

ldgm_1kg_pop  ldref_1kg_pop  sumstats.tsv  target.bed  target.bim  target.fam


## Case Study A

In [5]:
!mkdir casestudy-a
%cd casestudy-a

/home/jovyan/proj/repos/prstools/tutorials/casestudy-a


# Extra information

## Manual download

In [None]:
# Not yet created