# Getting started with Rhapsody

## Installation
Follow instructions in the git repository [README file](https://github.com/luponzo86/rhapsody/blob/master/README.md).

## Initial configuration
The standard configuration procedure will create a `rhapsody/` folder in your home directory and automatically train default classifiers and download all necessary data (i.e. EVmutation precomputed scores). Read the [documentation](https://rhapsody.readthedocs.io/en/latest/) to see how to change the default configuration parameters.

In [1]:
import rhapsody as rd

In [2]:
summary = rd.initialSetup()

@> You are running Rhapsody v0.9.2
@> Pre-existing working directory detected: /home/luca/rhapsody
@> Pre-existing classifiers found: /home/luca/rhapsody/default_classifiers-sklearn_v0.21.2
@> Pre-existing EVmutation metrics found.
@> EVmutation folder found: /home/luca/rhapsody/EVmutation_mutation_effects
@> DSSP is installed on the system.
@> Setup complete.


## Testing
Here we show how to obtain pathogenicity predictions for a small set of 5 Single Amino acid Variants:

In [3]:
test_SAVs = [
    'O00294 496 A T',  # know neutral SAV used for training
    'O00238 31 R H',   # SAV with no PDB structure
    'P01112 58 T R',   # unknown SAV
    'P01112 30 D E',   # unknown SAV
    'P01112 170 K I',  # unknown SAV with no Pfam domain
]

In [4]:
import os

if not os.path.isdir('local'):
    os.mkdir('local')
if not os.path.isdir('local/getting_started'):
    os.mkdir('local/getting_started')
    
os.chdir('local/getting_started')

In [5]:
rh = rd.rhapsody(test_SAVs)

@> Logging into file: rhapsody-log.txt
@> Logging started at 2019-08-08 17:42:30.486571
@> Imported feature set:
@>    'wt_PSIC'* 
@>    'Delta_PSIC'* 
@>    'SASA'* 
@>    'ANM_MSF-chain'* 
@>    'ANM_effectiveness-chain'* 
@>    'ANM_sensitivity-chain'* 
@>    'stiffness-chain'* 
@>    'entropy' 
@>    'ranked_MI' 
@>    'BLOSUM'* 
@>    (* auxiliary feature set)
@> Submitting query to PolyPhen-2...
@> Query to PolyPhen-2 started in 1.1s.
@> PolyPhen-2 is running...
@> Query to PolyPhen-2 completed in 13.2s.
@> PolyPhen-2's output parsed.
@> Sequence-conservation features have been retrieved from PolyPhen-2's output.
@> Mapping SAVs to PDB structures...
@> [1/5] Mapping SAV 'O00238 31 R H' to PDB...
@> Pickle 'UniprotMap-O00238.pkl' recovered.
@> [2/5] Mapping SAV 'O00294 496 A T' to PDB...
@> Pickle 'UniprotMap-O00238.pkl' saved.
@> Pickle 'UniprotMap-O00294.pkl' recovered.
@> [3/5] Mapping SAV 'P01112 58 T R' to PDB...
@> Pickle 'UniprotMap-O00294.pkl' saved.
@> Pickle 'UniprotMap-

All results and predictions are saved to file in the current directory:

In [6]:
ls

pph2-completed.txt     rhapsody-log.txt
pph2-full.txt          rhapsody-log.txt.1
pph2-log.txt           rhapsody-pickle.pkl
pph2-short.txt         rhapsody-predictions-full_vs_reduced.txt
pph2-snps.txt          rhapsody-predictions.txt
pph2-started.txt       rhapsody-SAVs.txt
rhapsody-features.txt  rhapsody-Uniprot2PDB.txt


...and can also be accessed through the Rhapsody object:

In [7]:
rh.getPredictions()

array([('O00294 496 A T', 'known_neu', 0.10066666, 0.03931291, 'neutral', 0.351, 'neutral', -3.1479, 'neutral'),
       ('O00238 31 R H', 'new',        nan,        nan, '?', 0.219, 'neutral', -2.4718, 'neutral'),
       ('P01112 58 T R', 'new', 0.9533333 , 0.8964852 , 'deleterious', 1.   , 'deleterious', -9.7604, 'deleterious'),
       ('P01112 30 D E', 'new', 0.118     , 0.04702793, 'neutral', 0.001, 'neutral',  0.2196, 'neutral'),
       ('P01112 170 K I', 'new', 0.45533332, 0.27731383, 'neutral', 0.   , 'neutral',     nan, '?')],
      dtype=[('SAV coords', '<U50'), ('training info', '<U12'), ('score', '<f4'), ('path. prob.', '<f4'), ('path. class', '<U12'), ('PolyPhen-2 score', '<f4'), ('PolyPhen-2 path. class', '<U12'), ('EVmutation score', '<f4'), ('EVmutation path. class', '<U12')])

In [8]:
rh.getPredictions(SAV='P01112 30 D E')['path. class']

'neutral'