# Getting started with Rhapsody

## Installation
Follow instructions in the git repository [README file](https://github.com/luponzo86/rhapsody/blob/master/README.md).

## Initial configuration
The standard configuration procedure will create a `rhapsody/` folder in your home directory and automatically train default classifiers and download all necessary data (i.e. EVmutation precomputed scores). Read the [documentation](https://rhapsody.readthedocs.io/en/latest/) to see how to change the default configuration parameters.

In [1]:
import rhapsody as rd

In [2]:
summary = rd.initialSetup()

@> You are running Rhapsody v0.9.7

@> Pre-existing working directory detected: /home/luca/rhapsody
@> Pre-existing classifiers found: /home/luca/rhapsody/default_classifiers-sklearn_v0.21.3
@> Pre-existing EVmutation metrics found.
@> EVmutation folder found: /home/luca/rhapsody/EVmutation_mutation_effects
@> DSSP is installed on the system.
@> Setup complete.


## Testing
Here we show how to obtain pathogenicity predictions for a small set of 5 Single Amino acid Variants:

In [3]:
test_SAVs = [
    'O00294 496 A T',  # know neutral SAV used for training
    'O00238 31 R H',   # SAV with no PDB structure
    'P01112 58 T R',   # unknown SAV
    'P01112 30 D E',   # unknown SAV
    'P01112 170 K I',  # unknown SAV with no Pfam domain
]

In [4]:
import os

if not os.path.isdir('local'):
    os.mkdir('local')

Let's run Rhapsody in the folder we just created:

In [5]:
os.chdir('local')
rh = rd.rhapsody(test_SAVs)
os.chdir('..')

@> Logging into file: rhapsody-log.txt
@> Logging started at 2019-12-13 14:29:57.750729
@> Imported feature set:
@>    'wt_PSIC'* 
@>    'Delta_PSIC'* 
@>    'SASA'* 
@>    'ANM_MSF-chain'* 
@>    'ANM_effectiveness-chain'* 
@>    'ANM_sensitivity-chain'* 
@>    'stiffness-chain'* 
@>    'entropy' 
@>    'ranked_MI' 
@>    'BLOSUM'* 
@>    (* auxiliary feature set)
@> Submitting query to PolyPhen-2...
@> Query to PolyPhen-2 started in 0.5s.
@> PolyPhen-2 is running...
@> Query to PolyPhen-2 completed in 19.6s.
@> PolyPhen-2's output parsed.
@> Sequence-conservation features have been retrieved from PolyPhen-2's output.
@> Mapping SAVs to PDB structures...
Mapping SAV 'O00238 31 R H' to PDB:   0%|          | 0/5 [00:00<?]@> Pickle 'UniprotMap-O00238.pkl' recovered.
Mapping SAV 'O00294 496 A T' to PDB:   0%|          | 0/5 [00:00<?]@> Pickle 'UniprotMap-O00238.pkl' saved.
@> Pickle 'UniprotMap-O00294.pkl' recovered.
Mapping SAV 'P01112 58 T R' to PDB:  40%|████      | 2/5 [00:00<00:00] @

All results and predictions have been saved to file:

In [6]:
!ls local

pph2-completed.txt     rhapsody-log.txt
pph2-full.txt	       rhapsody-log.txt.1
pph2-log.txt	       rhapsody-pickle.pkl
pph2-short.txt	       rhapsody-predictions-full_vs_reduced.txt
pph2-snps.txt	       rhapsody-predictions.txt
pph2-started.txt       rhapsody-SAVs.txt
rhapsody-features.txt  rhapsody-Uniprot2PDB.txt


...and can also be accessed through the Rhapsody object:

In [7]:
rh.getPredictions()

array([('O00294 496 A T', 'known_neu', 0.10733333, 0.04343577, 'neutral', 0.351, 'neutral', -3.1479, 'neutral'),
       ('O00238 31 R H', 'new',        nan,        nan, '?', 0.219, 'neutral', -2.4718, 'neutral'),
       ('P01112 58 T R', 'new', 0.952     , 0.8950683 , 'deleterious', 1.   , 'deleterious', -9.7604, 'deleterious'),
       ('P01112 30 D E', 'new', 0.122     , 0.0495388 , 'neutral', 0.001, 'neutral',  0.2196, 'neutral'),
       ('P01112 170 K I', 'new', 0.43533334, 0.25916612, 'neutral', 0.   , 'neutral',     nan, '?')],
      dtype=[('SAV coords', '<U50'), ('training info', '<U12'), ('score', '<f4'), ('path. prob.', '<f4'), ('path. class', '<U12'), ('PolyPhen-2 score', '<f4'), ('PolyPhen-2 path. class', '<U12'), ('EVmutation score', '<f4'), ('EVmutation path. class', '<U12')])

In [8]:
rh.getPredictions(SAV='P01112 30 D E')['path. class']

'neutral'