#RNA synthesis based on a given RNA family

Application scenario: we want to design new RNA sequences whose traits comply with a given RNA family. 
For this purpose we use EDeN to come up with a notion of "importance" in
existing sequences, calculate sequence constraints based on this importance, and then utilize antaRNA for RNA inverse folding using these constraints. 

In [15]:
%matplotlib inline
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [16]:
import logging
from eden.util import configure_logging
configure_logging(logging.getLogger(), verbosity=2, filename='rna.log')

In [17]:
from evaluation.PerformanceEvaluation import experiment

Define experiment-specific parameters.

In [18]:
params = {'rfam_id':'RF00005',
          'antaRNA_params':'../evaluation/antaRNA.ini' ,
          'nt_importance_threshold':0 , 
          'nmin_important_nt_adjaceny':1 , 
          'bp_importance_threshold':0 ,
          'nmin_important_bp_adjaceny':1 , 
          'nmin_unpaired_nt_adjacency':1 , 
          'multi_sequence_size':1 , 
          'filtering_threshold':0 , 
          'batch_proportion':1 , 
          'epoch_instances':10 , 
          'experiment_runs':10 ,
          'split_ratio':0.2}

Run the experiment.

In [None]:
%%time
roc_t , roc_s , apr_t , apr_s = experiment(params)

Starting RNA Synthesis experiment for RF00005 ...
Starting new HTTP connection (1): rfam.xfam.org
"GET /family/RF00005/alignment?acc=RF00005&format=fastau&download=0 HTTP/1.1" 200 90476
Starting epoch 1:

Classifier:
SGDClassifier(alpha=6.82678658642e-05, average=True, class_weight='auto',
       epsilon=0.1, eta0=0.682073602043, fit_intercept=True, l1_ratio=0.15,
       learning_rate='constant', loss='hinge', n_iter=83, n_jobs=1,
       penalty='l2', power_t=0.130099752293, random_state=None,
       shuffle=True, verbose=0, warm_start=False)

Predictive performance:
            accuracy: 0.777 +- 0.046
           precision: 0.333 +- 0.471
              recall: 0.056 +- 0.079
                  f1: 0.095 +- 0.135
   average_precision: 0.775 +- 0.175
             roc_auc: 0.873 +- 0.088

Classifier:
SGDClassifier(alpha=0.000916785429376, average=True, class_weight='auto',
       epsilon=0.1, eta0=0.504818228496, fit_intercept=True, l1_ratio=0.15,
       learning_rate='optimal', loss='hin

Plot the exponential decay learning curve for results.

In [None]:
from evaluation.PerformanceEvaluation import PlotKit as pltk

ROC learning curve comparison for True and Mixed samples:

In [None]:
pltk.xpDecay(iterable_t = roc_t , iterable_s = roc_s , measure = 'ROC')

APR learning curve comparison for True and Mixed samples:

In [None]:
pltk.xpDecay(iterable_t = apr_t , iterable_s = apr_s , measure = 'APR')