#RNA synthesis of a ncRNA family

We want to design new RNA sequences whose traits comply with a given RNA family. 
For this purpose we use EDeN to come up with a notion of "importance" in
existing sequences, calculate sequence constraints based on this importance, and then utilize antaRNA for RNA inverse folding using these constraints. 

In [1]:
%matplotlib inline
%load_ext autoreload
%autoreload 2

In [2]:
import logging
from eden.util import configure_logging
configure_logging(logging.getLogger(), verbosity=2, filename='rna.log')

*Define experimental parameters*

In [3]:
import numpy as np
data_fractions = list(np.linspace(0.01,0.05,10))
print data_fractions

params = {'rfam_id':'RF00005',
          'antaRNA_params':'../evaluation/antaRNA.ini' ,
          'importance_threshold_sequence_constraint':0 , 
          'min_size_connected_component_sequence_constraint':1 , 
          'importance_threshold_structure_constraint':0 ,
          'min_size_connected_component_structure_constraint':1 , 
          'min_size_connected_component_unpaired_structure_constraint':1 , 
          'n_synthesized_sequences_per_seed_sequence':1 , 
          'instance_score_threshold':0 , 
          'data_fractions':data_fractions , 
          'n_experiment_repetitions':5 ,
          'train_to_test_split_ratio':0.2,
          'vectorizer_complexity':2,
          'negative_shuffle_ratio':2}

[0.01, 0.014444444444444444, 0.018888888888888889, 0.023333333333333331, 0.027777777777777776, 0.032222222222222222, 0.036666666666666667, 0.041111111111111112, 0.045555555555555557, 0.050000000000000003]


*Run the experiment*

In [None]:
%%time
from evaluation.PerformanceEvaluation import compute_learning_curves
roc_t , roc_s , apr_t , apr_s, data_fractions  = compute_learning_curves(params)

Starting RNA Synthesis experiment for RF00005 ...
Starting new HTTP connection (1): rfam.xfam.org
"GET /family/RF00005/alignment?acc=RF00005&format=fastau&download=0 HTTP/1.1" 200 90476
Training on data chunk 0/10 (data fraction: 0.010)
--------------------------------------------------------------------------------
run 1/5
Fit estimator on original data
Positive data: Instances: 1 ; Features: 1048577 with an avg of 572 features per instance
Negative data: Instances: 3 ; Features: 1048577 with an avg of 581 features per instance
Elapsed time: 0.4 secs
Evaluate estimator:


*Plot the computed learning curves*

In [None]:
from evaluation.draw_utils import  draw_learning_curve
draw_learning_curve(data_A=roc_t, data_B=roc_s, x=data_fractions, measure='ROC')
draw_learning_curve(data_A=apr_t, data_B=apr_s, x=data_fractions, measure='APR')