## Benchmark different algos on different test sets, for structure prediction

One violin plot with the three test sets (PDB, lncRNA, viral_fragments) for each algorithm (RNAstructure, EternaFold, MxFold2, Ufold). 

Group by algorithm.
Colored by test set

**Assigned to**: Alberic

Use Ploty, and a white background

In [18]:
import pandas as pd

results = pd.read_feather('results_benchmark_algos.feather').set_index('reference')
results.loc[results['dataset']=='viral_fragments', 'dataset'] = 'viral mRNA'
results.loc[results['dataset']=='lncRNA', 'dataset'] = 'long ncRNA'

In [32]:
# Creat a box plot with plotly
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px
fig = make_subplots(rows=1, cols=1, shared_xaxes=True)

colors = px.colors.qualitative.Set2
for i, dataset in enumerate(results['dataset'].unique()):
    results_dataset = results[results['dataset']==dataset]
    fig.add_trace(go.Violin(x=results_dataset['model'], y=results_dataset['F1'], 
                            name=f'{dataset} ({len(results_dataset[results_dataset["model"]=="RNAstructure"])})', marker_color=colors[i], 
                            meanline_visible=True, points=False))
    
fig.update_layout(
                    # title='F1 score distribution for each model and dataset', 
                  yaxis_title='F1 score', xaxis_title='Model',
                  violinmode='group', yaxis_range=[0, 1],
                  width=1000, height=380,
                  template='plotly_white', font_size=15, font_color='black',)
fig.update_xaxes(categoryorder='array', categoryarray= ['RNAstructure', 'EternaFold', 'MXFold2', 'UFold'])
fig.show()

In [28]:
# save pdf
fig.write_image("images/a_algo_benchmark.pdf")