# Nucleic Transformer
The Nucleic Transformer models are deep learning models developed to study and understand DNA/RNA usings public available datasets. You can check out [the paper on bioarxiv](https://www.biorxiv.org/content/10.1101/2021.01.28.428629v1)
and [open-sourced code on github](https://github.com/Shujun-He/Nucleic-Transformer). The model archiecture is simple but effective, outperforming previous results in DNA promoters/virus classification; additionally,
we used it to to place 7th in the [OpenVaccine challenge](https://www.kaggle.com/c/stanford-covid-vaccine).

This notebook will show how to use the RNA degradation prediction model and visualize the attention weights of the Nucleic Transformer

In [None]:
!cp -r ../input/nucleictransformerrnainference/RNA_Inference .


In [None]:
import RNA_Inference
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Load sequences and features to predict degradation properties
First, let's load the sequences you want to predict. Here I just use the testing data from OpenVaccine but you can change it to anything you like

In [None]:
df_path='../input/stanford-covid-vaccine/post_deadline_files/new_sequences.csv'
df=pd.read_csv(df_path)

In [None]:
sequences=df.sequence
structures=df.structure
loops=df.bpRNA_string
bpps=[]
for i in range(len(df)):
    bpps.append(np.load(f'../input/stanford-covid-vaccine/post_deadline_files/new_sequences_bpps/{df.id[i]}.npy'))


### Let us take a look at what kind of features we are dealing with

In [None]:
print(sequences[0])
print(structures[0])
print(loops[0])

# Load models and make predictions

In [None]:
inference_tool=RNA_Inference.RNA_Inference()
inference_tool.load_models('RNA_Inference/best_weights')
outputs, attention_weights=inference_tool.predict(sequences,structures,loops,bpps)

In [None]:
results_df=pd.DataFrame(columns=
['idseqpos','reactivity','deg_Mg_pH10','deg_pH10','deg_Mg_50C','deg_50C'])

cat_outputs=np.concatenate(outputs,0)

idseqpos=[]

for i in range(len(df)):
    for j in range(len(outputs[i])):
        idseqpos.append(f'{df.id[i]}_j')

results_df.idseqpos=idseqpos
results_df.iloc[:,1:]=cat_outputs
results_df.to_csv('predictions.csv',index=False)

In [None]:
results_df.head()