# Prediction output data explorer

For example, after running:
```shell
$ python predict.py --cmap ./examples/pdb_cmaps/1S3P-A.npz -ont mf --verbose

### Computing predictions on a single protein...
Protein GO-term/EC-number Score GO-term/EC-number name
query_prot GO:0005509 0.99999 calcium ion binding
### Saving predictions to *.json file...
```

Prediction output files are:
```
DeepFRI_MF_pred_scores.json
DeepFRI_MF_predictions.csv
```

This notebook assumes those files exist.

## `predictions.csv`

The following are the full contents of the `predictions.csv` file.

In [1]:
import pandas as pd

predictions_csv_df = pd.read_csv('DeepFRI_MF_predictions.csv')
predictions_csv_df

Unnamed: 0,Unnamed: 1,Unnamed: 2,### Predictions made by DeepFRI.
Protein,GO_term/EC_number,Score,GO_term/EC_number name
query_prot,GO:0005509,0.99999,calcium ion binding


## `pred_scores.json`

In [2]:
import json
import numpy as np

with open('DeepFRI_MF_pred_scores.json') as json_file:
    predictions_json = json.load(json_file)


print(f"Keys: {predictions_json.keys()}")

Y_hat = np.asarray(predictions_json['Y_hat'])
goterms = np.asarray(predictions_json['goterms'])
gonames = np.asarray(predictions_json['gonames'])

print(f"pdb_chains: {predictions_json['pdb_chains']}")
print(f"Y_hat shape: {Y_hat.shape}")
print(f"goterms shape: {goterms.shape}")
print(f"gonames shape: {gonames.shape}")

pred_scores_df = pd.DataFrame({
    'pdb_chain': 'query_prot',  # Broadcast the single pdb_chain value to all rows.
    'goterms': goterms,
    'gonames': gonames,
    'Y_hat': Y_hat.flatten()
})

# Sort by descending Y_hat (highest probability score first)
pred_scores_df.sort_values(by='Y_hat', ascending=False, inplace=True)
pred_scores_df

Keys: dict_keys(['pdb_chains', 'Y_hat', 'goterms', 'gonames'])
pdb_chains: ['query_prot']
Y_hat shape: (1, 942)
goterms shape: (942,)
gonames shape: (942,)


Unnamed: 0,pdb_chain,goterms,gonames,Y_hat
496,query_prot,GO:0005509,calcium ion binding,9.999923e-01
51,query_prot,GO:0046914,transition metal ion binding,6.063118e-03
646,query_prot,GO:0003677,DNA binding,1.624436e-03
166,query_prot,GO:0005506,iron ion binding,1.039369e-03
492,query_prot,GO:0042802,identical protein binding,9.971423e-04
...,...,...,...,...
457,query_prot,GO:0004536,deoxyribonuclease activity,9.556396e-13
430,query_prot,GO:0000987,cis-regulatory region sequence-specific DNA bi...,7.696111e-13
354,query_prot,GO:0034212,peptide N-acetyltransferase activity,5.356901e-13
872,query_prot,GO:0016796,"exonuclease activity, active with either ribo-...",4.803297e-13
