## Biotrainer Inference example

After having trained a model, you can use the out.yml and an input sequence file to make predictions

In [1]:
from biotrainer.utilities import read_config_file
from biotrainer.inference import Inferencer

In [2]:
out_config_path = '../residue_to_class/output/out.yml'
out_config = read_config_file(out_config_path)

Let's find out how well the model performs on the test set.

In [3]:
print(f"For the {out_config['model_choice']}, the metrics on the test set are:")
for metric in out_config['test_iterations_results']['metrics']:
    print(f"\t{metric} : {out_config['test_iterations_results']['metrics'][metric]}")

For the CNN, the metrics on the test set are:
	- f1_score class 0 : 0.0
	- f1_score class 1 : 0.0
	- f1_score class 2 : 0.0
	- f1_score class 3 : 0.0
	- f1_score class 4 : 0.0
	- precission class 0 : 0.0
	- precission class 1 : 0.0
	- precission class 2 : 0.0
	- precission class 3 : 0.0
	- precission class 4 : 0.0
	- recall class 0 : 0.0
	- recall class 1 : 0.0
	- recall class 2 : 0.0
	- recall class 3 : 0.0
	- recall class 4 : 0.0
	accuracy : 0.0
	loss : 1.623467206954956
	macro-f1_score : 0.0
	macro-precision : 0.0
	macro-recall : 0.0
	matthews-corr-coeff : -0.3000600337982178
	micro-f1_score : 0.0
	micro-precision : 0.0
	micro-recall : 0.0
	spearmans-corr-coeff : -0.14046210050582886


First we need to create the embeddings for the sequences we are interested in

In [4]:
from biotrainer.embedders import OneHotEncodingEmbedder

  @numba.jit()
  @numba.jit()
  @numba.jit()
  @numba.jit()


In [5]:
embedder = OneHotEncodingEmbedder()

In [6]:
sequences = [
    "PROVTEIN",
    "SEQVENCESEQVENCE"
]

In [7]:
embeddings = list(embedder.embed_many(sequences))
# Note that for per-sequence embeddings, you would have to reduce the embeddings now:
# embeddings = [[embedder.reduce_per_protein(embedding)] for embedding in embeddings]

Next we generate an inference object from the out config of our training run

In [8]:
inferencer = Inferencer(**out_config)

Got 1 split(s): hold_out




In [9]:
predictions = inferencer.from_embeddings(embeddings, split_name="hold_out")

We can inspect the predictions

In [10]:
for sequence, prediction in zip(sequences, predictions["mapped_predictions"].values()):
    print(sequence)
    print(prediction)

PROVTEIN
FFFDFDFF
SEQVENCESEQVENCE
FFEFFFFFDEFFFFEF


**If your model uses dropout, you can also use inferencer.from_embeddings_with_monte_carlo_dropout to get the predictions with monte-carlo dropout. This is a method to quantify the uncertainty within your model.**

In [11]:
predictions_mcd = inferencer.from_embeddings_with_monte_carlo_dropout(embeddings, n_forward_passes=30, confidence_level=0.05, split_name="hold_out")

In [12]:
# Show predictions for first sequence:
for idx, residue in enumerate(sequences[0]):
    print(f"Residue: {residue}, MCD Prediction: {predictions_mcd['0'][idx]}")
    # prediction: Class prediction based on the mean over 30 forward passes
    # mcd_mean: Average over 30 forward passes
    # mcd_lower_bound: Lower bound of confidence interval using normal distribution with the given confidence level
    # mcd_upper_bound: Upper bound of confidence interval using normal distribution with the given confidence level

Residue: P, MCD Prediction: {'prediction': 'F', 'mcd_mean': tensor([0.1805, 0.2024, 0.2090, 0.2164, 0.1918], device='cuda:0'), 'mcd_lower_bound': tensor([0.1795, 0.2012, 0.2075, 0.2155, 0.1906], device='cuda:0'), 'mcd_upper_bound': tensor([0.1814, 0.2037, 0.2104, 0.2173, 0.1929], device='cuda:0')}
Residue: R, MCD Prediction: {'prediction': 'F', 'mcd_mean': tensor([0.1854, 0.2050, 0.2000, 0.2168, 0.1927], device='cuda:0'), 'mcd_lower_bound': tensor([0.1841, 0.2032, 0.1985, 0.2156, 0.1916], device='cuda:0'), 'mcd_upper_bound': tensor([0.1868, 0.2068, 0.2016, 0.2181, 0.1939], device='cuda:0')}
Residue: O, MCD Prediction: {'prediction': 'F', 'mcd_mean': tensor([0.1977, 0.2039, 0.1943, 0.2077, 0.1964], device='cuda:0'), 'mcd_lower_bound': tensor([0.1967, 0.2023, 0.1930, 0.2057, 0.1951], device='cuda:0'), 'mcd_upper_bound': tensor([0.1988, 0.2054, 0.1955, 0.2097, 0.1978], device='cuda:0')}
Residue: V, MCD Prediction: {'prediction': 'D', 'mcd_mean': tensor([0.1929, 0.2117, 0.2044, 0.1997, 0.1

**To compute error margins for your model, you can use the bootstrapping functionality. You must provide the according targets for this. In this example, we will use some arbitrary values.**

In [14]:
targets = ["FDFDFDFE", "FFEFEEFFDEFFFFEF"]
bootstrapping_result = inferencer.from_embeddings_with_bootstrapping(embeddings, targets, split_name="hold_out", iterations=30, confidence_level=0.05, seed=42)
print(bootstrapping_result)

{'loss': {'mean': 1.5665744543075562, 'error': 0.000751411949750036}, 'accuracy': {'mean': 0.7902777791023254, 'error': 0.036831799894571304}, 'macro-precision': {'mean': 0.4960740804672241, 'error': 0.03868114948272705}, 'micro-precision': {'mean': 0.7902777791023254, 'error': 0.036831799894571304}, '- precission class 0': {'mean': 0.0, 'error': 0.0}, '- precission class 1': {'mean': 1.0, 'error': 0.0}, '- precission class 2': {'mean': 0.7333333492279053, 'error': 0.16094747185707092}, '- precission class 3': {'mean': 0.7470370531082153, 'error': 0.03337482735514641}, '- precission class 4': {'mean': 0.0, 'error': 0.0}, 'macro-recall': {'mean': 0.4265555441379547, 'error': 0.03321339190006256}, 'micro-recall': {'mean': 0.7902777791023254, 'error': 0.036831799894571304}, '- recall class 0': {'mean': 0.0, 'error': 0.0}, '- recall class 1': {'mean': 0.8027777671813965, 'error': 0.0485801137983799}, '- recall class 2': {'mean': 0.39666667580604553, 'error': 0.08839632570743561}, '- recall