## Inference with SMURFF

In this notebook we will continue on the first example. After running a training session again in SMURFF, we will look deeper into how to use SMURFF for making predictions.

### Saving models

We run a `Macau` training session using side information (`ecfp`) from the chembl dataset.
We make sure we *save every 10th sample*, such that we can load the model afterwards. This run will take some minutes to run.

In [None]:
import smurff

ic50_train, ic50_test, ecfp = smurff.load_chembl()

session = smurff.MacauSession(
                       Ytrain     = ic50_train,
                       Ytest      = ic50_test,
                       side_info  = [ecfp, None],
                       num_latent = 16,
                       burnin     = 200,
                       nsamples   = 10,
                       save_freq  = 1,
                    
                       save_prefix= "ic50-macau/save",
                       verbose    = 1,)

predictions = session.run()

### Saved files

The saved files are indexed in a root ini-file, in this case the root ini-file will be `ic50-save-root.ini`.
The content of this file lists all saved info for this training run. For example

```ini
options = ic50-save-options.ini
sample_step_10 = ic50-save-sample-10-step.ini
sample_step_20 = ic50-save-sample-20-step.ini
sample_step_30 = ic50-save-sample-30-step.ini
sample_step_40 = ic50-save-sample-40-step.ini
```

Each step ini-file contains the matrices saved in the step:

```ini
#models
num_models = 2
model_0 = ic50-save-sample-50-U0-latents.ddm
model_1 = ic50-save-sample-50-U1-latents.ddm
#predictions
pred = ic50-save-sample-50-predictions.csv
pred_state = ic50-save-sample-50-predictions-state.ini
#priors
num_priors = 2
prior_0 = ic50-save-sample-50-F0-link.ddm
prior_1 = ic50-save-sample-50-F1-link.ddm
```

### Making  predictions from a `TrainSession`

The easiest way to make predictions is from an existing `TrainSession`:

In [None]:
predictor = session.makePredictSession()
print(predictor)


Once we have a `PredictSession`, there are serveral ways to make predictions:

 * From a sparse matrix
 * For all possible elements in the matrix (the complete $U \times V$)
 * For a single point in the matrix
 * Using only side-information
 
#### Predict all elements

We can make predictions for all rows $\times$ columns in our matrix

In [None]:
p = predictor.predict_all()
print(p.shape) # p is a numpy array of size: (num samples) x (num rows) x (num columns)

#### Predict element in a sparse matrix
We can make predictions for a sparse matrix, for example our `ic50_test` matrix:

In [None]:
p = predictor.predict_some(ic50_test)
print(len(p),"predictions") # p is a list of Predictions
print("predictions 1:", p[0])

#### Predict just one element

Or just one element. Let's predict the first element of our `ic50_test` matrix:

In [None]:
from scipy.sparse import find
(i,j,v) = find(ic50_test)
p = predictor.predict_one((i[0],j[0]),v[0])
print(p)

And plot the histogram of predictions for this element.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

# Plot a histogram of the samples.
plt.subplot(111)
plt.hist(p.pred_all, bins=10, density=True, label = "predictions's histogram")
plt.plot(p.val, 1., 'ro', markersize =5, label = 'actual value')
plt.legend()
plt.title('Histogram of ' + str(len(p.pred_all)) + ' predictions')
plt.show()

#### Make predictions using side information

We can make predictions for rows/columns not in our train matrix, using only side info:

In [None]:
import numpy as np
from scipy.sparse import find

(i,j,v) = find(ic50_test)
row_side_info = ecfp.tocsr().getrow(i[0])
beta = predictor.samples[0].betas[0]
uhat = row_side_info.dot(beta.transpose())
print("side = ", row_side_info.shape)
print("uhat = ", uhat.shape)
uhat_squeezed = np.squeeze(uhat) 
print("uhat squeezed = ", uhat_squeezed.shape)
p = predictor.predict_one((row_side_info,j[0]),v[0])
print(p)

### Accessing the saved model itself

The latents matrices for all samples are stored in the `PredictSession` as `numpy` arrays

In [None]:
# print the U matrices for all samples
for i,s in enumerate(predictor.samples):
    print("sample", i, ":", [ (m, u.shape) for m,u in enumerate(s.latents) ])

### Making predictions from saved run

One can also make a `PredictSession` from a save root ini-file:

In [None]:
import smurff

predictor = smurff.PredictSession.fromRootFile("ic50-macau-save-root.ini")
print(predictor)