## BPMF using posterior propagation


### Downloading the data files

In these examples we use ChEMBL dataset for compound-proteins activities (IC50). The IC50 values and ECFP fingerprints can be downloaded using this smurff function:

In [1]:
import smurff

ic50_train, ic50_test, ecfp = smurff.load_chembl()

### Running SMURFF

Finally we run make a BPMF training session and call `run`. The `run` function builds the model and
returns the `predictions` of the test data.

In [None]:
session = smurff.BPMFSession(
                       Ytrain     = ic50_train,
                       Ytest      = ic50_test,
                       num_latent = 16,
                       burnin     = 40,
                       nsamples   = 20,
                       verbose    = 1,
                       checkpoint_freq = 1,
                       save_freq = 1,)

predictions = session.run()

PythonSession {
  Data: {
    Type: ScarceMatrixData [with NAs]
    Component-wise mean: 6.35272
    Component-wise variance: 1.88772
    Noise: Fixed gaussian noise with precision: 5.00
    Size: 47424 [15073 x 346] (0.91%)
  }
  Model: {
    Num-latents: 16
  }
  Priors: {
    0: NormalPrior
    1: NormalPrior
  }
  Result: {
    Test data: 11856 [15073 x 346] (0.23%)
  }
  Config: {
      Iterations: 40 burnin + 20 samples
      Save model: every 1 iteration
      Checkpoint state: every 1 seconds
      Save prefix: /var/folders/d4/zbkxjlq94pq0523x6sd7v3mw0000gn/T/tmpeymfrr4g/
      Save extension: .ddm
  }
}

Initial:   0/0 RMSE: nan (1samp: nan) U: [ 0: 0.00,1: 0.00 ] took 0.0s
 Burnin:   1/40 RMSE: nan (1samp: 6.68) U: [ 0: 154.85,1: 115.05 ] took 0.1s
 Burnin:   2/40 RMSE: nan (1samp: 4.19) U: [ 0: 300.94,1: 124.48 ] took 0.1s
 Burnin:   3/40 RMSE: nan (1samp: 4.05) U: [ 0: 383.10,1: 122.45 ] took 0.1s
 Burnin:   4/40 RMSE: nan (1samp: 4.03) U: [ 0: 432.74,1: 118.89 ] took 0.1s


We can use the `calc_rmse` function to calculate the RMSE.

In [None]:
rmse = smurff.calc_rmse(predictions)
rmse

### Plotting predictions versus actual values
Next to RMSE, we can also plot the predicted versus the actual values, to see how well the model performs.

In [None]:
%matplotlib notebook

import numpy
from matplotlib.pyplot import subplots, show

y = numpy.array([ p.val for p in predictions ])
predicted = numpy.array([ p.pred_avg for p in predictions ])

fig, ax = subplots()
ax.scatter(y, predicted, edgecolors=(0, 0, 0))
ax.plot([y.min(), y.max()], [y.min(), y.max()], 'k--', lw=4)
ax.set_xlabel('Measured')
ax.set_ylabel('Predicted')
show()