# A quick start tutorial

## A minimal example
We analyze a simulated data-set `example_data.rds` to show the basic workflow with `pfar`. 
### Data-set

In [1]:
dat = readRDS('../test/example_data.rds')
names(dat)
names(dat$X)
rbind(dim(dat$X[[1]]), dim(dat$X[[2]]), dim(dat$X[[3]]))
dat$loglik

0,1
100,300
100,300
100,300


The simulated data-set has 300 samples and 300 features. For this simulated data we know the model log-likelihood is -253747.5.

### PFA model fit
We run a basic PFA using standard EM algorithm with uniform prior for sample distribution on edges, and random initialization of factors. 

In [2]:
set.seed(999)
K = length(dat$X) + 1
X = do.call(rbind, dat$X)
control = list(logfile = 'example_data', verbose = F, alpha = NULL)
res = pfar::pfa(X, K = K, control = control)
saveRDS(res, 'example_result.rds')

In [3]:
print(res)

$F_init
           [,1]       [,2]       [,3]      [,4]       [,5]       [,6]
[1,] -13.348998  -6.135172 -0.2410379 11.148072 -27.134831  0.5598765
[2,]  -7.863766 -16.793477 -0.8523309 -3.953813 -19.079478 -0.2074457
[3,]  21.553799  -2.795115  0.6323632  9.772271  -8.718292 -7.3564686
[4,] -12.099822  -2.057069  2.1463089  5.418662 -11.954877 -4.2176940
           [,7]        [,8]       [,9]     [,10]      [,11]     [,12]     [,13]
[1,]   5.735254 -1.25255505 -10.073224  41.28228 -37.117860 0.7823492  5.613040
[2,]   8.767105 -0.04930917   2.801243  94.40522 -14.885702 0.7714685  6.803914
[3,] -52.307295  6.76842230 -14.100849 -10.18895  34.490508 0.2554594 -9.832734
[4,]  17.368005  0.82637636   5.976363 -12.98749  -1.903003 0.1782108 -2.472395
          [,14]      [,15]     [,16]     [,17]      [,18]      [,19]      [,20]
[1,] -17.314087 -1.2006322 -1.093316 39.707172 -0.6484892 -18.204571   1.424071
[2,]   2.841044 -1.0222291 -4.175962  8.051926  1.2533393 -10.710817  -2.779842
[3

The algorithm converged within 102 iterations. The fitted log-likelihood is -254090.0.

## Search for best initial values
The previous run has used a random factor initialization. There are cases when we worry about convergence issue, ie, EM algorithm trapped on local optima, and the way out is to initialize the algorithm with different factors and adopt an initialization that gives the best log-likelihood.

Here we add `init = c(20, 5)` to the `control` list. It means we want to try 20 random initial factors and pick up the best one in likelihood after 5 iterations:

In [4]:
set.seed(999)
control = list(logfile = 'example_data', verbose = F, alpha = NULL, init = c(20,5))
res = pfar::pfa(X, K = K, control = control)
res

$F_init
          [,1]      [,2]        [,3]      [,4]       [,5]      [,6]       [,7]
[1,] -1.779343 -1.648705 -0.54478104 -1.739496  4.9511441 -1.423721   5.394134
[2,] -2.577555 -2.095564  0.08257832 -0.923324 -2.3667268  2.767845   8.539143
[3,] -3.023605 -1.656210  0.23060192 -2.066017 -0.1884845  1.353863  -1.993595
[4,]  2.085944  1.964491  0.51729482  1.870755 -1.1152618 -1.365222 -10.348029
           [,8]        [,9]     [,10]     [,11]       [,12]      [,13]
[1,]  2.0389172 -0.10411703 -6.868646 -0.821511 -0.15643893 -3.4024710
[2,]  0.7140160  1.37308833  5.399136  4.339226  0.76845140 -4.7144721
[3,]  1.5283138  0.83875048 -2.753111 -3.420314  0.04768409 -3.9941015
[4,] -0.1491795 -0.04432678  5.649249 -4.515974  0.79939268  0.7593258
         [,14]      [,15]       [,16]      [,17]      [,18]      [,19]
[1,] -4.114082 -0.9480789 -2.62226713  0.8886139  0.8016255 -0.5326564
[2,]  5.432922 -1.1240489 -0.07893925 -5.3639914 -0.2433817  0.5195113
[3,]  9.153406 -0.0632237 -1.

In this example we obtain identical result, indicating convergence is perhaps not an issue for this model fit.