# Getting Started

A simple example that demostrates (a) how to load existing data, or (b) generate your own dataset, followed by a learning task with 4 models.

### Load other modules

In [1]:
import argparse
import pandas
import numpy.random as random
import sklearn.metrics
import time
pandas.options.mode.chained_assignment = None

### RML Specific modules

First several modules are related to generating the datasets, then a single conditional model is passed to collective inference and semi-supervised methods

In [2]:
from rmllib.data.load import BostonMedians
from rmllib.data.generate import BayesSampleDataset
from rmllib.data.generate import edge_rejection_generator
from rmllib.models.conditional import RelationalNaiveBayes
from rmllib.models.collective_inference import VariationalInference
from rmllib.models.semi_supervised import ExpectationMaximization

# Seed numpy
random.seed(16)


### Two example datasets

One augments the boston housing dataset by adding some network connections, the other is fully generated large network with 1M nodes and ~25M edges

In [3]:
DATASETS = []

DATASETS.append(BostonMedians(name='Boston Medians', subfeatures=['RM', 'AGE'], sparse=True).node_sample_mask(.1))
DATASETS.append(BayesSampleDataset(name='Sparse 1,000,000', n_rows=1000000, n_features=3, generator=edge_rejection_generator, density=.00002, sparse=False).node_sample_mask(.01))

Average Degree: 19.998794


### Compare several models

A set of models to compare.  Note that the VI and EM modules are *wrapped* around some underlying method.  For VI, this has the effect of overridding the predict_proba method (of RNB) and for EM it effectly overwrites the .fit method.

In [4]:
MODELS = []

MODELS.append(RelationalNaiveBayes(name='NB', learn_method='iid', infer_method='iid', calibrate=False))
MODELS.append(RelationalNaiveBayes(name='RNB', learn_method='r_iid', infer_method='r_iid', calibrate=False))
MODELS.append(VariationalInference(RelationalNaiveBayes)(name='RNB_VI', learn_method='r_iid', calibrate=True))
MODELS.append(ExpectationMaximization(VariationalInference(RelationalNaiveBayes))(name='RNB_EM_VI', learn_iter=3, calibrate=True))


### Do the actual evaluation

All of our datasets and models have been setup; perform some evaluations

In [5]:

print('Begin Evaluation')
for dataset in DATASETS:
    TRAIN_DATA = dataset.create_training()

    for model in MODELS:
        print('\n' + "(" + dataset.name + ") " + model.name + ": Begin Train")
        train_data = TRAIN_DATA.copy()
        start_time = time.time()
        model.fit(train_data)
        print("(" + dataset.name + ") " + model.name, 'Training Time:', time.time() - start_time)
        model.predictions = model.predict_proba(train_data)
        print("(" + dataset.name + ") " + model.name, 'Total Time:', time.time() - start_time)            
        print("(" + dataset.name + ") " + model.name, 'Average Prediction:', model.predictions[:, 1].mean(), 'AUC:', sklearn.metrics.roc_auc_score(dataset.labels.Y[dataset.mask.Unlabeled][1], model.predictions[:, 1]))
        print("(" + dataset.name + ") " + model.name + ": End Train")

print('End Evaluation')

Begin Evaluation

(Boston Medians) NB: Begin Train
(Boston Medians) NB Training Time: 0.012000560760498047
(Boston Medians) NB Total Time: 0.013000726699829102
(Boston Medians) NB Average Prediction: 0.42285631281812075 AUC: 0.8292069557080475
(Boston Medians) NB: End Train

(Boston Medians) RNB: Begin Train
(Boston Medians) RNB Training Time: 0.028999805450439453
(Boston Medians) RNB Total Time: 0.037995338439941406
(Boston Medians) RNB Average Prediction: 0.4211773768087786 AUC: 0.8507096069868997
(Boston Medians) RNB: End Train

(Boston Medians) RNB_VI: Begin Train
(Boston Medians) RNB_VI Training Time: 0.025997161865234375
(Boston Medians) RNB_VI Total Time: 0.1549978256225586
(Boston Medians) RNB_VI Average Prediction: 0.5967295120169761 AUC: 0.8853321896444167
(Boston Medians) RNB_VI: End Train

(Boston Medians) RNB_EM_VI: Begin Train
(Boston Medians) RNB_EM_VI Training Time: 0.4390103816986084
(Boston Medians) RNB_EM_VI Total Time: 0.5580120086669922
(Boston Medians) RNB_EM_VI A