## Incorporating Neural Networks

author: Jacob Schreiber <br>
contact: jmschreiber91@gmail.com

Neural networks have become exceedingly popular recently due, in part, to their ability to achieve state-of-the-art performance on a variety of tasks without requiring complicated feature extraction pipelines. These models are frequently applied to domains where there is a great deal of raw structured data, such as computer vision, where neighboring pixels are strongly correlated, and natural language processing, where words are organized and modified in specific ways to convey meaning.

There is some overlap between neural networks and probabilistic models. For example, deep hidden Markov models (DHMMs) are models where the input to the neural network is some observation, such as an image, and the output is the state in the hidden Markov model that the observation belongs to. These resulting probabilities are then treated as the likelihood function P(D|M) by the model, regularized using the transition matrix, and then re-normalized to get the posterior probabilities. Another example is a deep mixture model, where expectation-maximization is used to train the model on unlabeled images.

Thus far, pomegranate has stuck to probabilistic models that are not coupled with a neural network. However, with the recent inclusion of custom distributions, one can use a quick hack in order to turn many of pomegranate's models into deep models.

In [3]:
%matplotlib inline
import numpy
import seaborn; seaborn.set_style('whitegrid')

from pomegranate import *

numpy.random.seed(0)

%load_ext watermark
%watermark -m -n -p numpy,scipy,pomegranate

The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark
Mon Dec 03 2018 

numpy 1.14.2
scipy 1.0.0
pomegranate 0.10.0

compiler   : GCC 7.2.0
system     : Linux
release    : 4.15.0-36-generic
machine    : x86_64
processor  : x86_64
CPU cores  : 4
interpreter: 64bit


### Deep Mixture Models

In [55]:
from keras.models import Sequential
from keras.layers import Dense

from sklearn.cluster import KMeans

class NeuralNetworkWrapper():
    def __init__(self, model, i):
        self.d = 10
        self.model = model
        self.i = i
        self.model.X = []
        self.model.y = []
        self.model.w = []
    
    def log_probability(self, X):
        return numpy.log(self.model.predict(X)[:,self.i])
    
    def summarize(self, X, w):
        self.model.X.append(X.copy())
        self.model.w.append(w.copy())
        
        y = numpy.zeros((X.shape[0], 2))
        y[:,self.i] = 1
        self.model.y.append(y)
        
    def from_summaries(self, inertia=0.0):
        if self.i == 0:
            X = numpy.concatenate(self.model.X)
            w = numpy.concatenate(self.model.w)
            y = numpy.concatenate(self.model.y)
        
            self.model.train_on_batch(X, y, sample_weight=w)
        
        self.clear_summaries()
    
    def clear_summaries(self):
        self.model.X = []
        self.model.y = []
        self.model.w = []
        
n = 10000

X = numpy.random.randn(n, 10)
X[::2] += 0.5

X_valid = numpy.random.randn(10, 10)
X_valid[::2] += 0.5

y_init = KMeans(2).fit_predict(X)
y = numpy.zeros((X.shape[0], 2))
y[numpy.arange(n), y_init] = 1

nn = Sequential([Dense(128, input_dim=10, activation='relu'), Dense(2, activation='softmax')])
nn.compile(loss='categorical_crossentropy', optimizer='adam')
nn.fit(X, y, epochs=1, verbose=0)

d1 = NeuralNetworkWrapper(nn, 0)
d2 = NeuralNetworkWrapper(nn, 1)

model = GeneralMixtureModel([d1, d2])
model.fit(X, max_iterations=100, verbose=True)

print model.predict_proba(X_valid)

model2 = GeneralMixtureModel.from_samples(MultivariateGaussianDistribution, 2, X)
print model2.predict_proba(X_valid)

[1] Improvement: 0.0123739647152	Time (s): 0.1254
Total Improvement: 0.0123739647152
Total Time (s): 0.1869
[[0.57102217 0.42897783]
 [0.96611289 0.03388711]
 [0.2672109  0.7327891 ]
 [0.00080146 0.99919854]
 [0.00012695 0.99987305]
 [0.99900971 0.00099029]
 [0.7510968  0.2489032 ]
 [0.99801552 0.00198448]
 [0.15337515 0.84662485]
 [0.16009661 0.83990339]]
[[0.54443732 0.45556268]
 [0.6270334  0.3729666 ]
 [0.15921108 0.84078892]
 [0.21760758 0.78239242]
 [0.08559682 0.91440318]
 [0.39204585 0.60795415]
 [0.07883944 0.92116056]
 [0.85338964 0.14661036]
 [0.35588869 0.64411131]
 [0.06725062 0.93274938]]


In [54]:
X_valid[1]

array([ 1.85009706, -0.36001267, -0.3307358 ,  1.07383233, -0.41560461,
       -0.43283055,  1.48240643, -0.73301856, -1.44795206, -1.16381234])

In [28]:
model2.log_probability(X_valid[:10] + 4)

array([-65.22359289, -48.00506255, -75.39049114, -59.00727455,
       -63.98556007, -55.92679917, -58.53411466, -51.00950827,
       -64.21108064, -47.85357562])

### Deep hidden Markov models

In [None]:
class NeuralNetworkWrapper():
    def __init__(self, model, i):
        self.d = 10
        self.model = model
        self.i = i
    
    def log_probability(self, X):
        return numpy.log(self.model.predict(X)[:,self.i])

X = numpy.random.randn(1000, 20, 10)
X[:, ::2] += 1

y = numpy.zeros((20000, 2))
y[::2, 0] = 1
y[1::2, 1] = 1

nn = Sequential([Dense(128, input_dim=10, activation='relu'), Dense(2, activation='softmax')])
nn.compile(loss='categorical_crossentropy', optimizer='adam')
nn.fit(X.reshape(20000, 10), y, epochs=10, verbose=0)

s1 = State(NeuralNetworkWrapper(nn, 0))
s2 = State(NeuralNetworkWrapper(nn, 1))

model = HiddenMarkovModel()
model.add_states(s1, s2)
model.add_transition(model.start, s1, 0.9)
model.add_transition(model.start, s2, 0.1)
model.add_transition(s1, s1, 0.9)
model.add_transition(s1, s2, 0.1)
model.add_transition(s2, s1, 0.7)
model.add_transition(s2, s2, 0.3)
model.bake()

model.predict_proba(X[0])