-
Notifications
You must be signed in to change notification settings - Fork 1.1k
/
theano_to_pylearn2_tutorial.txt
606 lines (456 loc) · 24 KB
/
theano_to_pylearn2_tutorial.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
.. _theano_to_pylearn2_tutorial:
=======================
Your models in Pylearn2
=======================
Who should read this
====================
We recommend you spend some time with Pylearn2 and read some of our other tutorials before starting with this minimalistic technique.
If you are completely new to Pylearn2, have a look at the
`softmax regression tutorial <http://nbviewer.ipython.org/github/lisa-lab/pylearn2/blob/master/pylearn2/scripts/tutorials/softmax_regression/softmax_regression.ipynb>`_.
Pylearn2 is great for many things; we’ll highlight two here.
* It allows you to experiment with new ideas without much implementation
overhead. The library was built to be modular, and it aims to be usable
without an extensive knowledge of the codebase. Writing a new model from
scratch is usually pretty fast once you know what to do and where to look.
* It has an interface (YAML) that allows one to decouple implementation from
experimental choices, enabling experiments to be constructed in a light
and readable fashion.
Obviously, there is always a trade-off between being user-friendly and being
flexible, and Pylearn2 is no exception. For instance, users looking for a way to
work with sequential data might have a harder time getting started (although
we’re working to make this experience better).
In this post, we will assume that you have built a regression or classification
model with Theano and that the training data can be cast into two
matrices, one for training examples and one for training targets. People with
different requirements may need to work a little more (e.g. by figuring out how to put
their data inside Pylearn2). This tutorial contains
useful information for anyone interested in porting a model to Pylearn2.
How is Pylearn2 used?
========================
While many researchers use Pylearn2 as their primary research tool, this doesn't necessarily mean they know or use every feature in Pylearn2. In fact, you can prototype new models in a very
Theano-like fashion: write a model as a big monolithic block of hard coded
Theano expressions, and wrap that up in the minimal amount of code necessary
to be able to plug a model into Pylearn2. **This bare minimum is what we’ll explain here.**
The resulting model may be hard to extend, but it represents a good starting point. As you
explore new ideas and change the code, you can gradually make it more flexible:
a hard coded input dimension gets factored out as a constructor argument,
functions being composed are separated into layers, etc.
Our point: **it is alright to stick to the
bare minimum when developing a model for Pylearn2**. Your code probably won't
satisfy any other use cases than your own, but this is something that you can
change gradually as you go. There's no need to overcomplicate things when you start.
The bare minimum
================
Let's look at that *bare minimum*. It involves writing exactly two subclasses:
* One subclass of `pylearn2.costs.cost.Cost`
* One subclass of `pylearn2.models.model.Model`
Need more than that? Nope. That's it! Let's have a look.
It all starts with a cost expression
------------------------------------
In the scenario we’re describing, your model maps an input to an output, the
output is compared with some ground truth using some measure of dissimilarity,
and the parameters of the model are changed to reduce this measure using
gradient information.
It is therefore natural that the object that interfaces between the model and
the training algorithm represents a cost. The base class for this object is
`pylearn2.costs.cost.Cost` and does three main things:
* It describes what data it needs to perform its duty and how it should be
formatted.
* It computes the cost expression by feeding the input to the model and
receiving its output.
* It differentiates the cost expression with respect to the model parameter and
returns the gradients to the training algorithm.
What's nice about `Cost` is if you follow the guidelines we’re about to describe,
you only have to worry about the cost expression; the gradient part is all
handled by the `Cost` base class, and a very useful `DefaultDataSpecsMixin`
mixin subclass is defined to handle the data description part (more about that
when we look at the `Model` subclass).
Let's look at how the subclass should look:
.. code-block:: python
from pylearn2.costs.cost import Cost, DefaultDataSpecsMixin
class MyCostSubclass(DefaultDataSpecsMixin, Cost):
# Here it is assumed that we are doing supervised learning
supervised = True
def expr(self, model, data, **kwargs):
space, source = self.get_data_specs(model)
space.validate(data)
inputs, targets = data
outputs = model.some_method_for_outputs(inputs)
loss = # some loss measure involving outputs and targets
return loss
The `supervised` class attribute is used by `DefaultDataSpecsMixin` to know how
to specify the data requirements. If it is set to `True`, the cost will expect
to receive inputs and targets, and if it is set to `False`, the cost will expect
to receive inputs only. In the example, it is assumed that we are doing
supervised learning, so we set `supervised` to `True`.
The first two lines of `expr` do some basic input checking and should always be
included at the beginning of your `expr` method. Without going too much into
detail, `space.validate(data)` will make sure that the data you get is the data
you requested (e.g. if you do supervised learning, you need an input a tensor
variable and a target tensor variable). How to determine “what you need" will be
covered when we look at the `Model` subclass.
In that case, `data` is a tuple containing the inputs as the first element and
the targets as the second element.
We then get the model output by calling its `some_method_for_outputs` method,
whose name and behaviour is really for you to decide, as long as your `Cost`
subclass knows which method to call on the model.
Finally, we compute some loss measure on `outputs` and `targets` and return that
as the cost expression.
Note that things don't have to be *exactly* like this. For instance, you could
ask the model to have a method that takes inputs and targets as arguments and
returns the loss directly, and that would be perfectly fine. All you need is
some way to make your `Model` and `Cost` subclasses work together to produce
a cost expression in the end.
Defining the model
------------------
Now it's time to make things more concrete by writing the model itself. The
model will be a subclass of `pylearn2.models.model.Model`, which is responsible
for the following:
* Defining what its parameters are
* Defining what its data requirements are
* Doing something with the input to produce an output
As is the case with `Cost`, the `Model` base class does many useful things on its own,
provided you set the appropriate instance attributes. Let's have a look at a
subclass example:
.. code-block:: python
from pylearn2.models.model import Model
class MyModelSubclass(Model):
def __init__(self, *args, **kwargs):
super(MyModelSubclass, self).__init__()
# Some parameter initialization using *args and **kwargs
# ...
self._params = [
# List of all the model parameters
]
self.input_space = # Some `pylearn2.space.Space` subclass
# This one is necessary only for supervised learning
self.output_space = # Some `pylearn2.space.Space` subclass
def some_method_for_outputs(self, inputs):
# Some computation involving the inputs
The first thing you should do if you're overriding the constructor is call the
the superclass' constructor. Pylearn2 checks for that and will scold you if you
don't.
You should then initialize you model parameters **as shared variables**:
Pylearn2 will build an updates dictionary for your model variables using
gradients returned by your cost. **Protip: the `pylearn2.utils.sharedX` method
initializes a shared variable with the value and an optional name you provide.
This allows your code to be GPU-compatible without putting too much thought into
it.** For instance, a weights matrix can be initialized this way:
.. code-block:: python
import numpy
from pylearn2.utils import sharedX
self.W = sharedX(numpy.random.normal(size=(size1, size2)), 'W')
Put all your parameters in a list as the `_params` instance attribute. The
`Model` superclass defines a `get_params` method which returns `self._params`
for you, and that is method that is called to get the model parameters when
`Cost` is computing the gradients.
Your `Model` subclass should also describe the data format it expects as inputs
(`self.input_space`) and the data format of the model's output
(`self.output_space`), which is required only if you're doing supervised
learning. These attributes should be instances of `pylearn2.space.Space` (and
generally are instances of `pylearn2.space.VectorSpace`, a subclass of
`pylearn2.space.Space` used to represent batches of vectors). Broadly, this
mechanism allows for automatic conversion between
different `data formats <http://deeplearning.net/software/pylearn2/internal/data_specs.html#data-specs>`_ (e.g. if your targets are stored as integer indexes in
the dataset but are required to be one-hot encoded by the model).
The `some_method_for_outputs` method is really where all the magic happens. Remember,
the name of the method doesn't really matter, as long as your
`Cost` subclass knows that it's the one it has to call. This method expects a
tensor variable as input and returns a symbolic expression involving the input
and its parameters. What happens in between is up to you, and this is where you
can put all the Theano code you could possibly hope for, just like you would do
in pure Theano scripts.
Examples
================
Let's demonstrate these ideas by writing two
models, one which does supervised learning and one which does unsupervised
learning.
The data you train these models on is up to you, as long as it is represented in
a matrix of features (each row being an example) and a matrix of targets (where each
row is a target for an example). Obviously this second matrix is only required for
supervised learning. While this is not the only way to store data in Pylearn2,
it is probably the most common method, so we will use it in the remainder of this discussion.
For the purposes of this tutorial, we will train models on the venerable
MNIST dataset, which you can download at:
.. code-block:: bash
wget http://deeplearning.net/data/mnist/mnist.pkl.gz
To make things easier to manipulate, we will unzip the archive into six different
files:
.. code-block:: bash
python -c "from pylearn2.utils import serial; \
data = serial.load('mnist.pkl'); \
serial.save('mnist_train_X.pkl', data[0][0]); \
serial.save('mnist_train_y.pkl', data[0][1].reshape((-1, 1))); \
serial.save('mnist_valid_X.pkl', data[1][0]); \
serial.save('mnist_valid_y.pkl', data[1][1].reshape((-1, 1))); \
serial.save('mnist_test_X.pkl', data[2][0]); \
serial.save('mnist_test_y.pkl', data[2][1].reshape((-1, 1)))"
Supervised learning using logistic regression
---------------------------------------------
Let's keep things simple by porting to Pylearn2 the *Hello
World!* of supervised learning: logistic regression. For a refresher, we suggest that you first
read the `deeplearning.net tutorial <http://www.deeplearning.net/tutorial/logreg.html#logreg>`_ on logistic regression. Here is
what we need to do:
* Implement the negative log-likelihood (NLL) loss in our `Cost` subclass
* Initialize the model parameters W and b
* Implement the model's logistic regression output
Let's start with the `Cost` subclass:
.. code-block:: python
import theano.tensor as T
from pylearn2.costs.cost import Cost, DefaultDataSpecsMixin
class LogisticRegressionCost(DefaultDataSpecsMixin, Cost):
supervised = True
def expr(self, model, data, **kwargs):
space, source = self.get_data_specs(model)
space.validate(data)
inputs, targets = data
outputs = model.logistic_regression(inputs)
loss = -(targets * T.log(outputs)).sum(axis=1)
return loss.mean()
We assumed our model has a `logistic_regression` method which
accepts a batch of examples and computes the logistic regression output. We will
implement that method in just a moment. We also computed the loss as the average
negative log-likelihood of the targets given the logistic regression output, as
described in the deeplearning.net tutorial. Also, notice how we set `supervised`
to `True`.
Now for the `Model` subclass:
.. code-block:: python
import numpy
import theano.tensor as T
from pylearn2.models.model import Model
from pylearn2.space import VectorSpace
from pylearn2.utils import sharedX
class LogisticRegression(Model):
def __init__(self, nvis, nclasses):
super(LogisticRegression, self).__init__()
self.nvis = nvis
self.nclasses = nclasses
W_value = numpy.random.uniform(size=(self.nvis, self.nclasses))
self.W = sharedX(W_value, 'W')
b_value = numpy.zeros(self.nclasses)
self.b = sharedX(b_value, 'b')
self._params = [self.W, self.b]
self.input_space = VectorSpace(dim=self.nvis)
self.output_space = VectorSpace(dim=self.nclasses)
def logistic_regression(self, inputs):
return T.nnet.softmax(T.dot(inputs, self.W) + self.b)
The model's constructor receives the dimensionality of the input and the number
of classes. It initializes the weights matrix and the bias vector with
`sharedX`. It also sets its input space to an instance of `VectorSpace` of
the dimensionality of the input (meaning it expects the input to be a batch of
examples which are all vectors of size `nvis`) and its output space to an
instance of `VectorSpace` of dimension `nclasses` (meaning it produces an output
corresponding to a batch of probability vectors, one element for each possible
class).
The `logistic_regression` method does pretty much what you would expect: it
returns a linear transformation of the input followed by a softmax
non-linearity.
How about we give it a try? Save those two code snippets in a single file (e.g.
`log_reg.py`) and save the following in `log_reg.yaml`:
.. code-block:: yaml
!obj:pylearn2.train.Train {
dataset: &train !obj:pylearn2.datasets.dense_design_matrix.DenseDesignMatrix {
X: !pkl: 'mnist_train_X.pkl',
y: !pkl: 'mnist_train_y.pkl',
y_labels: 10,
},
model: !obj:log_reg.LogisticRegression {
nvis: 784,
nclasses: 10,
},
algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
batch_size: 200,
learning_rate: 1e-3,
monitoring_dataset: {
'train' : *train,
'valid' : !obj:pylearn2.datasets.dense_design_matrix.DenseDesignMatrix {
X: !pkl: 'mnist_valid_X.pkl',
y: !pkl: 'mnist_valid_y.pkl',
y_labels: 10,
},
'test' : !obj:pylearn2.datasets.dense_design_matrix.DenseDesignMatrix {
X: !pkl: 'mnist_test_X.pkl',
y: !pkl: 'mnist_test_y.pkl',
y_labels: 10,
},
},
cost: !obj:log_reg.LogisticRegressionCost {},
termination_criterion: !obj:pylearn2.termination_criteria.EpochCounter {
max_epochs: 15
},
},
}
Run the following command:
.. code-block:: python
python -c "from pylearn2.utils import serial; \
train_obj = serial.load_train_file('log_reg.yaml'); \
train_obj.main_loop()"
Congratulations, you just implemented your first model in Pylearn2!
*(By the way, the targets you used to initialize `DenseDesignMatrix` instances
were column matrices, yet your model expects to receive one-hot encoded vectors.
The reason why you can do that is because Pylearn2 does the conversion for you
via the `data_specs` mechanism. That's why specifying the model's `input_space`
and `output_space` is important.)*
Unsupervised learning using an autoencoder
------------------------------------------
Let's now have a look at an unsupervised learning example: an autoencoder with
tied weights. Once again, we recommend that you read the
`deeplearning.net tutorial <http://www.deeplearning.net/tutorial/logreg.html#logreg>`_. Here's what we'll do:
* Implement the binary cross-entropy reconstruction loss in our `Cost` subclass
* Initialize the model parameters W and b
* Implement the model's reconstruction logic
Let's start again by the `Cost` subclass:
.. code-block:: python
import theano.tensor as T
from pylearn2.costs.cost import Cost, DefaultDataSpecsMixin
class AutoencoderCost(DefaultDataSpecsMixin, Cost):
supervised = False
def expr(self, model, data, **kwargs):
space, source = self.get_data_specs(model)
space.validate(data)
X = data
X_hat = model.reconstruct(X)
loss = -(X * T.log(X_hat) + (1 - X) * T.log(1 - X_hat)).sum(axis=1)
return loss.mean()
We assumed our model has a `reconstruction` method which encodes and decodes its
input. We also computed the loss as the average binary cross-entropy between the
input and its reconstruction. This time, however, we set `supervised` to
`False`.
Now for the `Model` subclass:
.. code-block:: python
import numpy
import theano.tensor as T
from pylearn2.models.model import Model
from pylearn2.space import VectorSpace
from pylearn2.utils import sharedX
class Autoencoder(Model):
def __init__(self, nvis, nhid):
super(Autoencoder, self).__init__()
self.nvis = nvis
self.nhid = nhid
W_value = numpy.random.uniform(size=(self.nvis, self.nhid))
self.W = sharedX(W_value, 'W')
b_value = numpy.zeros(self.nhid)
self.b = sharedX(b_value, 'b')
c_value = numpy.zeros(self.nvis)
self.c = sharedX(c_value, 'c')
self._params = [self.W, self.b, self.c]
self.input_space = VectorSpace(dim=self.nvis)
def reconstruct(self, X):
h = T.tanh(T.dot(X, self.W) + self.b)
return T.nnet.sigmoid(T.dot(h, self.W.T) + self.c)
The constructor looks quite similar to the logistic regression example, except
that this time we don't need to specify the model's output space.
The `reconstruct` method simply encodes and decodes its input.
Let's try to train it. Save the two code snippets in a single file. For instance
`autoencoder.py`. Then save the following in `autoencoder.yaml`:
.. code-block:: none
!obj:pylearn2.train.Train {
dataset: &train !obj:pylearn2.datasets.dense_design_matrix.DenseDesignMatrix {
X: !pkl: 'mnist_train_X.pkl',
},
model: !obj:autoencoder.Autoencoder {
nvis: 784,
nhid: 200,
},
algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
batch_size: 200,
learning_rate: 1e-3,
monitoring_dataset: {
'train' : *train,
'valid' : !obj:pylearn2.datasets.dense_design_matrix.DenseDesignMatrix {
X: !pkl: 'mnist_valid_X.pkl',
},
'test' : !obj:pylearn2.datasets.dense_design_matrix.DenseDesignMatrix {
X: !pkl: 'mnist_test_X.pkl',
},
},
cost: !obj:autoencoder.AutoencoderCost {},
termination_criterion: !obj:pylearn2.termination_criteria.EpochCounter {
max_epochs: 15
},
},
}
Run the following command:
.. code-block:: bash
python -c "from pylearn2.utils import serial; \
train_obj = serial.load_train_file('autoencoder.yaml'); \
train_obj.main_loop()"
What have we gained?
====================
At this point you might be thinking *"There's still boilerplate code to write;
what have we gained?"*
The answer is that we gained access to the plethora of scripts, model parts, costs and
training algorithms which are built into Pylearn2. You don't have to reinvent the
wheel anymore when you wish to train using SGD and momentum. If you want to switch
from SGD to BGD, then Pylearn2 makes this is as simple as changing the training
algorithm description in your YAML file.
As we pointed out earlier, this demonstrates only the **bare minimum** needed to
implement a model in Pylearn2. Nothing prevents you from digging deeper in the
codebase and overriding some methods to gain new functionalities.
Here's an example of how a few more lines of code can do a lot for you in
Pylearn2.
Monitoring various quantities during training
---------------------------------------------
Let's monitor the classification error of our logistic regression classifier.
To do so, you will have to override `Model`'s `get_monitoring_data_specs` and
`get_monitoring_channels` methods. The former specifies what the model needs for
its monitoring, and in which format they should be provided. The latter does the
actual monitoring by returning an `OrderedDict` mapping string identifiers to
their quantities.
Let's look at how it's done. Add the following to `LogisticRegression`:
.. code-block:: python
# Keeps things compatible for Python 2.6
from theano.compat.python2x import OrderedDict
from pylearn2.space import CompositeSpace
class LogisticRegression(Model):
# (Your previous code)
def get_monitoring_data_specs(self):
space = CompositeSpace([self.get_input_space(),
self.get_target_space()])
source = (self.get_input_source(), self.get_target_source())
return (space, source)
def get_monitoring_channels(self, data):
space, source = self.get_monitoring_data_specs()
space.validate(data)
X, y = data
y_hat = self.logistic_regression(X)
error = T.neq(y.argmax(axis=1), y_hat.argmax(axis=1)).mean()
return OrderedDict([('error', error)])
The content of `get_monitoring_data_specs` may look cryptic at first.
Documentation for data specs can be found
`here <http://deeplearning.net/software/pylearn2/internal/data_specs.html>`_.
All you really need to know, is that this is the standard method in Pylearn2 to request a
tuple whose first element represents features and second element represents
targets.
The content of `get_monitoring_channels` should more familiar. We start by
checking `data` just as in `Cost` subclasses' implementation of `expr`, and we
separate `data` into features and targets. We then get predictions by
calling `logistic_regression` and computing the average error the standard way.
We return an `OrderedDict` mapping `'error'` to the Theano expression for the
classification error.
If we launch training again using
.. code-block:: bash
python -c "from pylearn2.utils import serial; \
train_obj = serial.load_train_file('log_reg.yaml'); \
train_obj.main_loop()"
then you'll see the classification error being displayed with the other monitored
quantities.
What's next?
============
The examples given in this tutorial are obviously very simplistic and could be
easily replaced by existing parts of Pylearn2. However, they show a path that
one can take to implement arbitrary ideas in Pylearn2.
In order to avoid reinventing the wheel, it is often useful to dig into
Pylearn2's codebase to see what has already been implemented. For example, the VAE framework
relies on the MLP framework to represent the mapping from inputs to
conditional distribution parameters.
While it is often desirable to reuse code, the inherent difficulty of this
depends on your knowledge of Pylearn2, and also how
similar your model is to what is already implemented. You should never feel ashamed to dump
Theano code inside a `Model` subclass' method like we
showed here. The modularity of your code can be
improved gradually, and at your own pace. In the meantime you can
still benefit from Pylearn2's features, like human-readable descriptions of experiments, automatic monitoring
of various quantities, easily-interchangeable
training algorithms, and so on.