New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalizing flows #2362

Merged
merged 36 commits into from Jul 10, 2017

Conversation

Projects
None yet
6 participants
@ferrine
Copy link
Member

ferrine commented Jun 28, 2017

Normalizing flows on top of recent refactoring

@junpenglao

This comment has been minimized.

Copy link
Member

junpenglao commented Jun 28, 2017

need rebase?

@ferrine

This comment has been minimized.

Copy link
Member Author

ferrine commented Jun 28, 2017

Sure, that's done. BTW I have problems with convergence sanity check. Not sure if it is because of my math or approximation family does not have normal dist.

@ferrine

This comment has been minimized.

Copy link
Member Author

ferrine commented Jun 28, 2017

I'm thinking of hash depending on shared params for Approximations, etc. So that setting new shared variables there will rebuild graph in @node_property. How do you feel about that @aseyboldt?

@ferrine

This comment has been minimized.

Copy link
Member Author

ferrine commented Jun 28, 2017

Ideas list for discussion. Don't think that they are necessary in this PR, so just pinning them

  • implementing better formulas like '(planar*2-radial)*3' But I'm not so strong in parsing.
  • implementing AEVB with flows. (Encoder maps flow parameters)
@ferrine

This comment has been minimized.

Copy link
Member Author

ferrine commented Jun 28, 2017

Seems like forward pass is bad. This is the picture for planar flow unit
image

@ferrine

This comment has been minimized.

Copy link
Member Author

ferrine commented Jun 28, 2017

After thinking it over I got that I do no multiplication for q0

@taku-y

This comment has been minimized.

Copy link
Contributor

taku-y commented Jun 29, 2017

In the current API, you calculate determinants and then take its log for flows. It works for normalizing flow, but sometimes calculation of log determinants is numerically more stable than that separating determinant and log. An example is inverse auto regressive flow; its determinant is prod(sigma_i(x)) where i is the index of the dimension. For high dimensional variables, it may result in nearly zero and its log gets NaN. But its log determinant is sum(log(sigma_i(x))), which is numerically stable.

The current API computes determinants by def det(), but I recommend def logdet() instead.

@ferrine

This comment has been minimized.

Copy link
Member Author

ferrine commented Jun 29, 2017

@taku-y I sum logdets for logq

@ferrine

This comment has been minimized.

Copy link
Member Author

ferrine commented Jun 29, 2017

I saw suggestions to apply linear transformation here. So they start with Meanfield and do transformations. I see this point reasonable as flows seem not being able to "move" mass linearly

@ferrine

This comment has been minimized.

Copy link
Member Author

ferrine commented Jul 8, 2017

I have important updates here that ensure I've done correct math. There is a new test that computes logdet of transformation using theano jacobian and compares it with analytical vectorized derivation.

I also changed the way I treat compute_test_value flag. I was getting annoying problems when doing replacements stuff for implementation. Here I set it to 'off' for VI internal graph construction. I think that it is good solution as it now independent of context and shape of test value internally and can cause problems only when user does some math with symbolic attributes of an Approximation.

@ferrine

This comment has been minimized.

Copy link
Member Author

ferrine commented Jul 9, 2017

I have weird bug in test_fit_oo[ADVI-full-scale] (see log) but test passes locally for me. Do we have updated Theano/pytest?

@ferrine

This comment has been minimized.

Copy link
Member Author

ferrine commented Jul 9, 2017

Interesting Insight is that Normalizing Flow is faster than ADVI in profiling test.

Note that ADVI() == NF('scale-loc')

runtime + profiling
ADVI (full data): 4681.67it/s

100%|##########| 100/100 [00:00<00:00, 4681.67it/s]
Function profiling
==================
  Message: /Users/ferres/dev/pymc3/pymc3/variational/opvi.py:272
  Time in 100 calls to Function.__call__: 9.346962e-03s
  Time in Function.fn.__call__: 7.677078e-03s (82.134%)
  Time in thunks: 4.420519e-03s (47.294%)
  Total compile time: 1.511350e+00s
    Number of Apply nodes: 83
    Theano Optimizer time: 1.385350e+00s
       Theano validate time: 3.779817e-02s
    Theano Linker time (includes C, CUDA code generation/compiling): 9.025884e-02s
       Import time 1.806593e-02s
       Node make_thunk time 8.375001e-02s
           Node Elemwise{Composite{(i0 + (i1 * sqr(i2)))}}[(0, 2)](TensorConstant{(1,) of -4.0351}, TensorConstant{(1,) of -0.111111}, Elemwise{sub,no_inplace}.0) time 6.552935e-03s
           Node Elemwise{Composite{((i0 * i1) + (i2 * i3))}}[(0, 3)](TensorConstant{0.25}, Elemwise{add,no_inplace}.0, TensorConstant{-0.111111111939}, Sum{acc_dtype=float64}.0) time 5.337954e-03s
           Node Elemwise{Composite{(((-i0) / i1) + (i2 * (i3 / i1)) + i4)}}[(0, 1)](Sum{axis=[0], acc_dtype=float64}.0, Elemwise{sqr,no_inplace}.0, InplaceDimShuffle{x}.0, Sum{axis=[0], acc_dtype=float64}.0, Sum{axis=[0], acc_dtype=float64}.0) time 4.230976e-03s
           Node Elemwise{Composite{((i0 * scalar_sigmoid(i1) * ((((-i2) / i3) * sgn(i4)) + (i5 * i6 * i4))) + (scalar_sigmoid(i7) * ((-i8) / i9)) + (scalar_sigmoid(i7) * i10))}}[(0, 2)](InplaceDimShuffle{x}.0, GradScale.0, Sum{axis=[0], acc_dtype=float64}.0, Elemwise{abs_,no_inplace}.0, softplus.0, TensorConstant{(1,) of 4.0}, Sum{axis=[0], acc_dtype=float64}.0, rho, Sum{axis=[0], acc_dtype=float64}.0, Elemwise{sqr,no_inplace}.0, Sum{axis=[0], acc_dtype=float64}.0) time 4.002094e-03s
           Node Elemwise{Composite{scalar_identity((i0 - ((i1 * (i2 + (i3 * sqr(i4)))) + (i5 * i6))))}}[(0, 0)](Sum{acc_dtype=float64}.0, TensorConstant{0.5}, TensorConstant{-3.22417140007}, TensorConstant{-0.25}, Elemwise{add,no_inplace}.0, TensorConstant{0.5}, Sum{acc_dtype=float64}.0) time 3.746986e-03s

Time in all call to theano.grad() 3.770552e-01s
Time since theano import 9.146s
Class
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name>
  28.5%    28.5%       0.001s       3.71e-07s     C     3400      34   theano.tensor.elemwise.Elemwise
  22.6%    51.2%       0.001s       8.34e-07s     C     1200      12   theano.tensor.elemwise.Sum
  20.1%    71.2%       0.001s       2.22e-06s     C      400       4   theano.tensor.subtensor.IncSubtensor
   5.7%    77.0%       0.000s       2.54e-06s     C      100       1   theano.tensor.basic.Join
   4.8%    81.7%       0.000s       2.63e-07s     C      800       8   theano.tensor.elemwise.DimShuffle
   4.5%    86.2%       0.000s       6.57e-07s     C      300       3   theano.tensor.basic.Alloc
   3.1%    89.3%       0.000s       3.39e-07s     C      400       4   theano.tensor.subtensor.Subtensor
   2.4%    91.7%       0.000s       3.60e-07s     C      300       3   theano.tensor.basic.Reshape
   2.0%    93.7%       0.000s       8.87e-07s     C      100       1   theano.sandbox.rng_mrg.mrg_uniform
   1.4%    95.1%       0.000s       1.99e-07s     C      300       3   theano.tensor.basic.ScalarFromTensor
   1.1%    96.2%       0.000s       2.50e-07s     C      200       2   theano.compile.ops.Shape_i
   1.1%    97.3%       0.000s       2.35e-07s     C      200       2   theano.tensor.opt.MakeVector
   1.0%    98.3%       0.000s       1.51e-07s     C      300       3   theano.compile.ops.Rebroadcast
   1.0%    99.3%       0.000s       4.32e-07s     C      100       1   theano.tensor.elemwise.Prod
   0.7%   100.0%       0.000s       1.63e-07s     C      200       2   theano.gradient.GradScale
   ... (remaining 0 Classes account for   0.00%(0.00s) of the runtime)

Ops
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name>
  11.6%    11.6%       0.001s       5.12e-06s     C      100        1   IncSubtensor{InplaceInc;int64:int64:}
   9.9%    21.5%       0.000s       7.32e-07s     C      600        6   Sum{axis=[0], acc_dtype=float64}
   8.7%    30.3%       0.000s       1.29e-06s     C      300        3   Sum{acc_dtype=float64}
   5.7%    36.0%       0.000s       2.54e-06s     C      100        1   Join
   4.7%    40.7%       0.000s       2.09e-06s     C      100        1   IncSubtensor{InplaceInc;int64}
   4.5%    45.2%       0.000s       6.57e-07s     C      300        3   Alloc
   4.0%    49.2%       0.000s       5.87e-07s     C      300        3   Sum{axis=[1], acc_dtype=float64}
   3.8%    52.9%       0.000s       8.30e-07s     C      200        2   IncSubtensor{InplaceSet;::, int32}
   3.4%    56.3%       0.000s       3.00e-07s     C      500        5   Elemwise{sqr,no_inplace}
   3.0%    59.3%       0.000s       2.67e-07s     C      500        5   InplaceDimShuffle{x,0}
   2.0%    61.3%       0.000s       8.87e-07s     C      100        1   mrg_uniform{TensorType(float32, vector),inplace}
   1.7%    63.0%       0.000s       7.39e-07s     C      100        1   Elemwise{Composite{((i0 * scalar_sigmoid(i1) * ((((-i2) / i3) * sgn(i4)) + (i5 * i6 * i4))) + (scalar_sigmoid(i7) * ((-i8) / i9)) + (scalar_sigmoid(i7) * i10))}}[(0, 2)]
   1.7%    64.7%       0.000s       7.37e-07s     C      100        1   Elemwise{Composite{((i0 * i1) + i2)}}
   1.6%    66.2%       0.000s       6.99e-07s     C      100        1   Elemwise{sub,no_inplace}
   1.6%    67.8%       0.000s       6.87e-07s     C      100        1   Elemwise{Composite{((i0 * i1) / i2)}}
   1.5%    69.3%       0.000s       3.29e-07s     C      200        2   Reshape{0}
   1.4%    70.6%       0.000s       1.99e-07s     C      300        3   ScalarFromTensor
   1.3%    72.0%       0.000s       5.89e-07s     C      100        1   Elemwise{Composite{(i0 - ((i1 * i2) / i3))}}[(0, 2)]
   1.3%    73.3%       0.000s       2.91e-07s     C      200        2   Elemwise{Mul}[(0, 0)]
   1.3%    74.6%       0.000s       2.85e-07s     C      200        2   Subtensor{:int64:}
   ... (remaining 30 Ops account for  25.42%(0.00s) of the runtime)

Apply
------
<% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
  11.6%    11.6%       0.001s       5.12e-06s    100    63   IncSubtensor{InplaceInc;int64:int64:}(Alloc.0, Reshape{1}.0, Constant{0}, Constant{1})
   5.7%    17.3%       0.000s       2.54e-06s    100    40   Join(TensorConstant{0}, Elemwise{Composite{(i0 * cos(i1))}}.0, Elemwise{Composite{(i0 * sin(i1))}}[(0, 0)].0)
   4.7%    22.1%       0.000s       2.09e-06s    100    64   IncSubtensor{InplaceInc;int64}(Rebroadcast{0}.0, IncSubtensor{InplaceInc;int64:int64:}.0, Constant{0})
   4.1%    26.2%       0.000s       1.82e-06s    100    52   Sum{acc_dtype=float64}(Elemwise{sub,no_inplace}.0)
   3.3%    29.5%       0.000s       1.47e-06s    100    58   Sum{acc_dtype=float64}(Elemwise{Composite{(i0 + (i1 * sqr(i2)))}}[(0, 2)].0)
   2.4%    31.9%       0.000s       1.07e-06s    100    13   Alloc(TensorConstant{(1, 1) of 1.0}, TensorConstant{1}, Shape_i{0}.0)
   2.3%    34.2%       0.000s       1.02e-06s    100    61   Sum{axis=[0], acc_dtype=float64}(Elemwise{Mul}[(0, 0)].0)
   2.3%    36.5%       0.000s       9.99e-07s    100    72   IncSubtensor{InplaceSet;::, int32}(<TensorType(float32, matrix)>, Elemwise{sqr,no_inplace}.0, ScalarFromTensor.0)
   2.2%    38.7%       0.000s       9.68e-07s    100    27   Sum{axis=[0], acc_dtype=float64}(Rebroadcast{0}.0)
   2.0%    40.7%       0.000s       8.87e-07s    100     6   mrg_uniform{TensorType(float32, vector),inplace}(<TensorType(int32, matrix)>, TensorConstant{(1,) of 2})
   1.9%    42.5%       0.000s       8.20e-07s    100    65   Sum{axis=[0], acc_dtype=float64}(IncSubtensor{InplaceInc;int64}.0)
   1.7%    44.2%       0.000s       7.39e-07s    100    68   Elemwise{Composite{((i0 * scalar_sigmoid(i1) * ((((-i2) / i3) * sgn(i4)) + (i5 * i6 * i4))) + (scalar_sigmoid(i7) * ((-i8) / i9)) + (scalar_sigmoid(i7) * i10))}}[(0, 2)](InplaceDimShuffle{x}.0, GradScale.0, Sum{axis=[0], acc_dtype=float64}.0, Elemwise{abs_,no_inplace}.0, softplus.0, TensorConstant{(1,) of 4.0}, Sum{axis=[0], acc_dtype=float64}.0, rho, Sum{axis=[0], acc_dtype=float64}.0, Elemwise{sqr,no_inplace}.0, Sum{axis=[0], acc_dtype=float64}.0)
   1.7%    45.9%       0.000s       7.37e-07s    100    45   Elemwise{Composite{((i0 * i1) + i2)}}(Elemwise{Composite{log1p(exp(i0))}}.0, Rebroadcast{0,0}.0, InplaceDimShuffle{x,0}.0)
   1.6%    47.5%       0.000s       6.99e-07s    100    50   Elemwise{sub,no_inplace}(TensorConstant{[ -3.50985..25160e+00]}, InplaceDimShuffle{x}.0)
   1.6%    49.0%       0.000s       6.87e-07s    100    74   Sum{axis=[1], acc_dtype=float64}(Elemwise{Composite{(i0 - ((i1 * i2) / i3))}}[(0, 2)].0)
   1.6%    50.6%       0.000s       6.87e-07s    100    59   Elemwise{Composite{((i0 * i1) / i2)}}(TensorConstant{(1, 1) of 0.25}, Elemwise{sqr,no_inplace}.0, Elemwise{sqr,no_inplace}.0)
   1.5%    52.1%       0.000s       6.60e-07s    100    79   IncSubtensor{InplaceSet;::, int32}(<TensorType(float32, matrix)>, Elemwise{sqr,no_inplace}.0, ScalarFromTensor.0)
   1.3%    53.4%       0.000s       5.89e-07s    100    71   Elemwise{Composite{(i0 - ((i1 * i2) / i3))}}[(0, 2)](Elemwise{Composite{(i0 - log(i1))}}[(0, 1)].0, TensorConstant{(1, 1) of 0.5}, Elemwise{sqr,no_inplace}.0, InplaceDimShuffle{x,0}.0)
   1.3%    54.7%       0.000s       5.79e-07s    100    62   Sum{axis=[0], acc_dtype=float64}(Elemwise{Composite{((i0 * i1) / i2)}}.0)
   1.3%    56.0%       0.000s       5.67e-07s    100    77   Sum{acc_dtype=float64}(Sum{axis=[1], acc_dtype=float64}.0)
   ... (remaining 63 Apply instances account for 44.02%(0.00s) of the runtime)

Here are tips to potentially make your code run faster
                 (if you think of new ones, suggest them on the mailing list).
                 Test them first, as they are not guaranteed to always provide a speedup.
We don't know if amdlibm will accelerate this scalar op. scalar_identity
  - Try installing amdlibm and set the Theano flag lib.amdlibm=True. This speeds up only some Elemwise operation.
  - With the default gcc libm, exp in float32 is slower than in float64! Try Theano flag floatX=float64, or install amdlibm and set the theano flags lib.amdlibm=True

NF scale+loc (full data): 5435.36it/s

100%|##########| 100/100 [00:00<00:00, 5435.36it/s]
Function profiling
==================
  Message: /Users/ferres/dev/pymc3/pymc3/variational/opvi.py:272
  Time in 100 calls to Function.__call__: 7.875681e-03s
  Time in Function.fn.__call__: 6.223917e-03s (79.027%)
  Time in thunks: 3.683567e-03s (46.771%)
  Total compile time: 7.491031e-01s
    Number of Apply nodes: 66
    Theano Optimizer time: 6.213710e-01s
       Theano validate time: 2.046061e-02s
    Theano Linker time (includes C, CUDA code generation/compiling): 5.623817e-02s
       Import time 3.078938e-03s
       Node make_thunk time 5.080509e-02s
           Node Elemwise{Composite{scalar_identity((i0 - ((i1 * (i2 + (i3 * sqr(i4)))) + (i5 * i6))))}}[(0, 0)](Sum{acc_dtype=float64}.0, TensorConstant{0.5}, TensorConstant{-3.22417140007}, TensorConstant{-0.25}, Elemwise{Add}[(0, 1)].0, TensorConstant{0.5}, Sum{acc_dtype=float64}.0) time 7.087946e-03s
           Node Elemwise{Composite{(i0 + (-sqr(i1)))}}[(0, 1)](TensorConstant{(1, 1) of -1.83788}, Rebroadcast{0,0}.0) time 3.867149e-03s
           Node Elemwise{Composite{(i0 - ((i1 * i2) / sqrt((i3 + i4))))}}[(0, 0)](loc, TensorConstant{(1,) of 0.001}, Sum{axis=[0], acc_dtype=float64}.0, TensorConstant{(1,) of 0.1}, Sum{axis=[1], acc_dtype=float64}.0) time 1.600981e-03s
           Node Elemwise{Composite{((i0 * i1) - i2)}}[(0, 1)](TensorConstant{(1,) of 0.5}, Sum{axis=[1], acc_dtype=float64}.0, Rebroadcast{0}.0) time 1.478195e-03s
           Node Elemwise{Composite{(i0 + (i1 * i2))}}[(0, 1)](TensorConstant{(1,) of -1.0}, Sum{axis=[0], acc_dtype=float64}.0, Elemwise{exp}.0) time 1.471996e-03s

Time in all call to theano.grad() 4.406681e-01s
Time since theano import 10.584s
Class
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name>
  26.7%    26.7%       0.001s       2.46e-06s     C      400       4   theano.tensor.subtensor.IncSubtensor
  22.3%    49.0%       0.001s       3.42e-07s     C     2400      24   theano.tensor.elemwise.Elemwise
  22.0%    71.0%       0.001s       9.01e-07s     C      900       9   theano.tensor.elemwise.Sum
   6.3%    77.2%       0.000s       2.30e-06s     C      100       1   theano.tensor.basic.Join
   4.6%    81.8%       0.000s       2.39e-07s     C      700       7   theano.tensor.elemwise.DimShuffle
   4.4%    86.2%       0.000s       4.08e-07s     C      400       4   theano.tensor.subtensor.Subtensor
   3.4%    89.7%       0.000s       2.54e-07s     C      500       5   theano.tensor.basic.Reshape
   2.2%    91.8%       0.000s       7.99e-07s     C      100       1   theano.tensor.basic.Alloc
   2.1%    93.9%       0.000s       7.65e-07s     C      100       1   theano.sandbox.rng_mrg.mrg_uniform
   1.7%    95.6%       0.000s       1.60e-07s     C      400       4   theano.compile.ops.Rebroadcast
   1.6%    97.3%       0.000s       2.01e-07s     C      300       3   theano.tensor.basic.ScalarFromTensor
   1.2%    98.5%       0.000s       4.34e-07s     C      100       1   theano.tensor.elemwise.Prod
   0.8%    99.3%       0.000s       3.12e-07s     C      100       1   theano.compile.ops.Shape_i
   0.7%   100.0%       0.000s       2.55e-07s     C      100       1   theano.tensor.opt.MakeVector
   ... (remaining 0 Classes account for   0.00%(0.00s) of the runtime)

Ops
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name>
  14.9%    14.9%       0.001s       5.50e-06s     C      100        1   IncSubtensor{Inc;int64:int64:}
  13.0%    27.9%       0.000s       1.19e-06s     C      400        4   Sum{acc_dtype=float64}
   7.7%    35.6%       0.000s       2.85e-06s     C      100        1   IncSubtensor{Inc;int64}
   6.3%    41.9%       0.000s       2.30e-06s     C      100        1   Join
   5.2%    47.1%       0.000s       6.41e-07s     C      300        3   Sum{axis=[1], acc_dtype=float64}
   4.0%    51.1%       0.000s       7.44e-07s     C      200        2   IncSubtensor{InplaceSet;::, int32}
   3.8%    55.0%       0.000s       7.05e-07s     C      200        2   Sum{axis=[0], acc_dtype=float64}
   2.5%    57.4%       0.000s       2.28e-07s     C      400        4   Reshape{0}
   2.2%    59.6%       0.000s       7.99e-07s     C      100        1   Alloc
   2.1%    61.7%       0.000s       3.92e-07s     C      200        2   Subtensor{:int64:}
   2.1%    63.8%       0.000s       7.65e-07s     C      100        1   mrg_uniform{TensorType(float32, vector),inplace}
   2.0%    65.8%       0.000s       7.39e-07s     C      100        1   Elemwise{Composite{((i0 * i1) + i2)}}
   2.0%    67.8%       0.000s       7.37e-07s     C      100        1   Elemwise{sub,no_inplace}
   1.7%    69.5%       0.000s       3.11e-07s     C      200        2   Elemwise{Cast{int32}}
   1.7%    71.2%       0.000s       3.05e-07s     C      200        2   InplaceDimShuffle{x,0}
   1.6%    72.8%       0.000s       2.01e-07s     C      300        3   InplaceDimShuffle{x}
   1.6%    74.4%       0.000s       2.01e-07s     C      300        3   ScalarFromTensor
   1.4%    75.8%       0.000s       5.08e-07s     C      100        1   Elemwise{Composite{(i0 + (i1 * sqr(i2)))}}[(0, 2)]
   1.3%    77.2%       0.000s       4.96e-07s     C      100        1   Subtensor{int64}
   1.3%    78.5%       0.000s       4.89e-07s     C      100        1   Elemwise{Composite{sqrt((i0 * log(i1)))}}
   ... (remaining 25 Ops account for  21.50%(0.00s) of the runtime)

Apply
------
<% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
  14.9%    14.9%       0.001s       5.50e-06s    100    47   IncSubtensor{Inc;int64:int64:}(TensorConstant{(1,) of 0.0}, Rebroadcast{0}.0, Constant{0}, Constant{1})
   7.7%    22.7%       0.000s       2.85e-06s    100    48   IncSubtensor{Inc;int64}(TensorConstant{(1, 1) of 0.0}, IncSubtensor{Inc;int64:int64:}.0, Constant{0})
   6.3%    28.9%       0.000s       2.30e-06s    100    29   Join(TensorConstant{0}, Elemwise{Composite{(i0 * cos(i1))}}.0, Elemwise{Composite{(i0 * sin(i1))}}[(0, 0)].0)
   5.7%    34.6%       0.000s       2.09e-06s    100    40   Sum{acc_dtype=float64}(Elemwise{sub,no_inplace}.0)
   4.0%    38.6%       0.000s       1.48e-06s    100    43   Sum{acc_dtype=float64}(Elemwise{Composite{(i0 + (i1 * sqr(i2)))}}[(0, 2)].0)
   2.4%    41.1%       0.000s       8.96e-07s    100    62   IncSubtensor{InplaceSet;::, int32}(<TensorType(float32, matrix)>, Elemwise{sqr,no_inplace}.0, ScalarFromTensor.0)
   2.3%    43.3%       0.000s       8.39e-07s    100    49   Sum{axis=[0], acc_dtype=float64}(IncSubtensor{Inc;int64}.0)
   2.2%    45.6%       0.000s       8.20e-07s    100    54   Sum{axis=[1], acc_dtype=float64}(Elemwise{Composite{(i0 + (-sqr(i1)))}}[(0, 1)].0)
   2.2%    47.7%       0.000s       7.99e-07s    100    17   Alloc(InplaceDimShuffle{0,x}.0, TensorConstant{1}, TensorConstant{1})
   2.1%    49.8%       0.000s       7.65e-07s    100     4   mrg_uniform{TensorType(float32, vector),inplace}(<TensorType(int32, matrix)>, TensorConstant{(1,) of 2})
   2.0%    51.8%       0.000s       7.39e-07s    100    34   Elemwise{Composite{((i0 * i1) + i2)}}(Rebroadcast{0,0}.0, InplaceDimShuffle{x,0}.0, InplaceDimShuffle{x,0}.0)
   2.0%    53.8%       0.000s       7.37e-07s    100    38   Elemwise{sub,no_inplace}(TensorConstant{[ -3.50985..25160e+00]}, InplaceDimShuffle{x}.0)
   1.7%    55.6%       0.000s       6.41e-07s    100    64   Sum{axis=[1], acc_dtype=float64}(IncSubtensor{InplaceSet;::, int32}.0)
   1.7%    57.2%       0.000s       6.10e-07s    100    60   Sum{acc_dtype=float64}(Elemwise{Composite{((i0 * i1) - i2)}}[(0, 1)].0)
   1.6%    58.8%       0.000s       5.94e-07s    100     2   Sum{acc_dtype=float64}(log_scale)
   1.6%    60.4%       0.000s       5.91e-07s    100    55   IncSubtensor{InplaceSet;::, int32}(<TensorType(float32, matrix)>, Elemwise{sqr,no_inplace}.0, ScalarFromTensor.0)
   1.5%    62.0%       0.000s       5.70e-07s    100    53   Sum{axis=[0], acc_dtype=float64}(Elemwise{Mul}[(0, 0)].0)
   1.4%    63.4%       0.000s       5.08e-07s    100    41   Elemwise{Composite{(i0 + (i1 * sqr(i2)))}}[(0, 2)](TensorConstant{(1,) of -4.0351}, TensorConstant{(1,) of -0.111111}, Elemwise{sub,no_inplace}.0)
   1.3%    64.7%       0.000s       4.96e-07s    100    35   Subtensor{int64}(Elemwise{Composite{((i0 * i1) + i2)}}.0, Constant{0})
   1.3%    66.0%       0.000s       4.89e-07s    100    25   Elemwise{Composite{sqrt((i0 * log(i1)))}}(TensorConstant{(1,) of -2.0}, Subtensor{:int64:}.0)
   ... (remaining 46 Apply instances account for 33.97%(0.00s) of the runtime)

Here are tips to potentially make your code run faster
                 (if you think of new ones, suggest them on the mailing list).
                 Test them first, as they are not guaranteed to always provide a speedup.
We don't know if amdlibm will accelerate this scalar op. scalar_identity
  - Try installing amdlibm and set the Theano flag lib.amdlibm=True. This speeds up only some Elemwise operation.
  - With the default gcc libm, exp in float32 is slower than in float64! Try Theano flag floatX=float64, or install amdlibm and set the theano flags lib.amdlibm=True

@ferrine

This comment has been minimized.

Copy link
Member Author

ferrine commented Jul 9, 2017

Tests pass:) CC @taku-y, @twiecki. I think PR is ready for merge. I still have some improvements ideas but they can be done later.

import pymc3 as pm
from pymc3 import Model, Normal
from pymc3.variational import (
ADVI, FullRankADVI, SVGD,
ADVI, FullRankADVI, SVGD, NF,

This comment has been minimized.

@fonnesbeck

fonnesbeck Jul 9, 2017

Member

What about having a more meaningful name for the class? NF seems a little overzealous. Maybe NormFlow or similar?

This comment has been minimized.

@ferrine

ferrine Jul 9, 2017

Author Member

I want to avoid obscuring words as much as possible

@fonnesbeck

This comment has been minimized.

Copy link
Member

fonnesbeck commented Jul 9, 2017

This is a ton of great work. Thanks @ferrine! I’m wondering about the naming of the classes, which is currently NF for the inference object and NormalizingFlow for the approximation, which might be a little confusing. What about something like FlowApproximation for the approximation and reserving NormalizingFlow for the inference class?

@ferrine

This comment has been minimized.

Copy link
Member Author

ferrine commented Jul 9, 2017

I want Inference name to be short as we do with advi/svgd.

@fonnesbeck

This comment has been minimized.

Copy link
Member

fonnesbeck commented Jul 9, 2017

I agree the names should be short, but two letters is too short.

@ferrine

This comment has been minimized.

Copy link
Member Author

ferrine commented Jul 9, 2017

I do not like FlowApproximation as it contains Approximation word which is unused in names due to naming conventions. Flow is a free name but can be confusing too as I have flows module. We already have short abbreviations like GP, SMC , AR1 that are easy to understand, thus I see no problem in NF if it is easy too.

I think that the main concern is NormalizingFlow that is a long name for NF. It can be renamed to FlowSequence that has the same meaning and easy to understand.

@ferrine

This comment has been minimized.

Copy link
Member Author

ferrine commented Jul 9, 2017

I want more people to discuss names of new classes. I see @fonnesbeck's point in short name disadvantage but still do not have better alternative. Long name is bad for inference and FullRankADVI is exceptional here. Papers use NF by default and it seems ok

@junpenglao

This comment has been minimized.

Copy link
Member

junpenglao commented Jul 9, 2017

Personally I am fine with NF and NormalizingFlow - I dont think they will be too confusing. However, NF-scale-loc doesnt seems to be a good name.

@ferrine

This comment has been minimized.

Copy link
Member Author

ferrine commented Jul 9, 2017

@junpenglao where did you findNF-scale-loc name? BTW it can be a good feature for pm.fit:

approx = pm.fit(method='NF-scale-hh*5-loc-planar*3')

or better with '=' after NF

approx = pm.fit(method='NF=scale-hh*5-loc-planar*3')
dict(cls=ADVI, init=dict()),
dict(cls=FullRankADVI, init=dict()),
dict(cls=SVGD, init=dict(n_particles=500, jitter=1)),
dict(cls=ASVGD, init=dict(temperature=1.)),
], ids=lambda d: d['cls'].__name__)
], ids=[
'NF-scale-loc',

This comment has been minimized.

@junpenglao

junpenglao Jul 9, 2017

Member

@ferrine I think this is not a good name.

This comment has been minimized.

@ferrine

ferrine Jul 9, 2017

Author Member

this is an identifier for test. Just to get what's going on from log

This comment has been minimized.

@ferrine

ferrine Jul 9, 2017

Author Member

Maybe 'NF=scale-loc' will be better

This comment has been minimized.

@junpenglao

junpenglao Jul 9, 2017

Member

oh i see. in that case yes NF=scale-loc would be better.

@junpenglao

This comment has been minimized.

Copy link
Member

junpenglao commented Jul 9, 2017

@ferrine i comment on the code.
Yep, that would be quite a cool feature.

@ferrine

This comment has been minimized.

Copy link
Member Author

ferrine commented Jul 9, 2017

@junpenglao Done:)

@taku-y

This comment has been minimized.

Copy link
Contributor

taku-y commented Jul 10, 2017

All the code looks fine. I think it is ready to merge. What do you think @twiecki ?

NF can be NFVI to represent it is a Inference subclass, though I'm fine with NF.

@twiecki

This comment has been minimized.

Copy link
Member

twiecki commented Jul 10, 2017

This is tremendous. I also see the naming issue. On the one hand NF seems a bit terse but we already have ADVI so it is consistent -> I'm fine with NF.

@springcoil

This comment has been minimized.

Copy link
Member

springcoil commented Jul 10, 2017

@fonnesbeck

This comment has been minimized.

Copy link
Member

fonnesbeck commented Jul 10, 2017

Fair enough. I’m fine leaving NF in place if I’m the only objection. I personally find longer names easier to read (based mostly on my experience with teaching scikit learn), and with tab completion name length is less of an issue than it used to be.

I look forward to testing this on some applied problems when I get back from holiday.

@ferrine

This comment has been minimized.

Copy link
Member Author

ferrine commented Jul 10, 2017

I'm fine with NFVI as all our abbreviations end with "GD" or "VI": ADVI, FullRankADVI, ASVGD, SVGD.

@junpenglao

This comment has been minimized.

Copy link
Member

junpenglao commented Jul 10, 2017

fine with NFVI as well

@springcoil
Copy link
Member

springcoil left a comment

LGTM

@springcoil

This comment has been minimized.

Copy link
Member

springcoil commented Jul 10, 2017

Great work @ferrine. Sorry about that other PR - I screwed it up :)

@springcoil

This comment has been minimized.

Copy link
Member

springcoil commented Jul 10, 2017

@taku-y taku-y merged commit b4e068e into pymc-devs:master Jul 10, 2017

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.04%) to 87.565%
Details
@taku-y

This comment has been minimized.

Copy link
Contributor

taku-y commented Jul 10, 2017

Thanks for your great effort @ferrine!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment