# SVI Part IV: Tips and Tricks

The three SVI tutorials leading up to this one ([Part I](http://pyro.ai/examples/svi_part_i.html), [Part II](http://pyro.ai/examples/svi_part_ii.html), & [Part III](http://pyro.ai/examples/svi_part_iii.html)) go through
the various steps involved in using Pyro to do variational
inference.
Along the way we defined models and guides (i.e.~variational distributions),
setup variational objectives (in particular [ELBOs](https://docs.pyro.ai/en/dev/inference_algos.html?highlight=elbo#module-pyro.infer.elbo)), 
and constructed optimizers ([pyro.optim](http://docs.pyro.ai/en/dev/optimization.html)). 
The effect of all this machinery is to cast Bayesian inference as a *stochastic optimization problem*. 
This is all very useful, but in order to arrive at our ultimate goal---learning model parameters, inferring approximate posteriors, making predictions with the posterior predictive distribution, etc.---we need to successfully solve this optimization problem. 
Depending on the particular problem---for example the dimensionality of the latent spaces, whether we have discrete latent variables, and so on---this can be easy or hard. 
In this tutorial we cover a few tips and tricks we expect to be generally useful for users doing variational inference in Pyro. ELBO not converging? Running into NaNs? Look below for possible solutions!  

### 1. Start with small learning rates

While large learning rates might be appropriate for some problems, it's usually good practice to start with small learning rates like $10^{-3}$
or $10^{-4}$:
```python
optimizer = pyro.optim.Adam({"lr": 0.001})
```
This is because ELBO gradients are *stochastic*, and potentially high variance, so large learning rates can quickly lead to parts of model/guide parameter space that are numerically unstable or otherwise undesirable.
You can try a larger learning rate once you have achieved stable
ELBO optimization using a smaller learning rate.

### 2. Make sure your model and guide distributions have the same support

Suppose you have a distribution in your `model` with constrained support, e.g. a LogNormal distribution, which has support on the positive real axis:
```python
def model():
    pyro.sample("x", dist.LogNormal(0.0, 1.0))
``` 
Then you need to ensure that the accompanying `sample` site in the `guide` has the same support:
```python
def good_guide():
    loc = pyro.param("loc", torch.tensor(0.0))
    pyro.sample("x", dist.LogNormal(loc, 1.0))
``` 
If you fail to do this and use for example the following inadmissable guide:
```python
def bad_guide():
    loc = pyro.param("loc", torch.tensor(0.0))
    pyro.sample("x", dist.Normal(loc, 1.0))
```
you will likely run into NaNs very quickly. This is because the `log_prob` of a LogNormal distributions evaluated at a sample `x` that satisfies `x<0` is undefined, and the `bad_guide` is likely to produce such samples.
