# Lognormal Distribution – A simple explanation
> How to calculate the PDF's parameters (μ & σ), the mode, mean, median & variance

- toc: true 
- badges: true
- hide: false
- comments: true
- categories: [lognormal, distribution]
- image: images/chart-preview.png

In [1]:
#hide
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## About
We will briefly look at the definition of the log-normal and then go onto calculate the distribution’s parameters μ and σ from simple data. We will then have a look at how to calculate the mean, mode, median and variance from this probability distribution.

## Informal Definition
The log-normal distribution is a right skewed continuous probability distribution, meaning it has a long tail towards the right. It is used for modelling various natural phenomena such as income distributions, the length of chess games or the time to repair a maintainable system and more.

![](2022-01-05-lognorm/Figure1.PNG "Figure 1")

The probability density function for the log-normal is defined by the two parameters **μ** and **σ**, where x > 0:

$$f(x) = \frac{1}{x\sigma \sqrt{2\pi}} e^{-\frac{1}{2}\left( \frac{ln x -\mu}{\sigma}\right)^2 }$$


μ is the location parameter and σ the [scale parameter](https://en.wikipedia.org/wiki/Scale_parameter) of the distribution. _Caution here!_ These two parameters should not be mistaken for the more familiar mean or standard deviation from a normal distribution. When our log-normal data is transformed using logarithms our **μ** can then be viewed as the mean _(of the transformed data)_ and **σ** as the standard deviation _(of the transformed data)_. But without these transformations μ and σ here are simply two parameters that define our log-normal, not the mean or standard deviation!


**Let’s have a look at the just mentioned relationship between the log-normal and normal distribution a bit more.**


The name of the “log-normal” distribution reveals that it relates to logarithms as well as the normal distribution. How? Let’s say your data fits a log-normal distribution. If you then take the logarithm of all your data points, the newly transformed points will now fit a normal distribution. This simply means that when you take the log of your log-normal data you end up with a normal distribution. See figure below.

![](2022-01-05-lognorm/Figure2.PNG "Figure 2")

The data points for our log-normal distribution are given by the X variable. When we log-transform that X variable $(Y=ln(X))$ we get a Y variable which is normally distributed.
We can reverse this thinking and look at Y instead. If Y has a normal distribution and we take the exponential of Y $(X=exp(Y))$, then we get back to our X variable, which has a log-normal distribution. This visual is helpful to keep in mind when analysing important properties of the log-normal distribution:

        “The most efficient way to analyse log-normally distributed data consists of applying the well-known methods based on the normal distribution to logarithmically transformed data and then to back-transform results if appropriate.”
[Lognormal wiki](https://en.wikipedia.org/wiki/Log-normal_distribution#Statistics)

## Estimate μ & σ from data
We can estimate our log-normal parameters μ and σ using maximum likelihood estimation (MLE). This is a popular approach for approximating distribution parameters as it finds parameters that make our assumed probability distribution ‘most likely’ for our observed data.

If you want to understand how MLE works in more detail, [StatQuest](https://www.youtube.com/watch?v=Dn6b9fCIUpM&t=1016s) explains the approach in a fun intuitive way and also derives the estimators for the __normal distribution.__

The maximum likelihood estimators for the normal distribution are:

$\hat\mu = \frac{\sum_{i} x_i}{n}$ and $\hat\sigma^2 = \frac{\sum_{i} \left( x_i - \hat\mu \right)^2 }{n}$

We, however, want the maximum likelihood estimators μ and σ for the log-normal distribution, which are:

$\hat\mu = \frac{\sum_{i} ln (x_i)}{n}$ and $\hat\sigma^2 = \frac{\sum_{i} \left(ln (x_i) - \hat\mu \right)^2 }{n}$


These formulas are near identical. We can see that we can use the same approach as with the normal distribution and just transform our data with a logarithm first. If you are curious about how we get our log-normal estimators here is a link to the [derivation.]()

__Where is the simple example?!__

Let’s take a look at 5 values of income that follow a log-normal distribution. Our fictitious person 1 earns 20k, person 2 earns 22k and so on:


| Person 1 | Person 2 | Person 3 | Person 4 | Person 5 |
| ----------- | ----------- | ----------- | ----------- | ----------- |
| 20      | 22       | 25       | 30       | 60       |

We can now __estimate μ__ with the logic from above. First, we take the log of each of our income data points and then calculate the average value for the 5 transformed data points, see below:

![](2022-01-05-lognorm/Table1.PNG "Table 1")

This gives us a value of __3.36__ for our location parameter μ.


-----


We can then use our estimated μ to __approximate our σ__ with the following formula. 


Rather than calculating σ², we take the square root of the formula above to approximate σ. The formula also uses n-1 instead of just n to get a less biased estimator. If you want to understand more on this change have a look at corrected sample variance (or also Bessel’s correction).