# Fitting a distribution to waiting times

### How long should we wait for the bus before giving up on it and starting to walk?

First, we'll need to observe some data on the historic arrival times of the bus and fit a distribution to them. Note however that some of our data will be incomplete since when we give up on the bus after x minutes, we only know it took more than that time for it to arrive, but not exactly how much. These are called censored observations.

Let's generate sample data - both complete observations (ti) and some censored observations (xi) - and fit a distribution! 


In [1]:
import matplotlib.pyplot as plt
from distributions.lomax import *
from distributions.loglogistic import *

In [22]:
# Define parameters for Lomax
k = 10.0; lmb = 0.5; sample_size = 5000; censor_level = 2.0; prob = 1.0

In [23]:
# Let's assume the arrival times of the bus follow a Lomax distribution.
l = Lomax(k=k, lmb=lmb)

### What is lomax distribution?

It is basically a Pareto distribution that has been shifted so that its support begins at zero. A heavy tailed distribution. For a non-negative random variable.

<img src = "https://www.statisticshowto.datasciencecentral.com/wp-content/uploads/2016/06/lomax-pdf.png">

<img src = "https://www.statisticshowto.datasciencecentral.com/wp-content/uploads/2016/06/PDF.png">

In [24]:
# Generate waiting times from Lomax distribution.
samples = l.samples(size=sample_size)
samples

array([0.13420315, 0.28097822, 0.07061329, ..., 0.3900251 , 0.35156226,
       0.05107625])

In [25]:
# Since we never wait for the bus more than x minutes,
# the observed samples are the ones that take less than x minutes.
ti = samples[(samples<=censor_level)]
ti

array([0.13420315, 0.28097822, 0.07061329, ..., 0.3900251 , 0.35156226,
       0.05107625])

In [26]:
# For the samples that took more than 10 minutes, add them to the censored array 
# all we know is they took more than x minutes but not exactly how long.
samples>censor_level


array([False, False, False, ..., False, False, False])

In [27]:
xi = np.ones(sum(samples>censor_level)) * censor_level
xi

array([2., 2., 2., 2., 2.])