# Introduction

Often times, a user may find it more insightful to have confidence intervals together with a point prediction.

Confidence intervals provide a proxy for the range of deviation in the point prediction.

A wider confidence interval may signify that the prediction is not very reliable. 

In this notebook, we estimate the confidence interval of a prediction using a quantile objective function.

A quantile objective function can also be used to generate a distribution of the prediction. This may be a source of additional features for downstream tasks.  

log cosh function is used as a smooth approximation to a quantile function. Mathematically, the objective function for the $\alpha$ quantile is given by   

$$
\begin{cases}
  (1-\alpha) \log ( \cosh(x)) & \text{:} & x < 0\\    
  \alpha \log ( \cosh(x)) & \text{:} & x \geq 0    
\end{cases}
$$   

A case study involving the california housing data is also provided.

# Importing packages

In [1]:
import pandas as pd
import numpy as np
from xgboost.sklearn import XGBRegressor
from sklearn.model_selection import ShuffleSplit
import matplotlib.pyplot as plt

# log cosh quantile function

The problem with using np.cosh function is that there are overflow problems.

We replace the np.cosh definition with an alternate easier to compute definition

In [2]:
def cosh(x):
    return np.cosh(np.minimum(x, 700 * np.ones(x.shape)))

The modified cosh function was still giving me an overflow problem. After two hours of debugging I realized that I reversed the order of y_true and y_pred in the _log_cosh_quantile function

In [3]:
def log_cosh_quantile(alpha):
    def _log_cosh_quantile(y_true, y_pred):
        err = y_pred - y_true
        
        grad = np.where(err < 0, alpha * np.tanh(err), (1-alpha) * np.tanh(err))
        
        hess = np.where(err < 0, alpha * sechSq(err), (1-alpha) * sechSq(err))

        return grad, hess
    
    def sechSq(x):
        
        sech = 1 / cosh(x)
        
        return sech ** 2
    
    return _log_cosh_quantile

# XGBoost model

A custom objective function can be passed to the XGBoost model as follows

In [4]:
alpha = 0.95
clf = XGBRegressor(objective = log_cosh_quantile(1-alpha),
                  n_entimators = 125,
                  max_depth = 5,
                  n_jobs = 6,
                  learning_rate = 0.05)

# Data loading

In [5]:
from sklearn.datasets import fetch_california_housing
housing = fetch_california_housing()

In [6]:
X = pd.DataFrame(housing.data, columns = housing.feature_names)
y = pd.DataFrame(housing.target, columns = housing.target_names)

# Train test split

In [7]:
splitter = ShuffleSplit(n_splits = 1, test_size = 0.25, random_state = 1)

In [8]:
for train_index, test_index in splitter.split(X):
    X_train = X.iloc[train_index]
    y_train = y.iloc[train_index]
    X_test = X.iloc[test_index]
    y_test = y.iloc[test_index]

# Training and prediction

Calculating the upper quantile

In [9]:
alpha = 0.95

clf = XGBRegressor(objective = log_cosh_quantile(alpha),
                  n_estimators = 200,
                  max_depth = 3,
                  n_jobs = 4,
                  learning_rate = 0.05)

In [10]:
clf.fit(X_train, y_train)

In [11]:
y_upper_smooth = clf.predict(X_test)

In [12]:
print(y_upper_smooth[:10])
print(y_test['MedHouseVal'].to_list()[:10])

[4.6515775 1.1099726 3.0944874 2.6100302 3.4845743 4.978835  3.3166888
 2.206825  1.8518007 2.4240296]
[3.55, 0.707, 2.294, 1.125, 2.254, 2.63, 2.268, 1.662, 1.18, 1.563]


Calculating the lower quantile

In [13]:
clf = XGBRegressor(objective = log_cosh_quantile(1-alpha),
                  n_estimators = 200,
                  max_depth = 3,
                  n_jobs = 4,
                  learning_rate = 0.05)

In [14]:
clf.fit(X_train, y_train)

In [15]:
y_lower_smooth = clf.predict(X_test)

In [16]:
print(y_lower_smooth[:10])
print(y_test['MedHouseVal'].to_list()[:10])

[2.1034167 0.5202606 2.0245087 0.8863784 2.1141813 2.451071  1.5790187
 1.2442524 1.0692693 1.2456285]
[3.55, 0.707, 2.294, 1.125, 2.254, 2.63, 2.268, 1.662, 1.18, 1.563]


Calculating the median

In [17]:
clf = XGBRegressor(objective = log_cosh_quantile(0.5),
                  n_estimators = 200,
                  max_depth = 3,
                  n_jobs = 4,
                  learning_rate = 0.05)

In [18]:
clf.fit(X_train, y_train)

In [19]:
y_median = clf.predict(X_test)

In [20]:
print(y_median[:10])
print(y_test['MedHouseVal'].to_list()[:10])

[3.2429554 0.757352  2.5868504 1.3783662 2.8943105 3.8976893 2.5790272
 1.5965589 1.3985851 1.6484576]
[3.55, 0.707, 2.294, 1.125, 2.254, 2.63, 2.268, 1.662, 1.18, 1.563]


# Results

In [21]:
output = {}
output['TrueMedHouseVal'] = y_test['MedHouseVal'].to_list()
output['MedianPrediction'] = y_median
output['0.95quantile'] = y_upper_smooth
output['0.05quantile'] = y_lower_smooth
result = pd.DataFrame(output)
result.head()

Unnamed: 0,TrueMedHouseVal,MedianPrediction,0.95quantile,0.05quantile
0,3.55,3.242955,4.651577,2.103417
1,0.707,0.757352,1.109973,0.520261
2,2.294,2.58685,3.094487,2.024509
3,1.125,1.378366,2.61003,0.886378
4,2.254,2.89431,3.484574,2.114181


# Additional Work

We try the definition of grad and hessian for log cosh function as is given in the book, page 152.

In [22]:
def log_cosh_quantile(alpha):
    def _log_cosh_quantile(y_true, y_pred):
        err = y_pred - y_true
        err = np.where(err < 0, alpha * err, (1-alpha) * err)
        grad = np.tanh(err)
        hess = 1 / cosh(err) ** 2
        
        return grad, hess
    
    return _log_cosh_quantile

In [23]:
alpha = 0.95

clf = XGBRegressor(objective = log_cosh_quantile(alpha),
                  n_estimators = 200,
                  max_depth = 3,
                  n_jobs = 4,
                  learning_rate = 0.05)

clf.fit(X_train, y_train)

y_upper_smooth = clf.predict(X_test)

In [24]:
clf = XGBRegressor(objective = log_cosh_quantile(1-alpha),
                  n_estimators = 200,
                  max_depth = 3,
                  n_jobs = 4,
                  learning_rate = 0.05)

clf.fit(X_train, y_train)

y_lower_smooth = clf.predict(X_test)

In [25]:
clf = XGBRegressor(objective = log_cosh_quantile(0.5),
                  n_estimators = 200,
                  max_depth = 3,
                  n_jobs = 4,
                  learning_rate = 0.05)

clf.fit(X_train, y_train)

y_median = clf.predict(X_test)

In [26]:
output = {}
output['MedianPrediction2'] = y_median
output['0.95quantile2'] = y_upper_smooth
output['0.05quantile2'] = y_lower_smooth
result2 = pd.concat([result, pd.DataFrame(output)], axis = 1)
result2.head()

Unnamed: 0,TrueMedHouseVal,MedianPrediction,0.95quantile,0.05quantile,MedianPrediction2,0.95quantile2,0.05quantile2
0,3.55,3.242955,4.651577,2.103417,3.169382,4.050434,1.352701
1,0.707,0.757352,1.109973,0.520261,0.800896,1.864278,0.609783
2,2.294,2.58685,3.094487,2.024509,2.416822,2.724847,1.01456
3,1.125,1.378366,2.61003,0.886378,1.437977,2.357894,0.647745
4,2.254,2.89431,3.484574,2.114181,2.932159,3.368453,1.517753


# Conclusion

The gradient and hession expression used in the book and at the begining of this notebook provide similar results. 

This may be because they are only off by a constant factor.

The log cosh objective function does provide a upper and lower value to the prediction. This is verified by performing a case study on the california housing prices.

An interesting application of quantile prediction is in the area of time-series. Will will explore this in future notebooks. 