This notebook explains how to run the toy logistic regression model example using the German credit data from [1]. In this example, we have predictors for 1000 individuals and an outcome variable indicating whether or not each individual should be given credit.


[1] "UCI machine learning repository", 2010. A. Frank and A. Asuncion. https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data)

In [1]:
import os
os.chdir("../../")
import pints
import pints.toy
import pints.plot
import numpy as np
import matplotlib.pyplot as plt
import io
import urllib
from scipy import stats

To run this example, we need to first get the data from [1] and process it so we have dichtonomous $y\in\{-1,1\}$ outputs and the matrix of predictors has been standardised. In addition, we also add a column of 1s corresponding to a constant term in the regression.

In [2]:
url="http://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.data-numeric"
with urllib.request.urlopen(url) as url:
    raw_data = url.read()
a = np.genfromtxt(io.BytesIO(raw_data), delimiter=4)[:, :25]
# get output
y = a[:, -1]
y[y==1] = -1
y[y==2] = 1

# get inputs and standardise
x = a[:, :-1]
x = stats.zscore(x)
x1 = np.zeros((x.shape[0], x.shape[1] + 1))
x1[:, 0] = np.ones(x.shape[0])
x1[:, 1:] = x
x = np.copy(x1)

Import toy model.

In [3]:
model = pints.toy.GermanCreditLogPDF(x, y)
model(np.ones(25))

-2887.6292678483533