# Kelleher 2015, Chapter 7, Exercise 3

In this problem, we're given the weights for a multivariate logistic regression, and we're asked to make predictions for new observations.

In [6]:
import numpy as np
import pandas as pd

weights = np.array([-3.82398, -0.02990, -0.09089, -0.19558, 0.02999, 0.74572])

# Read in the new observations
input_file = "ex3data.csv"
X = pd.read_csv(input_file)

# Add a 1, for intercept weight purposes
X["Int"] = 1
X = X[["Int"] + X.columns[:-1].tolist()]
X

Unnamed: 0,Int,Age,Economic,ShopFreq,ShopValue
0,1,56,b,1.6,109.32
1,1,21,c,4.92,11.28
2,1,48,b,1.21,161.19
3,1,37,c,0.72,170.65
4,1,32,a,1.08,165.39


Logistic regression handles continues features just fine. 

To deal with the categorical (probably ordinal) variable Economic (which captures the socioeconomic band to which the customer belongs), we're apparently going to use indicators for levels 'b' and 'c', where zeros for both indicates the customer is at level 'a'. Thankfully, this is easy to do in pandas:

In [7]:
X = pd.get_dummies(X, drop_first=True)

In [8]:
# Reorder the columns, so that they line up with the order of the weights
# Also note that in the weights provided, ShopVal precedes ShopFreq, but in the data, ShopFreq comes first. Flip em'
X = X[X.columns[:2].tolist() + X.columns[-2:].tolist() + X.columns[range(3,1,-1)].tolist()]

Finally, we're ready to apply the logistic regression model to our data...

In [11]:
from scipy.special import expit as logistic
logistic(np.dot(X, weights))

array([ 0.24645465,  0.34519446,  0.59540115,  0.6292153 ,  0.72802866])

The actual predictions made by this model will depend on where we draw the line between "don't give a free gift" and "give a free gift," but if the line is at 0.5, then the last thee customers in our dataset will receive the gift and the first two won't.