Using Lasso regression to choose features for predicting customer spend

Given a number of data elements about
customers: how much they spent in the previous year
(prev_year_spend), the number of days since their last
purchase (days_since_last_purchase), the number of
days since their first purchase days_since_first_purchase), the total number of
transactions (total_transactions), the customer’s age
(age), the customer’s income (income), and a customer
engagement score (engagement_score), which is a score
created based on customers’ engagement with previous
marketing offers. You are asked to investigate which of these
is related to the customer spend in the current year
(cur_year_spend), and create a simple linear model to
describe these relationships.

2. Use train_test_split from sklearn to split the data into training and test sets, with random_state=100 and cur_year_spend as the y variable:
3. Import Lasso from sklearn and fit a lasso model (with normalize=True and random_state=10) to the training data.
4. Get the coefficients from the lasso model, and store the names of the features that have nonzero coefficients along with their coefficient values in the selected_features and selected_coefs variables, respectively.
5. Print out the names of the features with nonzero coefficients and their associated coefficient values using the following code.

In [4]:
import pandas as pd

df = pd.read_csv('data_science/customer_spend.csv')
df.head()

Unnamed: 0,cur_year_spend,prev_year_spend,days_since_last_purchase,days_since_first_purchase,total_transactions,age,income,engagement_score
0,5536.46,1681.26,7,61,34,61,97914.93,-0.652392
1,871.41,1366.74,12,34,33,68,30904.69,0.007327
2,2046.74,1419.38,10,81,22,54,48194.59,0.221666
3,4662.7,1561.21,12,32,34,49,93551.98,1.149641
4,3539.46,1397.6,17,72,34,66,66267.57,0.835834


In [5]:
from sklearn.model_selection import train_test_split

cols = df.columns[1:]
X = df[cols]

y = df['cur_year_spend']

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 100)

In [6]:
from sklearn.linear_model import Lasso

lasso_model = Lasso(normalize=True, random_state=10)
lasso_model.fit(X_train,y_train)

Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000, normalize=True,
      positive=False, precompute=False, random_state=10, selection='cyclic',
      tol=0.0001, warm_start=False)

In [7]:
coefs = lasso_model.coef_
selected_features = cols[coefs > 0]
selected_coefs = coefs[coefs > 0]

In [9]:
for coef, feature in zip(selected_coefs, selected_features):
    print(feature + ' coefficient: ' + str(coef))

prev_year_spend coefficient: 0.7986123135389838
days_since_first_purchase coefficient: 14.244498212235905
total_transactions coefficient: 46.312327266441415
income coefficient: 0.05781233517079364
