# Coursework 2
# General Instructions
In this CW, we apply predictive modelling in building investment strategies. 

The data required to run this notebook (including the notebook itself) is shared on Moodle-->coursework. The data is saved in a pickle format with the file name "clean_data_v2.pickle". You can find this in CW 1. 

You need to save this file on your PC, and then load it using an appropriate file path. There are five exercises for this CW. 

For this CW, no preliminary codes are provided and you must build the entire notebook by yourself.

No approximated number of lines for this CW is provided. 

Marks for each exercise are shown in brackets; note that these marks are provisional and they might be changed. 

If you need some parameters which are not specified to you, you can choose them at your will but the choice should be justified.

Where applicable, for simplicity, in forming training, validation, and testing set, use only continuous features.

No short selling is allowed.

The weight of a selected loan in a portfolio is either zero or one, i.e. no partial weights are allowed.

It is advised to support your coding with brief comments.

You can copy some of the codes from CW1 if required.


# Provide all the necessary preliminaries such as importing libraries, dataset, etc after this block and before Exercise 3.1.

In [1]:
import pandas as pd
import numpy as np
import pickle
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge, LogisticRegression
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPRegressor, MLPClassifier
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.metrics import roc_auc_score
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import GridSearchCV

default_seed = 1
data, discrete_features, continuous_features = pickle.load( 
    open( "clean_data_v2/clean_data_v2.pickle", "rb" ) )


data = data.rename({'ret_PESS': 'return_1', 'ret_OPT': 'return_2', 'ret_INTa':'return_3a', 'ret_INTb':'return_3b', 'ret_INTc':'return_3c'}, axis=1)
return_features = ['return_1', 'return_2', 'return_3a', 'return_3b', 'return_3c']

data

Unnamed: 0,id,loan_amnt,funded_amnt,term,int_rate,installment,grade,emp_length,home_ownership,annual_inc,...,total_pymnt,last_pymnt_d,recoveries,loan_length,term_num,return_1,return_2,return_3a,return_3b,return_3c
0,40390412,5000.0,5000.0,36 months,12.39,167.01,C,< 1 year,RENT,48000.0,...,5475.140000,2015-12-01,0.0,10.973531,36,0.031676,0.103917,0.031155,0.050634,0.086751
2,40401108,17000.0,17000.0,36 months,12.39,567.82,C,1 year,RENT,53000.0,...,20452.099120,2018-03-01,0.0,37.947391,36,0.067688,0.064215,0.050574,0.066334,0.094950
3,40501689,9000.0,9000.0,36 months,14.31,308.96,C,6 years,RENT,39000.0,...,9792.560000,2015-11-01,0.0,9.987885,36,0.029354,0.105803,0.029798,0.049345,0.085622
4,40352737,14000.0,14000.0,36 months,11.99,464.94,B,6 years,RENT,44000.0,...,16592.911300,2018-01-01,0.0,36.008953,36,0.061736,0.061721,0.047093,0.063007,0.091937
5,40431323,10000.0,10000.0,60 months,19.24,260.73,E,10+ years,MORTGAGE,130000.0,...,15122.079970,2018-10-01,0.0,44.978336,60,0.102442,0.136655,0.113866,0.131897,0.164518
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1048430,139391862,25000.0,25000.0,36 months,19.92,928.08,D,10+ years,RENT,54173.0,...,27257.514580,2019-03-01,0.0,5.946734,36,0.030100,0.182219,0.030735,0.051117,0.089089
1048442,139388430,15000.0,15000.0,60 months,14.47,352.69,C,10+ years,MORTGAGE,115000.0,...,15879.803690,2019-02-01,0.0,5.026797,60,0.011731,0.140018,0.024141,0.044108,0.081340
1048471,139248973,10000.0,10000.0,36 months,20.89,376.19,D,5 years,RENT,75000.0,...,10854.179110,2019-02-01,0.0,5.026797,36,0.028473,0.203910,0.029807,0.050279,0.088453
1048488,138986745,12800.0,12800.0,36 months,24.37,504.68,E,6 years,RENT,60000.0,...,13033.951780,2018-10-01,0.0,0.985647,36,0.006092,0.222524,0.016028,0.035984,0.073344


# Random Based Strategy
# Exercise 3.1 --- [8/30]

In this part, you have to implement the random based strategy, i.e. choose loans completely at random (uniform distribution), and build a portfolio made of these randomly chosen loans. You then calculate the average return (using the test dataset) of an investor if they use this strategy in long run. In what follows, more details
are provided.

Split the dataset into training set, test set, and cross validation (though you will not use the training and the cross validation set for this exercise): 60% training, 20% cross validation, and 20% testing. For default seed use 1. 

In this exercise, use the following notions of returns ["return_1","return_2","return_3a","return_3b","return_3c"], these returns are exactly the same as CW1. For return_3 use the solution of CW1.

The goal is to estimate the average returns (using the test set) that an investor might obtain following this random based strategy. In order to do that fix the number of iteration to be 1000, then for each iteration, you build a random portfolio by randomly selecting 100 loans from the test dataset. Note that in your random selection, no selection with partial weights is allowed. More precisely, if the returns of 100 randomly selected loans are $(r_1, r_2,..., r_{100})$, the weight of each $r_i$, $i=1,2,...,100$ in the random portfolio will be 0.01, however, in the initial random selection the whole $r_i$ is selected.  Based on this dataset and for each notion of return in ['return_1', 'return_2', 'return_3a', 'return_3b', 'return_3c'], calculate the average returns that an investor might obtain following this random strategy.

Provide your solution after this block. Do not forget to print your result.


In [2]:
# Standardize
df = data.copy()
df[continuous_features] = StandardScaler().fit_transform(data[continuous_features] )

# add the bias column
df['one'] = 1

# binarize dault 
df['loan_status'] = df['loan_status'].map(lambda x: 0 if x == "Fully Paid" else 1)

# Split dataset
df_train, df_CV = train_test_split(df, test_size=0.4, random_state=1)
df_CV, df_test = train_test_split(df_CV, test_size=0.5, random_state=1)
df_train['group'] = 'train'
df_CV['group'] = 'CV'
df_test['group'] = 'test'

df_all = pd.concat([df_train, df_CV, df_test])


# strategy
def strategy_random(ret, df_test):
    iter_series = pd.Series(range(1000))
    return(pd.Series({
        'strategy': 'random',
        'name': ret,
        'performance': iter_series.apply( lambda i: df_test[ret].sample(100, random_state=i).mean() ).mean()
    }))


def predict_model(regressor, df_train, df_test, X_columns, y_columns, predict_method, evaluate_method):
    # unravel data
    X_train = df_train[X_columns].values
    X_test = df_test[X_columns].values
    y_train = df_train[y_columns].values.ravel()
    y_test = df_test[y_columns].values.ravel()
    #print(y_test.shape)
    # train 
    regressor.fit(X_train, y_train)
    
    # predict
    if predict_method == 'predict':
        y_pred = regressor.predict(X_test)
    if predict_method == 'predict_prod':
        y_pred = regressor.predict_proba(X_test)[:,0] # proba of 0, proba of 1

    # evaluate    
    if evaluate_method == 'mse':
        mse = mean_squared_error(y_test, y_pred)
        return(y_pred, mse)
    if evaluate_method == 'auc':
        auc = roc_auc_score(y_test, regressor.predict(X_test))
        return(y_pred, auc)


def strategy(return_name, strategy_name, regressor, df_train, df_test, X_columns, y_columns, predict_method, evaluate_method): 
    # Predict model
    y_pred, evaluate_score = predict_model(regressor, df_train, df_test, X_columns, y_columns, predict_method, evaluate_method)
    df_test['predict'] = y_pred

    # Construct portfolio
    df_portfolio = df_test.nlargest(100, columns='predict', keep='all')
    
    # Report 
    return( pd.Series({
        'strategy': strategy_name,
        'name': return_name,
        "predict_meodel_score":evaluate_score,
        'predict_model_evaluate_method': evaluate_method,
        'strategy_performance':df_portfolio[return_name].mean(),
        'regressor': regressor,
        'predict': y_pred,
        'df_portfolio': df_portfolio
    }))

In [3]:
# grid search

def grid_decect(model, parameters, scoring, df_train, df_test, X_columns, y_columns ):
    df_grid = pd.concat([df_train, df_test])
    grid = GridSearchCV(estimator=model, param_grid=parameters, scoring=scoring, cv= 3, n_jobs=-1)
    grid.fit(df_grid[X_columns].values, df_grid[y_columns].values.ravel())
    return(
        grid.best_estimator_, grid
    )
# model = MLPRegressor(hidden_layer_sizes=(20,10), activation='relu', solver='adam', alpha=0.001)

# parameters = {"hidden_layer_sizes": [(200,100),(300,100),(500,100)], 
#               # "activation": ['logistic','tanh','relu'],
#               "alpha": [1,0.0001]
#               }


# regressor_3 = LogisticRegression(fit_intercept=True)

# parameters_3 = {'C':[0.01,0.1,1,10,50,100]}

# grid_decect(model=regressor_3, 
#             parameters=parameters_3,
#             #scoring='neg_mean_squared_error', 
#             scoring='roc_auc',
#             df_train=df_train, 
#             df_test=df_test, 
#             X_columns=continuous_features, “
#             y_columns='loan_status')
            


In [4]:
# randon_strategy
def strategy_random(ret, df_test):
    iter_series = pd.Series(range(1000))
    return(pd.Series({
        'strategy': 'random',
        'name': ret,
        'strategy_performance': iter_series.apply( lambda i: df_test[ret].sample(100, random_state=i).mean() ).mean()
    }))

report_random = pd.Series(return_features).apply(lambda x: strategy_random(x, df_test) )

print(f"The results of all random-select strategy:")
report_random

The results of all random-select strategy:


Unnamed: 0,strategy,name,strategy_performance
0,random,return_1,0.00579
1,random,return_2,0.045715
2,random,return_3a,0.012561
3,random,return_3b,0.028646
4,random,return_3c,0.058246


# Return Based Strategy
# Exercise 3.2 ---- [5/30]

In this section, you implement a return based strategy, i.e. you will estimate the return of the loans (here using linear regression) and choose 100 loans with the highest return. More specifically, use Ridge regression and suppose that you have done your cross validation and it turns out the optimal hyperparameter for alpha is 240.

Use the same dataset and notion of returns as in Exercise 3.1 and only use continuous features. The goal is to estimate the  return (for each notion of return as in Exercise 3.1) that an investor will obtain following this strategy. 

Provide your solution after this block. Do not forget to print your result.




In [5]:
# return_Ridge_strategy
regressor =  Ridge(alpha=240)
report_return_Ridge = pd.Series(return_features).apply( lambda x: strategy(
                                    return_name= x, 
                                    strategy_name='return_Ridge', 
                                    regressor = regressor, 
                                    df_train = df_train, 
                                    df_test = df_test, 
                                    X_columns = continuous_features, 
                                    y_columns = [x],
                                    predict_method = 'predict', 
                                    evaluate_method ='mse'))
print(f"The results of all return-based Ridge strategy:")
report_return_Ridge

The results of all return-based Ridge strategy:


Unnamed: 0,strategy,name,predict_meodel_score,predict_model_evaluate_method,strategy_performance,regressor,predict,df_portfolio
0,return_Ridge,return_1,0.006633,mse,0.021146,Ridge(alpha=240),"[0.018881418402383755, 0.01733544766075227, -0...",id loan_amnt funded_amnt ...
1,return_Ridge,return_2,0.011517,mse,0.069572,Ridge(alpha=240),"[0.0406693590171077, 0.04826630543998791, 0.04...",id loan_amnt funded_amnt ...
2,return_Ridge,return_3a,0.003566,mse,0.024176,Ridge(alpha=240),"[0.022123666267938075, 0.02231995087234273, 0....",id loan_amnt funded_amnt ...
3,return_Ridge,return_3b,0.004077,mse,0.040748,Ridge(alpha=240),"[0.03831642481659679, 0.039642248146672034, 0....",id loan_amnt funded_amnt ...
4,return_Ridge,return_3c,0.00512,mse,0.070931,Ridge(alpha=240),"[0.06802772967868444, 0.07159489394918847, 0.0...",id loan_amnt funded_amnt ...


# Exercise 3.3 ---- [5/30]

Repeat Exercise 3.2 but instead of a linear model use a neural network. Your neural network will have
two hidden layers. You can choose the rest of the parameters of the network as appropriate. Since 
using a cross validation to choose the best architecture takes time, you are not required to perform
any cross validation to choose the optimal architecture. However, the return of your model should be reasonable and at 
least as good as the linear model in the previous exercise.  

Use the same dataset and notions of returns as in Exercise 3.2.

Provide your solution after this block. Do not forget to print your result.


In [7]:
# return_Ridge_strategy
regressor =  MLPRegressor(hidden_layer_sizes=(400,100), alpha=0.001, max_iter=200)
report_return_MLP = pd.Series(return_features).apply( lambda x: strategy(
                                    return_name= x, 
                                    strategy_name='return_MLP', 
                                    regressor = regressor, 
                                    df_train = df_train, 
                                    df_test = df_test, 
                                    X_columns = continuous_features, 
                                    y_columns = [x],
                                    predict_method = 'predict', 
                                    evaluate_method='mse' ) )
print(f"The results of all return-based MLP strategy:")
report_return_MLP 

The results of all return-based MLP strategy:


Unnamed: 0,strategy,name,predict_meodel_score,predict_model_evaluate_method,strategy_performance,regressor,predict,df_portfolio
0,return_MLP,return_1,0.006583,mse,0.019543,"MLPRegressor(alpha=0.001, hidden_layer_sizes=(...","[0.015730082684909833, 0.012932180879288487, -...",id loan_amnt funded_amnt ...
1,return_MLP,return_2,0.011386,mse,0.14104,"MLPRegressor(alpha=0.001, hidden_layer_sizes=(...","[0.030164852457325024, 0.0470829601550298, 0.0...",id loan_amnt funded_amnt ...
2,return_MLP,return_3a,0.003533,mse,0.022841,"MLPRegressor(alpha=0.001, hidden_layer_sizes=(...","[0.017914278465173492, 0.01939347008397902, -0...",id loan_amnt funded_amnt ...
3,return_MLP,return_3b,0.004032,mse,0.039694,"MLPRegressor(alpha=0.001, hidden_layer_sizes=(...","[0.03466666219747323, 0.03972890812677457, 0.0...",id loan_amnt funded_amnt ...
4,return_MLP,return_3c,0.005065,mse,0.068993,"MLPRegressor(alpha=0.001, hidden_layer_sizes=(...","[0.06623340914886669, 0.07045240686231397, 0.0...",id loan_amnt funded_amnt ...


# Default Based Strategy
# Exercise 3.4 --- [8/30]

In this exercise, you will implement a default based strategy, i.e. to select 100 loans with the highest credit quality (a loan with PD of zero has the highest credit quality). For the feature space, use only the same continuous features as Exercise 3.2; you will need to determine the output as well. Split the dataset: 60% training, 20% cross validation (though we don't perform cross validation here), and 20% testing.

Train three machine learning models (of your choice) to estimate the probability of default for these loans. Note that for simplicity, ignore any cross validation analysis that might have been done or required but use reasonable parameters. For instance for a logistic regression model, one can use l2 penalty with C=1, and an appropriate solver of your choice. 

Although in practice, you should provide justification for the choice of the models, you are not required to provide such justification for this exercise.

For each notion of the returns ["return_1","return_2","return_3a","return_3b","return_3c"] estimate the return of this strategy.

Provide your solution after this block. Do not forget to print your result.


In [6]:
# default_RF_strategy
regressor_1 = RandomForestClassifier(max_depth=15, n_estimators=40, random_state=0)
report_default_1 = pd.Series(return_features).apply( lambda x: strategy(
                                    return_name= x, 
                                    strategy_name='default_RF', 
                                    regressor = regressor_1, 
                                    df_train = df_train, 
                                    df_test = df_test, 
                                    X_columns = continuous_features, 
                                    y_columns = ['loan_status'],
                                    predict_method = 'predict_prod', 
                                    evaluate_method='auc' ) )

regressor_2 = MLPClassifier(hidden_layer_sizes=(200,100), activation='relu', solver='adam', alpha=0.01)
report_default_2 = pd.Series(return_features).apply( lambda x: strategy(
                                    return_name= x, 
                                    strategy_name='default_MLP', 
                                    regressor = regressor_2, 
                                    df_train = df_train, 
                                    df_test = df_test, 
                                    X_columns = continuous_features, 
                                    y_columns = ['loan_status'],
                                    predict_method = 'predict_prod', 
                                    evaluate_method='auc' ) )
                                    
regressor_3 = LogisticRegression(C=1,fit_intercept=True)
report_default_3 = pd.Series(return_features).apply( lambda x: strategy(
                                    return_name= x, 
                                    strategy_name='default_LG', 
                                    regressor = regressor_3, 
                                    df_train = df_train, 
                                    df_test = df_test, 
                                    X_columns = continuous_features, 
                                    y_columns = ['loan_status'],
                                    predict_method = 'predict_prod', 
                                    evaluate_method='auc' ) )

regressor_4 = GaussianNB(var_smoothing=1e-8)
report_default_4 = pd.Series(return_features).apply( lambda x: strategy(
                                    return_name= x, 
                                    strategy_name='default_GaussianNB', 
                                    regressor = regressor_4, 
                                    df_train = df_train, 
                                    df_test = df_test, 
                                    X_columns = continuous_features, 
                                    y_columns = ['loan_status'],
                                    predict_method = 'predict_prod', 
                                    evaluate_method='auc' ) )

report_default = pd.concat([report_default_1, report_default_2, report_default_3, report_default_4])
print(f"The results of all default-based strategy:")
report_default

The results of all default-based strategy:


Unnamed: 0,strategy,name,predict_meodel_score,predict_model_evaluate_method,strategy_performance,regressor,predict,df_portfolio
0,default_RF,return_1,0.5,auc,0.014388,"(DecisionTreeClassifier(max_depth=6, max_featu...","[0.8366270972323159, 0.9498385286561117, 0.781...",id loan_amnt funded_amnt ...
1,default_RF,return_2,0.5,auc,0.042119,"(DecisionTreeClassifier(max_depth=6, max_featu...","[0.8366270972323159, 0.9498385286561117, 0.781...",id loan_amnt funded_amnt ...
2,default_RF,return_3a,0.5,auc,0.0196,"(DecisionTreeClassifier(max_depth=6, max_featu...","[0.8366270972323159, 0.9498385286561117, 0.781...",id loan_amnt funded_amnt ...
3,default_RF,return_3b,0.5,auc,0.037146,"(DecisionTreeClassifier(max_depth=6, max_featu...","[0.8366270972323159, 0.9498385286561117, 0.781...",id loan_amnt funded_amnt ...
4,default_RF,return_3c,0.5,auc,0.069576,"(DecisionTreeClassifier(max_depth=6, max_featu...","[0.8366270972323159, 0.9498385286561117, 0.781...",id loan_amnt funded_amnt ...
0,default_MLP,return_1,0.543199,auc,0.020928,"MLPClassifier(alpha=0.01, hidden_layer_sizes=(...","[0.8743538157277829, 0.9588647453303993, 0.768...",id loan_amnt funded_amnt ...
1,default_MLP,return_2,0.538541,auc,0.043048,"MLPClassifier(alpha=0.01, hidden_layer_sizes=(...","[0.8738600874604643, 0.9603061965536426, 0.750...",id loan_amnt funded_amnt ...
2,default_MLP,return_3a,0.548463,auc,0.020422,"MLPClassifier(alpha=0.01, hidden_layer_sizes=(...","[0.8228179236041214, 0.972975713392218, 0.7487...",id loan_amnt funded_amnt ...
3,default_MLP,return_3b,0.539942,auc,0.040689,"MLPClassifier(alpha=0.01, hidden_layer_sizes=(...","[0.8316139130072766, 0.9627460329449162, 0.753...",id loan_amnt funded_amnt ...
4,default_MLP,return_3c,0.549677,auc,0.072042,"MLPClassifier(alpha=0.01, hidden_layer_sizes=(...","[0.8441265712656183, 0.970753534333086, 0.7433...",id loan_amnt funded_amnt ...


# Exercise 3.5 [Max 300 words] --- [4/30]

Compare the last three strategies, i.e. random, return (as in the two models of Exercises 3.2, 3.3), and default; which strategy you will pick and why.

Provide and explain a new investment strategy that is different from those strategies explained above.

Your explanations should be right to the point, concise, clear, and free from spelling error and grammatically correct.

Write your answer below:


In [8]:
report_all = pd.concat([report_random, report_return_Ridge, report_return_MLP ,report_default])
report_all_pretty = report_all[['strategy_performance', 'predict_meodel_score','predict_model_evaluate_method']]


header = pd.MultiIndex.from_product([report_all['strategy'].unique().tolist(), report_all['name'].unique().tolist()])

report_all_pretty.index=header

# report_all_pretty
report_all_pretty2 = report_all_pretty[['strategy_performance']]


print(f"The results of all strategy:")
report_all_pretty2.unstack(level=0)


The results of all strategy:


Unnamed: 0_level_0,strategy_performance,strategy_performance,strategy_performance,strategy_performance,strategy_performance,strategy_performance,strategy_performance
Unnamed: 0_level_1,default_GaussianNB,default_LG,default_MLP,default_RF,random,return_MLP,return_Ridge
return_1,0.021133,0.016522,0.020928,0.014388,0.00579,0.019543,0.021146
return_2,0.078838,0.042125,0.043048,0.042119,0.045715,0.14104,0.069572
return_3a,0.025707,0.019009,0.020422,0.0196,0.012561,0.022841,0.024176
return_3b,0.043314,0.035275,0.040689,0.037146,0.028646,0.039694,0.040748
return_3c,0.075799,0.065171,0.072042,0.069576,0.058246,0.068993,0.070931


#### Answer:


**1. I will pick the return-based strategy for two reasons:**

    (1) The return-based strategies have a better return than the default-based strategies. In other words, the return-based strategies tend to select non-default loans with high returns. (In our dataset, default loans have low or even negative returns. Thus, in some way, the returns has already been risk-adjusted, and the return-based strategies selected the safe enough loans with high returns) 

    (2) I have a preference for high return than low risk. But I admit the default-based strategies may have lower default risk. So risk averters may prefer the default-based strategies. 



**2. New strategy: loan-length-based strategy (a.k.a liquidity-based strategy):**

    (1) What is the loan-length-based strategy:
    
        Some investors may want to manage their liquidity strictly. To be more specific, in order to prepare enough capital for future investment opportunities and payment plans, investors may want to get back the principal and interest within a predetermined period, like six months. 

        So, the loan-length-based strategy is to choose loans which can guarantee the repayment of interest within the predetermined period and, at the same time, has high returns. The strategy can be represented mathematically:

$$
\text{Strateg:  }\max_{\{\text{loan}_1, \text{ loan}_2... \text{ loan}_n\}} \frac{1}{n} \sum \limits_{1}^{n}  \text{adjusted-return}_i

= \max_{\{\text{loan}_1, \text{ loan}_2... \text{ loan}_n\}} \frac{1}{n} \sum \limits_{1}^{n} 
 ( \text{return}_i * e^{\text{PL}_i} * e^{(1-\text{PD}_i )}  )
$$

$$
\text{PD}_i \text{ is the default probability of loan}_i \\
\text{PL}_i \text{ is the probability of the loan}_i \text{ get back the repayment with the predetermined period}
$$

    (2) How to implement the strategy:

        step1: Predict the default probability

        step2: Predict the probability of getting back the repayment within the predetermined period. And the predetermined period is given by investors according to their request or needs.  

        step3: Predict the return

        step4: build the portfolio by selecting the top 100 loans according to the adjusted return
        


## A case of loan-length-based strategy return

In [9]:
def strategy_loan_lenth(required_loan_lenth, return_name, strategy_name, regressor, classifier, df_train, df_test, X_columns): 
    # Predict model
    non_PD_pred, evaluate_score = predict_model(classifier, df_train, df_test, X_columns, ['loan_status'], 'predict_prod', 'auc')

    #print( [return_name])
    return_pred, evaluate_score = predict_model(regressor, df_train, df_test, X_columns, [return_name], 'predict', 'mse')

    df_train_c =df_train.copy()
    df_test_c =df_test.copy()
    df_test_c['loan_length'] = 1 * (df_test_c['loan_length']<=required_loan_lenth)
    df_train_c['loan_length'] = 1 * (df_train_c['loan_length']<=required_loan_lenth)
    X_columns = [ i for i in X_columns if i !='loan_length'] 

    non_PL_pred, evaluate_score = predict_model(classifier, df_train_c, df_test_c, X_columns, ['loan_length'], 'predict_prod', 'auc')

    y_pred = return_pred * np.exp(1-non_PL_pred) * np.exp(non_PD_pred)
    df_test['predict'] = y_pred

    # Construct portfolio
    df_portfolio = df_test.nlargest(100, columns='predict', keep='all')
    
    # Report 
    return( pd.Series({
        'strategy': strategy_name,
        'name': return_name,
        #"predict_meodel_score":evaluate_score,
        #'predict_model_evaluate_method': evaluate_method,
        'strategy_performance':df_portfolio[return_name].mean(),
        'regressor': regressor,
        'predict': y_pred,
        'df_portfolio': df_portfolio
    }))



regressor= Ridge(alpha=240)
classifier = LogisticRegression()
report_loan_lenth = pd.Series(return_features).apply( lambda x: strategy_loan_lenth(
                                    required_loan_lenth =  12,
                                    return_name= x, 
                                    strategy_name='loan_lenth_GaussianNB', 
                                    regressor = regressor, 
                                    classifier= classifier,
                                    df_train = df_train, 
                                    df_test = df_test, 
                                    X_columns = continuous_features ) )
report_loan_lenth

Unnamed: 0,strategy,name,strategy_performance,regressor,predict,df_portfolio
0,loan_lenth_GaussianNB,return_1,0.019595,Ridge(alpha=240),"[0.054329655690071796, 0.06483817446754951, -0...",id loan_amnt funded_amnt ...
1,loan_lenth_GaussianNB,return_2,0.068822,Ridge(alpha=240),"[0.11702257878340458, 0.18052600626560145, 0.1...",id loan_amnt funded_amnt ...
2,loan_lenth_GaussianNB,return_3a,0.023097,Ridge(alpha=240),"[0.0636589447531857, 0.08348125165781219, 0.01...",id loan_amnt funded_amnt ...
3,loan_lenth_GaussianNB,return_3b,0.038268,Ridge(alpha=240),"[0.1102522132181241, 0.1482702409490747, 0.068...",id loan_amnt funded_amnt ...
4,loan_lenth_GaussianNB,return_3c,0.067393,Ridge(alpha=240),"[0.19574393470109208, 0.2677797721585774, 0.17...",id loan_amnt funded_amnt ...
