### Linear Models in TensorFlow: Timing Factors and Smart-Beta Strategies

**Project Goal**: To build a neural network, under a linear model, that aims to predict the future returns of the momentum factor.

**Data SOurce**

- We are gping to take as inputs the returns from a momentum factor. Using these inputs, we will then aim to predict next period momentum factor returns using as inputs past returns. 

- Data source is Prof. Ken French's Data Library: [Daily returns of 10 Portfolios Formed Daily on Momentum.](https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/Data_Library/det_10_port_form_pr_12_2_daily.html)

In [12]:
import numpy as np
import pandas as pd

**Timing Momentum with (Linear Regression) Neural Networks**

In [13]:
route = "10_Portfolios_Prior_12_2_Daily.CSV"

In [14]:
# Read the csv file again with skipped rows
df = pd.read_csv(route, index_col=0)
# Format the date index
df.index = pd.to_datetime(df.index, format="%Y%m%d")
# Build the MOM strategy: Long "Hi PRIOR" and Short "Lo PRIOR"
df["Mom"] = df["Hi PRIOR"] - df["Lo PRIOR"]
df.head()

Unnamed: 0,Lo PRIOR,PRIOR 2,PRIOR 3,PRIOR 4,PRIOR 5,PRIOR 6,PRIOR 7,PRIOR 8,PRIOR 9,Hi PRIOR,Mom
1926-11-03,-0.12,0.6,-0.09,0.3,-0.51,-0.22,-0.12,0.5,0.13,1.28,1.4
1926-11-04,0.65,1.82,1.34,0.61,1.01,0.64,0.82,0.44,0.48,0.4,-0.25
1926-11-05,-0.84,-0.77,-0.22,-0.15,-0.02,-0.02,-0.07,0.36,0.2,0.08,0.92
1926-11-06,1.03,0.28,0.24,0.4,0.19,0.64,0.1,0.1,0.39,-0.68,-1.71
1926-11-08,-0.06,0.11,1.78,0.28,0.36,0.23,0.3,1.17,0.58,-0.18,-0.12


- **Inputs and Outputs**

In [15]:
df["Ret"] = df["Mom"]
df["Ret10_MOMi"] = df["Mom"].rolling(10).apply(lambda x: np.prod(1 + x / 100) - 1)
df["Ret25_MOMi"] = df["Mom"].rolling(25).apply(lambda x: np.prod(1 + x / 100) - 1)
df["Ret60_MOMi"] = df["Mom"].rolling(60).apply(lambda x: np.prod(1 + x / 100) - 1)
df["Ret120_MOMi"] = df["Mom"].rolling(120).apply(lambda x: np.prod(1 + x / 100) - 1)
df["Ret240_MOMi"] = df["Mom"].rolling(240).apply(lambda x: np.prod(1 + x / 100) - 1)

df["Ret10_hi"] = df["Hi PRIOR"].rolling(10).apply(lambda x: np.prod(1 + x / 100) - 1)
df["Ret25_hi"] = df["Hi PRIOR"].rolling(25).apply(lambda x: np.prod(1 + x / 100) - 1)
df["Ret60_hi"] = df["Hi PRIOR"].rolling(60).apply(lambda x: np.prod(1 + x / 100) - 1)
df["Ret120_hi"] = df["Hi PRIOR"].rolling(120).apply(lambda x: np.prod(1 + x / 100) - 1)
df["Ret240_hi"] = df["Hi PRIOR"].rolling(240).apply(lambda x: np.prod(1 + x / 100) - 1)

df["Ret10_Low"] = df["Lo PRIOR"].rolling(10).apply(lambda x: np.prod(1 + x / 100) - 1)
df["Ret25_Low"] = df["Lo PRIOR"].rolling(25).apply(lambda x: np.prod(1 + x / 100) - 1)
df["Ret60_Low"] = df["Lo PRIOR"].rolling(60).apply(lambda x: np.prod(1 + x / 100) - 1)
df["Ret120_Low"] = df["Lo PRIOR"].rolling(120).apply(lambda x: np.prod(1 + x / 100) - 1)
df["Ret240_Low"] = df["Lo PRIOR"].rolling(240).apply(lambda x: np.prod(1 + x / 100) - 1)

df["Ret60"] = df["Ret60_MOMi"].shift(-60)
df = df.dropna()
df.tail(10)

df = df.drop(
    [
        "Lo PRIOR",
        "PRIOR 2",
        "PRIOR 3",
        "PRIOR 4",
        "PRIOR 5",
        "PRIOR 6",
        "PRIOR 7",
        "PRIOR 8",
        "PRIOR 9",
        "Hi PRIOR",
        "Mom",
    ],
    axis=1,
)

In [16]:
df.head()

Unnamed: 0,Ret,Ret10_MOMi,Ret25_MOMi,Ret60_MOMi,Ret120_MOMi,Ret240_MOMi,Ret10_hi,Ret25_hi,Ret60_hi,Ret120_hi,Ret240_hi,Ret10_Low,Ret25_Low,Ret60_Low,Ret120_Low,Ret240_Low,Ret60
1927-08-19,1.6,-0.007189,-0.011632,0.029566,0.20159,0.275852,-0.003897,0.010958,0.031462,0.181792,0.458303,0.003376,0.02196,-0.000112,-0.021026,0.125174,0.102616
1927-08-20,0.74,-0.016077,-0.002022,0.028443,0.191185,0.267548,-0.006644,0.027704,0.043832,0.192865,0.458015,0.009704,0.02892,0.013091,-0.002974,0.132384,0.08773
1927-08-22,0.71,-0.008596,0.002957,0.033884,0.216675,0.279747,-0.002775,0.025256,0.054955,0.209346,0.459903,0.005975,0.021349,0.018498,-0.010311,0.123046,0.081898
1927-08-23,0.97,0.033152,0.008149,0.038926,0.213431,0.280381,0.031562,0.018638,0.056751,0.207897,0.45932,-0.001583,0.009427,0.015219,-0.00881,0.122026,0.040216
1927-08-24,0.91,0.061988,0.007949,0.066728,0.225576,0.31451,0.07062,0.015491,0.086507,0.208018,0.470046,0.007666,0.006483,0.016347,-0.01861,0.101036,0.036711


- **Train-Test Samples and Scaling**

In [17]:
from sklearn.model_selection import train_test_split

df.reset_index(inplace=True)
df.rename(columns={"index": "Date"}, inplace=True)
df.head()

Unnamed: 0,Date,Ret,Ret10_MOMi,Ret25_MOMi,Ret60_MOMi,Ret120_MOMi,Ret240_MOMi,Ret10_hi,Ret25_hi,Ret60_hi,Ret120_hi,Ret240_hi,Ret10_Low,Ret25_Low,Ret60_Low,Ret120_Low,Ret240_Low,Ret60
0,1927-08-19,1.6,-0.007189,-0.011632,0.029566,0.20159,0.275852,-0.003897,0.010958,0.031462,0.181792,0.458303,0.003376,0.02196,-0.000112,-0.021026,0.125174,0.102616
1,1927-08-20,0.74,-0.016077,-0.002022,0.028443,0.191185,0.267548,-0.006644,0.027704,0.043832,0.192865,0.458015,0.009704,0.02892,0.013091,-0.002974,0.132384,0.08773
2,1927-08-22,0.71,-0.008596,0.002957,0.033884,0.216675,0.279747,-0.002775,0.025256,0.054955,0.209346,0.459903,0.005975,0.021349,0.018498,-0.010311,0.123046,0.081898
3,1927-08-23,0.97,0.033152,0.008149,0.038926,0.213431,0.280381,0.031562,0.018638,0.056751,0.207897,0.45932,-0.001583,0.009427,0.015219,-0.00881,0.122026,0.040216
4,1927-08-24,0.91,0.061988,0.007949,0.066728,0.225576,0.31451,0.07062,0.015491,0.086507,0.208018,0.470046,0.007666,0.006483,0.016347,-0.01861,0.101036,0.036711


In [18]:
df.reset_index(inplace=True, drop=True)

ts = int(0.4 * len(df))
split_time = len(df) - ts
test_time = df.iloc[split_time:, 0:1].values
Ret_vector = df.iloc[split_time:, 1:2].values
df.tail()

Unnamed: 0,Date,Ret,Ret10_MOMi,Ret25_MOMi,Ret60_MOMi,Ret120_MOMi,Ret240_MOMi,Ret10_hi,Ret25_hi,Ret60_hi,Ret120_hi,Ret240_hi,Ret10_Low,Ret25_Low,Ret60_Low,Ret120_Low,Ret240_Low,Ret60
25475,2024-08-29,-4.21,-0.035453,0.063937,-0.114656,-0.021273,0.18238,-0.034336,0.038463,0.002863,0.112082,0.463052,0.001078,-0.023261,0.124712,0.119166,0.183476,0.117438
25476,2024-08-30,0.88,-0.026478,0.078585,-0.125749,0.016932,0.172962,-0.022781,0.041748,-0.018133,0.147941,0.483963,0.003772,-0.03335,0.114849,0.112311,0.210427,0.059617
25477,2024-09-03,-2.0,-0.056702,0.062432,-0.133705,-0.043668,0.150308,-0.101268,-0.013252,-0.069597,0.046629,0.39717,-0.046576,-0.069687,0.067058,0.079807,0.163062,0.098866
25478,2024-09-04,0.49,-0.048273,0.106131,-0.140717,-0.033861,0.162805,-0.093159,0.019845,-0.073135,0.047156,0.41221,-0.046576,-0.076586,0.07182,0.069334,0.162827,0.075027
25479,2024-09-05,0.18,-0.049032,0.051948,-0.157206,-0.045109,0.168755,-0.101854,-0.034501,-0.084556,0.059955,0.449385,-0.054984,-0.080997,0.079483,0.095512,0.18728,0.080071


In [19]:
Xdf, ydf = df.iloc[:, 2:-1], df.iloc[:, -1]
X = Xdf.astype("float32")
y = ydf.astype("float32")

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=ts, shuffle=False
)
n_features = X_train.shape[1]
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

(15288, 15) (10192, 15) (15288,) (10192,)


In [20]:
# Scaling

from sklearn.preprocessing import MinMaxScaler

scaler_input = MinMaxScaler(feature_range=(-1, 1))
scaler_input.fit(X_train)
X_train = scaler_input.transform(X_train)
X_test = scaler_input.transform(X_test)

mean_ret = np.mean(y_train)  # Useful to compute the performance = R2

scaler_output = MinMaxScaler(feature_range=(-1, 1))
y_train = y_train.values.reshape(len(y_train), 1)
y_test = y_test.values.reshape(len(y_test), 1)
scaler_output.fit(y_train)
y_train = scaler_output.transform(y_train)
y_test = scaler_output.transform(y_test)

**MLP Model and Training**

- **Activation function** - The rectified linear unit (**ReLU**).

- **Hiden layers and units within layers** - A total of 3 hidden layers, each layer will have 50, 30, and 10 unts respectively in order from the input layers. 

- **Output layer** - A fully connected layer for the output.

- **Learning rate** - A learning rate of $10^{-5}$. 

- **Optimizer** - The Adam optimizer.

- **Loss function** - Different from the linear regression case, we select a loss functtion based on the **mean absolute error (MAE)**:
$$
\begin{equation*}
    L(y, \hat{y}) = \frac{1}{n}\sum_{i=1}^n |y_i - \hat{y}_i| 
\end{equation*}
$$

In [21]:
import tensorflow as tf

tf.random.set_seed(12345)

act_fun = "relu"  # Activation function
hp_units = 50  # Units in the first hidden layer
hp_units_2 = 30  # Units in the second hidden layer
hp_units_3 = 10  # Units in the third hidden layer

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(units=hp_units, activation=act_fun))
model.add(tf.keras.layers.Dense(units=hp_units_2, activation=act_fun))
model.add(tf.keras.layers.Dense(units=hp_units_3, activation=act_fun))
model.add(tf.keras.layers.Dense(1))

hp_lr = 1e-5

adam = tf.keras.optimizers.Adam(learning_rate=hp_lr)
model.compile(optimizer=adam, loss="mean_absolute_error")

Once we have defined our model, we can train it.

In [22]:
model.fit(X_train, y_train, epochs=30, batch_size=32, verbose=2)

Epoch 1/30
478/478 - 2s - 5ms/step - loss: 0.1380
Epoch 2/30
478/478 - 1s - 1ms/step - loss: 0.1308
Epoch 3/30
478/478 - 1s - 1ms/step - loss: 0.1283
Epoch 4/30
478/478 - 1s - 2ms/step - loss: 0.1274
Epoch 5/30
478/478 - 1s - 1ms/step - loss: 0.1269
Epoch 6/30
478/478 - 1s - 1ms/step - loss: 0.1265
Epoch 7/30
478/478 - 1s - 2ms/step - loss: 0.1262
Epoch 8/30
478/478 - 1s - 1ms/step - loss: 0.1259
Epoch 9/30
478/478 - 1s - 1ms/step - loss: 0.1256
Epoch 10/30
478/478 - 1s - 1ms/step - loss: 0.1254
Epoch 11/30
478/478 - 1s - 1ms/step - loss: 0.1251
Epoch 12/30
478/478 - 1s - 1ms/step - loss: 0.1249
Epoch 13/30
478/478 - 1s - 1ms/step - loss: 0.1247
Epoch 14/30
478/478 - 1s - 1ms/step - loss: 0.1245
Epoch 15/30
478/478 - 1s - 1ms/step - loss: 0.1243
Epoch 16/30
478/478 - 1s - 1ms/step - loss: 0.1240
Epoch 17/30
478/478 - 1s - 1ms/step - loss: 0.1238
Epoch 18/30
478/478 - 1s - 1ms/step - loss: 0.1235
Epoch 19/30
478/478 - 1s - 1ms/step - loss: 0.1232
Epoch 20/30
478/478 - 1s - 1ms/step - lo

<keras.src.callbacks.history.History at 0x2e6cf79eea0>

In [23]:
model.summary()

In the first layer of the model,  there are 800 parameters. These are equal to the number of units in the layer (50) times the number of different inputs (15)-because there would be a weight associated with each input and unit in the layer-plus the bias terms $b$ for each unit (50). Thus, $15 \times 50 + 50 = 800$.

Where does the number of $1530$ parameters from the second layer come from?
This is equal to the number of "inputs" to this layer (which is essentially the number of units in the previous layer (50) times the number of units in the layer (30) plus the bias term for each unit (30). Thus, $50 \times 30 + 30 = 1530$.

**Validation and Early stopping**

- **How many epochs should we use when training our model?** - The choice of the number of epochs that we have used (30) is discretionary. We're going to change this by including **Early stopping** in our training. Meaning, intructing Keras to stop model training when some condition is met. This is done via the **callback API** in Keras.

- **When should we stop training** - Once after each epoch of the training process, we will check if (and how much) the loss function in the validation set decreases. We'll also define a parameter, **patience**, that indicates the number of epochs with no improvement in the validation set that we tolerate before Early stopping training. 

- Defining the characteristics of Early stopping:

1. **The quantity/set to monitor**: in our case the validation set loss function.

2. **The 'mode'**: by setting this to 'min' we ensure training will stop when the quantity set in (1) has stop decreasing.

3. **Patience**: we will allow for 10 epochs with no improvement in minimizing the loss function of the validation set before we stop training. 

4. **restore_best_weights**: due to iteration process, it may be the case that the last iteration before stopping training does not yield the model weights that acheieve the lowest loss function in validation. By settting this option to 'True' we ensure that we keep the weights that achieved the best loss function value (the lowest) in validation.