# Create New Features

This notebook implements cyclical encoding to transform raw dates into periodic features. Standard integer representation (1â€“12 for months) fails to show that December and January are adjacent. By using trigonometric transformations, we map time onto a 2D unit circle to preserve chronological continuity.

**Mathematical Approach: Cyclical Temporal Encoding**

To ensure the LSTM model correctly interprets the periodic nature of time, we transform discrete temporal variables into a continuous two-dimensional space. For a given temporal feature $T$ with a known period $P$, the encoding is defined as follows:

$$
\begin{cases} 
x_{\text{sin}} = \sin\left(\frac{2\pi \cdot T}{P}\right) \\
x_{\text{cos}} = \cos\left(\frac{2\pi \cdot T}{P}\right) 
\end{cases}
$$

Where the period $P$ is defined according to the specific temporal cycle:
* **Day of the Week:** $T \in \{0, 1, \dots, 6\}$ and $P = 7$
* **Month of the Year:** $T \in \{1, 2, \dots, 12\}$ and $P = 12$

### Imports

In [1]:
# Libraries
import pandas as pd
import numpy as np

# Load the data
x_train = pd.read_csv('train_f_x.csv')
y_train = pd.read_csv('y_train_sncf.csv')
x_test = pd.read_csv('x_test.csv')

# Verification
print("x_train before: ",x_train.head())
print("x_test before: ",x_test.head())

x_train before:           date station  job  ferie  vacances
0  2015-01-01     1J7    1      1         1
1  2015-01-01     O2O    1      1         1
2  2015-01-01     8QR    1      1         1
3  2015-01-01     UMC    1      1         1
4  2015-01-01     FK3    1      1         1
x_test before:              index        date station  job  ferie  vacances
0  2023-01-01_1J7  2023-01-01     1J7    0      1         1
1  2023-01-01_O2O  2023-01-01     O2O    0      1         1
2  2023-01-01_8QR  2023-01-01     8QR    0      1         1
3  2023-01-01_L58  2023-01-01     L58    0      1         1
4  2023-01-01_UMC  2023-01-01     UMC    0      1         1


### Encoding Function

In [2]:
def add_cyclical_features(df):
    """
    Adds sine and cosine transformations for month and day of the week
    to capture cyclical patterns.
    """
    # Ensure date is datetime
    df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
    
    # 1. Day of the Week (0-6)
    # 0 = Monday, 6 = Sunday
    dow = df['date'].dt.dayofweek
    df['dow_sin'] = np.sin(2 * np.pi * dow / 7)
    df['dow_cos'] = np.cos(2 * np.pi * dow / 7)
    
    # 2. Month of the Year (1-12)
    month = df['date'].dt.month
    df['month_sin'] = np.sin(2 * np.pi * month / 12)
    df['month_cos'] = np.cos(2 * np.pi * month / 12)
    
    return df

### Create the new features

In [3]:
# Add the new features
x_train_new = add_cyclical_features(x_train)
x_test_new = add_cyclical_features(x_test)

# Verification
print("x_train: ",x_train_new.head())
print("x_test: ",x_test_new.head())

x_train:          date station  job  ferie  vacances   dow_sin   dow_cos  month_sin  \
0 2015-01-01     1J7    1      1         1  0.433884 -0.900969        0.5   
1 2015-01-01     O2O    1      1         1  0.433884 -0.900969        0.5   
2 2015-01-01     8QR    1      1         1  0.433884 -0.900969        0.5   
3 2015-01-01     UMC    1      1         1  0.433884 -0.900969        0.5   
4 2015-01-01     FK3    1      1         1  0.433884 -0.900969        0.5   

   month_cos  
0   0.866025  
1   0.866025  
2   0.866025  
3   0.866025  
4   0.866025  
x_test:              index       date station  job  ferie  vacances   dow_sin  dow_cos  \
0  2023-01-01_1J7 2023-01-01     1J7    0      1         1 -0.781831  0.62349   
1  2023-01-01_O2O 2023-01-01     O2O    0      1         1 -0.781831  0.62349   
2  2023-01-01_8QR 2023-01-01     8QR    0      1         1 -0.781831  0.62349   
3  2023-01-01_L58 2023-01-01     L58    0      1         1 -0.781831  0.62349   
4  2023-01-01_UMC 2023-

### Export new dataset into CSV files

In [None]:
# Export to CSV files
# x_train_new.to_csv('x_train_new.csv', index=False)
# x_test_new.to_csv('x_test_new.csv', index=False)