# MLE challenge - Features engineering

### Notebook 1

In this notebook we compute five features for the **credit risk** dataset. 
Each row in the dataset consists of the credit that a user took on a given date.

These features are roughly defined as follows:

**nb_previous_loans:** number of loans granted to a given user, before the current loan.

**avg_amount_loans_previous:** average amount of loans granted to a user, before the current rating.

**age:** user age in years.

**years_on_the_job:** years the user has been in employment.

**flag_own_car:** flag that indicates if the user has his own car.



In [None]:
import pandas as pd

In [None]:
df = pd.read_csv('data/input/dataset_credit_risk.csv')

In [None]:
df.shape

In [None]:
df = df.sort_values(by=["id", "loan_date"])
df = df.reset_index(drop=True)
df["loan_date"] = pd.to_datetime(df.loan_date)
df.head(2)

#### Feature nb_previous_loans

In [None]:
df_grouped = df.groupby("id")
df["nb_previous_loans"] = df_grouped["loan_date"].rank(method="first") - 1

#### Feature avg_amount_loans_previous

In [None]:
df['avg_amount_loans_previous'] = (
    df.groupby('id')['loan_amount'].apply(lambda x: x.shift().expanding().mean())
)

#### Feature age

In [None]:
from datetime import datetime, date

In [None]:
df['birthday'] = pd.to_datetime(df['birthday'], errors='coerce')


In [None]:
df['age'] = (pd.to_datetime('today').normalize() - df['birthday']).dt.days // 365

#### Feature years_on_the_job

In [None]:
df['job_start_date'] = pd.to_datetime(df['job_start_date'], errors='coerce')

In [None]:
df['years_on_the_job'] = (pd.to_datetime('today').normalize() - df['job_start_date']).dt.days // 365

#### Feature flag_own_car

In [None]:
df['flag_own_car'] = df.flag_own_car.apply(lambda x : 0 if x == 'N' else 1)

## Save dataset for model training

In [None]:
df = df[['id', 'age', 'years_on_the_job', 'nb_previous_loans', 'avg_amount_loans_previous', 'flag_own_car', 'status']]


In [None]:
df.to_csv('train_model.csv', index=False)