# ML Approach

Manuall decisions rules are good, but they only bring you so far.
When combinations of different variables matter it results in a huge space to test.

**Idea**: Generate carefully selected set of feature and train a very light model to "learn" decision rules

**Idea**: Our features will probably not be able to explain all the rejections. This results in a lot of "noise". To make the model a useful component an assymetric loss penalizing incorrectly rejected could be viable

**Idea**: Boosting. Only train on the samples that we cannot reject withour our manual rules.


### Crafting the dataset

In [21]:
import my_parser
import my_classifier
import pandas as pd

all_data = my_parser.get_all()
all_predictions = my_classifier.predict(all_data)
all_labels = [my_classifier.label_map[item['label']['label']] for item in all_data]

In [17]:
indices_accepted = [i for i, pred in enumerate(all_predictions) if pred == 1]
labels = [all_labels[i] for i in indices_accepted]
data = [all_data[i] for i in indices_accepted]
print(len(data))

7454


In [34]:
def get_investment_horizon(x):
    risk_profile = x['client_profile']['investment_horizon']
    if risk_profile not in ['Medium', 'Short', 'Long-Term']:
        risk_profile = 'Short'
    return risk_profile

In [42]:
feature_functions = {
    'risk_profile': lambda x: x['client_profile']['investment_risk_profile'],
    'investment_horizon': get_investment_horizon,
    'investment_experience': lambda x: x['client_profile']['investment_experience'],
    'type_of_mandate': lambda x: x['client_profile']['type_of_mandate']
}

feature_data = []
for entry in data:
    mapped_entry = {key: func(entry) for key, func in feature_functions.items()}
    feature_data.append(mapped_entry)

df = pd.DataFrame(feature_data)
df.head()

Unnamed: 0,risk_profile,investment_horizon,investment_experience,type_of_mandate
0,Considerable,Long-Term,Inexperienced,Advisory
1,High,Medium,Inexperienced,Advisory
2,Moderate,Medium,Expert,Advisory
3,Considerable,Long-Term,Inexperienced,Advisory
4,Considerable,Long-Term,Experienced,Advisory


In [43]:
df.describe(include=['object'])

Unnamed: 0,risk_profile,investment_horizon,investment_experience,type_of_mandate
count,7454,7454,7454,7454
unique,8,3,3,5
top,Moderate,Medium,Experienced,Advisory
freq,3130,3677,4073,3714


In [45]:
# Ensure that each feature is sensible
#print(df['risk_profile'].value_counts())
#print(df['investment_horizon'].value_counts())
#print(df['investment_experience'].value_counts())
print(df['type_of_mandate'].value_counts())

type_of_mandate
Advisory          3714
Discretionary     3661
                    36
Hybrid              24
Execution-Only      19
Name: count, dtype: int64


### Model

In [14]:
predictions = labels

### Evaluation

In [18]:
from sklearn.metrics import (
    accuracy_score,
    precision_score,
    recall_score,
    f1_score,
    confusion_matrix,
    classification_report
)

#def combined_model(X):
#    y = my_classifier.predict(all_data) 
# 

combined_predictions = [1]*len(all_data)
for i in range(len(all_data)):
    if (all_predictions[i] == 0):
        combined_predictions[i] = 0
for j, prediction in zip(indices_accepted, predictions):
    combined_predictions[j] = prediction


In [19]:
print("📊 Classification Report:\n")
print(classification_report(all_labels, combined_predictions, target_names=["Reject", "Accept"]))

📊 Classification Report:

              precision    recall  f1-score   support

      Reject       1.00      1.00      1.00      4992
      Accept       1.00      1.00      1.00      5008

    accuracy                           1.00     10000
   macro avg       1.00      1.00      1.00     10000
weighted avg       1.00      1.00      1.00     10000

