# Diseño de carteras de inversión

El objetivo buscado en este notebook es analizar las posibilidades de diselar una cartera de inversión a partir de los resultados de los modelos de machine learning sobre el comportamiento bursatil de valores.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sn
from sklearn.metrics import confusion_matrix, classification_report

import warnings
warnings.filterwarnings("ignore")

In [2]:
df = pd.read_csv("../tablas/dataformodel.csv", usecols=['Price_d', 'Price_d+180',
                                                        'quantile_PER', 'var_quantile_PER',
                                                        'quantile_PBC', 'var_quantile_PBC',
                                                        'quantile_ROA',
                                                        'quantile_Dot_dudosos', 'var_quantile_Dot_dudosos',
                                                        'Etiqueta', 'Periodo','Ticker'])

df=df.replace([np.inf, -np.inf], np.nan)
for column in df.columns:
    df=df[df[column].notnull()]
df=df.reset_index(drop=True)

## Preparación del modelo

In [3]:
trim_seleccionado = '2018Q3'

In [4]:
df_train = df[df.Periodo<trim_seleccionado]
df_test = df[df.Periodo==trim_seleccionado]

In [5]:
X_train=df_train[['quantile_PER', 'var_quantile_PER',
                  'quantile_PBC', 'var_quantile_PBC',
                  'quantile_ROA',
                  'quantile_Dot_dudosos', 'var_quantile_Dot_dudosos']].values
y_train=df_train['Etiqueta'].values

In [6]:
len(X_train)

884

In [7]:
X_test=df_test[['quantile_PER', 'var_quantile_PER',
                  'quantile_PBC', 'var_quantile_PBC',
                  'quantile_ROA',
                  'quantile_Dot_dudosos', 'var_quantile_Dot_dudosos']].values
y_test=df_test['Etiqueta'].values

In [8]:
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.metrics import recall_score
from sklearn.metrics import make_scorer

### Kneighbors

In [9]:
# Load the library
from sklearn.neighbors import KNeighborsClassifier
# Create an instance of the classifier
clfk=KNeighborsClassifier(n_neighbors=5)
# Fit the data
clfk.fit(X_train,y_train)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                     weights='uniform')

In [10]:
from sklearn.model_selection import GridSearchCV

In [11]:
knGrid = GridSearchCV(clfk,cv=5,scoring="accuracy",param_grid={'n_neighbors':np.arange(1,20)})
knGrid.fit(X_train,y_train)
knGrid.best_params_

{'n_neighbors': 13}

In [12]:
best_param=knGrid.best_params_.get('n_neighbors')

In [13]:
clfk=KNeighborsClassifier(n_neighbors=best_param)
clfk.fit(X_train,y_train)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=13, p=2,
                     weights='uniform')

In [14]:
cross_val_score(clfk,X_train,y_train,cv=5,scoring="accuracy").mean()

0.5146892655367232

In [15]:
print(classification_report(y_test,clfk.predict(X_test)))

              precision    recall  f1-score   support

       Mejor       0.55      0.30      0.39        20
        Peor       0.30      0.55      0.39        11

    accuracy                           0.39        31
   macro avg       0.42      0.42      0.39        31
weighted avg       0.46      0.39      0.39        31



In [16]:
clfk.score(X_test, y_test, sample_weight=None)

0.3870967741935484

Creamos un dataframe con la probabilidad asignada por el modelo a cada registro del grupo test

In [17]:
df_prob=pd.DataFrame({'Probabilidad':clfk.predict_proba(X_test)[:,0],'Prediction':clfk.predict(X_test),'Actual':y_test})
df_prob.index=df_test.index
len(df_prob)

31

Seleccionamos los 10 con mayor probabilidad de ser clasificados como 'Mejor' y los 10 con mayor probabilidad de clasificarse como 'Peor'

In [18]:
best=df_prob.sort_values('Probabilidad').head(10)
worst=df_prob.sort_values('Probabilidad').tail(10)
cartera=best.append(worst)

In [19]:
cartera

Unnamed: 0,Probabilidad,Prediction,Actual
886,0.076923,Peor,Peor
57,0.230769,Peor,Mejor
393,0.230769,Peor,Mejor
199,0.307692,Peor,Peor
683,0.307692,Peor,Mejor
625,0.307692,Peor,Mejor
335,0.307692,Peor,Mejor
915,0.384615,Peor,Peor
828,0.384615,Peor,Mejor
596,0.384615,Peor,Mejor


Evaluación de la cartera seleccionada

In [20]:
def resultado(row):   
    if row['Prediction']=='Mejor' and row['Actual']=='Mejor':
        return 'True positive'
    elif row['Prediction']=='Peor' and row['Actual']=='Peor':
        return 'True negative'
    elif row['Prediction']=='Peor' and row['Actual']=='Mejor':
        return 'False negative'
    else:
        return 'False positive'

In [21]:
cartera['Resultado']=cartera.apply(resultado,axis=1)
cartera.groupby('Resultado').count()

Unnamed: 0_level_0,Probabilidad,Prediction,Actual
Resultado,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
False negative,7,7,7
False positive,4,4,4
True negative,3,3,3
True positive,6,6,6


Cálculo de posibles ganancias/pérdidas

In [22]:
#Volcamos los precios de cada valor de nuestra cartera

cartera = cartera.join(df_test)

#df_valores.drop(['Precio'], axis=1, inplace=True)

#df_valores.rename(columns={'Price_b100':'Price_d+180_100'}, inplace=True)

In [23]:
res_cartera=cartera[['Prediction','Price_d','Price_d+180']]
res_cartera

Unnamed: 0,Prediction,Price_d,Price_d+180
886,Peor,49.55,45.79
57,Peor,87.19,99.38
393,Peor,103.92,112.44
199,Peor,35.69,33.95
683,Peor,247.36,250.57
625,Peor,123.26,132.06
335,Peor,73.58,79.31
915,Peor,38.87,38.31
828,Peor,61.09,68.24
596,Peor,27.2,28.58


In [24]:
def resultado(row):
    if row['Prediction'] == 'Peor':
        return (row['Price_d'] - row['Price_d+180'])/row['Price_d']
    else:
        return (row['Price_d+180'] - row['Price_d'])/row['Price_d']

In [25]:
inversion = 1000
res_cartera['Resultado']=res_cartera.apply(resultado,axis=1)*inversion

In [26]:
res_cartera

Unnamed: 0,Prediction,Price_d,Price_d+180,Resultado
886,Peor,49.55,45.79,75.882947
57,Peor,87.19,99.38,-139.809611
393,Peor,103.92,112.44,-81.986143
199,Peor,35.69,33.95,48.753152
683,Peor,247.36,250.57,-12.977038
625,Peor,123.26,132.06,-71.393802
335,Peor,73.58,79.31,-77.874422
915,Peor,38.87,38.31,14.406998
828,Peor,61.09,68.24,-117.040432
596,Peor,27.2,28.58,-50.735294


In [27]:
res_cartera.Resultado.sum()

-315.8447204942669