# Diseño de carteras de inversión

El objetivo buscado en este notebook es analizar las posibilidades de diselar una cartera de inversión a partir de los resultados de los modelos de machine learning sobre el comportamiento bursatil de valores.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sn
from sklearn.metrics import confusion_matrix, classification_report

In [2]:
df = pd.read_csv("../tablas/dataformodel.csv", usecols=['Price_d', 'Price_d+180',
                                                        'quantile_PER', 'var_quantile_PER',
                                                        'quantile_PBC', 'var_quantile_PBC',
                                                        'quantile_ROA',
                                                        'quantile_Dot_dudosos', 'var_quantile_Dot_dudosos',
                                                        'Etiqueta', 'Periodo','Ticker'])

df=df.replace([np.inf, -np.inf], np.nan)
for column in df.columns:
    df=df[df[column].notnull()]
df=df.reset_index(drop=True)

## Preparación del modelo

In [3]:
trim_seleccionado = '2016Q1'

In [4]:
df_train = df[df.Periodo<trim_seleccionado]
df_test = df[df.Periodo==trim_seleccionado]

In [5]:
X_train=df_train[['quantile_PER', 'var_quantile_PER',
                  'quantile_PBC', 'var_quantile_PBC',
                  'quantile_ROA',
                  'quantile_Dot_dudosos', 'var_quantile_Dot_dudosos']].values
y_train=df_train['Etiqueta'].values

In [6]:
len(X_train)

564

In [7]:
X_test=df_test[['quantile_PER', 'var_quantile_PER',
                  'quantile_PBC', 'var_quantile_PBC',
                  'quantile_ROA',
                  'quantile_Dot_dudosos', 'var_quantile_Dot_dudosos']].values
y_test=df_test['Etiqueta'].values

In [8]:
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.metrics import recall_score
from sklearn.metrics import make_scorer

### Kneighbors

In [9]:
# Load the library
from sklearn.neighbors import KNeighborsClassifier
# Create an instance of the classifier
clfk=KNeighborsClassifier(n_neighbors=5)
# Fit the data
clfk.fit(X_train,y_train)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                     weights='uniform')

In [10]:
from sklearn.model_selection import GridSearchCV

In [11]:
knGrid = GridSearchCV(clfk,cv=5,scoring="accuracy",param_grid={'n_neighbors':np.arange(1,20)})
knGrid.fit(X_train,y_train)
knGrid.best_params_

{'n_neighbors': 3}

In [12]:
best_param=knGrid.best_params_.get('n_neighbors')

In [13]:
clfk=KNeighborsClassifier(n_neighbors=best_param)
clfk.fit(X_train,y_train)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=3, p=2,
                     weights='uniform')

In [14]:
cross_val_score(clfk,X_train,y_train,cv=5,scoring="accuracy").mean()

0.471665613147914

In [15]:
print(classification_report(y_test,clfk.predict(X_test)))

              precision    recall  f1-score   support

       Mejor       0.67      0.67      0.67        18
        Peor       0.57      0.57      0.57        14

    accuracy                           0.62        32
   macro avg       0.62      0.62      0.62        32
weighted avg       0.62      0.62      0.62        32



In [16]:
clfk.score(X_test, y_test, sample_weight=None)

0.625

Creamos un dataframe con la probabilidad asignada por el modelo a cada registro del grupo test

In [17]:
df_prob=pd.DataFrame({'Probabilidad':clfk.predict_proba(X_test)[:,0],'Prediction':clfk.predict(X_test),'Actual':y_test})
df_prob.index=df_test.index
len(df_prob)

32

Seleccionamos los 10 con mayor probabilidad de ser clasificados como 'Mejor' y los 10 con mayor probabilidad de clasificarse como 'Peor'

In [18]:
best=df_prob.sort_values('Probabilidad').head(10)
worst=df_prob.sort_values('Probabilidad').tail(10)
cartera=best.append(worst)

In [19]:
cartera

Unnamed: 0,Probabilidad,Prediction,Actual
876,0.0,Peor,Peor
160,0.0,Peor,Mejor
218,0.0,Peor,Peor
905,0.333333,Peor,Peor
470,0.333333,Peor,Peor
325,0.333333,Peor,Mejor
296,0.333333,Peor,Mejor
586,0.333333,Peor,Mejor
268,0.333333,Peor,Peor
189,0.333333,Peor,Peor


Evaluación de la cartera seleccionada

In [20]:
def resultado(row):   
    if row['Prediction']=='Mejor' and row['Actual']=='Mejor':
        return 'True positive'
    elif row['Prediction']=='Peor' and row['Actual']=='Peor':
        return 'True negative'
    elif row['Prediction']=='Peor' and row['Actual']=='Mejor':
        return 'False negative'
    else:
        return 'False positive'

In [21]:
cartera['Resultado']=cartera.apply(resultado,axis=1)
cartera.groupby('Resultado').count()

Unnamed: 0_level_0,Probabilidad,Prediction,Actual
Resultado,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
False negative,4,4,4
False positive,4,4,4
True negative,6,6,6
True positive,6,6,6


Cálculo de posibles ganancias/pérdidas

In [22]:
#Volcamos los precios de cada valor de nuestra cartera

cartera = cartera.join(df_test)

#df_valores.drop(['Precio'], axis=1, inplace=True)

#df_valores.rename(columns={'Price_b100':'Price_d+180_100'}, inplace=True)

In [23]:
res_cartera=cartera[['Prediction','Price_d','Price_d+180']]
res_cartera

Unnamed: 0,Prediction,Price_d,Price_d+180
876,Peor,42.43,40.21
160,Peor,15.52,19.46
218,Peor,10.41,10.72
905,Peor,28.15,29.58
470,Peor,107.17,110.15
325,Peor,54.44,59.81
296,Peor,22.34,29.37
586,Peor,17.54,21.04
268,Peor,18.96,19.29
189,Peor,22.84,24.74


In [24]:
def resultado(row):
    if row['Prediction'] == 'Peor':
        return (row['Price_d'] - row['Price_d+180'])/row['Price_d']
    else:
        return (row['Price_d+180'] - row['Price_d'])/row['Price_d']

In [25]:
inversion = 1000
res_cartera['Resultado']=res_cartera.apply(resultado,axis=1)*inversion

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [26]:
res_cartera

Unnamed: 0,Prediction,Price_d,Price_d+180,Resultado
876,Peor,42.43,40.21,52.321471
160,Peor,15.52,19.46,-253.865979
218,Peor,10.41,10.72,-29.779059
905,Peor,28.15,29.58,-50.79929
470,Peor,107.17,110.15,-27.806289
325,Peor,54.44,59.81,-98.640705
296,Peor,22.34,29.37,-314.682184
586,Peor,17.54,21.04,-199.5439
268,Peor,18.96,19.29,-17.405063
189,Peor,22.84,24.74,-83.187391


In [27]:
res_cartera.Resultado.sum()

215.0299657157165