## Programación Entera
### Réplica de un Fondo Indice

Este cuaderno presenta un ejemplo de como elegir un conjunto de valores para 
replicar el IBEX35. La idea consiste en elegir para cada componente del índice
un represente, que puede ser el propio valor o un valor semejante.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import cvxpy as cp
from collections import defaultdict

### Datos
Utilizaremos una ventana de un año de los precios de cierre del IBEX
para construir una matriz de correlación que nos determinará la semejanza
de comportamiento entre valores

In [2]:
import pickle
with open('../data/stock_data.pkl', 'rb') as handle:
    stock_data = pickle.load(handle)

In [3]:
close_dict = {tk: df.close for tk, df in stock_data.items()}
close_df = pd.DataFrame(close_dict)

In [4]:
close_year = close_df.loc['2019-01-02':'2019-12-31']
close_year = close_year.dropna(axis=1)

In [5]:
returns = np.log(close_year).diff()
stock_corr = returns.corr()

In [6]:
stock_corr

Unnamed: 0,SAN,BKIA,NTGY,ACX,FER,ACS,ELE,SAB,AMS,AENA,...,BKT,ENC,IDR,SGRE,MEL,TL5,REE,COL,TEF,CABK
SAN,1.0,0.739687,0.007766,0.576811,0.156128,0.507313,-0.04779,0.747315,0.325301,0.317107,...,0.768438,0.372061,0.394102,0.299583,0.471852,0.31502,-0.051466,0.075263,0.592652,0.724967
BKIA,0.739687,1.0,-0.037723,0.515162,0.050746,0.342619,-0.118702,0.808096,0.185658,0.190344,...,0.826206,0.430492,0.373373,0.233646,0.39394,0.241668,-0.117732,0.030248,0.39161,0.793073
NTGY,0.007766,-0.037723,1.0,-0.000156,0.268261,0.181864,0.643046,-0.069175,0.221182,0.285174,...,-0.006571,-0.003958,0.047715,0.207665,0.032794,0.095785,0.600533,0.281042,0.141321,-0.045406
ACX,0.576811,0.515162,-0.000156,1.0,0.145747,0.380135,-0.076886,0.502492,0.427649,0.236377,...,0.472346,0.42431,0.449797,0.303884,0.517624,0.302951,-0.074667,-0.025816,0.384621,0.510502
FER,0.156128,0.050746,0.268261,0.145747,1.0,0.396293,0.404612,-0.000216,0.272936,0.40403,...,0.041739,0.120055,0.239131,0.202117,0.125389,0.264052,0.21738,0.287724,0.256369,0.025261
ACS,0.507313,0.342619,0.181864,0.380135,0.396293,1.0,0.151146,0.312811,0.406966,0.391628,...,0.380408,0.273991,0.422968,0.370304,0.418018,0.325581,0.140501,0.247896,0.44589,0.300909
ELE,-0.04779,-0.118702,0.643046,-0.076886,0.404612,0.151146,1.0,-0.198834,0.195898,0.258302,...,-0.085288,0.01552,0.017415,0.196946,-0.050329,0.116414,0.623898,0.280932,0.233385,-0.122462
SAB,0.747315,0.808096,-0.069175,0.502492,-0.000216,0.312811,-0.198834,1.0,0.121441,0.200316,...,0.795612,0.359751,0.354379,0.150455,0.37675,0.251105,-0.14811,0.013639,0.379635,0.84603
AMS,0.325301,0.185658,0.221182,0.427649,0.272936,0.406966,0.195898,0.121441,1.0,0.332126,...,0.207229,0.276904,0.426153,0.363843,0.427691,0.25475,0.157691,0.132502,0.318767,0.151253
AENA,0.317107,0.190344,0.285174,0.236377,0.40403,0.391628,0.258302,0.200316,0.332126,1.0,...,0.261332,0.158896,0.269359,0.289383,0.314481,0.254561,0.171544,0.300223,0.38767,0.170134


___

### Ejercicio Propuesto

- Analizar el resultado del ejemplo anterior verificando diferentes numeros de acciones en el fondo de replica 
- Modificar el problema para que ningún valor pueda representar más de 3 activos a la vez

In [7]:
# numero de valores en el indice
n = stock_corr.shape[0]

In [9]:
# numero de valores elegidos para la replica
n_fund = 16

In [10]:
x = cp.Variable(stock_corr.shape, boolean=True)
y = cp.Variable(stock_corr.shape[0], boolean=True)

In [11]:
# Funcion objetivo
objective = cp.sum(cp.multiply(x, stock_corr))

In [12]:
# la suma del vector y debe ser la cantidad de valores seleccionados para el fondo
constraints =[
    cp.sum(y) == n_fund,
]

In [14]:
# Para cada fila solo debemos seleccionar un valor
for i in range(n):
    c_i = cp.sum(x[i,:]) == 1
    constraints.append(c_i)

In [15]:
# Si un valor en el vector y no es seleccionado, su fila correspondiente
# en la matriz de Xs debe estar vacia 
for i in range(n):
    for j in range(n):
        c_ij = x[i,j] <= y[j]
        constraints.append(c_ij)

In [16]:
# Restriccion que ninguna columna sume mas de 3 valores representados
for i in range(n):
    c_i = cp.sum(x[:,i]) <= 3
    constraints.append(c_i)

In [17]:
prob = cp.Problem(cp.Maximize(objective), constraints)
res = prob.solve()
res

26.736190220116168

In [18]:
col_represent = np.argwhere(x.value.T == 1)
col_represent

array([[ 0,  0],
       [ 0, 14],
       [ 0, 24],
       [ 4,  4],
       [ 4,  9],
       [ 4, 21],
       [ 6,  6],
       [ 6, 17],
       [ 6, 30],
       [ 7,  1],
       [ 7,  7],
       [ 7, 33],
       [ 8,  8],
       [ 8, 10],
       [ 8, 26],
       [11,  3],
       [11, 11],
       [11, 25],
       [12, 12],
       [13, 13],
       [15,  5],
       [15, 15],
       [15, 32],
       [16, 16],
       [16, 19],
       [16, 28],
       [18,  2],
       [18, 18],
       [20, 20],
       [22, 22],
       [27, 27],
       [29, 29],
       [31, 23],
       [31, 31]])

In [19]:
tickers = stock_corr.columns
group_represent = defaultdict(list)
for pair in col_represent:
    irep = tickers[pair[0]]
    istock = tickers[pair[1]]
    group_represent[irep].append(istock)
group_represent

defaultdict(list,
            {'SAN': ['SAN', 'BBVA', 'BKT'],
             'FER': ['FER', 'AENA', 'GRF'],
             'ELE': ['ELE', 'IBE', 'REE'],
             'SAB': ['BKIA', 'SAB', 'CABK'],
             'AMS': ['AMS', 'ITX', 'IDR'],
             'MTS': ['ACX', 'MTS', 'ENC'],
             'IAG': ['IAG'],
             'VIS': ['VIS'],
             'MAP': ['ACS', 'MAP', 'TEF'],
             'CIE': ['CIE', 'REP', 'MEL'],
             'ENG': ['NTGY', 'ENG'],
             'ANA': ['ANA'],
             'CLNX': ['CLNX'],
             'SGRE': ['SGRE'],
             'TL5': ['TL5'],
             'COL': ['MRL', 'COL']})