## Portfolio Construction


Instead of trying to limit the exposure each day I am just looking to ensure that we do not make too many trades based on one factor. This involves initially setting up our portfolio to match an exposure level, and from there and future days simply maintaining roughly the same number of trades towards each factor. So for instance if we want equal factor exposure, we limit the factor exposure initially to 12.5%. 


One thing to note here is that this corresponds to less than 5 trades, at a stock price of 250,000 and so we will never have more than 5 open trades towards one factor at any point in time. Even if the net exposure remained low (say three buys two short sell) but because we do not have infinite money, if we maximized our net exposure by taking more trades (up to reaching the 12.5% threshold) we'd exceed our 10,000,000 limit.

For this reason, though it may mean practically missing out on trades, for now I set a "hard cap" of 5 trades for each factor which will not be exceeded. 

## Imports & Pre-Processing

In [1]:
#General formatting. I use japan as it is relatively small. Only important point here is that datetime needs
#reformated for easy filtering. This is done repeatedly throughout the document as it appears to reset
#I am initially only working with 2017 again just for practical reasons. 
import numpy as np
import sympy as sp
import pandas as pd
from datetime import datetime

japan_trades = pd.read_csv('Japan Trades.csv')
#price_data = pd.read_pickle('price_df.pkl')
#fx_data = pd.read_excel('FX.xlsx',sheet_name='data', skiprows=4, index_col=0)
#mapping = pd.read_csv('mapping.csv')

japan_trades['Date'] = pd.to_datetime(japan_trades['Date'])
#japan_2017 = japan_trades[japan_trades['Date'].dt.year == 2017]
japan_2017 = japan_trades.drop(columns=['returns', 'company_name'])
Days = japan_2017.Date.unique()

data = japan_2017

## Variable Set Up



Note: there are lots of NAs that remain in. These are related to the buy covers, and sells, which I deal with by setting the end date to be the same as the start date. Otherwise filtering for NA's removes this data. They appear in the "Exit Date" and "return" columns but I never explicitly use these. The reason for this is that I have found on one day, we may have multiple trades associated with the same fsym_id being closed. This is an issue, and so I simply do not deal with the "buy cover" or "sell" options for "Side", instead just using the "Exit Date" as I assume every trade must be exercised (as I believe these are not options...?)

In [2]:
total_val = 10000000 
trade_val_init = 250000

#This is effectively a measure of the exposure. For the given numbers we get a value of 1/40, and so if we want
#our maximum factor exposure to be 12% for each factor this means we cannot have a " 5 trade swing" in favour of 
#buy or sell. For example if we had 14 buy trades and 10 sell shorts this is fine, but if it was 15 and 10 we 
#would be over exposed to the given factor. 

stock_weighting = np.round(trade_val_init/total_val,3)

#unique factors and sectors. This is importantly for all days, for each day we may not have a factor / sector present
#and so these need to be re-defined daily. 

factors = list(data['Factor'].unique())
sectors = list(data['Sector'].unique())

#inbuilt maximum net and sector exposure. Can be changed. 
maximum_net_exposure = np.round(1/len(factors),2)
maximum_sector_exposure = np.round(100/len(sectors),0)

#this is set for now, but note that as prices change in the future we may be able to change this. 
maximum_trades = np.round(total_val/trade_val_init,3)
maximum_factor_trades = int(maximum_trades/len(factors))

#This will contain the portfolio for each given day 
portfolios = []


## Valid Exposure Checker

In [3]:
#To make code more efficient, this valid_exposure function is called first to see if our input is already valid.
#It is checking that the length (i.e. number of trades) for a given factor does not exceed the allocation
#for instance the ML models may have given us 10 datapoints for a given factor / day in the Japan Trades.csv file
#and this ensures we do not exceed that. It also ensures we do not allocate too much of our portfolio to one single factor. 

def valid_exposure(data,maximum_net_exposure,max_trades):
    exposure = 0
    total_trades = 0 
    
    for i in ["Sell Short","Buy"]:
        exposure_by_side = data[data.Side == i]
        total_trades += len(exposure_by_side)
        
        if i == "Sell Short":
            exposure += stock_weighting*len(exposure_by_side)
        else:
            exposure -= stock_weighting*len(exposure_by_side)
    
    if np.abs(exposure) <= maximum_net_exposure and total_trades <= max_trades:
        print("hes valid")
        return True
    else:
        return False


## Return Valid Initial Portfolio Function

In [4]:
#data                 : should be a dataframe, pre sorted by both date and factor.
#maximum_net_exposure : the maximum exposure to a given factor. Initially I take them all as 12%
#max_trades           : the maximum number of trades we can make for each factor. 

def get_valid(data,maximum_net_exposure,max_trades):
    data =data.drop_duplicates()
    #calls simpler function for efficiency. 
    if valid_exposure(data,maximum_net_exposure,max_trades) == True:
        return data
    else: 
        
        #sets up our blank portfolio representing the day,factor pair. 
        portfolio = data.head(1)
        
        #extracts positions that are buy or sell short. 
        buys = data[data.Side == "Buy"]
        sell_shorts = data[data.Side == "Sell Short"]
        
        #this difference in length can be seen as the "over-exposure"? For instance if we had 80 buys and 70 short sells
        #we would have 10 difference, and after multiplying by the (initially equal) weighting of each of these
        #we would arrive at 10*0.025 = 0.25 = over exposed! 
        
        difference = np.abs(len(buys) - len(sell_shorts))

        #this sets up our dataframe first by filling with pairs of buy and sell shorts to ensure net_exposure
        #remains un-changed. Entries here are going to be filtered down later in the code.
        for i in range(0,min(len(buys),len(sell_shorts))):
            portfolio = portfolio.append(buys.iloc[[i]])
            portfolio = portfolio.append(sell_shorts.iloc[[i]])
            
        portfolio = portfolio.iloc[1: , :]
        trade_limit = min(difference,max_trades)
        
        #adds our new trades to the portfolio. If we have more buys we add buys, etc. 
        if len(buys) > len(sell_shorts):
            for k in range (0,trade_limit):
                    portfolio = portfolio.append(buys.iloc[[i+k]])
        elif len(sell_shorts) < len(buys):
             for k in range (0,trade_limit):
                    portfolio = portfolio.append(sell_shorts.iloc[[i+k]])
                    
        #this bit is just ensuring we do not make too many trades. It is checking the length of our constructed portfolio
        #If too long, it removes a pair. This will ensure our portfolio remains within 1 unit of the maximum allocated
        #trades. 
        if len(portfolio) > max_trades:
            difference = len(portfolio) - 1 - max_trades
            if difference % 2 == 0:
                sample = int(difference/2)
                for i in ["Sell Short","Buy"]:
                    if i == "Sell Short":
                        portfolio = portfolio.drop(portfolio[portfolio['Side'] == i].sample(n= sample).index)
                    else:
                        portfolio = portfolio.drop(portfolio[portfolio['Side'] == i].sample(n= sample + 1).index)
            else:
                sample = int((-1 + difference)/2 + 1)     
                for i in ["Sell Short","Buy"]:
                    remove = portfolio[portfolio['Side'] == i].sample(n= sample).index

                    portfolio = portfolio.drop(remove)
        #removes the first row that was needed for set-up.
        
        portfolio = portfolio.reset_index(drop = True)
        return portfolio

## Initial Portfolio Construction

In [6]:
#gets the data for the first day
first_day = Days[0]
data_initial = data[data['Date'] == first_day]

initial_portfolio = [] 

#gets the factors for which we may trade for a current day. Most days in the current data we would not be trading every factor.
factors_current_day = data_initial.Factor.unique()

#this is just creating the portfolio from the already defined functions. 
for i in factors_current_day:
    current_factor = data_initial[data_initial.Factor == i ] # get trades based on factor
    initial_portfolio.append(get_valid(current_factor,maximum_net_exposure,maximum_factor_trades))  

day_one = pd.concat(initial_portfolio).reset_index(drop = True) # concat single factor dfs together
day_one['Actual Date'] = Days[0]
day_one

Unnamed: 0.1,Unnamed: 0,fsym_id,Side,Date,Exit Date,Country,Sector,Factor,Actual Date
0,135,C7P0T8-R,Buy,2017-01-09,2017-01-17,Japan,All_sectors,Momentum,2017-01-09
1,49,HLXGRS-R,Sell Short,2017-01-09,2017-01-17,Japan,All_sectors,Momentum,2017-01-09
2,158,HTLNY7-R,Buy,2017-01-09,2017-01-17,Japan,All_sectors,Momentum,2017-01-09
3,162,GW6XYJ-R,Sell Short,2017-01-09,2017-01-17,Japan,All_sectors,Momentum,2017-01-09
4,175,D2WLKV-R,Sell Short,2017-01-09,2017-01-17,Japan,All_sectors,Momentum,2017-01-09
5,54,QH66VR-R,Sell Short,2017-01-09,2017-01-17,Japan,Consumer_Discretionary,Behavioural,2017-01-09
6,13,D66200-R,Buy,2017-01-09,2017-01-17,Japan,Consumer_Discretionary,Behavioural,2017-01-09
7,14,RCHFSX-R,Buy,2017-01-09,2017-01-17,Japan,Consumer_Discretionary,Behavioural,2017-01-09
8,92,T0LBNC-R,Sell Short,2017-01-09,2017-01-17,Japan,Consumer_Discretionary,Behavioural,2017-01-09
9,22,NGSSGB-R,Buy,2017-01-09,2017-01-17,Japan,Consumer_Discretionary,Behavioural,2017-01-09


***
# Future Days



## Close Trades Function 

This closes trades. It is simple right now but will get more complicated when prices are incorporated. 

In [7]:
#day            :is a Timestamp() object, corresponding to the date we are checking.Use Days[i] if possible. 
# prev_day_data :is a dataframe with the portfolio for the previous day. Filtered from OG dataframe. 

def close_trades(prev_day_data,Day):
    prev_day_data['Exit Date'] = pd.to_datetime(prev_day_data['Exit Date'])
    closed_trades = prev_day_data[prev_day_data['Exit Date'] == Day]
    return closed_trades

## Make Trades Function

In [8]:
def make_trades(prev_day_data,date,maximum_factor_trades):
    trades = [] 
    
    #we now want to investigate if we can make any new trades, after we have closed the other trades. 
    current_day_data = data[data['Date'] == date]
    
    #looping over factors, I check if we can make trades. If we can this code also randomly selects them from the potential
    #available trades for the current day. 
    for i in current_day_data['Factor'].unique():
        prev_day_factor = prev_day_data[prev_day_data['Factor'] == str(i)]
        current_day_factor = current_day_data[current_day_data['Factor'] == i]
        #this bit is a messy way of filtering out "buy cover" and "sells" <--- Change ME!!!!
        current_day_factor_buy = current_day_factor[current_day_factor['Side'] == "Buy"]
        current_day_factor_sell = current_day_factor[current_day_factor['Side'] == "Sell Short"]
        current_day_factor = pd.concat([current_day_factor_buy,current_day_factor_sell])
        
        if len(prev_day_factor) < maximum_factor_trades:
            if len(current_day_factor) > 0:
                #this makes the trades, where possible. 
                trades_to_make = min(maximum_factor_trades - len(prev_day_factor),len(current_day_factor))
                trades.append(current_day_factor.sample(n= trades_to_make))
    if len(trades) == 0:
        return []
    else:
        return pd.concat(trades)
    

# Future Day Portfolio Construction

In this function the bulk of the work is done. This is through lots of calls to the pre-defined functions. 

In [14]:
Days = japan_2017.Date.unique()

portfolio_list = []
ct = []

portfolio_list.append(day_one)
check = []

#simulates 100 days 
for i in range (1,60):
    date = Days[i]
    prev_day_data = portfolio_list[i-1].copy()
    closed_trades = close_trades(prev_day_data,date)
    ct.append(closed_trades)
    if len(closed_trades) >0:
        prev_day_data = pd.concat([prev_day_data, closed_trades]).drop_duplicates(keep=False)
        
    check.append(prev_day_data)   
    trades = make_trades(prev_day_data,date,maximum_factor_trades)
    
    if len(trades) != 0:
        portfolio_current_day = pd.concat([prev_day_data,trades])
    else:
        portfolio_current_day = prev_day_data 
    portfolio_current_day['Actual Date'] = np.repeat(Days[i],len(portfolio_current_day))
    portfolio_list.append(portfolio_current_day)

In [15]:
closed_trades = pd.concat(ct)
closed_trades.to_csv('closed.csv')

In [16]:
big_frame = pd.concat(portfolio_list)
big_frame = big_frame.reset_index(drop = True)
big_frame = big_frame.drop(['Unnamed: 0'],axis = 1)

In [17]:
big_frame.to_csv('open.csv')


### Getting Price Data

Type portfolio_list[i] to check out a portfolio at some day in the future. We now make calls to the price data and fx files to get our proper vals. 

In [76]:
#this creates one big dataframe with all of our portfolios in it. This is for simplicity to extract
#the number of shares of each company that we will buy. 

big_frame = pd.concat(portfolio_list)
big_frame = big_frame.reset_index()
big_frame = big_frame.drop(columns = ['index'])
#properly setting up the dates of our big dataframe. 
#THIS NEEDS DONE. IT DOESNT WORK ATM!!

#gets the fx data and price data from the pkl file in the correct format

# fx_data2 = fx_date_conversion(fx_data)
# price_data2 = pickle_conversion(price_data)

#calls function to return dataframe of the prices and fx rate at the point at which we bought our 
#shares. 

#big_data = get_usd_prices(big_frame,price_data2,fx_data2)
#big_data['Price USD per Share'] = big_data['Price']/big_data['fx_rate']
#big_data['# Shares'] = np.floor(trade_val_init/big_data['Price USD']) 



In [79]:
#this code was working but has stopped - the issue is to do with the dates. I hate them!!!
#there are so many formats and so extracting the right info I have found to be super annoying
#as it appears sometimes pandas will autoformat a string. This is a job for tomorrow but the code
#will work if it can correctly extract the date from price_data (which it cannot right now)


# fx_data2 = fx_date_conversion(fx_data)
# price_data2 = pickle_conversion(price_data)


big_data = get_usd_prices(big_frame,price_data,fx_data)
big_data['Price USD per Share'] = big_data['Price']/big_data['fx_rate']
big_data['# Shares'] = np.floor(trade_val_init/big_data['Price USD'])

      fsym_id        Side       Date            Exit Date Country  \
0    NSRD0Y-R         Buy 2017-01-09  2017-01-17 00:00:00   Japan   
1    GC5BYK-R  Sell Short 2017-01-09  2017-01-17 00:00:00   Japan   
2    RMHHXQ-R         Buy 2017-01-09  2017-01-17 00:00:00   Japan   
3    SL0J1F-R  Sell Short 2017-01-09  2017-01-17 00:00:00   Japan   
4    JJ5HJX-R  Sell Short 2017-01-09  2017-01-17 00:00:00   Japan   
..        ...         ...        ...                  ...     ...   
243  DH0XYN-R  Sell Short 2017-01-23           2017-01-31   Japan   
244  V3PN5M-R  Sell Short 2017-01-23           2017-01-31   Japan   
245  N0GYLF-R  Sell Short 2017-01-23           2017-01-31   Japan   
246  V5Z7TW-R         Buy 2017-01-23           2017-01-31   Japan   
247  JXDLKP-R  Sell Short 2017-01-23           2017-01-31   Japan   

               Sector    Factor Actual Date  
0         All_sectors  Momentum  2017-01-09  
1         All_sectors  Momentum  2017-01-09  
2         All_sectors  Momentum  

KeyError: "None of [Index(['G'], dtype='object')] are in the [index]"