<a href="https://colab.research.google.com/github/maxoverhere/hardcore/blob/main/4_real.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Constants


In [None]:
RUN = False # create graphs and run tests
COLAB = True # running on google colab (afects save and load file locations)

# data generation config
MAX_VOLUME = 5
DAY_LEN = 1440
NUM_CCYS = 3

# environment config 
NUM_BUCKETS = 12
ACTION_BUCKETS = 2
NUM_TIME_PERIODS = 24 * 3
LEN_TIME_PERIOD = DAY_LEN // NUM_TIME_PERIODS
SMALLEST_BUCKET = 1
NUM_WAIT_BUCKET = 2

#Data Generation

The data will be generated at runtime so no fat training files needed. Also all the training data is totally reproucable as genrated from a seed. What is needed though is the method that is to be used to generate the data. 

100 days will be a run, so this data will be generated to match that. The data will be based arround the behavior of a number of clients so the data that is need to train is the features of these clients. The clients that are generated will be generated so as to reflect a number of currencies and each currencies charactaristics.

###Client Charactaristics

variable | value | meaning
--- | --- | ---
buyCcy | 0 - 9 | the currency they buy from the brokerage
sellCcy | 0 - 9 | the currency they sell from the brokerage
trading_time | 0 - 1440 | minute of the day the client is most likely to trade at
volume | 0 - 5 | the logatithmic total volume a clients trades
frequency | 0 - 2 | the frequency of client trades from 0 being once a week to 3 being many trades a day
time_regular | true - false | the regularity of client trade times
volume_regular | true - false | the regularity of client trade volumes

###Currency Charactaristics

variable | value | meaning
--- | --- | ---
id | 0 - 9 | the identity, will be in decending order with respect to volume traded
volume | 0 - 5 | the amount a currency is traded (logorithmic)
trading_hour | 0 - 23 | the hour that is most likely for a client based in that country to be trading at

###Currency Bias

An upper traingular matrix storing the chance of a generated client exhanging the two currenices will take a particular side


## Currency Data
The data will have two hubs to emulate the way that there are hubs that work at differnt times, like Asia having peek trading at diffrent hours to the Americas.

To emulate this I will have one group arround 8:00 and the other at 15:00 with the biggest two currencies in terms of volume being in diffrent groups.

There will be two major currencies and these currencies will have a volume size 5 and 4 with all other currencies having smaller volumes.

If the volume of a currency is 5 it means the chance of that currency being used in a random transaction as compared to a currency of volume 0 should be a ration of 32 (2 raised by 5) : 1. Note that volume has no effect on the average size of each transaction in that currency.

In [None]:
from dataclasses import dataclass
import numpy as np
import math
import matplotlib.pyplot as plt

np.random.seed(3)

@dataclass
class Currency:
  volume : int
  trading_time : int
  id : int = 0

def get_currencies():
  # the two biggest currencies
  currencies = [Currency(volume=5, trading_time=8*60), 
                Currency(volume=4, trading_time=14*60)] 
  
  timezones = [8*60, 15*60]
  for i in range(NUM_CCYS - 2):
    currencies.append(Currency(volume=np.random.randint(0, MAX_VOLUME - 1),
                               trading_time=timezones[i % 2] + np.random.randint(-2*60, 2*60)))
    

  currencies.sort(key=lambda x: x.volume, reverse=True)
  currencies = np.array(currencies)
  for i in range(NUM_CCYS):
    currencies[i].id = i
  return currencies

currencies = get_currencies()

In [None]:
if RUN:
  plt.scatter(x=list(map(lambda x: x.trading_time/60, currencies)), y=list(map(lambda x: x.volume,currencies)))
  plt.xlabel('hour of day')
  plt.ylabel('currency volume')
  plt.title("Currency peak trading time vs volume traded")
  plt.show()

##Currency Relations
In reality not all relationships are equal, the amount bought and sold in a currency are not always the same due to the nature of countries speciailising in diffrent industries like tourism, heavy industry or natural resource production.

To represent this I will have a 2d upper triangular matrix representing the relationship between each currency and a function that will return the relation between two currencies, returning the inverse if the currencies are swapped.

In [None]:
currency_matrix = np.random.choice(a=[.25, .4, .5, .6, .75], size=(NUM_CCYS, NUM_CCYS))

for point in [(x, y) for x in range(NUM_CCYS) for y in range(NUM_CCYS) if x <= y]:
  currency_matrix[point[0], point[1]] = 0

def get_ccy_bias(ccy1: int, ccy2: int):
  return currency_matrix[ccy1, ccy2] if ccy1 > ccy2 else 1 - currency_matrix[ccy2, ccy1]

if RUN:
  print(currency_matrix)

##Client Data
Each client will be associated with a currency and be bias towards acting in that currencies trading hours. They will have a target target currency but will not be affected by that currencies trading hours. This is to reflect that in the real world a company is generally based in one country and will do their trading in that countries trading hours, due to people being awake at that time.

In [None]:
@dataclass
class Client:
  buy_ccy: int
  sell_ccy: int
  trading_time: int 
  volume: int 
  frequency: int
  time_regular: bool
  volume_regular: bool

def get_clients():
  def get_buy_sell_ccy(ccy: int, num_clients: int):
    ccys = list(range(NUM_CCYS))
    ccys.remove(ccy)
    total_vol = sum(map(lambda x: 2 ** currencies[x].volume, ccys))
    probablities = list(map(lambda x: 2 ** currencies[x].volume/total_vol, ccys))
    matching_ccys = np.random.choice(a=ccys, size=num_clients, p=probablities)
    directions = map(lambda x: np.random.random() < get_ccy_bias(ccy, x), matching_ccys)
    matching_ccy_and_direction = list(zip(iter(matching_ccys), directions))
    sell_ccys = np.fromiter(map(lambda x: ccy if x[1] else x[0], matching_ccy_and_direction), int)
    buy_ccys = np.fromiter(map(lambda x: x[0] if x[1] else ccy, matching_ccy_and_direction), int)
    return buy_ccys, sell_ccys

  def get_direction(num_clients: int):
    return np.random.choice(a=[True, False], size=num_clients, p=[.5,.5])

  def get_random_with_default(default_num: int, num_clients: int):
    random = np.random.randint(0, 1000, num_clients)
    return [x if x < END_DAY else default_num for x in random]

  def get_trading_time(ccy: int, num_clients: int):
    num_clients_not_normal = int(num_clients/50)
    times = np.random.normal(loc=currencies[ccy].trading_time, scale=100, size=num_clients-num_clients_not_normal)
    return np.append(times.astype(int), np.random.randint(low=0, high=DAY_LEN, size=num_clients_not_normal))

  def get_trading_volumes(num_clients: int):
    return np.random.choice(a=[0,1,2,3], size=num_clients, p=[.5,.3,.15,.05])

  def get_frequencies(num_clients: int):
    return np.random.choice(a=[0,1,2], size=num_clients, p=[.4,.3,.3])

  def get_regularities(num_clients: int):
    return np.random.choice(a=[True,False], size=num_clients, p=[.5,.5])

  clients = np.array([Client(buy_ccy=0, sell_ccy=1, trading_time=14*60,volume=5, frequency=1, volume_regular=True, time_regular=True)])

  for ccy in range(NUM_CCYS):
    num_clients = 10 * 2 ** currencies[ccy].volume
    buy_ccys, sell_ccys = get_buy_sell_ccy(ccy=ccy, num_clients=num_clients)
    directions = get_direction(num_clients=num_clients)
    trading_times = get_trading_time(ccy=ccy, num_clients=num_clients)
    volumes = get_trading_volumes(num_clients=num_clients)
    frequencies = get_frequencies(num_clients=num_clients)
    time_regularities = get_regularities(num_clients=num_clients)
    volume_regularities = get_regularities(num_clients=num_clients)
    new_clients = np.array([Client(buy_ccy=buy_ccys[i], sell_ccy=sell_ccys[i],
                                  trading_time=trading_times[i], volume=volumes[i], frequency=frequencies[i],
                                  time_regular=time_regularities[i], volume_regular=volume_regularities[i])
                            for i in range(num_clients)])

    clients = np.append(clients, new_clients)
  return clients

In [None]:
if RUN:
  clients = get_clients()
  print("generated " + str(len(clients)) + " clients in acordance with desired patterns\n")

  plt.hist(list(map(lambda x: x.trading_time/60, clients)), bins=24*2)
  plt.xlabel('hour of day')
  plt.ylabel('frequency per 30 mins')
  plt.title("trading time of clients frequency")
  plt.show()


  plt.hist(list(map(lambda x: x.trading_time/60, 
                    clients[np.fromiter(map(lambda x: x.buy_ccy == 1 or x.sell_ccy == 1, clients), bool)])), bins=24*2)
  plt.xlabel('hour of day')
  plt.ylabel('frequency per 30 mins')
  plt.title("trading time of clients trading currency 1")
  plt.show()

  from collections import Counter
  ccy_volumes_dict = list(Counter(map(lambda x: (x.buy_ccy, x.sell_ccy, x.volume), clients)).items())

  ccy_buy_totals = np.zeros(NUM_CCYS)
  ccy_sell_totals = np.zeros(NUM_CCYS)
  ccy0_buy_totals = np.zeros(NUM_CCYS - 1)
  ccy0_sell_totals = np.zeros(NUM_CCYS - 1)

  for i in ccy_volumes_dict:
    ccy_buy_totals[i[0][0]] += i[1] * (2**i[0][2])
    ccy_sell_totals[i[0][1]] += i[1] * (2**i[0][2])
    if i[0][0] == 0:
      ccy0_buy_totals[i[0][1]-1] += i[1] * (2**i[0][2])
    if i[0][1] == 0:
      ccy0_sell_totals[i[0][0]-1] += i[1] * (2**i[0][2])

  plt.title("Volume of each currency purchased")
  plt.pie(x=ccy_buy_totals, labels=range(NUM_CCYS));
  plt.show()

  plt.title("Volume of each currency sold")
  plt.pie(x=ccy_sell_totals, labels=range(NUM_CCYS));
  plt.show()

  plt.title("Volume of currency 0 purchased in each currency")
  plt.pie(x=ccy0_buy_totals, labels=range(1, NUM_CCYS));
  plt.show()

  plt.title("Volume of currency 0 sold in each currency")
  plt.pie(x=ccy0_sell_totals, labels=range(1, NUM_CCYS));
  plt.show()

##Generating Day
Each day will be generated by using the client info to build a number of trades in a day.

There are two functions to get a day, one which just returns a list of postion changes throughout the day intended for the AI and another which returns a list of client trades that will be needed for the algorithmic benchmarking algorithem. The benchmarking algorithem is to nieve and needs this information as it cannot otherwise know what currency to hedge against.

In [None]:
class Day:
  def __init__(self, clients):
    self.clients = clients

  def get_clients_active(self, day: int):
    return self.clients[list(map(
        lambda x: (x.frequency != 0 or np.random.random() < 0.25) and np.random.random() < 0.98
        ,self.clients))]

  def get_times(self, trading_time, time_regular: bool, frequency: int):
    num_times = 1 if frequency < 2 else np.random.randint(0, 20)
    scale = 10 if time_regular else 120
    return np.array([int(np.random.normal(loc=trading_time, scale=scale)) % 1440 for _ in range(num_times)])

  def get_volumes(self, num_trades: int, frequency: int, volume_regular: bool, volume: int):
    distribution = .6, 1.4 if volume_regular else 0, 2
    frequency_scaler = 10 if frequency < 2 else 1
    return np.array([int(np.random.uniform(distribution[0], distribution[1])* (2 ** volume) * frequency_scaler)
     for _ in range(num_trades)], dtype=np.int32)


  def get_client_trades(self, day_num: int, client: Client):
    times = self.get_times(client.trading_time, client.time_regular, client.frequency)
    volumes = self.get_volumes(len(times), client.frequency, client.volume_regular, client.volume)
    return times, volumes

  def next_clients(self, day_num: int, use_seed: bool = False):
    if use_seed:
      np.random.seed(day_num)

    day = {}
    for ccy_pair in [(x, y) for x in range(NUM_CCYS) for y in range(NUM_CCYS) if x > y]:
      day[ccy_pair] = []
    for client in self.get_clients_active(day_num):
      times, volumes = self.get_client_trades(day_num, client)
      if client.buy_ccy > client.sell_ccy:
        day[client.buy_ccy, client.sell_ccy] += zip(times, volumes)
      else:
        day[client.sell_ccy, client.buy_ccy] += zip(times, -1 * volumes)
    return day

  def next(self, day_num: int, use_seed: bool = False):
    if use_seed:
      np.random.seed(day_num)

    day = np.zeros((DAY_LEN, NUM_CCYS), dtype=np.int32)
    for client in self.get_clients_active(day_num):
      times, volumes = self.get_client_trades(day_num, client)
      for i in range(len(times)):
        day[times[i], client.buy_ccy] -= volumes[i]
        day[times[i], client.sell_ccy] += volumes[i]
    return day

day_generator = Day(get_clients())

In [None]:
if RUN:
  average_day = day_generator.next(0)
  for i in range(1,100):
    average_day += day_generator.next(i)

  average_day = average_day/100

  plt.plot(average_day[:,0])
  plt.xlabel('minute of day')
  plt.ylabel('average postion change per minute')
  plt.title("average postion change throughout day of currency 0")
  plt.show()

  plt.bar(list(range(NUM_CCYS)), average_day.sum(axis=0))
  plt.xlabel('currency')
  plt.ylabel('average net postion at end of day')
  plt.title("currency average net postion at end of day")
  plt.show()

#Environment

## Base Environment

In [None]:
from typing import Any, List, Sequence, Tuple

class Environment:
  def __init__(self, use_seed=False):
    self.day_num = 0
    self.beta = -1
    self.alpha = -1
    self.use_seed = use_seed
    np.random.seed(1)

  def reset(self):
    self.day = day_generator.next(self.day_num, use_seed=self.use_seed)
    self.min = 0
    self.day_num += 1
    self.position = np.zeros(NUM_CCYS, dtype=np.int32)
    return self.get_state()
  
  def get_alpha_reward(self):
    return np.array([self.alpha * np.absolute(self.position).sum(), 0], np.int32)

  def get_beta_reward(self, amount_heged):
    return np.array([0, self.beta * (amount_heged + 1)], np.int32)
    
  def get_bucket(self, pos: int) -> int:
    return min(math.frexp(abs(pos)/SMALLEST_BUCKET)[1], NUM_BUCKETS - 1)

  def amount_heged(self, ccy, bucket):
    if bucket == 1:
      return self.position[ccy]
    else:
      return self.position[ccy] // 2 if self.position[ccy] > 0 else (self.position[ccy] - 1) // 2
  
  def wait(self):
    reward = self.get_alpha_reward()
    self.position += self.day[self.min]
    self.min += 1
    return reward
  
  def hedge(self, ccy1, ccy2, bucket):
    amount_heged = self.amount_heged(ccy1, bucket)
    self.position[ccy1] -= amount_heged
    self.position[ccy2] += amount_heged
    return self.get_beta_reward(abs(amount_heged))

## Neural Net Environments

In [None]:
class EnvNN(Environment):
  def __init__(self):
    super().__init__()
    self.action_space =[2, NUM_CCYS, NUM_CCYS, ACTION_BUCKETS] 
    self.state_space = NUM_CCYS * NUM_BUCKETS * 2 + NUM_TIME_PERIODS

  def encode_position(self, ccy):
    start = ccy * 2 * NUM_BUCKETS 
    return [start + (0 if self.position[ccy] < 0 else 1), start + self.get_bucket(self.position[ccy]) * 2 \
                 + (1 if self.position[ccy] < 0 else 0)] 

  def print_state(self):
    state = self.get_state()
    print(state[:NUM_CCYS * NUM_BUCKETS * 2].reshape((NUM_CCYS, NUM_BUCKETS * 2)))
    print(state[NUM_CCYS * NUM_BUCKETS * 2:])

  def get_state(self):
    state = np.zeros(self.state_space, dtype=np.int32)
    for ccy in range(NUM_CCYS):
      state[self.encode_position(ccy)] = 1
    state[-(self.min // LEN_TIME_PERIOD)] = 1
    return state

  def hedge(self, ccy1, ccy2, bucket_size):
    if self.position[ccy1] <= 0 or self.position[ccy2] >= 0:
      return -np.ones(2)
    if abs(self.position[ccy1]) < abs(self.position[ccy2]):
      return super().hedge(ccy1, ccy2, bucket_size)
    else:
      return super().hedge(ccy2, ccy1, bucket_size)  

  def step(self, action):
    if action[0] == 0:
      reward = np.zeros(2, np.int32)
    else:
      reward = self.hedge(action[1], action[2], action[3])
    reward += self.wait()
    return self.get_state(), reward, np.int64(self.min >= DAY_LEN)

In [None]:
class EnvNNShort(EnvNN):  
  def step(self, action):
    if action[0] == 0:
      reward = np.zeros(2, np.int32)
    else:
      reward = self.hedge(action[1], action[2], action[3])
    for i in range(10):
      reward += self.wait()
    return self.get_state(), reward, np.int64(self.min >= DAY_LEN)

A simplistic algorithic algorithem to be used for benchmarking I have dubed X minute hedging.

The algorithem has not been implimented, just the calculations to get the reward.

The algorithm is to keep track of the net position change between every single pair of currencies and neutralize the position every X minutes.

In [None]:
def neutralize_pos_triangular(env, state):
  def smallest_nonzero(state: np.ndarray) -> int:
    valid_idx = np.where(state != 0)[0]
    smallest = None
    if valid_idx.size != 0:
      smallest = valid_idx[np.argmin(np.absolute(state[valid_idx]))]
    return smallest

  def hedge_against(state: list, idx: int) -> int:
    chosen_ccy = None
    if idx != None:
      counter_ccys = np.where(state < 0 if state[idx] > 0 else state > 0)[0]
      if counter_ccys.size != 0:
        chosen_ccy = counter_ccys[smallest_nonzero(state[counter_ccys])]
    return chosen_ccy

  reward_sum = np.zeros(2)
  ccy1 = smallest_nonzero(state)
  ccy2 = hedge_against(state, ccy1)
  while ccy2 != None:
    state, reward, finished = env.step(Action(ccy1=ccy1, ccy2=ccy2, bucket=1))
    state[-1] = 0
    reward_sum += reward
    ccy1 = smallest_nonzero(state)
    ccy2 = hedge_against(state, ccy1)
  return reward_sum

def x_minute_hedging_triangular(between_hedging_time: int = 5, number_of_days=5):
  np.seed = 0
  reward_sum = np.zeros(2)
  env = EnvTabular(use_seed=True)
  for _ in range(number_of_days):
    state = env.reset()
    finished = False
    while not(finished):
      for _ in range(between_hedging_time):
        state, reward, finished = env.step(Action(wait=True))
        state[-1] = 0
        reward_sum += reward
        if finished:
          break
      reward_sum += neutralize_pos_triangular(env, state)

  scaler = 1 / number_of_days
  return (scaler * reward_sum).round(1)

In [None]:
if RUN:
  print("alpha bata rewards calculated using the x triangular minute hedging algorithem for a day :")
  print("only hedging once a day " + str(x_minute_hedging_triangular(between_hedging_time=1440)))
  print("hedging every hour " + str(x_minute_hedging_triangular(between_hedging_time=60)))
  print("hedging every 20 minutes " + str(x_minute_hedging_triangular(between_hedging_time=20)))
  print("hedging every 5 minutes " + str(x_minute_hedging_triangular(between_hedging_time=5)))
  print("hedging every minute " + str(x_minute_hedging_triangular(between_hedging_time=1)))

In [None]:
if RUN:
  times = [1440, 720, 360, 180, 90, 45, 22, 11, 5, 2, 1]
  results = []
  for time in times:
    results.append(-x_minute_hedging_triangular(between_hedging_time=time))
  results = np.array(results)

  plt.rcParams["figure.figsize"] = [10,6]
  plt.plot(times, results[:, 0], label="alpha")
  plt.plot(times, results[:, 1], label="beta")
  plt.yscale('log')
  plt.legend()
  plt.xlabel('Minute between hedgings')
  plt.ylabel('Negitive reward')
  plt.title("Rewards for hedging triangularly every x minutes")
  plt.show()

#A2C

In [None]:
import sys
import torch  
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.autograd import Variable
from torch.distributions import Categorical
import matplotlib.pyplot as plt
import pandas as pd

class ActorCritic(nn.Module):
  def __init__(self, num_inputs, num_actions_list):
    super(ActorCritic, self).__init__()
    self.shared_layers = nn.Sequential(
          nn.Linear(num_inputs, 1024),
          nn.ReLU(),
          nn.Linear(1024, 512),
          nn.ReLU(),
          nn.Linear(512, 256),
          nn.ReLU()
      )
    self.critic_linear = nn.Linear(256, 1)
    self.actors_linear = nn.ModuleList([nn.Linear(256, num_actions) for num_actions in num_actions_list])
    
  def forward(self, state):
    shared_nodes_value = Variable(torch.from_numpy(state).float().unsqueeze(0))
    shared_nodes_value = self.shared_layers(shared_nodes_value)
    value = self.critic_linear(shared_nodes_value)
    policy_dists = []
    for actor_layer in self.actors_linear:
      policy_dists.append(F.softmax(F.relu(actor_layer(shared_nodes_value)), dim=1))

    return policy_dists, value

In [None]:
class A2C():
  def __init__(self):
    self.env = EnvNN() # MyCartPole2()
    self.actor_critic = ActorCritic(self.env.state_space, self.env.action_space)
    self.ac_optimizer = optim.Adam(self.actor_critic.parameters(), lr=1e-4)
    self.eps = np.finfo(np.float32).eps.item()
    
  def learn(self, num_days):
    day_reward = np.zeros((num_days,2))
    self.entropy_term = 0
    for i in range(num_days):
      reward = self.train_episode(batch_size=np.random.randint(5,50))
      day_reward[i] += (np.sum(reward[:-1]), reward[-1])
      if i % 100 == 0:
        print("rewards " + str(day_reward[i]/1000)  + ", sum rewards " + str(np.sum(day_reward[i])/1000))
    return day_reward

  def train_episode(self, n_step=50, batch_size=10):
    tot_rewards = np.zeros(2)
    log_probs = []
    values = []
    rewards = []
    done = False
    state = self.env.reset()
    step = 0
    
    while not done:
      policy_dist, value = self.actor_critic.forward(state)
      values.append(value)
      dists = list(map(lambda x: x.detach().numpy() + self.eps, policy_dist))
      actions = list(map(lambda dist: np.random.choice(dist.shape[1], p=np.squeeze(dist)), dists))
      self.entropy_term += sum(map(lambda dist: np.sum(np.mean(dist) * np.log(dist)), dists))
      if actions[0] == 1:
        log_probs.append(torch.log(policy_dist[0].squeeze(0)[actions[0]]))
      else:
        log_probs.append(sum(map(lambda i: torch.log(policy_dist[i].squeeze(0)[actions[i]]), range(len(actions)))))
      state, reward, done = self.env.step(actions)
      tot_rewards += reward
      rewards.append(reward.sum())
      step += 1
      
      if step % (batch_size + n_step) == 0 or done:
        if done and step % (batch_size + n_step) != 0:
          final_update_size = step % (batch_size + n_step)
          if final_update_size < batch_size:
            batch_size = final_update_size
          rewards += [0 for _ in range(batch_size + n_step - len(rewards))]
          values += [0 for _ in range(batch_size + n_step - len(values))]
        
        batch_rewards = torch.FloatTensor([sum(rewards[t: t + n_step]) for t in range(batch_size)])
        batch_values_t = torch.stack(values[: batch_size])
        batch_values_t_plus_n = torch.FloatTensor(values[n_step: batch_size + n_step])
        log_probs = torch.stack(log_probs[: batch_size])

        advantage = batch_rewards - batch_values_t + batch_values_t_plus_n
        actor_loss = (advantage * log_probs).mean()
        critic_loss = 0.2 * (advantage ** 2).mean()
        ac_loss = -actor_loss + critic_loss + 0.0001 * self.entropy_term

        self.ac_optimizer.zero_grad()        
        ac_loss.backward()
        self.ac_optimizer.step()

        rewards = []
        values = []
        log_probs = []
        
    return tot_rewards

# Testing


## Admin

Code to load and save models and data

In [None]:
if COLAB:
  from google.colab import drive
  drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [None]:
import json
import os
from skimage.measure import block_reduce

def gen_path(file_name, beta_scale=1):
  return os.path.dirname(os.path.realpath("__file__")).replace("\\", "/") + \
   ("/gdrive/MyDrive/Colab/" if COLAB else "/") + file_name + "-beta-" + str(beta_scale)

def load_rewards(file_name, beta_scale, average_over=20):
  directory_path = gen_path(file_name, beta_scale)
  if not os.path.exists(directory_path):
    print("no such directory : " + directory_path)
  rewards = np.loadtxt(directory_path + "/rewards.txt")
  return block_reduce(rewards, block_size=(average_over,1), func=np.mean)

def graph_results(filename="AI-Models/Q-learning-Parallel", beta=2, x_min_comp=None, title="V-learning", average=20): 
  rewards = load_rewards(filename, beta, average_over=average) * -1
  plt.plot(rewards[:,0], label='alpha')
  plt.plot(rewards[:,1], label='beta')

  if x_min_comp != None:
    comparison = x_minute_hedging_triangular(between_hedging_time=x_min_comp, number_of_days=x_min_comp) * -1
    plt.plot(np.full((rewards.shape[0], 1), comparison[0]), label="algo alpha")
    plt.plot(np.full((rewards.shape[0], 1), comparison[1]), label="algo beta")

  plt.rcParams["figure.figsize"] = [10,6]
  plt.yscale('log')
  plt.legend()
  plt.ylabel('reward per episode')
  plt.xlabel(f'{average} episode average')
  plt.title(f'Rewards for {title},  beta scale {beta}')
  plt.savefig(gen_path(filename, beta) + '/graph.png', bbox_inches='tight')
  print(gen_path(filename, beta) + '/graph.png')

def graph_results_multi(filenames, title="V-learning", average=20):
  for filename in filenames:
    rewards = load_rewards(filename, 25, average_over=average) * -1
    rewards[:,1] = rewards[:,1] * 25
    plt.plot(rewards.sum(axis=1), label=filename)

  plt.rcParams["figure.figsize"] = [10,6]
  plt.yscale('log')
  plt.legend()
  plt.ylabel('reward per episode')
  plt.xlabel(f'{average} episode average')
  plt.title(f'Rewards for {title},  beta scale 25')
  plt.savefig(gen_path(title, 25) + '.png', bbox_inches='tight')
  print(gen_path(filename, 25) + '/graph.png')

## Test Neural Nets

In [None]:
def net_learn_with_backup(algo, filename, beta_scale=1, num_batches=100, batch_size=1000):
  directory_path = gen_path(filename, beta_scale)
  if not os.path.exists(directory_path):
      os.mkdir(directory_path)
  algo.env.beta = algo.env.beta * beta_scale
  rewards = np.zeros((0,2))
  print(algo.env.action_space)
  for batch in range(num_batches):
    print("\nbatch " + str(batch))
    rewards = np.append(rewards, algo.learn(batch_size) * (1, 1/beta_scale), axis=0)
    torch.save({'network': algo}, directory_path + "/network.txt")
    np.savetxt(directory_path + "/rewards.txt", rewards)

def net_load_model(filename, beta_scale=1):
  directory_path = gen_path(filename, beta_scale)
  if not os.path.exists(directory_path):
    print("no such file")
  return torch.load(directory_path + "/network.txt")['network']

In [None]:
if RUN:
  filename = "A2C"
  beta = 25
  # net_learn_with_backup(algo=A2C(),filename=filename,beta_scale=beta, num_batches=10,batch_size=20)
  graph_results(filename=filename, beta=beta, x_min_comp=3, title=filename, average=5)
  #graph_results(filename=filename, beta=beta, average=2) # for cartpole testing

## Neural network sanity test code

In [None]:
# import gym
# import sys
# import torch  
# import gym
# import numpy as np  
# import torch.nn as nn
# import torch.optim as optim
# import torch.nn.functional as F
# from torch.autograd import Variable
# import matplotlib.pyplot as plt
# import pandas as pd


# class ActorCritic(nn.Module):
#   def __init__(self, state_space, action_space_list):
#     super(ActorCritic, self).__init__()
#     self.shared_layers = nn.Sequential(nn.Linear(state_space, 256), nn.ReLU())
#     self.critic_linear = nn.Linear(256, 1)
#     self.actors_linear = nn.ModuleList([nn.Linear(256, num_actions) for num_actions in action_space_list])
    
#   def forward(self, state):
#     state = Variable(torch.from_numpy(state).float().unsqueeze(0))
#     shared_nodes = self.shared_layers(state)
#     critic = self.critic_linear(shared_nodes)
#     policy_blah = []
#     for actor_shizle in self.actors_linear:
#       policy_blah.append(F.softmax(actor_shizle(shared_nodes), dim=1))
#     return policy_blah, critic

# class ReinforceNetwork(nn.Module):
#   def __init__(self, state_space, action_space_list):
#     super(ReinforceNetwork, self).__init__()
#     self.shared_layers = nn.Sequential(nn.Linear(state_space, 256), nn.ReLU())
#     self.actors_linear = nn.ModuleList([nn.Linear(256, num_actions) for num_actions in action_space_list])
    
#   def forward(self, state):
#     state = Variable(torch.from_numpy(state).float().unsqueeze(0))
#     shared_nodes = self.shared_layers(state)
#     policy_blah = []
#     for actor_shizle in self.actors_linear:
#       policy_blah.append(F.softmax(actor_shizle(shared_nodes), dim=1))
#     return policy_blah

# class MyCartPole2():
#   def __init__(self):
#     self.env = gym.make("CartPole-v0")
#     self.env2 = gym.make("CartPole-v0")
#     self.state_space = self.env.observation_space.shape[0] * 2
#     self.action_space = [2, 2, 2]
#     self.beta = 0

#   def reset(self):
#     return np.append(self.env.reset(), self.env2.reset())

#   def step(self, action):
#     a1, b1, c1, _ = self.env.step(action[1])
#     a2, b2, c2, _ = self.env2.step(action[2])
#     return np.append(a1, a2), np.array([b1, b2]), c1 or c2

# class MyCartPole():
#   def __init__(self):
#     self.env = gym.make("CartPole-v0")
#     self.state_space = self.env.observation_space.shape[0]
#     self.action_space = [2]
#     self.beta = 0

#   def reset(self):
#     return self.env.reset()

#   def step(self, action):
#     a, b, c, _ = self.env.step(action[0])
#     return a, np.array([b, 0]), c


# def graph_results(filename, beta=1, average=5): 
#   rewards = load_rewards(filename, beta, average)
#   plt.plot(rewards[:,0])
#   plt.plot(rewards[:,1])
#   plt.rcParams["figure.figsize"] = [10,6]
#   plt.show()