In [9]:
!pip install datasets
!pip install transformers==4.30
!pip install transformers[torch]
!pip install accelerate -U



In [10]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


#Dataset Description



Data: https://data.world/opensnippets/bbc-uk-news-dataset

The dataset that I will be using for this homework will be a BBC UK nes dataset with roughly 16,000 entries. The dataset provides tags, titles, publication dates, authosr, language, and of course the actual content of the articles and their summaries. The Dataset is however somewhat old, being createdd three years ago.

#Data Analysis and Processing


In [11]:
import os

import json
import pandas as pd
import numpy as np
import seaborn as sns
import torch

import datasets
from datasets import Dataset, DatasetDict

from transformers import BartTokenizer, BartForConditionalGeneration
from transformers import Seq2SeqTrainingArguments, Seq2SeqTrainer
from sklearn.model_selection import train_test_split

In [12]:
file_path = '/content/drive/MyDrive/Datasets/BBC_dataset/bbc_news_list_uk.json'

with open(file_path, 'r') as file:
    data = json.load(file)

df = pd.DataFrame(data) #convert the json file in to a pandas dataframe

In [13]:
df.keys()

Index(['tags', 'title', 'news_post_date', 'raw_content', 'content', 'url',
       'author', 'language', '_id', 'region', 'short_description', 'category',
       'crawled_at'],
      dtype='object')

In [14]:
df = df[['content', 'short_description']] # Isolate the relevant columns
df = df.dropna() # drop rows with empty values
df = df.sample(n=10) #sample 10 rows from the entire dataframe to limit training time

In [15]:
# reset index and drop old index column
df.reset_index(drop=True, inplace=True)
train_df, test_df = train_test_split(df, test_size=0.2)


# convert the dataframe into a dataset for hugging face
train_dataset = Dataset.from_pandas(train_df)
test_dataset = Dataset.from_pandas(test_df)

# Model



In [16]:
# define a tokenizer to tokenize the datasets
tokenizer = BartTokenizer.from_pretrained("facebook/bart-large")

# Tokenize the input data
def tokenize(example):
    content = tokenizer(example['content'], truncation=True, padding='max_length', max_length=512)
    summary = tokenizer(example['short_description'], truncation=True, padding='max_length', max_length=128)

    content = {
        'input_ids': content['input_ids'],
        'attention_mask': content['attention_mask'],
        'labels': summary['input_ids']
    }

    return content

# Tokenize train dataset
train_dataset = train_dataset.map(tokenize, batched=True)
test_dataset = test_dataset.map(tokenize, batched=True)

Map:   0%|          | 0/8 [00:00<?, ? examples/s]

Map:   0%|          | 0/2 [00:00<?, ? examples/s]

In [None]:
# initialize the model
model = BartForConditionalGeneration.from_pretrained("facebook/bart-large")

# set the hyperparameters
training_args = Seq2SeqTrainingArguments(
    per_device_train_batch_size=10,
    per_device_eval_batch_size=10,
    output_dir='./results',
    logging_dir='./logs',
    num_train_epochs=2,
    logging_steps=1000,
    evaluation_strategy='steps',
    eval_steps=2000,
    # save_steps=000,
    warmup_steps=200,
    weight_decay=0.05,
    predict_with_generate=True
)

# initialize the trainer
trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
)

trainer.train()

# Evaluation

No matter what I do I cannot get this model to train without crashing. I tried shifting the hyperparameters and change the model, and even decrease the sample size of the data down to only ten entries. No matter what it still seems that the model crashes in the process of training. Given the fact this is an LLM, the computation intensity involved in training it is not very surprising through. I suspect the main reason that training is failing is because the actual tokenized text is just to large.

# Part 2: Reinforcement Learning

## Task 2

One real world application of reinforcement learning that can be modeled as an MDP is stock trading. For the sake of explanation, and my own limited understanding of stock trading, our trader will only be trading a single stock, with a limited of number of variables be taken into account. In this model, a state will only consist of two variables; the current price of the stock, and the trader's current position (The amount of said stock the trader possess). At any given momement, a trader can either buy or sell the stock. The choice of whether to buy or sell, and the related quantity make would then make up the action space of the MDP. The transision model would be a function which, the provided the trader's current state as described by their position and the current stock value, how much stock they sold or bought, and some other state returns the probability of transitioning to that newer state, provided their action and previous state. Finally, the reward in this model would be measured by the current value of the trader's position combined with whatever profit has been made from selling stock.

## Task 3

One problem in tading and finance that reinforcment learning has been applied to is risk managment when it comes to stock portfolios. More specifically, the risk-return balancing problem. This problem, as its name might imply, focuses on maximizing projected return from stocks while minimizing the risk that keeping those stocks poses. Reinforcment learning, and more specifically deep reinforcment learning can be used to solve this prblem. DeepTrader, is a model which does just this. To quote the paper in which the model was described; "DeepTrader embeds macro market conditions as an idicator to dynamically adjust the proportion between long and short funds, to lower the risk of market fluctions". Essentially the model uses broad trends in the stock market as a basis for the reward function of the model.

Although I couldn't find any code for DeepTrader alone, the model is implemented in TradeMaster, an open source platform for trading which specializes in more than just portfolio managment. The TradeMaster platform is composed of 6 different modules. The first of these is a modules is comprised of the data consisting of multi-modality market data of different financial assets at multiple granularity. The second module is the data processing pipeline. The third module is comprised of multiple high-fidelity data-driven market simulators, used for mainstream quantitative trading. The fourth module is a series of implementation for RL-based trading algorithms, 13 in total. Finally there are the evaluation toolkits which incorporate 17 different measures, all of which feed into the final module; the interface designed to be used by interdisciplinary users.


DeepTrader: https://ojs.aaai.org/index.php/AAAI/article/view/16144 \\
TradeMaster: https://github.com/TradeMaster-NTU/TradeMaster?tab=readme-ov-file#trademaster-an-rl-platform-for-trading

