# UBS Challenge Preliminary Attempts

------

#### Download the files

In [1]:
!pip install requests --quiet
import os, sys; sys.path.insert(0, os.path.join(sys.path[0], '..'))
from scripts.functions import download_files
download_files()

The file market-data-swap-rates.csv has already been downloaded. 
The file market-data-swaption-vols.csv has already been downloaded. 
The file trade-information.csv has already been downloaded. 
The file trade-price-ir-vegas.csv has already been downloaded. 


## Project Motivation

### The Challenge

- The ```structure of the product is relatively complex```, and its ```interval accumulation characteristics are relatively idiosyncratic``` and there is ```no simple numerical analytical solution```. Such products often use numerical simulation (Monte Carlo, etc.) methods to price and calculate related risk exposures.

- The calculation results of the Monte Carlo model are ```more accurate```, but there are problems such as ```slow calculation speed and low sensitivity to changes in the external environment```. 

### The Task

基于所提供的 ```模拟产品特征、历史风险因子的数据``` 以及 ```市场环境```，建立能 ```准确、高效预测该类产品风险敞口``` 的模型 （例如AI），检验 ```模型的效率和准确度```。

Based on the provided ```simulated product characteristics, historical risk factor data and market environment```, establish a model (such as AI) that can ```accurately and efficiently predict the risk exposure``` of this type of product, and test ```the efficiency and accuracy``` of the model.

## Understand the Question

### What are the product characteristics from the provided .csv files? 

The product characteristics can be obtained from the ```trade-information.csv``` file. This file provides information about the trades: 

- ```Trade Name``` (identifier)

- ```Underlying```

- ```Pay Frequency```

- ```Lower Bound```

- ```Upper Bound```

### What are the product historical risk factors from the provided .csv files? 

The historical risk factors can be found in the ```trade-price-ir-vegas.csv``` file. 

This file contains the timeseries data for each trade, indicating the risks: 

- ```TV``` (Total Value)

- ```Vega```

Both historical risk factors are correlated with the historical trade data in the same ```trade-price-ir-vegas.csv``` file: 

- ```Zero Rate Shock```

- ```Expiry Bucket```

- ```Tenor Bucket```

### What are the market environment factors from the provided .csv files? 

The market environment factors can be found in the ```market-data-swap-rates.csv``` and ```market-data-swaption-vols.csv``` files. These files provide timeseries data for ```Swap Rate``` and ```Vols``` (implied normal volatilities). 

Regarding the factors influencing the ```Swap Rate```: 

- ```Start Date``` from ```market-data-swap-rates.csv```. The start date itself may not be too helpful, but the difference between the ```Start Date``` and ```Date``` may be useful. 

- ```Tenor``` (i.e., maturity time) from ```market-data-swap-rates.csv```

Regarding the factors including the ```Vols```: 

- ```Expiry``` from ```market-data-swaption-vols.csv```

- ```Tenor``` from ```market-data-swaption-vols.csv```

- ```Strike``` from ```market-data-swaption-vols.csv```

In the end, both ```Swap Rate``` and ```Vols``` are market indicators that may influence investors' decisions. In the preliminary stage, it is suggested to only take ```Swap Rate``` and ```Vols``` to represent market influences. 

### What risk exposure indicators should the model predict? 

The model should predict the ```TV``` (Total Value) and ```Vega``` of the trades. These are the main risk exposure indicators provided in the ```trade-price-ir-vegas.csv``` file. 

Summarized from the above analysis, after aggregating the data into specific day intervals (min. daily style), the correlation between ```TV``` and ```Vega``` and each potential aspect can be analyzed by machine learning methods, e.g., regression, Monte Carlo methods, and more. There are three aspects: 

- Aspect 1 (from the product characteristic perspective): 

  - Input (Independent Variables): ```Underlying```, ```Pay Frequency```, ```Lower Bound``` and ```Upper Bound``` \[AGGREGATED\] (in ```trade-information.csv```)

  - Predict (Dependent Variables): ```TV```, ```Vega```

- Aspect 2 (from the historical risk exposure indicator perspective): 

  - Input (Independent Variables): ```Zero Rate Shock```, ```Expiry Bucket```, and ```Tenor Bucket``` \[AGGREGATED\] (in ```trade-price-ir-vegas.csv```)

  - Predict (Dependent Variables): ```TV```, ```Vega```

- Aspect 3 (from the market environment factor perspective): 

  - Input (Independent Variables): ```Swap Rate``` \[AGGREGATED\] (in ```market-data-swap-rates.csv```), and ```Vols``` \[AGGREGATED\] (in ```market-data-swaption-vols.csv```) + ```Start Date``` and ```Tenor``` \[AGGREGATED\] (in ```market-data-swap-rates.csv```), ```Expiry```, ```Tenor``` and ```Strike``` (in ```market-data-swaption-vols.csv```)

  - Predict (Dependent Variables): ```TV```, ```Vega```

### How to evaluate the model efficiency and accuracy? 

Use standard machine learning evaluation metrics: 

- ```Mean Squared Error``` (MSE)

- ```R-squared```

- ```Other relevant metrics``` depending on the model.

### How the provided data can be deployed in machine prediction with Monte Carlo methods? 

The provided data can be used to train a machine learning model, such as a regression model to predict the TV (Total Value) and Vega of the trades. Here is the execution plan: 

- Attempt 1: ```Regression```

- Attempt 2: ```Monte Carlo Methods``` (previous work applicable)

- Attempt 3: ```Deep Learning Methods``` (previous work applicable)

## Project Examination

## Preliminary Attempts

Using sklearn

### Data Processing

------

### XGBoost Models

### Random Forest Models

### Regression Models