# Configurações Básicas
- Criação de diretórios;
- Download das dependências: [repositório GitHub onde o código foi desenvolvido](https://github.com/felipedmz/oraculus) ;

In [1]:
"""
1) Criação dos diretórios que serão usados no script
2) Download dos arquivos do repositório de desenvolvimento para uso no Colab
"""
import os
import requests

data_dir = 'data/'
src_dir = 'src/'
api_path = 'src/api.py'
bot_path = 'src/pycaret_robot.py'
train_path = 'data/quotation.csv'

if not os.path.exists(data_dir):
  os.mkdir(data_dir)
if not os.path.exists(src_dir):
  os.mkdir(src_dir)
if not os.path.exists(api_path):
  a = requests.get('https://raw.githubusercontent.com/felipedmz/oraculus/main/src/api.py')
  open(api_path, 'wb').write(a.content)
if not os.path.exists(bot_path):
  a = requests.get('https://raw.githubusercontent.com/felipedmz/oraculus/main/src/pycaret_robot.py')
  open(bot_path, 'wb').write(a.content)
if not os.path.exists(train_path):
  t = requests.get('https://raw.githubusercontent.com/felipedmz/oraculus/main/data/quotation.csv')
  open(train_path, 'wb').write(t.content)

# Load das APIs
- Ambiente **'dev'** usa o token default de testes, ambiente **'prd'** utiliza o token do Grupo
- [Para entender o funcionamento completo, veja o código completo de **api.py** (github)](https://github.com/felipedmz/oraculus/blob/main/src/api.py)

In [2]:
from src.api import Client
api = Client('dev')

> Inicializando API Client em=dev
> Utilizando o token=token_dummy_001


# Train - Instalação de dependências
- Para utilização da lib *pycaret* que irá rodar um set de testes e eleger o melhor modelo a ser utilizado!

In [3]:
!pip install statsmodels
!pip install pyarrow
!pip install scikit-learn
!pip install pycaret
!pip install cuml

Collecting cuml
  Using cached cuml-0.6.1.post1.tar.gz (1.1 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: cuml
  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mpython setup.py bdist_wheel[0m did not run successfully.
  [31m│[0m exit code: [1;36m1[0m
  [31m╰─>[0m See above for output.
  
  [1;35mnote[0m: This error originates from a subprocess, and is likely not a problem with pip.
  Building wheel for cuml (setup.py) ... [?25lerror
[31m  ERROR: Failed building wheel for cuml[0m[31m
[0m[?25h  Running setup.py clean for cuml
Failed to build cuml
[31mERROR: Could not build wheels for cuml, which is required to install pyproject.toml-based projects[0m[31m
[0m

## Train Step
- Método train encapsula a etapa de **feature engineering**;
- Utiliza o arquivo **'quotation.csv'** com 1747 linhas de ocorrências capturadas;

## Feature Engineering

### Coluna Target

1. **value_class**: É a calculada diferença (delta) do valor da cripto moeda entre a ocorrência anterior e a atual na coluna **value_variation**. Dessa forma a primeira linha será sempre zero. Após o calculo, definimos a variável como categórica, e partir do valor de variação numérico entre positiva ou negativa, sendo uma variação moderada, forte ou fraca, atribuimos assim a categoria à coluna *value_class*.

**Valores possíveis:** *positive_strong, positive, positive_soft, no_variation, negative_strong, negative, negative_soft*

### Features de data

2. **year**: parse da coluna datetime
3. **month**: parse da coluna datetime
4. **day**: parse da coluna datetime
5. **hour**: parse da coluna datetime
6. **minute**: parse da coluna datetime
7. **week_day**: parse da coluna datetime

### Features calculadas

8. **amplitude**: high - low
9. **candle**: close - open

### Features baseadas no Coeficiente de Hurst

Foram criados um conjunto de features baseados no [expoente de husrt](https://en.wikipedia.org/wiki/Hurst_exponent), uma medida que pode nos dá o "índice de dependência" ou "índice de dependência de longo alcance", quantificando a tendência relativa de uma série temporal de regredir fortemente à média ou de se agrupar em uma direção. [Mais detalhes sobre o cálculo do expoente de hurst.](https://www.youtube.com/watch?v=fEM_642WWok).
Nesse caso atribuímos o hurst a uma determinada linha do dataset, calculando sua variação a partir da referência das dez (10) linhas anteriores. Dessa forma as 10 primeiras ficam sempre com essas features zeradas e são eliminadas no final da transformação.

11. **h_value_variation**: coeficiente de hurst calculado para value_variation
12. **h_amplitude**: coeficiente de hurst calculado para amplitude
13. **h_candle**: coeficiente de hurst calculado para candle

A ideia principal é que encontremos o melhor modelo que captura a tendência expressa ao coeficiente de hurst.

[Para entender o funcionamento em detalhes, veja o código completo de feature_eng() em **pycaret_robot.py** (github)](https://github.com/felipedmz/oraculus/blob/920be60fbe751c6b248a3be01f0de446f99e1ac8/src/pycaret_robot.py#L67C9-L67C20).

## Treinamento
- 80% da amostra é utilizada para treino e 20% para a validação, com a opção *fold_shuffle=True*;
- [Para entender o funcionamento em detalhes, veja o código completo de train() em **pycaret_robot.py** (github)](https://github.com/felipedmz/oraculus/blob/adf5f5876197e753a1fece05aba9e72077b2aa92/src/pycaret_robot.py#L166);
- Rodaremos uma série de modelos a partir do Pycaret, e o melhor modelo será eleito para criação da base de conhecimento em **data/pycaret_best.pickle**;

In [4]:
from src.pycaret_robot import PycaretRobot

pycaret_robot = PycaretRobot(api)
pycaret_robot.train()


>>> PycaretRobot

>>> Etapa Treinamento
... carregando treinamento

>>> Feature Eng Entrada=Index(['symbol', 'datetime', 'open', 'high', 'low', 'close', 'volume',
       'number_of_trades'],
      dtype='object')
... Trabalhando a partir dos dados de 2023-09-22 10:38:00
>>> Feature Eng Saída=Index(['symbol', 'open', 'high', 'low', 'close', 'volume', 'value_variation',
       'value_class', 'year', 'month', 'day', 'hour', 'minute', 'week_day',
       'amplitude', 'candle', 'h_value_variation', 'h_amplitude', 'h_candle'],
      dtype='object')

... debug de features salvo em = data/temp.features.csv
... setup


Unnamed: 0,Description,Value
0,Session id,123
1,Target,value_class
2,Target type,Multiclass
3,Target mapping,"negative: 0, negative_soft: 1, negative_strong: 2, no_variation: 3, positive: 4, positive_soft: 5, positive_strong: 6"
4,Original data shape,"(1737, 19)"
5,Transformed data shape,"(1737, 42)"
6,Transformed train set shape,"(1389, 42)"
7,Transformed test set shape,"(348, 42)"
8,Ordinal features,1
9,Numeric features,14


... compare models


Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
ridge,Ridge Classifier,0.2649,0.0,0.2649,0.228,0.2292,0.0983,0.102,0.139
rf,Random Forest Classifier,0.2505,0.6128,0.2505,0.2272,0.2326,0.0928,0.0941,0.787
lda,Linear Discriminant Analysis,0.2505,0.6207,0.2505,0.236,0.2354,0.0944,0.0958,0.147
lr,Logistic Regression,0.2498,0.6205,0.2498,0.2054,0.1939,0.0704,0.0797,1.194
lightgbm,Light Gradient Boosting Machine,0.2469,0.5959,0.2469,0.2298,0.2309,0.0879,0.089,4.532
gbc,Gradient Boosting Classifier,0.2462,0.6058,0.2462,0.229,0.2305,0.087,0.0883,4.497
et,Extra Trees Classifier,0.2333,0.6122,0.2333,0.2203,0.2234,0.079,0.0796,0.435
ada,Ada Boost Classifier,0.2311,0.5713,0.2311,0.2036,0.2083,0.0723,0.074,0.334
xgboost,Extreme Gradient Boosting,0.2253,0.5874,0.2253,0.2144,0.2149,0.0661,0.0667,1.43
knn,K Neighbors Classifier,0.2188,0.5633,0.2188,0.2213,0.2139,0.0645,0.0652,0.284


Processing:   0%|          | 0/65 [00:00<?, ?it/s]

... best model
RidgeClassifier(alpha=1.0, class_weight=None, copy_X=True, fit_intercept=True,
                max_iter=None, positive=False, random_state=123, solver='auto',
                tol=0.0001)


Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.2806,0.0,0.2806,0.3482,0.2526,0.119,0.1228
1,0.259,0.0,0.259,0.1883,0.214,0.0885,0.0924
2,0.2518,0.0,0.2518,0.1889,0.2129,0.083,0.0861
3,0.2806,0.0,0.2806,0.2634,0.2456,0.1172,0.1213
4,0.2806,0.0,0.2806,0.278,0.2552,0.1219,0.1267
5,0.2302,0.0,0.2302,0.1893,0.2025,0.0629,0.0644
6,0.295,0.0,0.295,0.2218,0.2509,0.1343,0.1391
7,0.259,0.0,0.259,0.2043,0.2278,0.0951,0.0969
8,0.2734,0.0,0.2734,0.2089,0.2256,0.1018,0.1077
9,0.2391,0.0,0.2391,0.1884,0.2053,0.0598,0.0626


Processing:   0%|          | 0/4 [00:00<?, ?it/s]


>>> Avaliação de modelo


interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Pipeline Plot', 'pipelin…

Transformation Pipeline and Model Successfully Saved

>>> Export realizado para: data/pycaret_best.pickle


# Execução de trades
- Rodamos 1 trade por minuto, então definimos quantos minutos (ou trades) o robo irá executar;
- O método execute encapsula a recuperação do aprendizado em **data/pycaret_best.pickle**;
- A partir da previsão da **última ocorrência** como uma tendência **'positive'** ou **'negative'**;
- Baseado nessa tendência atribui as actions de **'sell'** e **'buy'** para o robo executar;
- [Para entender o funcionamento em detalhes, veja o código completo de execute() em **pycaret_robot.py** (github)](https://github.com/felipedmz/oraculus/blob/920be60fbe751c6b248a3be01f0de446f99e1ac8/src/pycaret_robot.py#L204);

In [None]:
minutos = 1 # 1 trade a cada 1 min
pycaret_robot.execute(minutos)
# pycaret_robot.execute(minutos, True) # executar somente em tendencia strong


>>> Realizando trades
Transformation Pipeline and Model Successfully Loaded

>>> Feature Eng Entrada=Index(['symbol', 'datetime', 'open', 'high', 'low', 'close', 'volume',
       'number_of_trades'],
      dtype='object')
... Trabalhando a partir dos dados de 2023-10-15 05:37:00
>>> Feature Eng Saída=Index(['symbol', 'open', 'high', 'low', 'close', 'volume', 'value_variation',
       'value_class', 'year', 'month', 'day', 'hour', 'minute', 'week_day',
       'amplitude', 'candle', 'h_value_variation', 'h_amplitude', 'h_candle'],
      dtype='object')

... Prevendo as ultimas 75 ocorrencias...


Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
0,Ridge Classifier,0.24,0,0.24,0.4481,0.2209,0.0795,0.1032


... last_prediction=positive_soft -> ACTION= buy
>>> Comprando 0.00056 BTC
    ticker      quantity
0      BTC  3.134100e+00
1  BTCUSDT  7.437300e-02
2    money  9.920720e+06
3    total  1.000498e+07
...... Aguardando 1 min
