# Configurações Básicas
- Criação de diretórios;
- Download das dependências: [repositório GitHub onde o código foi desenvolvido](https://github.com/felipedmz/oraculus) ;

In [1]:
"""
1) Criação dos diretórios que serão usados no script
2) Download dos arquivos do repositório de desenvolvimento para uso no Colab
"""
import os
import requests

data_dir = 'data/'
src_dir = 'src/'
api_path = 'src/api.py'
bot_path = 'src/pycaret_robot.py'
train_path = 'data/quotation.csv'

if not os.path.exists(data_dir):
  os.mkdir(data_dir)
if not os.path.exists(src_dir):
  os.mkdir(src_dir)
if not os.path.exists(api_path):
  a = requests.get('https://raw.githubusercontent.com/felipedmz/oraculus/main/src/api.py')
  open(api_path, 'wb').write(a.content)
if not os.path.exists(bot_path):
  a = requests.get('https://raw.githubusercontent.com/felipedmz/oraculus/main/src/pycaret_robot.py')
  open(bot_path, 'wb').write(a.content)
if not os.path.exists(train_path):
  t = requests.get('https://raw.githubusercontent.com/felipedmz/oraculus/main/data/quotation.csv')
  open(train_path, 'wb').write(t.content)

# Load das APIs
- Ambiente **'dev'** usa o token default de testes, ambiente **'prd'** utiliza o token do Grupo
- [Para entender o funcionamento completo, veja o código completo de **api.py** (github)](https://github.com/felipedmz/oraculus/blob/main/src/api.py)

In [2]:
from src.api import Client
api = Client('dev')

> Inicializando API Client em=dev
> Utilizando o token=token_dummy_001


# Train - Instalação de dependências
- Para utilização da lib *pycaret* que irá rodar um set de testes e eleger o melhor modelo a ser utilizado!

In [3]:
!pip install statsmodels
!pip install pyarrow
!pip install scikit-learn
!pip install pycaret
!pip install cuml

Collecting pycaret
  Downloading pycaret-3.1.0-py3-none-any.whl (483 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m483.9/483.9 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
Collecting scipy~=1.10.1 (from pycaret)
  Downloading scipy-1.10.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m34.4/34.4 MB[0m [31m44.5 MB/s[0m eta [36m0:00:00[0m
Collecting pyod>=1.0.8 (from pycaret)
  Downloading pyod-1.1.0.tar.gz (153 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m153.4/153.4 kB[0m [31m17.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting category-encoders>=2.4.0 (from pycaret)
  Downloading category_encoders-2.6.2-py2.py3-none-any.whl (81 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.8/81.8 kB[0m [31m10.9 MB/s[0m eta [36m0:00:00[0m
Collecting deprecation>=2.1.0 (from pycaret)
  D

## Train Step
- Método train encapsula a etapa de **feature engineering**;
- Utiliza o arquivo **'quotation.csv'** com 1747 linhas de ocorrências capturadas;

## Feature Engineering

### Coluna Target

1. **value_class**: É a calculada diferença (delta) do valor da cripto moeda entre a ocorrência anterior e a atual na coluna **value_variation**. Dessa forma a primeira linha será sempre zero. Após o calculo, definimos a variável como categórica, e partir do valor de variação numérico entre positiva ou negativa, sendo uma variação moderada, forte ou fraca, atribuimos assim a categoria à coluna *value_class*.

**Valores possíveis:** *positive_strong, positive, positive_soft, no_variation, negative_strong, negative, negative_soft*

### Features de data

2. **year**: parse da coluna datetime
3. **month**: parse da coluna datetime
4. **day**: parse da coluna datetime
5. **hour**: parse da coluna datetime
6. **minute**: parse da coluna datetime
7. **week_day**: parse da coluna datetime

### Features calculadas

8. **traded_volume**: number_of_trades / volume
9. **amplitude**: high - low
10. **candle**: close - open

### Features baseadas no Coeficiente de Hurst

Foram criados um conjunto de features baseados no [expoente de husrt](https://en.wikipedia.org/wiki/Hurst_exponent), uma medida que pode nos dá o "índice de dependência" ou "índice de dependência de longo alcance", quantificando a tendência relativa de uma série temporal de regredir fortemente à média ou de se agrupar em uma direção. [Mais detalhes sobre o cálculo do expoente de hurst.](https://www.youtube.com/watch?v=fEM_642WWok).
Nesse caso atribuímos o hurst a uma determinada linha do dataset, calculando sua variação a partir da referência das dez (10) linhas anteriores. Dessa forma as 10 primeiras ficam sempre com essas features zeradas e são eliminadas no final da transformação.

11. **h_value_variation**: coeficiente de hurst calculado para value_variation
12. **h_traded_volume**: coeficiente de hurst calculado para traded_volume
13. **h_amplitude**: coeficiente de hurst calculado para amplitude
14. **h_candle**: coeficiente de hurst calculado para candle

A ideia principal é que encontremos o melhor modelo que captura a tendência expressa ao coeficiente de hurst.

[Para entender o funcionamento em detalhes, veja o código completo de feature_eng() em **pycaret_robot.py** (github)](https://github.com/felipedmz/oraculus/blob/920be60fbe751c6b248a3be01f0de446f99e1ac8/src/pycaret_robot.py#L67C9-L67C20);

## Treinamento
- 80% da amostra é utilizada para treino e 20% para a validação, com a opção *fold_shuffle=True*;
- [Para entender o funcionamento em detalhes, veja o código completo de train() em **pycaret_robot.py** (github)](https://github.com/felipedmz/oraculus/blob/adf5f5876197e753a1fece05aba9e72077b2aa92/src/pycaret_robot.py#L166);
- Rodaremos uma série de modelos a partir do Pycaret, e o melhor modelo será eleito para criação da base de conhecimento em **data/pycaret_best.pickle**;

In [4]:
from src.pycaret_robot import PycaretRobot

pycaret_robot = PycaretRobot(api)
pycaret_robot.train()


>>> PycaretRobot

>>> Etapa Treinamento
... carregando treinamento

>>> Feature Eng Entrada=Index(['symbol', 'datetime', 'open', 'high', 'low', 'close', 'volume',
       'number_of_trades'],
      dtype='object')
... Trabalhando a partir dos dados de 2023-09-22 10:38:00
>>> Feature Eng Saída=Index(['symbol', 'open', 'high', 'low', 'close', 'volume', 'number_of_trades',
       'value_variation', 'value_class', 'year', 'month', 'day', 'hour',
       'minute', 'week_day', 'traded_volume', 'amplitude', 'candle',
       'h_value_variation', 'h_traded_volume', 'h_amplitude', 'h_candle'],
      dtype='object')

... debug de features salvo em = data/temp.features.csv
... setup


Unnamed: 0,Description,Value
0,Session id,123
1,Target,value_class
2,Target type,Multiclass
3,Target mapping,"negative: 0, negative_soft: 1, negative_strong: 2, no_variation: 3, positive: 4, positive_soft: 5, positive_strong: 6"
4,Original data shape,"(1737, 22)"
5,Transformed data shape,"(1737, 45)"
6,Transformed train set shape,"(1389, 45)"
7,Transformed test set shape,"(348, 45)"
8,Ordinal features,1
9,Numeric features,17


... compare models


Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
ridge,Ridge Classifier,0.2599,0.0,0.2599,0.228,0.2257,0.093,0.0964,0.145
rf,Random Forest Classifier,0.2548,0.6198,0.2548,0.2338,0.2312,0.0933,0.0955,0.891
lda,Linear Discriminant Analysis,0.2469,0.6179,0.2469,0.2307,0.2314,0.0906,0.0919,0.313
lr,Logistic Regression,0.2441,0.6263,0.2441,0.1889,0.1931,0.0644,0.0711,1.485
gbc,Gradient Boosting Classifier,0.2404,0.599,0.2404,0.2213,0.2222,0.0783,0.0796,5.201
lightgbm,Light Gradient Boosting Machine,0.2397,0.5858,0.2397,0.2231,0.2232,0.0782,0.0793,5.905
knn,K Neighbors Classifier,0.2311,0.5701,0.2311,0.22,0.2197,0.0795,0.0805,0.322
ada,Ada Boost Classifier,0.2311,0.5617,0.2311,0.2202,0.2129,0.0778,0.0793,0.37
et,Extra Trees Classifier,0.2304,0.6117,0.2304,0.2112,0.217,0.0725,0.0731,0.585
xgboost,Extreme Gradient Boosting,0.2275,0.5871,0.2275,0.2164,0.2151,0.0667,0.0674,1.699


Processing:   0%|          | 0/65 [00:00<?, ?it/s]

... best model
RidgeClassifier(alpha=1.0, class_weight=None, copy_X=True, fit_intercept=True,
                max_iter=None, positive=False, random_state=123, solver='auto',
                tol=0.0001)


Unnamed: 0_level_0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.2518,0.0,0.2518,0.3279,0.2284,0.0842,0.0869
1,0.2734,0.0,0.2734,0.2378,0.2343,0.1073,0.1118
2,0.259,0.0,0.259,0.1981,0.2213,0.0919,0.0951
3,0.2734,0.0,0.2734,0.2504,0.2361,0.1104,0.1139
4,0.2806,0.0,0.2806,0.2811,0.253,0.1219,0.1271
5,0.2086,0.0,0.2086,0.1709,0.1842,0.0365,0.0373
6,0.2734,0.0,0.2734,0.2064,0.2339,0.1097,0.1131
7,0.2446,0.0,0.2446,0.1965,0.2173,0.0786,0.08
8,0.2878,0.0,0.2878,0.2228,0.2393,0.12,0.1267
9,0.2464,0.0,0.2464,0.1876,0.2091,0.0696,0.0726


Processing:   0%|          | 0/4 [00:00<?, ?it/s]


>>> Avaliação de modelo


interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Pipeline Plot', 'pipelin…

Transformation Pipeline and Model Successfully Saved

>>> Export realizado para: data/pycaret_best.pickle


# Execução de trades
- Rodamos 1 trade por minuto, então definimos quantos minutos (ou trades) o robo irá executar;
- O método execute encapsula a recuperação do aprendizado em **data/pycaret_best.pickle**;
- A partir da previsão da **última ocorrência** como uma tendência **'positive'** ou **'negative'**;
- Baseado nessa tendência atribui as actions de **'sell'** e **'buy'** para o robo executar;
- [Para entender o funcionamento em detalhes, veja o código completo de execute() em **pycaret_robot.py** (github)](https://github.com/felipedmz/oraculus/blob/920be60fbe751c6b248a3be01f0de446f99e1ac8/src/pycaret_robot.py#L204);

In [21]:
minutos = 1 # 1 trade a cada 1 min
"""
from src.pycaret_robot import PycaretRobot

pycaret_robot = PycaretRobot(api)
api.cripto_quotation()
"""
pycaret_robot.execute(minutos)



>>> Realizando trades
Transformation Pipeline and Model Successfully Loaded

>>> Feature Eng Entrada=Index(['symbol', 'datetime', 'open', 'high', 'low', 'close', 'volume',
       'number_of_trades'],
      dtype='object')
... Trabalhando a partir dos dados de 2023-10-15 03:46:00


KeyError: ignored