<h1> Pruebas para Liga de Bolsa </h1>

<h2> Descargar información de las cotizaciones </h2>

Se guardarán la información de la cotización en un dataframe de pequeñas dimensiones y con la mínima información posible para ahorrar memoria.

### Próximos pasos

- Mostrar como funciona el análisis bursátil a nivel de código.

- Probar a ejecutar el Jupyter Notebook en una máquina virtual de Google Cloud

> Buscar algún tutorial para ello e intentar averiguar la manera de seleccionar la VM más adecuada.

- Probar la API de Interactive Brokers (trading intradía)

> Comprobar compatibilidad con Google Cloud

> Buscar alternativas si procede

- Desarrollar un algoritmo mediante indicadores técnicos.

> Comprobar significancia de los indicadores y otros valores (por ejemplo, la beta).

> Utilizar métodos de valididación cruzada.

> ¿Seleccionar acciones por las señales de compra y/o por sus valores de beta?

> Tanto en este paso como en los posteriores, será interesante realizar un análisis de de componentes principales para optimizar el código.

- Del algortimo anterior, integrar indicadores fundamentales

> Asegurarse de que estos indicadores fundamentales tengan valores concordes a su periodo.

> Comprobar si aportarán en el análisis.

- Teniendo en cuenta lo de antes, integrar el machine learning

> Tal vez un clasificador que determine, con esperanza estadística positiva, si la compra o venta serán rentables o no.

- Con todo lo de antes, desarrollar un algoritmo de IA que tenga en cuenta los máximos parámetros posibles.

> Pueden combinarse con análisis de sentimiento de cara a un activo (creo que puede descargarse por internet)

> ¿Desarrollar una IA para cada tipo de acción (cuidado con el overfitting)? ¿O para cada tipo de mercado?

---

In [None]:
# Install investpy package. This first command does not work.
# ! pip install git+https://github.com/alvarobartt/investpy.git@master
# This one works!
# ! pip install investpy

# Interactive Brokers API
# ! pip install ibapi

In [1]:
import investpy
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import statistics

### Investing API

In [5]:
# Load some historical data from the AAPL stock. Date format is: dd/mm/yyyy
    # Frequency: daily
    # Open, maximum, minimum and close prices, volume and currency.
ubip = investpy.get_stock_historical_data(stock='UBIP', country='France', from_date='01/01/2020', to_date='10/11/2021')
ubip

# Get the five first values of volume from the df dataframe
# with these example options:
# 1. df.iloc[0:5]["Volume"] 
# 2. df.head()["Volume"]

# The same, but with the last five values:
# 1. df.iloc[-5:]["Volume"]
# 2. df.tail()["Volume"]

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Currency
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-01-02,62.08,62.90,62.02,62.38,328549,EUR
2020-01-03,62.40,63.46,61.82,62.38,330640,EUR
2020-01-06,62.36,62.44,60.96,62.10,371954,EUR
2020-01-07,62.40,63.06,61.04,61.04,430724,EUR
2020-01-08,61.30,63.90,61.20,63.90,637279,EUR
...,...,...,...,...,...,...
2021-11-04,46.18,47.20,45.61,47.00,627690,EUR
2021-11-05,47.20,47.56,46.47,47.30,468885,EUR
2021-11-08,47.02,47.50,46.65,47.27,377660,EUR
2021-11-09,47.55,47.58,46.91,47.00,442059,EUR


In [6]:
# Load futher data from a stock

# Apple stock will be saved at search_result
ubip_info = investpy.search_quotes(text='ubip', products=['stocks'], countries=['france'], n_results=1)
# Save the information in a dict variable
ubip_info = ubip_info.retrieve_information()
print(ubip_info)

# Technical information:
    # prevClose, dailyRange, open (most recent), weekRange (most recent), volume(unkwown) avgVolume(it could be weekly)
# Fundamental information:
    # revenue, eps (earning per share), marketCap, dividend (last), ratio (P/E).
# Other or both
    # beta, oneYearReturn, sharesOutstanding, nextEarningDate

'''
# avgVolume: 74388303

for x in range(1, len(df["Volume"])):
    print(f"{x}: {df.tail(x)['Volume'].mean()}")
    # mean output 29 is similar

for x in range(1, len(df["Volume"])):
    print(f"{x}: {df.tail(x)['Volume'].median()}")
    # median output 30 is similar
'''


{'prevClose': 47, 'dailyRange': '46.96-47.61', 'revenue': 2980000000, 'open': 46.96, 'weekRange': '43.54-88.16', 'eps': 0.67, 'volume': 316892, 'marketCap': 5830000000, 'dividend': 'N/A(N/A)', 'avgVolume': 469093, 'ratio': 70.37, 'beta': 0.27, 'oneYearReturn': '-38.08%', 'sharesOutstanding': 123909503, 'nextEarningDate': '28/01/2022'}


'\n# avgVolume: 74388303\n\nfor x in range(1, len(df["Volume"])):\n    print(f"{x}: {df.tail(x)[\'Volume\'].mean()}")\n    # mean output 29 is similar\n\nfor x in range(1, len(df["Volume"])):\n    print(f"{x}: {df.tail(x)[\'Volume\'].median()}")\n    # median output 30 is similar\n'

In [None]:
# import investpy

# s_results = investpy.search_quotes(text='a', products=['stocks'], countries=['united states'], n_results=10)

# s_results = map(lambda x: print(x), s_results)

### Interactive Brokers API

It will allow us to work with intraday data.
- IB Gateway is mandatory.

In [None]:
from ibapi.client import EClient
from ibapi.wrapper import EWrapper
from ibapi.contract import Contract
from datetime import datetime, timedelta
import pandas as pd

class IBapi(EWrapper, EClient):
    def __init__(self):
        EClient.__init__(self,self)
        cols = ['date', 'open', 'high', 'low', 'close']
        self.df = pd.DataFrame(columns=cols)
    
    def historicalData(self, reqId, bar):
        print(" Date:", bar.date, "Open:", bar.open, "High:", bar.high, "Low:", bar.low, "Close:", bar.close) #, "Volume: ", bar.volume, "Count: ", bar.barCount)
        dftemp = pd.DataFrame({'date':bar.date,'open':bar.open,'high':bar.high,'low':bar.low, 'close':bar.close}, index=[0])
        self.df = pd.concat([self.df, dftemp], axis=0)
        
    def historicalDataEnd(self, reqId: int, start: str, end: str):
        super().historicalDataEnd(reqId, start, end)
        print("HistoricalDataEnd. ReqId:", reqId, "from", start, "to", end)
        self.df.to_csv("GBP_USD_1Y_15mins.csv",index=False)
        self.disconnect()

app = IBapi()
app.connect('127.0.0.1', 4002, 0)

#Create contract object
def defineContract(symbol,secType,exchange,currency='USD'):
    contract = Contract()
    contract.symbol = symbol
    contract.secType = secType
    contract.exchange = exchange
    contract.currency = currency
    return contract

contract = defineContract(symbol='GBP',secType='CASH',exchange='IDEALPRO')
queryTime = (datetime.today() - timedelta(days=30)).strftime("%Y%m%d %H:%M:%S")
#queryTime = ""
duration = '1 Y'
barsize = '15 mins'
priceType = 'MIDPOINT'

app.reqHistoricalData(1, contract, queryTime, duration, barsize, priceType, 1, 1, False, [])
app.run()

## Technical analysis
Futher information [here](https://medium.com/codex/this-python-library-will-help-you-get-stock-technical-indicators-in-one-line-of-code-c11ed2c8e45f) (stockstats) and [here](https://towardsdatascience.com/technical-analysis-library-to-financial-datasets-with-pandas-python-4b2b390d3543) (ta).

In [None]:
# ! pip3 install stockstats
# ! pip3 install ta

In [7]:
# Technical analysis libraries
from stockstats import StockDataFrame
import ta


### stockstats library

In [9]:
# Datafrate to Stockstats dataframe library
ubip = StockDataFrame(ubip)
ubip.columns = ubip.columns.str.lower()
ubip

Unnamed: 0_level_0,open,high,low,close,volume,currency
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-01-02,62.08,62.90,62.02,62.38,328549,EUR
2020-01-03,62.40,63.46,61.82,62.38,330640,EUR
2020-01-06,62.36,62.44,60.96,62.10,371954,EUR
2020-01-07,62.40,63.06,61.04,61.04,430724,EUR
2020-01-08,61.30,63.90,61.20,63.90,637279,EUR
...,...,...,...,...,...,...
2021-11-04,46.18,47.20,45.61,47.00,627690,EUR
2021-11-05,47.20,47.56,46.47,47.30,468885,EUR
2021-11-08,47.02,47.50,46.65,47.27,377660,EUR
2021-11-09,47.55,47.58,46.91,47.00,442059,EUR


In [10]:
# Mostrar tres medias móviles (de 10, 20 y 50 periodos), RSI (14 periodos) and MACD
ubip[['close_10_sma', 'close_20_sma', 'close_50_sma', 'rsi_14', 'macd', 'macds']]


Unnamed: 0_level_0,close_10_sma,close_20_sma,close_50_sma,rsi_14,macd,macds
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-01-02,62.380000,62.380000,62.380000,,0.000000,0.000000
2020-01-03,62.380000,62.380000,62.380000,,0.000000,0.000000
2020-01-06,62.286667,62.286667,62.286667,0.000000,-0.008683,-0.003559
2020-01-07,61.975000,61.975000,61.975000,0.000000,-0.050690,-0.019524
2020-01-08,62.360000,62.360000,62.360000,70.000000,0.041357,-0.001414
...,...,...,...,...,...,...
2021-11-04,45.888000,46.441000,51.051800,45.857305,-1.331921,-1.691999
2021-11-05,45.848000,46.373000,50.923000,47.411496,-1.162271,-1.586053
2021-11-08,45.932000,46.361500,50.806400,47.265382,-1.018502,-1.472543
2021-11-09,46.113000,46.341500,50.668000,45.894511,-0.915794,-1.361193


In [None]:
# Plot size in inches
plt.rcParams['figure.figsize'] = [10, 5]

# Plot the last 90 periods
plt.plot(ubip.iloc[-90:]['close'], linewidth = 2, label = 'UBIP')
plt.plot(ubip.iloc[-90:]['close_10_sma'], linewidth = 2, alpha = 0.6, label = 'SMA 10')
plt.plot(ubip.iloc[-90:]['close_20_sma'], linewidth = 2, alpha = 0.6, label = 'SMA 20')
plt.plot(ubip.iloc[-90:]['close_50_sma'], linewidth = 2, alpha = 0.6, label = 'SMA 50')
plt.legend(loc = 'upper left')
plt.show()

In [None]:
ubip['close_50_sma_xd_close_20_sma']

# Detect where the signals cross each other.
# Create buy and sell signals by close price, SMA_50 and SMA_20.
# This could be interesting for testing methods 
buy_signals = aapl['close_50_sma_xd_close_20_sma']
sell_signals = aapl['close_20_sma_xd_close_50_sma']

for i in range(len(buy_signals)):
    if buy_signals.iloc[i] == True:
        buy_signals.iloc[i] = aapl.close[i]
    else:
        buy_signals.iloc[i] = np.nan

for i in range(len(sell_signals)):    
    if sell_signals.iloc[i] == True:
        sell_signals.iloc[i] = aapl.close[i]
    else:
        sell_signals.iloc[i] = np.nan


plt.rcParams['figure.figsize'] = [10, 5]

# Plot stock, indicators and signals
plt.plot(aapl['close'], linewidth = 2.5, label = 'AAPL')
plt.plot(aapl['close_20_sma'], linewidth = 2.5, alpha = 0.6, label = 'SMA 20')
plt.plot(aapl['close_50_sma'], linewidth = 2.5, alpha = 0.6, label = 'SMA 50')
plt.plot(aapl.index, buy_signals, marker = '^', markersize = 10, color = 'green', linewidth = 0, label = 'BUY SIGNAL')
plt.plot(aapl.index, sell_signals, marker = 'v', markersize = 10, color = 'r', linewidth = 0, label = 'SELL SIGNAL')
plt.legend(loc = 'upper left')
plt.title('AAPL SMA 20,50 CROSSOVER STRATEGY SIGNALS')
plt.style.use('bmh')
plt.show()

In [None]:
# FIXME: MACD must be plotted in differnt subplot.
# Also, it is necessary to check the different indicators, apart of MACD and MACDS.
# Maybe, it can be interesting look for some different styles.

buy_signals = aapl['macds_xd_macd']
sell_signals = aapl['macd_xd_macds']

for i in range(len(buy_signals)):
    if buy_signals.iloc[i] == True:
        buy_signals.iloc[i] = aapl.close[i]
    else:
        buy_signals.iloc[i] = np.nan

for i in range(len(sell_signals)):    
    if sell_signals.iloc[i] == True:
        sell_signals.iloc[i] = aapl.close[i]
    else:
        sell_signals.iloc[i] = np.nan


plt.rcParams['figure.figsize'] = [10, 5]

# Plot stock, indicators and signals
plt.plot(aapl['close'], linewidth = 2.5, label = 'AAPL')
plt.plot(aapl['macd'], linewidth = 2.5, alpha = 0.6, label = 'MACD')
plt.plot(aapl['macds'], linewidth = 2.5, alpha = 0.6, label = 'MACD_SIGNAL')
plt.plot(aapl.index, buy_signals, marker = '^', markersize = 10, color = 'green', linewidth = 0, label = 'BUY SIGNAL')
plt.plot(aapl.index, sell_signals, marker = 'v', markersize = 10, color = 'r', linewidth = 0, label = 'SELL SIGNAL')

plt.legend(loc = 'upper left')
plt.title('AAPL MACD CROSSOVER STRATEGY SIGNALS')
plt.style.use('bmh')
plt.show()

### ta library

## Artificial Intelligence example

Example [here](https://www.thepythoncode.com/article/stock-price-prediction-in-python-using-tensorflow-2-and-keras).

In [None]:
# ! pip3 install tensorflow pandas numpy matplotlib yahoo_fin sklearn
# ! pip3 install yahoo_fin

In [None]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout, Bidirectional
from tensorflow.keras.callbacks import ModelCheckpoint, TensorBoard
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from yahoo_fin import stock_info as si
from collections import deque

import os
import numpy as np
import pandas as pd
import random

In [None]:
# set seed, so we can get the same results after rerunning several times
np.random.seed(206)
tf.random.set_seed(206)
random.seed(206)

In [None]:
# Used to correct an error with model.add()
! pip install numpy==1.18.5 --user

### Define functions

In [None]:
def shuffle_in_unison(a, b):
    # shuffle two arrays in the same way
    state = np.random.get_state()
    np.random.shuffle(a)
    np.random.set_state(state)
    np.random.shuffle(b)

def load_data(ticker, n_steps=50, scale=True, shuffle=True, lookup_step=1, split_by_date=True,
                test_size=0.2, feature_columns=['adjclose', 'volume', 'open', 'high', 'low']):
    """
    Loads data from Yahoo Finance source, as well as scaling, shuffling, normalizing and splitting.
    Params:
        ticker (str/pd.DataFrame): the ticker you want to load, examples include AAPL, TESL, etc.
        n_steps (int): the historical sequence length (i.e window size) used to predict, default is 50
        scale (bool): whether to scale prices from 0 to 1, default is True
        shuffle (bool): whether to shuffle the dataset (both training & testing), default is True
        lookup_step (int): the future lookup step to predict, default is 1 (e.g next day)
        split_by_date (bool): whether we split the dataset into training/testing by date, setting it 
            to False will split datasets in a random way
        test_size (float): ratio for test data, default is 0.2 (20% testing data)
        feature_columns (list): the list of features to use to feed into the model, default is everything grabbed from yahoo_fin
    """
    # see if ticker is already a loaded stock from yahoo finance
    if isinstance(ticker, str):
        # load it from yahoo_fin library
        df = si.get_data(ticker)
    elif isinstance(ticker, pd.DataFrame):
        # already loaded, use it directly
        df = ticker
    else:
        raise TypeError("ticker can be either a str or a `pd.DataFrame` instances")
    # this will contain all the elements we want to return from this function
    result = {}
    # we will also return the original dataframe itself
    result['df'] = df.copy()
    # make sure that the passed feature_columns exist in the dataframe
    for col in feature_columns:
        assert col in df.columns, f"'{col}' does not exist in the dataframe."
    # add date as a column
    if "date" not in df.columns:
        df["date"] = df.index
    if scale:
        column_scaler = {}
        # scale the data (prices) from 0 to 1
        for column in feature_columns:
            scaler = preprocessing.MinMaxScaler()
            df[column] = scaler.fit_transform(np.expand_dims(df[column].values, axis=1))
            column_scaler[column] = scaler
        # add the MinMaxScaler instances to the result returned
        result["column_scaler"] = column_scaler
    # add the target column (label) by shifting by `lookup_step`
    df['future'] = df['adjclose'].shift(-lookup_step)
    # last `lookup_step` columns contains NaN in future column
    # get them before droping NaNs
    last_sequence = np.array(df[feature_columns].tail(lookup_step))
    # drop NaNs
    df.dropna(inplace=True)
    sequence_data = []
    sequences = deque(maxlen=n_steps)
    for entry, target in zip(df[feature_columns + ["date"]].values, df['future'].values):
        sequences.append(entry)
        if len(sequences) == n_steps:
            sequence_data.append([np.array(sequences), target])
    # get the last sequence by appending the last `n_step` sequence with `lookup_step` sequence
    # for instance, if n_steps=50 and lookup_step=10, last_sequence should be of 60 (that is 50+10) length
    # this last_sequence will be used to predict future stock prices that are not available in the dataset
    last_sequence = list([s[:len(feature_columns)] for s in sequences]) + list(last_sequence)
    last_sequence = np.array(last_sequence).astype(np.float32)
    # add to result
    result['last_sequence'] = last_sequence
    # construct the X's and y's
    X, y = [], []
    for seq, target in sequence_data:
        X.append(seq)
        y.append(target)
    # convert to numpy arrays
    X = np.array(X)
    y = np.array(y)
    if split_by_date:
        # split the dataset into training & testing sets by date (not randomly splitting)
        train_samples = int((1 - test_size) * len(X))
        result["X_train"] = X[:train_samples]
        result["y_train"] = y[:train_samples]
        result["X_test"]  = X[train_samples:]
        result["y_test"]  = y[train_samples:]
        if shuffle:
            # shuffle the datasets for training (if shuffle parameter is set)
            shuffle_in_unison(result["X_train"], result["y_train"])
            shuffle_in_unison(result["X_test"], result["y_test"])
    else:    
        # split the dataset randomly
        result["X_train"], result["X_test"], result["y_train"], result["y_test"] = train_test_split(X, y, 
                                                                                test_size=test_size, shuffle=shuffle)
    # get the list of test set dates
    dates = result["X_test"][:, -1, -1]
    # retrieve test features from the original dataframe
    result["test_df"] = result["df"].loc[dates]
    # remove duplicated dates in the testing dataframe
    result["test_df"] = result["test_df"][~result["test_df"].index.duplicated(keep='first')]
    # remove dates from the training/testing sets & convert to float32
    result["X_train"] = result["X_train"][:, :, :len(feature_columns)].astype(np.float32)
    result["X_test"] = result["X_test"][:, :, :len(feature_columns)].astype(np.float32)
    return result

In [None]:
def create_model(sequence_length, n_features, units=256, cell=LSTM, n_layers=2, dropout=0.3,
                loss="mean_absolute_error", optimizer="rmsprop", bidirectional=False):
    model = Sequential()
    for i in range(n_layers):
        if i == 0:
            # first layer
            if bidirectional:
                model.add(Bidirectional(cell(units, return_sequences=True), batch_input_shape=(None, sequence_length, n_features)))
            else:
                model.add(cell(units, return_sequences=True, batch_input_shape=(None, sequence_length, n_features)))
        elif i == n_layers - 1:
            # last layer
            if bidirectional:
                model.add(Bidirectional(cell(units, return_sequences=False)))
            else:
                model.add(cell(units, return_sequences=False))
        else:
            # hidden layers
            if bidirectional:
                model.add(Bidirectional(cell(units, return_sequences=True)))
            else:
                model.add(cell(units, return_sequences=True))
        # add dropout after each layer
        model.add(Dropout(dropout))
    model.add(Dense(1, activation="linear"))
    model.compile(loss=loss, metrics=["mean_absolute_error"], optimizer=optimizer)
    return model

In [None]:
# model = Sequential()
# cell=LSTM
# model.add(cell(256, return_sequences=True, batch_input_shape=(None, 50, 5)))
# # print(type(cell(256, return_sequences=True, batch_input_shape=(None, 50, 5))))

### Model creation

In [None]:
import os
import time
from tensorflow.keras.layers import LSTM

# Window size or the sequence length
N_STEPS = 50
# Lookup step, 1 is the next day
LOOKUP_STEP = 15
# whether to scale feature columns & output price as well
SCALE = True
scale_str = f"sc-{int(SCALE)}"
# whether to shuffle the dataset
SHUFFLE = True
shuffle_str = f"sh-{int(SHUFFLE)}"
# whether to split the training/testing set by date
SPLIT_BY_DATE = False
split_by_date_str = f"sbd-{int(SPLIT_BY_DATE)}"
# test ratio size, 0.2 is 20%
TEST_SIZE = 0.2
# features to use
FEATURE_COLUMNS = ["adjclose", "volume", "open", "high", "low"]
# date now
date_now = time.strftime("%Y-%m-%d")
### model parameters
N_LAYERS = 2
# LSTM cell
CELL = LSTM
# 256 LSTM neurons
UNITS = 256
# 40% dropout
DROPOUT = 0.4
# whether to use bidirectional RNNs
BIDIRECTIONAL = False
### training parameters
# mean absolute error loss
# LOSS = "mae"
# huber loss
LOSS = "huber_loss"
OPTIMIZER = "adam"
BATCH_SIZE = 64
EPOCHS = 500
# Amazon stock market
ticker = "AMZN"
ticker_data_filename = os.path.join("data", f"{ticker}_{date_now}.csv")
# model name to save, making it as unique as possible based on parameters
model_name = f"{date_now}_{ticker}-{shuffle_str}-{scale_str}-{split_by_date_str}-\
{LOSS}-{OPTIMIZER}-{CELL.__name__}-seq-{N_STEPS}-step-{LOOKUP_STEP}-layers-{N_LAYERS}-units-{UNITS}"
if BIDIRECTIONAL:
    model_name += "-b"

In [None]:
# create these folders if they does not exist
if not os.path.isdir("results"):
    os.mkdir("results")
if not os.path.isdir("logs"):
    os.mkdir("logs")
if not os.path.isdir("data"):
    os.mkdir("data")

In [None]:
# load the data
data = load_data(ticker, N_STEPS, scale=SCALE, split_by_date=SPLIT_BY_DATE, 
                shuffle=SHUFFLE, lookup_step=LOOKUP_STEP, test_size=TEST_SIZE, 
                feature_columns=FEATURE_COLUMNS)
# save the dataframe
data["df"].to_csv(ticker_data_filename)
# construct the model
model = create_model(N_STEPS, len(FEATURE_COLUMNS), loss=LOSS, units=UNITS, cell=CELL, n_layers=N_LAYERS,
                    dropout=DROPOUT, optimizer=OPTIMIZER, bidirectional=BIDIRECTIONAL)
# some tensorflow callbacks
checkpointer = ModelCheckpoint(os.path.join("results", model_name + ".h5"), save_weights_only=True, save_best_only=True, verbose=1)
tensorboard = TensorBoard(log_dir=os.path.join("logs", model_name))
# train the model and save the weights whenever we see 
# a new optimal model using ModelCheckpoint
history = model.fit(data["X_train"], data["y_train"],
                    batch_size=BATCH_SIZE,
                    epochs=EPOCHS,
                    validation_data=(data["X_test"], data["y_test"]),
                    callbacks=[checkpointer, tensorboard],
                    verbose=1)