---
# Relatório 01 - Análise Setorial de Ativos - Bloomberg Challenge
---

O objetivo desse relatório é identificar setores e ativos a serem analisados e investidos ao longo do Bloomberg Challenge de Outubro/2024.

## 1. Bibliotecas e Setup

### 1.1. Bibliotecas utilizadas

In [2]:
import pandas as pd 
import numpy as np 

import plotly.graph_objects as go 
import matplotlib.pyplot as plt 

import yfinance as yf
import requests
from datetime import datetime as dt 
from dateutil.relativedelta import relativedelta
import numpy as np 
import os 

from sklearn.preprocessing import StandardScaler 
from sklearn.cluster import KMeans 
from sklearn.decomposition import PCA 

### 1.2. Importação de dados das ativos passíveis de investir

Para esse passo, foi utilizado o arquivo "WLS as of Sep 25 20241.xlsx", compartilhado no grupo Mack IA Finance. Esse arquivo contém cerca de 10 mil ativos possivelmente a serem investidos. 


#### 1.2.1. Enriquecimento dos dados de ativos passíveis de investir

Como não havia inicialmente o setor nem o nome do ticker de cada um dos ativos, foi necessária a criação de algumas funções para extração de dados do Yahoo Finance via url e posterior enriquecimento do arquivo.


In [None]:
def search_ticker(company_name):
    """ Função para extrair o symbol e outras informações para cada ticker """
    url = "https://query1.finance.yahoo.com/v1/finance/search"

    headers = {
        "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:122.0) Gecko/20100101 Firefox/122.0"
    }

    params = {
        "q": f"{company_name}",
        "lang": "en-US",
        "region": "US",
    }

    try:
        data = requests.get(url, params=params, headers=headers).json()
        df = pd.DataFrame(data["quotes"])
        first_quote = df.iloc[0]
        return first_quote.get('symbol', 'N/A'), first_quote.get('sector', 'N/A'), first_quote.get('industry', 'N/A'), first_quote.get('shortname', 'N/A')
    except:
        print(f"Não foi encontrado info para {company_name}")
        return 'N/A', 'N/A', 'N/A', 'N/A'
    
def add_tickers_to_excel(input_file, output_file):
    """ Função para enriquecer o arquivo original """

    df = pd.read_excel(input_file)
    
    df['MainTicker'] = [x.split(' ')[0] for x in df['Ticker']]  # Utilizei o primeiro nome da coluna Ticker
    df[['YFTicker', 'SetorEconomico', 'Industria', 'NomeCompletoParaAuditoria']] = df['MainTicker'].apply(lambda x: pd.Series(search_ticker(x)))
    
    # Retorna o Excel enriquecido
    df.to_excel(output_file, index=False)
    print(f"Updated file saved as {output_file}")

# Determinar o input e o nome do arquivo enriquecido
input_file = 'data\WSL as of Sep 25 20241.xlsx'  
output_file = 'data\WSL as of Sep 25 20241 - Modificado.xlsx' 

# Enriquecer o arquivo
add_tickers_to_excel(input_file, output_file)


Tratando os dados para excluir os ativos não encontrados no Yahoo Finance

In [None]:
ativos = pd.read_excel(r'data\WSL as of Sep 25 20241 - Modificado.xlsx')
ativos = ativos[~ativos['YFTicker'].isna()]
ativos.head()

Além do Setor Econômico, seria interessante também identificar a bolsa/país principal em que o ativo é negociado, além da moeda utilizada.

In [None]:
h = 0 

def add_more_infos(company_name):
    """ Função para adicionar o país em que o ativo é operado e a moeda utilizada"""
    global h 

    h += 1 
    try:
        ticker = yf.Ticker(company_name)
    except:
        print(company_name)
        raise
    country = ticker.info.get('country', 'N/A')
    currency = ticker.info.get('currency', 'N/A')
    if h % 100 == 0:
        print(h)
    return country, currency 

def add_infos_to_excel(ativos, output_file):
    """ Função para enriquecer o arquivo original """
    ativos[['Pais', 'Moeda']] = ativos['YFTicker'].apply(lambda x: pd.Series(add_more_infos(x)))
    
    # Retorna o Excel enriquecido
    ativos.to_excel(output_file, index=False)
    print(f"Updated file saved as {output_file}")

output_file = 'data\WSL as of Sep 25 20241 - Modificado_2.xlsx' 

add_infos_to_excel(ativos, output_file)

#### 1.2.2. Importação de dados OHLCV

Após realizar um enriquecimento dos ativos, será realizada uma extração dos dados OHLCV <i>Open, High, Low, Close, Volume</i>, especificamente os dados de <i>Close</i> e <i>Volume</i>.

Para permitir uma análise setorial, ao invés de usar os dados especificamente de cada ativo, será realizado um tratamento em fluxo, por meio do qual se buscará extrair:
- Variação diária 
- Volume 

Diante da variação diária, se buscará reduzir as informações para apenas um vetor de variação diária por indústria / moeda / país.

In [3]:
ativos = pd.read_excel(r'data\WSL as of Sep 25 20241 - Modificado_2.xlsx')
ativos = ativos[~ativos['Pais'].isna()] # Filtrando por ativos que possuam informações de Pais
ativos = ativos[~ativos['Moeda'].isna()] # Filtrando por ativos que possuam informações de moeda
ativos.head()

Unnamed: 0,Ticker,Nome,Ponderação,Ações,Preço,MainTicker,YFTicker,SetorEconomico,Industria,NomeCompletoParaAuditoria,Pais,Moeda
0,PROT NO Equity,Protector Forsikring ASA,0.001278,50.451,237.5,PROT,PROT,Healthcare,Biotechnology,PROTEONOMIX INC,United States,USD
1,ALAB UW Equity,Astera Labs Inc,0.001276,21.729,52.1,ALAB,ALAB,Technology,Semiconductors,"Astera Labs, Inc.",United States,USD
2,7282 JT Equity,Toyoda Gosei Co Ltd,0.001276,65.137,2514.5,7282,7282.T,Consumer Cyclical,Auto Parts,TOYODA GOSEI,Japan,JPY
3,9793 JT Equity,Daiseki Co Ltd,0.001276,41.825,3915.0,9793,9793.T,Industrials,Waste Management,DAISEKI CO LTD,Japan,JPY
4,601198 C1 Equity,Dongxing Securities Co Ltd,0.001275,905.08488,8.79,601198,601198.SS,Financial Services,Capital Markets,DONGXING SECURITIES CO LTD,China,CNY


Para identificar esse vetor único, adotei a seguinte estatística/passo-a-passo:

1. Obter volume financeiro movimentado no dia pelo ativo (${Volume_{FinanceiroDiarioAtivo}}$)
2. Obter variação do valor de fechamento diário (${VariacaooFechamento_{Ativo}}$)
3. Multiplicar os dois primeiros valores (1 e 2) 
4. Obter o volume financeiro total por dia no setor do ativo (${Volume_{FinanceiroDiarioTotal}}$)
5. Obter a soma das variações sopesadas pelo volume financeiro (item 3) 
6. Dividir o item 5 pelo item 4 

Com isso, espera-se encontrar uma estatística de variação diária do preço de fechamento do setor (sopesada pelo volume financeiro).

$$ {VariacaoDiaria_{Ponderada}} = \frac{\sum_{k=1}^n {Volume_{FinanceiroDiarioAtivo}} * {VariacaoFechamento_{Ativo}}}{{Volume_{FinanceiroDiarioTotal}}}  $$

In [3]:
def obtain_sector_performance(lista_symbols):
    """ Função para realizar extract, transform dos dados em fluxo, transformando-os em vetores de rendimento por setor """
    data_inicial = dt(2002, 1, 1)
    data_final = dt(2024, 10, 1)

    dados_setoriais = pd.DataFrame()

    # Obtendo o vetor representativo do segmento por ano
    while data_inicial <= data_final:
        data_chunk = min(data_inicial + relativedelta(years=1), data_final)
        volume_total = pd.Series()
        dados_anuais = pd.DataFrame()

        for symbol in lista_symbols:
            data = yf.download(symbol, start=data_inicial.strftime('%Y-%m-%d'), end=data_chunk.strftime('%Y-%m-%d'), progress=False)[['Close', 'Volume']]
            
            if not data.empty:

                # Evitar SettingWithCopyWarning
                data = data.copy()

                # Calcular volume financeiro
                data['DollarVolume'] = data['Volume'] * data['Close']

                # Calcular retorno com ponderação por volume financeiro
                data[symbol] = data['Close'].pct_change() * data['DollarVolume']

                # Acumular volume financeiro
                volume_total = volume_total.add(data['DollarVolume'], fill_value=0)
                
                # Adicionar o valor do retorno ponderado ao pandas
                dados_anuais = pd.concat([dados_anuais, data[[symbol]]], axis=1)
        
        volume_total.replace(0, np.nan, inplace=True)

        # Obtendo indicador comparativo de cada setor por ano
        dados_anuais['RetornoSetor'] = dados_anuais.sum(axis=1) / volume_total # Usei a média mensal de volume financeiro movimentado por todo o setor
        
        # Juntando num só dataframe
        dados_setoriais = pd.concat([dados_setoriais, dados_anuais[['RetornoSetor']]])

        # Passando para o próximo ano
        data_inicial += relativedelta(years=1)
    
    return dados_setoriais

In [9]:
def pipeline(ativos):
    """ Função para executar o pipeline do ETL """
    print("Iniciando pipeline")
    for moeda in ativos['Moeda'].unique():
        for pais in ativos[ativos['Moeda']==moeda]['Pais'].unique():
            for industria in ativos[(ativos['Moeda']==moeda)&(ativos['Pais']==pais)]['Industria'].unique():
                if f'{moeda}_{pais}_{industria}_setoriado.csv' not in os.listdir('data'):
                    ativos_chunk = ativos[(ativos['Moeda']==moeda)&(ativos['Pais']==pais)&(ativos['Industria']==industria)]

                    if not ativos_chunk.empty:
                        lista_ativos = list(ativos_chunk['YFTicker'])
                        temp = obtain_sector_performance(lista_ativos)
                        temp.to_csv(f'data\{moeda}_{pais}_{industria}_setoriado.csv')
                        print(f"Concluído {moeda}_{pais}_{industria}")
    print("Pipeline concluído!")
    return None 

Executando o pipeline:

In [10]:
pipeline(ativos)

## 2. Análise Exploratória de Dados

Considerando os dados extraídos e transformados nas seções anteriores, inicia-se a análise exploratória.

### 2.1. Ativos que compõem os setores

In [4]:
ativos = pd.read_excel(r'data\WSL as of Sep 25 20241 - Modificado_2.xlsx')
ativos = ativos[~ativos['Pais'].isna()] # Filtrando por ativos que possuam informações de Pais
ativos = ativos[~ativos['Moeda'].isna()] # Filtrando por ativos que possuam informações de moeda
ativos.head()


ativos.info()

<class 'pandas.core.frame.DataFrame'>
Index: 8541 entries, 0 to 9906
Data columns (total 12 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Ticker                     8541 non-null   object 
 1   Nome                       8541 non-null   object 
 2   Ponderação                 8541 non-null   object 
 3   Ações                      8541 non-null   float64
 4   Preço                      8541 non-null   object 
 5   MainTicker                 8541 non-null   object 
 6   YFTicker                   8541 non-null   object 
 7   SetorEconomico             8435 non-null   object 
 8   Industria                  8435 non-null   object 
 9   NomeCompletoParaAuditoria  8539 non-null   object 
 10  Pais                       8541 non-null   object 
 11  Moeda                      8541 non-null   object 
dtypes: float64(1), object(11)
memory usage: 867.4+ KB


Percebe-se acima que há ativos para os quais as funções de enriquecimento não encontraram a correlata Industria. Esses ativos serão desconsiderados.
Além disso, deve-se tornara a coluna "Ponderação" de tipo "float".

Por meio de iterações, percebeu-se a existência de uma inconsistência na coluna Ponderação, conforme abaixo. Optou-se por tratá-la, modificando-a para 0

In [5]:
display(ativos[ativos['Ponderação']=='--'])
ativos.loc[ativos['Ponderação']=='--', 'Ponderação'] = 0

ativos['Ponderação'] = ativos['Ponderação'].astype(float)

Unnamed: 0,Ticker,Nome,Ponderação,Ações,Preço,MainTicker,YFTicker,SetorEconomico,Industria,NomeCompletoParaAuditoria,Pais,Moeda
4971,TCNSBR IS Equity,TCNS Clothing Co Ltd,--,12.98,--,TCNSBR,TCNSBRANDS.BO,Consumer Cyclical,Apparel Manufacturing,TCNS Clothing Co. Limited,India,INR


In [6]:
ativos.dropna(inplace=True)

### 2.2. Análise de participação de mercado financeiro

In [7]:
fig = go.Figure()


contagem_paises = ativos.Pais.value_counts()[:10].sort_values(ascending=True)
fig.add_trace(go.Bar(y = contagem_paises.index, x = contagem_paises.values, orientation='h'))

fig.update_xaxes(title_text='<b> Quantidade de ativos')

fig.update_layout(title_text='Distribuição de ativos por países - Top 10', width=800, height=600)
fig.show()

Há uma predominância de ativos listados nas bolsas dos Estados Unidos.

In [8]:
fig = go.Figure()


contagem_paises = ativos.SetorEconomico.value_counts()[:30].sort_values(ascending=True)
fig.add_trace(go.Bar(y = contagem_paises.index, x = contagem_paises.values, orientation='h'))

fig.update_xaxes(title_text='<b> Quantidade de ativos')

fig.update_layout(title_text='Distribuição de ativos por Setores Econômicos', width=800, height=600)
fig.show()

In [9]:
fig = go.Figure()


contagem_paises = ativos.Industria.value_counts()[:30].sort_values(ascending=True)
fig.add_trace(go.Bar(y = contagem_paises.index, x = contagem_paises.values, orientation='h'))

fig.update_xaxes(title_text='<b> Quantidade de ativos')

fig.update_layout(title_text='Distribuição de ativos por Indústria', width=800, height=600)
fig.show()

Essa análise, contudo, não leva em consideração o volume financeiro transacionado, o que é importante para o estudo. Deve-se modificar o código para obter a participação de cada indústira/setor/país.

In [10]:
part_industria = ativos.groupby(['Industria'])['Ponderação'].sum() 

In [11]:
fig = go.Figure()

part_industria.sort_values(ascending=True, inplace=True)
fig.add_trace(go.Bar(y = part_industria.index[-10:], x = part_industria.values[-10:], orientation='h'))

fig.update_xaxes(title_text='<b> Participação no Mercado Global')

fig.update_layout(title_text='Distribuição de ativos por Indústria - Top 10', width=800, height=600)
fig.show()

In [12]:
fig = go.Figure()

part_setor = ativos.groupby(['SetorEconomico'])['Ponderação'].sum() 
part_setor.sort_values(ascending=True, inplace=True)
fig.add_trace(go.Bar(y = part_setor.index, x = part_setor.values, orientation='h'))

fig.update_xaxes(title_text='<b> Participação no Mercado Global')

fig.update_layout(title_text='Distribuição de ativos por Setor Econômico', width=800, height=600)
fig.show()

O resultado não pode ser interpretado como percentual ou valor absoluto: o campo <i>Ponderação</i>, proporcionado pelo arquivo Excel inicialmente compartilhado com o grupo MackIA, não aparenta se referir ao percentual de participação global, sendo possível que represente o percentual de suas bolsas de valores.

Ainda assim, a prevalência de determinados setores e indústrias nos permite concluir sobre a importância desses setores, em geral, por terem maiores participações "em cada bolsa de valores". 

### 2.3. Análise de Relacionamento entre setores

A fim de descobrir possível dependência entre setores (e reduzir o risco pela seleção de setores menos correlacionados), prossegue-se a verificação dos vetores de cada país/moeda/indústria.

In [46]:
lista_csv = os.listdir('data')
arquivos_csv_organizados = {'Moeda':[], 'Pais':[], 'Industria': [], 'Endereco': []}

for arquivo in lista_csv:
    if arquivo.endswith('.csv'):
        dados = arquivo.split('_')
        arquivos_csv_organizados['Moeda'].append(dados[0])
        arquivos_csv_organizados['Pais'].append(dados[1])
        arquivos_csv_organizados['Industria'].append(dados[2])
        arquivos_csv_organizados['Endereco'].append(arquivo)

arquivos_organizados = pd.DataFrame(arquivos_csv_organizados)

In [47]:
arquivos_organizados

Unnamed: 0,Moeda,Pais,Industria,Endereco
0,BRL,Brazil,Agricultural Inputs,BRL_Brazil_Agricultural Inputs_setoriado.csv
1,BRL,Brazil,Airlines,BRL_Brazil_Airlines_setoriado.csv
2,BRL,Brazil,Aluminum,BRL_Brazil_Aluminum_setoriado.csv
3,BRL,Brazil,Auto Parts,BRL_Brazil_Auto Parts_setoriado.csv
4,BRL,Brazil,Banks—Regional,BRL_Brazil_Banks—Regional_setoriado.csv
...,...,...,...,...
509,USD,United States,Utilities—Renewable,USD_United States_Utilities—Renewable_setoriad...
510,USD,United States,Waste Management,USD_United States_Waste Management_setoriado.csv
511,USD,Uruguay,Internet Retail,USD_Uruguay_Internet Retail_setoriado.csv
512,USD,Uruguay,Restaurants,USD_Uruguay_Restaurants_setoriado.csv


#### 2.3.1. Análise de setores - Estados Unidos

In [16]:
dados = pd.DataFrame()

arquivos_a_importar = arquivos_organizados[arquivos_organizados['Pais']=='United States']['Endereco']

for arquivo in arquivos_a_importar:
    nome = arquivo.split('_')
    temp = pd.read_csv(rf'data\{arquivo}', parse_dates=[0])
    
    temp.columns = ['Date', nome[2]]
    
    temp.set_index(keys='Date', inplace=True, drop=True)
    
    dados = pd.concat([dados, temp], axis=1)
    # break

dados.head()

Unnamed: 0_level_0,Advertising Agencies,Aerospace & Defense,Agricultural Inputs,Airlines,Airports & Air Services,Aluminum,Apparel Manufacturing,Apparel Retail,Asset Management,Auto & Truck Dealerships,...,Travel Services,Trucking,Uranium,Utilities—Diversified,Utilities—Independent Power Producers,Utilities—Regulated Electric,Utilities—Regulated Gas,Utilities—Regulated Water,Utilities—Renewable,Waste Management
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2002-01-02,0.0,0.0,0.0,0.0,,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,,0.0,0.0,0.0,,0.0
2002-01-03,-0.011206,-0.000951,0.031262,0.036819,,0.014425,0.033868,0.017662,0.011451,-0.019202,...,0.009379,0.023555,0.0,-0.013019,,-0.004352,0.003142,0.009752,,-0.01455
2002-01-04,4.3e-05,0.006005,-1.7e-05,0.036488,,0.031894,0.036881,0.068347,0.01797,-0.021046,...,0.055739,0.03856,0.030769,0.035431,,-0.027157,0.005161,0.00929,,-0.000916
2002-01-07,0.004174,-0.023803,0.000678,0.002513,,0.022895,0.000763,-0.010559,0.00011,-0.000911,...,-0.013652,0.00867,0.0,0.01025,,0.00778,-0.00462,-0.010903,,0.001564
2002-01-08,0.006195,-0.009609,-0.007456,0.001862,,-0.021272,0.003146,0.001636,-0.011418,0.000553,...,-0.037218,0.025765,-0.002714,-0.00028,,-0.018859,-0.008311,0.007064,,0.007016


Um dos problemas identificados nos dados é o fato de ter operações de junção ou divisão de ações. Diante disso, em alguns casos, poderá ter ações com rendimentos nominais absolutos elevados, mas que não são reais. Para solucionar isso, será utilizada a função clip, conforme abaixo:

In [17]:
dados = dados.clip(lower=-0.5, upper=0.5)

Olhando as estatísticas de cada setor:

In [18]:
dados.fillna(0, inplace=False).describe()

Unnamed: 0,Advertising Agencies,Aerospace & Defense,Agricultural Inputs,Airlines,Airports & Air Services,Aluminum,Apparel Manufacturing,Apparel Retail,Asset Management,Auto & Truck Dealerships,...,Travel Services,Trucking,Uranium,Utilities—Diversified,Utilities—Independent Power Producers,Utilities—Regulated Electric,Utilities—Regulated Gas,Utilities—Regulated Water,Utilities—Renewable,Waste Management
count,5725.0,5725.0,5725.0,5725.0,5725.0,5725.0,5725.0,5725.0,5725.0,5725.0,...,5725.0,5725.0,5725.0,5725.0,5725.0,5725.0,5725.0,5725.0,5725.0,5725.0
mean,0.000594,0.00071,0.001088,0.000904,0.000116,0.000416,0.001022,0.000987,0.000744,0.001797,...,0.00106,0.001227,0.001377,0.000487,0.00078,0.000225,0.00038,0.00059,0.00068,0.000605
std,0.019619,0.019401,0.025426,0.030085,0.019303,0.030177,0.023603,0.021684,0.020848,0.028887,...,0.026597,0.022357,0.04948,0.017017,0.022825,0.013304,0.013289,0.014409,0.018643,0.014178
min,-0.188206,-0.189898,-0.366824,-0.248595,-0.173067,-0.19384,-0.170007,-0.192521,-0.278536,-0.18304,...,-0.244476,-0.229955,-0.5,-0.305439,-0.209075,-0.111353,-0.114935,-0.10385,-0.250693,-0.143441
25%,-0.00818,-0.007239,-0.010787,-0.013373,0.0,-0.014384,-0.009852,-0.009342,-0.007994,-0.010365,...,-0.009764,-0.010475,-0.021849,-0.006387,-0.008955,-0.005443,-0.005825,-0.006795,0.0,-0.005516
50%,0.000557,0.00061,0.000882,0.000658,0.0,0.000111,0.000862,0.000786,0.000884,0.000986,...,0.000663,0.000905,-4.8e-05,0.000788,0.0,0.000744,0.000735,0.000786,0.0,0.000894
75%,0.009426,0.008707,0.013534,0.014599,0.0,0.014964,0.011751,0.011217,0.009776,0.012919,...,0.011351,0.012911,0.020935,0.008003,0.010363,0.006793,0.007107,0.008211,0.0,0.007179
max,0.179128,0.208051,0.176612,0.309452,0.401453,0.257729,0.171137,0.200728,0.267245,0.466774,...,0.300456,0.197286,0.5,0.150377,0.293326,0.149336,0.151889,0.186648,0.5,0.141429


Para ser possível avaliar setores possivelmente rentáveis, utilizou-se um <i>proxy</i> do Índice Sharpe, baseado na seguinte fórmula:

$$ {ÍndiceSharpe_{Modificado}} = \frac{\sum_{k=1}^n {MédiaAritmética_{Retornos}}}{{DesvioPadrão{Retornos}}}  $$

Com esse indicador, buscou-se, a cada ano, identificar os 10 setores com as melhores colocações. O propósito dessa análise é extrair setores que, historicamente, possuam uma relação risco-benefício adequada. 


In [19]:
def extrair_indice_sharpe(dados_historicos):
    """ Função para extrair índice sharpe modificado de um conjunto de dados históricos de retornos """

    tabela_descricoes = dados_historicos.describe().T

    # Gerando índice sharpe médio aproximado do setor
    tabela_descricoes['IndiceSharpeAproximado'] = tabela_descricoes['mean'] / tabela_descricoes['std']
    tabela_descricoes.reset_index(inplace=True)
    return tabela_descricoes.sort_values(by='IndiceSharpeAproximado', ascending=False)

data_inicial = dados.index[0]
lista_indices = pd.DataFrame()

while data_inicial < dados.index[-1]:
    data_final_pesquisa = min(data_inicial + relativedelta(years=1), dados.index[-1])   # Extraindo o final do pedaço a ser buscado

    dados_anuais = dados.fillna(0).loc[data_inicial:data_final_pesquisa]
    temp = extrair_indice_sharpe(dados_anuais)
    temp['Ano'] = data_inicial.strftime('%Y')
    lista_indices = pd.concat([lista_indices, temp[:10]])

    data_inicial += relativedelta(years=1)



A partir desses indicadores, veremos setores que, em determinados períodos, se comportaram de forma adequada a:

- Conflitos Armados com participação dos EUA:

    - 2001 - Guerra do Afeganistão
    - 2003 - Guerra do Iraque 
    - 2004 - Guerra no Noroeste do Paquistão
    - 2010 - Rebelião da AI-Qaeda no Iémen
    - 2011 - Intervenção militar na Líbia 
    - 2014 - Guerra contra o Estado Islâmico 

- Eleições Presidenciais (considerando modelo de 4 em 4 anos)

    - 2024
    - 2020
    - 2016
    - 2012
    - 2008
    - 2004

- Redução de Taxa de Juros

    - 2007
    - 2008
    - 2019 
    - 2020


In [20]:
lista_indices_adaptada = lista_indices[lista_indices['Ano'].isin(['2001', '2003', '2004', '2010', '2011', '2014', '2024', '2020', '2016', '2012', '2008', '2007', '2019', '2020', '2023', '2024'])]
lista_indices_adaptada.sort_values(by='index')

Unnamed: 0,index,count,mean,std,min,25%,50%,75%,max,IndiceSharpeAproximado,Ano
1,Aerospace & Defense,251.0,0.002543,0.014654,-0.053113,-0.006475,0.002091,0.010767,0.069798,0.173510,2023
2,Agricultural Inputs,252.0,0.005843,0.030009,-0.134553,-0.008283,0.005493,0.024533,0.099406,0.194719,2007
3,Airlines,254.0,0.002339,0.068513,-0.239602,-0.038129,0.000000,0.034060,0.309452,0.034134,2008
3,Airlines,253.0,0.003044,0.021113,-0.063823,-0.009791,0.001414,0.016703,0.070755,0.144164,2014
6,Apparel Manufacturing,252.0,0.002199,0.024900,-0.094753,-0.011119,0.001857,0.016220,0.082084,0.088302,2011
...,...,...,...,...,...,...,...,...,...,...,...
135,Trucking,253.0,0.002358,0.016143,-0.047481,-0.007122,0.002122,0.010388,0.106768,0.146095,2014
135,Trucking,254.0,0.002152,0.038237,-0.129036,-0.019716,-0.000426,0.022322,0.160102,0.056275,2008
136,Uranium,252.0,0.006729,0.061352,-0.166637,-0.020544,0.001769,0.026927,0.500000,0.109685,2016
138,Utilities—Independent Power Producers,188.0,0.006428,0.032724,-0.106938,-0.009112,0.007299,0.025955,0.142941,0.196422,2024


In [21]:
lista_indices_adaptada['index'].value_counts()

index
Biotechnology                      8
Semiconductors                     4
Health Information Services        4
Restaurants                        3
Consumer Electronics               3
                                  ..
Tobacco                            1
Marine Shipping                    1
Healthcare Plans                   1
Apparel Manufacturing              1
Information Technology Services    1
Name: count, Length: 80, dtype: int64

A partir desse ponto, podemos considerar que os setores com maior frequência entre os melhores classificados na relação risco-benefício são os setores a seguir:

In [22]:
lista_indices_adaptada['index'].value_counts()[:10]

index
Biotechnology                  8
Semiconductors                 4
Health Information Services    4
Restaurants                    3
Consumer Electronics           3
Auto Parts                     3
Diagnostics & Research         2
Insurance—Specialty            2
Trucking                       2
Airlines                       2
Name: count, dtype: int64

##### 2.3.1.1. Análise de Correlação entre Setores Selecionados

In [23]:
dados[lista_indices_adaptada['index'].value_counts()[:20].index].corr()

Unnamed: 0,Biotechnology,Semiconductors,Health Information Services,Restaurants,Consumer Electronics,Auto Parts,Diagnostics & Research,Insurance—Specialty,Trucking,Airlines,Lumber & Wood Production,Scientific & Technical Instruments,Leisure,Residential Construction,Medical Care Facilities,Travel Services,Insurance Brokers,Building Materials,Medical Devices,Gambling
Biotechnology,1.0,0.1977,0.142845,0.137191,0.159749,0.152742,0.2241,0.137925,0.132765,0.125863,0.156234,0.183217,0.123034,0.169792,0.131685,0.144086,0.129555,0.16148,0.186382,0.170561
Semiconductors,0.1977,1.0,0.28987,0.397183,0.519584,0.377527,0.44282,0.330852,0.413632,0.342591,0.400296,0.55818,0.350168,0.399439,0.285964,0.425152,0.373633,0.405129,0.366852,0.36503
Health Information Services,0.142845,0.28987,1.0,0.230637,0.223934,0.217962,0.301322,0.168638,0.232768,0.18837,0.238077,0.281625,0.247948,0.218513,0.183552,0.222283,0.172676,0.218167,0.233525,0.240041
Restaurants,0.137191,0.397183,0.230637,1.0,0.360962,0.348332,0.380641,0.359235,0.383121,0.38412,0.419755,0.390016,0.352569,0.446517,0.310696,0.454674,0.370456,0.438592,0.358147,0.352535
Consumer Electronics,0.159749,0.519584,0.223934,0.360962,1.0,0.31582,0.39169,0.292603,0.333169,0.295003,0.344344,0.417822,0.299724,0.360182,0.241185,0.355208,0.304799,0.35151,0.322682,0.308308
Auto Parts,0.152742,0.377527,0.217962,0.348332,0.31582,1.0,0.354585,0.386721,0.383377,0.382422,0.403948,0.39907,0.316976,0.411868,0.279221,0.442604,0.329382,0.419177,0.292025,0.342453
Diagnostics & Research,0.2241,0.44282,0.301322,0.380641,0.39169,0.354585,1.0,0.341526,0.371336,0.327837,0.389679,0.437968,0.33446,0.400654,0.336331,0.369874,0.367588,0.402896,0.441288,0.336114
Insurance—Specialty,0.137925,0.330852,0.168638,0.359235,0.292603,0.386721,0.341526,1.0,0.352891,0.407433,0.426794,0.366155,0.299611,0.501807,0.280177,0.395229,0.38877,0.472446,0.3091,0.322363
Trucking,0.132765,0.413632,0.232768,0.383121,0.333169,0.383377,0.371336,0.352891,1.0,0.407266,0.464465,0.426942,0.330016,0.441246,0.280632,0.407263,0.37309,0.461413,0.332568,0.335803
Airlines,0.125863,0.342591,0.18837,0.38412,0.295003,0.382422,0.327837,0.407433,0.407266,1.0,0.392605,0.369944,0.300346,0.453328,0.287958,0.542358,0.349004,0.431393,0.282465,0.340912


A existência de correlações muito pequenas pode indicar problemas nos dados. Assim, serão removidos dados mais antigos (com maior probabilidade de problemas de NA - e, em virtude disso, recebidos valores iguais a 0).

In [24]:
dados_recentes = dados.loc[dt(2023, 1, 1): dt(2024, 10, 1)]
dados_recentes.head()


Unnamed: 0_level_0,Advertising Agencies,Aerospace & Defense,Agricultural Inputs,Airlines,Airports & Air Services,Aluminum,Apparel Manufacturing,Apparel Retail,Asset Management,Auto & Truck Dealerships,...,Travel Services,Trucking,Uranium,Utilities—Diversified,Utilities—Independent Power Producers,Utilities—Regulated Electric,Utilities—Regulated Gas,Utilities—Regulated Water,Utilities—Renewable,Waste Management
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2023-01-03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2023-01-04,0.021243,0.0325,0.008704,0.058471,0.025633,0.012198,0.05746,0.03265,0.018358,0.042362,...,0.062015,0.011034,-0.020756,-0.003962,0.016267,0.010928,0.01222,0.015925,0.019555,-0.001463
2023-01-05,0.00841,0.004883,0.012635,0.02338,-0.03612,0.010257,-0.002083,-0.005321,-0.016682,-0.018266,...,0.012674,-0.015172,0.009781,-0.026534,-0.016096,-0.022934,-0.01645,-0.021638,-0.026516,-0.019297
2023-01-06,0.01624,0.022759,0.021777,0.023306,0.025242,0.026335,0.02363,0.017734,0.036062,0.025173,...,0.018884,0.061737,0.053051,0.013197,0.010253,0.02048,0.030264,0.01938,0.007359,0.026939
2023-01-09,-0.002974,-0.019723,0.017543,0.022836,0.086624,0.080097,0.002505,0.012833,0.01618,0.001066,...,0.023651,0.018073,0.007498,0.016281,0.005629,0.005882,0.008376,-0.004459,0.032746,-0.003524


In [25]:
dados_recentes[lista_indices_adaptada['index'].value_counts()[:20].index].corr()

Unnamed: 0,Biotechnology,Semiconductors,Health Information Services,Restaurants,Consumer Electronics,Auto Parts,Diagnostics & Research,Insurance—Specialty,Trucking,Airlines,Lumber & Wood Production,Scientific & Technical Instruments,Leisure,Residential Construction,Medical Care Facilities,Travel Services,Insurance Brokers,Building Materials,Medical Devices,Gambling
Biotechnology,1.0,0.076758,0.060961,0.107531,0.083948,0.116922,0.146048,0.117496,0.047156,0.098066,0.160134,0.1044,0.094707,0.156853,-0.031509,0.120885,0.013216,0.155145,-0.032313,0.079808
Semiconductors,0.076758,1.0,0.172686,0.314434,0.377447,0.247248,0.199112,0.043871,0.280346,0.265419,0.249828,0.485575,0.178458,0.320013,0.100266,0.406561,0.022018,0.385575,0.15408,0.321778
Health Information Services,0.060961,0.172686,1.0,0.305532,0.249886,0.306846,0.392355,0.276289,0.284723,0.33166,0.346768,0.283542,0.375288,0.363282,0.260865,0.40655,0.090071,0.327596,0.235593,0.353358
Restaurants,0.107531,0.314434,0.305532,1.0,0.274427,0.186399,0.279728,0.146478,0.181286,0.244065,0.281508,0.250156,0.20066,0.309243,0.241837,0.323386,0.100887,0.36805,0.223089,0.284575
Consumer Electronics,0.083948,0.377447,0.249886,0.274427,1.0,0.204371,0.211145,0.131039,0.218863,0.228439,0.279157,0.295139,0.190877,0.266387,0.157194,0.292501,0.086249,0.323725,0.164734,0.241142
Auto Parts,0.116922,0.247248,0.306846,0.186399,0.204371,1.0,0.299771,0.278436,0.2949,0.346546,0.42467,0.299645,0.352594,0.416788,0.318473,0.384767,0.073309,0.367514,0.151312,0.250614
Diagnostics & Research,0.146048,0.199112,0.392355,0.279728,0.211145,0.299771,1.0,0.302526,0.256602,0.324899,0.341646,0.323163,0.314486,0.400373,0.383387,0.297543,0.121612,0.331329,0.30343,0.318448
Insurance—Specialty,0.117496,0.043871,0.276289,0.146478,0.131039,0.278436,0.302526,1.0,0.258682,0.313313,0.401373,0.237039,0.273374,0.377074,0.248062,0.263082,0.191271,0.274034,0.107929,0.220194
Trucking,0.047156,0.280346,0.284723,0.181286,0.218863,0.2949,0.256602,0.258682,1.0,0.334437,0.414417,0.298283,0.257991,0.391514,0.257217,0.261455,0.119506,0.439679,0.265812,0.249406
Airlines,0.098066,0.265419,0.33166,0.244065,0.228439,0.346546,0.324899,0.313313,0.334437,1.0,0.411081,0.36585,0.329853,0.380888,0.224094,0.505678,0.0748,0.406551,0.157124,0.315612


Observando os setores americanos acima, entendeu-se ser interessante operar no setor de Biotecnologia e no setor de Semicondutores, considerando estarem nos melhores classificados em sharpe ratio e aparentemente não serem correlacionados (correlação linear).

Para manter a exigência de 3 setores por país, optou-se por utilizar o setor de Insurance - Specialty e Insurance Brokers, por serem pouco correlacionados com os setores acima.

In [26]:
setores_selecionados = ['Biotechnology', 'Semiconductors', 'Insurance Brokers', 'Insurance—Specialty']

##### 2.3.1.2. Análise de Tendência Histórica de crescimento dos setores

A partir do momento que foram identificados três setores para serem observados, pode-se verificar se há uma previsão ou tendência geral de aumento no período de outubro, numa série temporal histórica dos setores.

In [28]:
dados_selecionados = dados[setores_selecionados]

In [41]:
# Criando função para visualização da tendência dos rendimentos
def plot_daily_return_mean(df, symbols, window=7):
    """ Função para plotar média móvel de retornos diários """
    
    pct_changes = df[symbols].dropna().rolling(window=window).mean()
    
    fig = go.Figure()
    
    for symbol in symbols:
        fig.add_trace(go.Scatter(
            x=pct_changes.index,
            y=pct_changes[symbol],
            mode='lines',
            name=symbol,
            line=dict(width=2)
        ))
    
    fig.update_layout(
        title="Média Móvel de Retornos Diários",
        xaxis_title="Data",
        yaxis_title="Retorno Diário",
        hovermode="x unified",
        template="plotly_white",
        height=600,
        width=1000
    )
    
    fig.add_shape(type="line",
                  x0=pct_changes.index.min(), x1=pct_changes.index.max(),
                  y0=0, y1=0,
                  line=dict(color="Red", width=1, dash="dash"))
    
    fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='LightGray', tickangle=45)
    fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='LightGray')
    
    return fig


In [43]:
plot_daily_return_mean(dados.loc[dt(2023, 1, 1):dt(2024, 10, 1)], setores_selecionados, window = 7)

Existe uma tendência de queda recente nos retornos. Ainda assim, considerando que se trata de um ciclo intermediário (média móvel de 60 dias), seria interessante usar esses setores para um possível movimento de correção.

É interessante olhar outros setores também.

In [44]:
plot_daily_return_mean(dados.loc[dt(2023, 1, 1):dt(2024, 10, 1)], lista_indices_adaptada['index'].value_counts()[:10].index, window = 60)

#### 2.3.2. Análise de setores - Brasil

Será realizada a mesma sequência dos Estados Unidos

In [155]:
dados = pd.DataFrame()

arquivos_a_importar = arquivos_organizados[(arquivos_organizados['Pais']=='Brazil')&(arquivos_organizados['Moeda']=='BRL')]['Endereco']

for arquivo in arquivos_a_importar:
    nome = arquivo.split('_')
    temp = pd.read_csv(rf'data\{arquivo}', parse_dates=[0])
    

    temp.columns = ['Date', nome[2]]
    temp['Date'] = pd.to_datetime(temp['Date'])
    temp.set_index(keys='Date', inplace=True, drop=True)

    dados = pd.concat([dados, temp], axis=1)

# Por algum motivo, está importando mais de uma vez alguns ativos
dados = dados[[*dados.columns.unique()]]
# dados.head()

In [157]:
dados = dados.clip(lower=-0.5, upper=0.5)

In [158]:
dados.fillna(0, inplace=False).describe()


Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`



Unnamed: 0,Agricultural Inputs,Airlines,Aluminum,Auto Parts,Banks—Regional,Beverages—Brewers,Communication Equipment,Conglomerates,Department Stores,Drug Manufacturers—Specialty & Generic,...,Software—Application,Software—Infrastructure,Specialty Business Services,Specialty Chemicals,Specialty Industrial Machinery,Specialty Retail,Steel,Utilities—Diversified,Utilities—Regulated Electric,Utilities—Renewable
count,5695.0,5695.0,5695.0,5695.0,5695.0,5695.0,5695.0,5695.0,5695.0,5695.0,...,5695.0,5695.0,5695.0,5695.0,5695.0,5695.0,5695.0,5695.0,5695.0,5695.0
mean,-6e-06,3e-06,-3.1e-05,0.000902,0.000663,0.000532,0.000288,0.000748,0.000698,0.000395,...,0.00064,0.000185,0.00012,0.000505,0.001135,0.000759,0.000988,0.000751,0.000798,0.000625
std,0.008763,0.0235,0.013282,0.022102,0.022602,0.017476,0.036022,0.025307,0.026944,0.019399,...,0.020692,0.019387,0.008555,0.022346,0.021042,0.027983,0.029343,0.023055,0.021699,0.02813
min,-0.136738,-0.368677,-0.165323,-0.204878,-0.420701,-0.224263,-0.219638,-0.15,-0.5,-0.152096,...,-0.163054,-0.2,-0.072507,-0.323583,-0.206197,-0.198879,-0.415812,-0.168571,-0.384,-0.283467
25%,0.0,0.0,0.0,-0.001842,-0.011082,-0.008051,-0.014925,-0.010831,-0.010719,-0.006427,...,-0.008411,0.0,0.0,-0.007856,-0.006089,-0.004717,-0.014655,-0.011819,-0.00536,-0.013046
50%,0.0,0.0,0.0,0.0,3.6e-05,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,0.0,0.0,0.0,0.00391,0.011788,0.009304,0.012481,0.011715,0.011103,0.006233,...,0.009648,0.0,0.0,0.007884,0.007884,0.003779,0.015764,0.01333,0.006567,0.013147
max,0.122727,0.379834,0.147122,0.304369,0.223398,0.15972,0.369863,0.251471,0.5,0.211429,...,0.197996,0.294118,0.142063,0.252688,0.322035,0.372883,0.173688,0.205732,0.5,0.48665


Existem muitos dados faltantes entre setores, fazendo com que as duas soluções a seguir gerem erro:

- Excluir valores nulos: Alguns setores estão nulos para o ano de 2023 e 2024, fazendo com que se excluam setores entre si
- Adicionar valores 0: Alguns setores são recentes na bolsa de valores brasileira, fazendo com que impeça o cálculo das estatísticas (vide setor de Airlines acima)

Isso, contudo, não impede de tratá-los por ano, fazendo via extração de índice sharpe.

In [None]:
def extrair_indice_sharpe(dados_historicos):
    """ Função para extrair índice sharpe modificado de um conjunto de dados históricos de retornos """

    tabela_descricoes = dados_historicos.describe().T

    # Gerando índice sharpe médio aproximado do setor
    tabela_descricoes['IndiceSharpeAproximado'] = tabela_descricoes['mean'] / tabela_descricoes['std']
    tabela_descricoes.reset_index(inplace=True)
    return tabela_descricoes.sort_values(by='IndiceSharpeAproximado', ascending=False)

data_inicial = dados.index[0]
lista_indices = pd.DataFrame()

while data_inicial < dados.index[-1]:
    data_final_pesquisa = min(data_inicial + relativedelta(years=1), dados.index[-1])   # Extraindo o final do pedaço a ser buscado

    

    dados_anuais = dados.fillna(0, inplace=False).loc[data_inicial:data_final_pesquisa]
    temp = extrair_indice_sharpe(dados_anuais)
    temp['Ano'] = data_inicial.strftime('%Y')
    lista_indices = pd.concat([lista_indices, temp[:10]])

    data_inicial += relativedelta(years=1)


In [160]:
lista_indices_adaptada = lista_indices[lista_indices['Ano'].isin(['2001', '2003', '2004', '2010', '2011', '2014', '2024', '2020', '2016', '2012', '2008', '2007', '2019', '2020', '2023', '2024'])]
lista_indices_adaptada = lista_indices_adaptada.dropna().sort_values(by='index').drop_duplicates()
lista_indices_adaptada

Unnamed: 0,index,count,mean,std,min,25%,50%,75%,max,IndiceSharpeAproximado,Ano
2,Aluminum,190.0,0.000985,0.032886,-0.125000,-0.019810,-0.001562,0.020850,0.147122,0.029954,2024
3,Auto Parts,247.0,0.002233,0.014880,-0.066689,0.000000,0.000000,0.006397,0.068796,0.150091,2010
3,Auto Parts,249.0,0.000623,0.021179,-0.056818,-0.005495,0.000000,0.006674,0.111112,0.029412,2011
3,Auto Parts,248.0,0.001924,0.020985,-0.059406,-0.010767,0.000000,0.015496,0.056667,0.091702,2023
3,Auto Parts,262.0,0.003160,0.037898,-0.183680,0.000000,0.000000,0.000000,0.304369,0.083376,2003
...,...,...,...,...,...,...,...,...,...,...,...
46,Utilities—Regulated Electric,248.0,0.000819,0.016033,-0.044416,-0.009374,0.000581,0.010071,0.050962,0.051093,2014
46,Utilities—Regulated Electric,249.0,0.002465,0.013591,-0.031120,-0.005907,0.001975,0.010112,0.065998,0.181382,2016
47,Utilities—Renewable,249.0,0.006319,0.033188,-0.076056,-0.013393,0.005173,0.022099,0.124510,0.190416,2016
47,Utilities—Renewable,262.0,0.003335,0.034366,-0.087379,-0.019857,0.000000,0.024381,0.136646,0.097051,2003


In [161]:
lista_indices_adaptada['index'].value_counts()

index
Specialty Industrial Machinery            7
Packaged Foods                            7
Pharmaceutical Retailers                  7
Farm & Heavy Construction Machinery       6
Banks—Regional                            6
Utilities—Diversified                     6
Real Estate Services                      5
Real Estate—Development                   5
Oil & Gas Refining & Marketing            5
Steel                                     5
Beverages—Brewers                         5
Auto Parts                                5
Other Industrial Metals & Mining          4
Rental & Leasing Services                 4
Conglomerates                             4
Utilities—Regulated Electric              4
Paper & Paper Products                    4
Oil & Gas Integrated                      4
Specialty Retail                          3
Specialty Chemicals                       3
Utilities—Renewable                       3
Oil & Gas E&P                             3
Insurance—Diversified     

In [162]:
lista_indices_adaptada['index'].value_counts()[:10]

index
Specialty Industrial Machinery         7
Packaged Foods                         7
Pharmaceutical Retailers               7
Farm & Heavy Construction Machinery    6
Banks—Regional                         6
Utilities—Diversified                  6
Real Estate Services                   5
Real Estate—Development                5
Oil & Gas Refining & Marketing         5
Steel                                  5
Name: count, dtype: int64

##### 2.3.2.1. Análise de Correlação entre Setores Selecionados

In [164]:
dados[lista_indices_adaptada['index'].value_counts()[:20].index].corr()

Unnamed: 0,Specialty Industrial Machinery,Packaged Foods,Pharmaceutical Retailers,Farm & Heavy Construction Machinery,Banks—Regional,Utilities—Diversified,Real Estate Services,Real Estate—Development,Oil & Gas Refining & Marketing,Steel,Beverages—Brewers,Auto Parts,Other Industrial Metals & Mining,Rental & Leasing Services,Conglomerates,Utilities—Regulated Electric,Paper & Paper Products,Oil & Gas Integrated,Specialty Retail,Specialty Chemicals
Specialty Industrial Machinery,1.0,0.269784,0.263295,0.279027,0.311455,0.269493,0.322427,0.316447,0.316647,0.27937,0.281027,0.188722,0.247387,0.341664,0.299232,0.285883,0.27088,0.255405,0.251547,0.297265
Packaged Foods,0.269784,1.0,0.193128,0.246345,0.353449,0.312212,0.31447,0.318813,0.338509,0.315967,0.273447,0.126983,0.287479,0.31468,0.298659,0.287378,0.316617,0.302468,0.248552,0.308093
Pharmaceutical Retailers,0.263295,0.193128,1.0,0.189241,0.241008,0.228745,0.266045,0.235953,0.226776,0.176497,0.235954,0.123353,0.147223,0.303067,0.247932,0.23355,0.15861,0.170214,0.233603,0.195854
Farm & Heavy Construction Machinery,0.279027,0.246345,0.189241,1.0,0.390516,0.319491,0.38064,0.379258,0.309897,0.31112,0.269048,0.19035,0.253575,0.397026,0.317528,0.233438,0.184108,0.32794,0.292991,0.290419
Banks—Regional,0.311455,0.353449,0.241008,0.390516,1.0,0.486083,0.486805,0.435807,0.455443,0.463051,0.383971,0.155093,0.417488,0.484266,0.669653,0.304626,0.29055,0.539668,0.345856,0.337046
Utilities—Diversified,0.269493,0.312212,0.228745,0.319491,0.486083,1.0,0.440507,0.401906,0.397509,0.392933,0.30933,0.168864,0.298835,0.421301,0.38541,0.343279,0.230529,0.428358,0.320363,0.306556
Real Estate Services,0.322427,0.31447,0.266045,0.38064,0.486805,0.440507,1.0,0.493612,0.445053,0.356091,0.374211,0.203882,0.283807,0.490462,0.455299,0.406882,0.158177,0.394627,0.367499,0.324482
Real Estate—Development,0.316447,0.318813,0.235953,0.379258,0.435807,0.401906,0.493612,1.0,0.392313,0.342434,0.309474,0.247149,0.258573,0.486415,0.377978,0.359192,0.168487,0.366377,0.393621,0.297908
Oil & Gas Refining & Marketing,0.316647,0.338509,0.226776,0.309897,0.455443,0.397509,0.445053,0.392313,1.0,0.379553,0.344068,0.156667,0.339459,0.439709,0.367432,0.319331,0.23725,0.388044,0.338987,0.393826
Steel,0.27937,0.315967,0.176497,0.31112,0.463051,0.392933,0.356091,0.342434,0.379553,1.0,0.283499,0.173556,0.562014,0.390782,0.378945,0.226051,0.316012,0.469057,0.291182,0.314206


In [165]:
dados_recentes = dados.loc[dt(2023, 1, 1): dt(2024, 10, 1)]
dados_recentes.head()

Unnamed: 0_level_0,Agricultural Inputs,Airlines,Aluminum,Auto Parts,Banks—Regional,Beverages—Brewers,Communication Equipment,Conglomerates,Department Stores,Drug Manufacturers—Specialty & Generic,...,Software—Application,Software—Infrastructure,Specialty Business Services,Specialty Chemicals,Specialty Industrial Machinery,Specialty Retail,Steel,Utilities—Diversified,Utilities—Regulated Electric,Utilities—Renewable
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2023-01-02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2023-01-03,-0.05,-0.013592,-0.038532,0.002165,-0.039901,-0.001409,0.003226,-0.02512,-0.023797,-0.04322,...,-0.034376,-0.068536,-0.049558,-0.034043,-0.024934,-0.021764,-0.000488,-0.016391,-0.019098,-0.018769
2023-01-04,0.027864,0.03937,0.012405,-0.021598,0.004141,0.005646,0.03537,0.008589,0.006359,0.003346,...,0.0292,0.001672,0.013035,0.008811,0.006729,0.010269,0.002407,0.030651,0.003285,0.024549
2023-01-05,0.008032,0.09375,0.02262,0.015452,0.040627,0.002807,-0.012422,0.014599,-0.015271,0.012863,...,0.045472,0.09182,0.011949,-0.021397,0.002139,0.032618,0.037187,0.00583,0.002996,0.002978
2023-01-06,0.065737,0.025108,0.023041,0.023913,0.021787,0.002099,-0.040881,0.016787,0.036898,0.014346,...,0.016357,-0.006116,0.023615,0.006247,0.014674,0.031213,0.008432,0.015556,0.013641,1.9e-05


In [166]:
dados_recentes[lista_indices_adaptada['index'].value_counts()[:20].index].corr()

Unnamed: 0,Specialty Industrial Machinery,Packaged Foods,Pharmaceutical Retailers,Farm & Heavy Construction Machinery,Banks—Regional,Utilities—Diversified,Real Estate Services,Real Estate—Development,Oil & Gas Refining & Marketing,Steel,Beverages—Brewers,Auto Parts,Other Industrial Metals & Mining,Rental & Leasing Services,Conglomerates,Utilities—Regulated Electric,Paper & Paper Products,Oil & Gas Integrated,Specialty Retail,Specialty Chemicals
Specialty Industrial Machinery,1.0,0.15168,0.180078,0.051985,0.105388,0.225586,0.215499,0.153281,0.144921,0.120217,0.186107,0.253222,0.149041,0.231409,0.156506,0.154841,0.065107,0.047163,0.025386,0.217445
Packaged Foods,0.15168,1.0,0.211396,0.151187,0.271882,0.242138,0.26615,0.316973,0.298171,0.213937,0.244764,0.195055,0.174233,0.253341,0.32248,0.249387,0.179652,0.085724,0.285821,0.249797
Pharmaceutical Retailers,0.180078,0.211396,1.0,0.135115,0.360152,0.357876,0.399473,0.345722,0.325753,0.076892,0.271457,0.240103,0.055225,0.373421,0.338983,0.337043,0.096274,-0.041611,0.282713,0.155029
Farm & Heavy Construction Machinery,0.051985,0.151187,0.135115,1.0,0.217643,0.197645,0.313022,0.324534,0.218943,0.1227,0.117894,0.267288,0.105831,0.297488,0.207674,0.212299,0.04274,0.074346,0.198006,0.167465
Banks—Regional,0.105388,0.271882,0.360152,0.217643,1.0,0.401026,0.458716,0.39837,0.455607,0.219743,0.383146,0.177359,0.160073,0.351802,0.765949,0.342282,0.083602,0.184517,0.35887,0.264909
Utilities—Diversified,0.225586,0.242138,0.357876,0.197645,0.401026,1.0,0.573905,0.481397,0.477139,0.256118,0.288758,0.222173,0.151187,0.417295,0.477235,0.627405,0.124277,0.042423,0.377412,0.264624
Real Estate Services,0.215499,0.26615,0.399473,0.313022,0.458716,0.573905,1.0,0.581147,0.495871,0.204386,0.349928,0.324186,0.143316,0.476052,0.533748,0.546806,0.078535,0.063992,0.431883,0.340531
Real Estate—Development,0.153281,0.316973,0.345722,0.324534,0.39837,0.481397,0.581147,1.0,0.457549,0.268002,0.300007,0.357145,0.16995,0.529323,0.482051,0.443118,0.099742,0.039169,0.448285,0.272786
Oil & Gas Refining & Marketing,0.144921,0.298171,0.325753,0.218943,0.455607,0.477139,0.495871,0.457549,1.0,0.29365,0.387217,0.230846,0.307607,0.484582,0.592335,0.397097,0.142995,0.241996,0.457965,0.335128
Steel,0.120217,0.213937,0.076892,0.1227,0.219743,0.256118,0.204386,0.268002,0.29365,1.0,0.137986,0.211437,0.479435,0.237681,0.258224,0.152305,0.277043,0.110983,0.23345,0.153251


##### 2.3.2.2. Análise de Tendência Histórica de Crescimento dos Setores

In [167]:
# Criando função para visualização da tendência dos rendimentos
def plot_daily_return_mean(df, symbols, window=7):
    """ Função para plotar média móvel de retornos diários """
    
    pct_changes = df[symbols].dropna().rolling(window=window).mean()
    
    fig = go.Figure()
    
    for symbol in symbols:
        fig.add_trace(go.Scatter(
            x=pct_changes.index,
            y=pct_changes[symbol],
            mode='lines',
            name=symbol,
            line=dict(width=2)
        ))
    
    fig.update_layout(
        title="Média Móvel de Retornos Diários",
        xaxis_title="Data",
        yaxis_title="Retorno Diário",
        hovermode="x unified",
        template="plotly_white",
        height=600,
        width=1000
    )
    
    fig.add_shape(type="line",
                  x0=pct_changes.index.min(), x1=pct_changes.index.max(),
                  y0=0, y1=0,
                  line=dict(color="Red", width=1, dash="dash"))
    
    fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='LightGray', tickangle=45)
    fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='LightGray')
    
    return fig

In [168]:
plot_daily_return_mean(dados.loc[dt(2023, 1, 1):dt(2024, 10, 1)], lista_indices_adaptada['index'].value_counts()[:10].index, window = 60)