# PAXPE - Ingestão de dados para banco de dados SQL do Azure usando serviços do Azure

## Visão geral

Este notebook demonstra como agendar um script Python para ingerir dados em um Banco de Dados SQL do Postgres, orquestrado pelo airflow.

# Documentação do Processo de Criação de Tabelas com Dados do Yahoo Finance

## Objetivo
O objetivo deste processo é obter dados financeiros, de mercado, dividendos, valuation e informações gerais de empresas listadas na bolsa, utilizando a API do Yahoo Finance. Os dados são coletados para um ou mais tickers e organizados em DataFrames utilizando PySpark para garantir performance e escalabilidade, especialmente ao lidar com uma grande quantidade de tickers.

1. **Coleta de Dados**: 
   - Para cada ticker fornecido, as informações relevantes foram extraídas da API do Yahoo Finance utilizando a biblioteca `yfinance`. Os dados foram organizados em dicionários para posterior conversão em DataFrames.

2. **Criação dos DataFrames**:
   - **Tabela Geral** (`df_geral`): Contém informações gerais da empresa, como setor, indústria, número de empregados, localização e resumo das atividades.
   - **Tabela Financeira** (`df_financeira`): Contém dados financeiros da empresa, como capitalização de mercado, receita, lucro líquido, EBITDA, dívida total, entre outros.
   - **Tabela de Mercado** (`df_mercado`): Inclui dados relacionados ao mercado, como preço atual, preço de abertura, volume de negociação, beta, entre outros.
   - **Tabela de Dividendos** (`df_dividendos`): Contém informações sobre dividendos, incluindo taxa de dividendos, data ex-dividendo e índice de distribuição.
   - **Tabela de Valuation** (`df_valuation`): Inclui dados de valuation da empresa, como índices P/E (Price to Earnings), P/B (Price to Book) e PEG (Price/Earnings to Growth).
   - **Tabela de Retorno Mensal** (`df_retorno_mensal`): Retorno mensal da ação com base em preço da ação, dividendos e percentual

## Considerações Finais
Este processo permite a coleta eficiente e escalável de dados financeiros de várias empresas, facilitando análises complexas em grandes volumes de dados. O uso do PySpark garante que mesmo listas extensas de tickers possam ser processadas rapidamente, gerando tabelas estruturadas e prontas para análise.


# Changelog

| Responsável | Data       | Change Log                                                                                      |
|-------------|------------|--------------------------------------------------------------------------------------------------|
| IGOR MENDES | 10-08-24 | Criação do script em spark                   |
| IGOR MENDES | 28-08-24 | Criação da logica de upsert com o postgresSQL                |
| IGOR MENDES | 20-10-24 | Sprint 3 - adicionando indices a um dataframe               |

In [1]:
#fontes - yahoo finance api
!pip install yahoofinance
!pip install yahooquery
!pip install --upgrade yfinance
!pip install psycopg2-binary

Defaulting to user installation because normal site-packages is not writeable







[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


Defaulting to user installation because normal site-packages is not writeable









[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


Defaulting to user installation because normal site-packages is not writeable









[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


Defaulting to user installation because normal site-packages is not writeable





[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [2]:
import findspark
findspark.init()  # Inicializa o Spark
findspark.find()  # Verifica se o Spark está corretamente configurado

from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession

spark = (
    SparkSession
    .builder
    .appName("PAXPE")
    .config("spark.sql.session.timeZone", "America/Sao_Paulo")  # Define o fuso horário para São Paulo
    .config("spark.driver.memory", "16g")  # Memória do driver
    .config("spark.executor.memory", "12g")  # Memória para cada executor (ajuste conforme a carga)
    .config("spark.executor.cores", "8")  # Núcleos por executor
    .config("spark.cores.max", "24")  # Total de núcleos disponíveis
    .config("spark.dynamicAllocation.enabled", "true")
    .config("spark.dynamicAllocation.minExecutors", "2")
    .config("spark.dynamicAllocation.maxExecutors", "10")
    .config("spark.dynamicAllocation.initialExecutors", "4")
    .config("spark.default.parallelism", "24")  # Nível de paralelismo
    .config("spark.memory.fraction", "0.8")  # Memória usada para armazenamento e execução
    .config("spark.memory.storageFraction", "0.5")  # Memória usada para armazenamento
    .config("spark.jars", "/opt/airflow/jars/postgresql-42.7.4.jar")
    .getOrCreate()
)


spark

/home/airflow/.local/lib/python3.8/site-packages/pyspark/bin/load-spark-env.sh: line 68: ps: command not found


25/04/06 15:10:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


In [3]:
import pandas as pd

import yfinance as yf
from yfinance import Ticker
#api yahoo
from yahooquery import Screener, Ticker


#criar timestamps e automatizar a safra de tempo da análise
# apagar depois que tiver usando a api do spark sql
from datetime import datetime, timedelta

from pyspark.sql.functions import col, lit, when, lag, current_timestamp , date_format, from_utc_timestamp
from pyspark.sql.types import StructType, StructField, StringType, FloatType, LongType, DateType, DoubleType,IntegerType
from pyspark.sql.window import Window


import psycopg2
from psycopg2 import OperationalError
import sys


In [4]:
def obter_empresas_ativas():
    screener = Screener()
    dados = screener.get_screeners('most_actives', count=200)
    # print(dados)  # Linha de depuração para inspecionar a estrutura dos dados retornados
    empresas = dados['most_actives']['quotes']
    
    # Criar um DataFrame a partir dos dados
    df = spark.createDataFrame(empresas)
    
    # Colunas para corresponder ao site
    colunas = [
        'symbol', 'shortName', 'displayName', 'regularMarketPrice', 'regularMarketChange', 
        'regularMarketChangePercent', 'regularMarketVolume', 'marketCap', 
        'fullExchangeName', 'quoteSourceName'
    ]
    df = df.select(*colunas)
    
    # Renomear colunas para português
    df = df.withColumnRenamed('symbol', 'ticker') \
           .withColumnRenamed('shortName', 'nome_curto') \
           .withColumnRenamed('displayName', 'nome_exibicao') \
           .withColumnRenamed('regularMarketPrice', 'preco_mercado_regular') \
           .withColumnRenamed('regularMarketChange', 'mudanca_mercado_regular') \
           .withColumnRenamed('regularMarketChangePercent', 'mudanca_percentual_mercado_regular') \
           .withColumnRenamed('regularMarketVolume', 'volume_mercado_regular') \
           .withColumnRenamed('marketCap', 'capitalizacao_mercado') \
           .withColumnRenamed('fullExchangeName', 'nome_exchange_completa') \
           .withColumnRenamed('quoteSourceName', 'nome_fonte_cotacao')
    
    # Adicionar coluna com data e hora atual
    df = df.withColumn('dt_ptcao', date_format(current_timestamp(), 'yyyy-MM-dd'))
    df = df.withColumn('dthr_igtao', current_timestamp())
    
    # Garantir que 'ticker' não tenha valores nulos
    df = df.withColumn('ticker', col('ticker').cast('string'))
    df = df.dropna(subset=['ticker'])

    # Ordenar por capitalizacaoMercado
    df = df.orderBy(col('capitalizacao_mercado').desc())
    
    return df

In [5]:
def obter_dados_historicos(symbols, start_date, end_date):
    dados = {}
    for symbol in symbols:
        ticker = yf.Ticker(symbol)
        historico = ticker.history(start=start_date, end=end_date, interval='1mo')
        dados[symbol] = historico
    
    return dados

In [6]:

from concurrent.futures import ThreadPoolExecutor, as_completed

def obter_dados_historicos_ticker(symbol, start_date, end_date):
    ticker = yf.Ticker(symbol)
    historico = ticker.history(start=start_date, end=end_date, interval='1mo')
    return (symbol, historico)

def obter_dados_historicos(symbols, start_date, end_date):
    dados = {}
    
    # Usar ThreadPoolExecutor para executar as solicitações em paralelo
    with ThreadPoolExecutor(max_workers=10) as executor:
        # Submit tarefas para o executor
        futuros = [executor.submit(obter_dados_historicos_ticker, symbol, start_date, end_date) for symbol in symbols]
        
        # Coletar os resultados conforme as tarefas são concluídas
        for futuro in as_completed(futuros):
            symbol, historico = futuro.result()
            dados[symbol] = historico
    
    return dados

In [7]:
def retorno_mensal(dados):
    # Inicializar uma lista vazia para armazenar dados estruturados
    dados_estruturados = []
    
    # Iterar sobre os dados históricos de cada símbolo
    for symbol, df in dados.items():
        # Converter DataFrame do Pandas para PySpark
        df['symbol'] = symbol

        df_spark = spark.createDataFrame(df.reset_index())
        
        
        # Renomear colunas para português
        df_spark = df_spark.withColumnRenamed('symbol', 'ticker') \
                        .withColumnRenamed('Date', 'data') \
                        .withColumnRenamed('Open', 'abertura') \
                        .withColumnRenamed('High', 'alta') \
                        .withColumnRenamed('Low', 'baixa') \
                        .withColumnRenamed('Close', 'fechamento') \
                        .withColumnRenamed('Volume', 'volume') \
                        .withColumnRenamed('Dividends', 'dividendos') \
                        .withColumnRenamed('Stock Splits', 'desdobramentos')

        janela = Window.partitionBy('ticker').orderBy('Data')


        # Calcular preço de fechamento do mês anterior (deslocar uma linha para cima)
        df_spark = df_spark.withColumn('fechamento_mes_anterior', lag('fechamento').over(janela))

        # Calcular Retorno em valor (diferença absoluta)
        df_spark = df_spark.withColumn('valor_retorno', col('fechamento') - col('fechamento_mes_anterior'))

        # Calcular Retorno em porcentagem
        df_spark = df_spark.withColumn('porcentagem_retorno', (col('valor_retorno') / col('fechamento_mes_anterior')) * 100)

        # Adicionar coluna com data atual no formato desejado
        df_spark = df_spark.withColumn('dt_ptcao', date_format(current_timestamp(), 'yyyy-MM-dd'))
        df_spark = df_spark.withColumn('dthr_igtao', current_timestamp())

        # Reordenar colunas
        df_spark = df_spark.select(
            'ticker',               # 'symbol' traduzido para 'ticker'
            'data',                 # 'date' traduzido para 'Data'
            'abertura',             # 'open' traduzido para 'Abertura'
            'alta',                 # 'high' traduzido para 'Alta'
            'baixa',                # 'low' traduzido para 'Baixa'
            'fechamento',           # 'close' traduzido para 'Fechamento'
            'volume',               # 'volume' mantido como 'Volume'
            'dividendos',           # 'dividends' traduzido para 'Dividendos'
            'desdobramentos',       # 'splits' traduzido para 'Desdobramentos'
            'fechamento_mes_anterior', # 'Close_Last_Month' traduzido para 'Fechamento_Mes_Anterior'
            'valor_retorno',        # 'Return_Value' traduzido para 'Valor_Retorno'
            'porcentagem_retorno',  # 'Return_Percentage' traduzido para 'Porcentagem_Retorno'
            'dt_ptcao',             # 'dt_ptcao' mantido como está
            'dthr_igtao'            # 'DTHR_IGTAO' mantido como está
        )
        
        # Adicionar o DataFrame à lista de dados estruturados
        dados_estruturados.append(df_spark)

    # Unir todos os DataFrames em um único DataFrame
    df_final = dados_estruturados[0]
    for df in dados_estruturados[1:]:
        df_final = df_final.union(df)
    
    return df_final

# Tabela fato -  maiores empresas segundo a api do yahoo finance

In [8]:
df_ativas = obter_empresas_ativas()

df_ativas.show()
df_ativas.printSchema()

25/04/06 15:10:25 WARN SparkStringUtils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.


[Stage 0:>                                                         (0 + 8) / 24]



+------+--------------------+--------------------+---------------------+-----------------------+----------------------------------+----------------------+---------------------+----------------------+--------------------+----------+--------------------+
|ticker|          nome_curto|       nome_exibicao|preco_mercado_regular|mudanca_mercado_regular|mudanca_percentual_mercado_regular|volume_mercado_regular|capitalizacao_mercado|nome_exchange_completa|  nome_fonte_cotacao|  dt_ptcao|          dthr_igtao|
+------+--------------------+--------------------+---------------------+-----------------------+----------------------------------+----------------------+---------------------+----------------------+--------------------+----------+--------------------+
|  AAPL|          Apple Inc.|               Apple|               188.38|             -14.809998|                        -7.2887435|             124921508|        2829863354368|              NasdaqGS|Nasdaq Real Time ...|2025-04-06|2025-04-06

# Dimensão - Retornos mensais 10 anos

In [9]:
# 10 anos passados
start_date = datetime.today() - timedelta(days=10*365)

# hoje
end_date = datetime.today()

# 'YYYY-MM-DD'
start_date_str = start_date.strftime('%Y-%m-%d')
end_date_str = end_date.strftime('%Y-%m-%d')

print(f"start_date: {start_date_str}")
print(f"end_date: {end_date_str}")


start_date: 2015-04-09
end_date: 2025-04-06


In [10]:
# Selecionar a coluna 'symbol' e coletar os valores como uma lista
#100 maiores para dimensão das 100 maiores e retornos
symbol_list = df_ativas.select('ticker').rdd.flatMap(lambda x: x).collect()

# Converter a lista para uma tupla
df_tickers = tuple(symbol_list)

print(df_tickers)


('AAPL', 'MSFT', 'NVDA', 'AMZN', 'GOOG', 'GOOGL', 'META', 'TSLA', 'TSM', 'AVGO', 'WMT', 'JPM', 'XOM', 'BABA', 'KO', 'BAC', 'CVX', 'CSCO', 'MRK', 'WFC', 'T', 'VZ', 'GE', 'PLTR', 'MS', 'DIS', 'PDD', 'QCOM', 'AMD', 'NEE', 'UBER', 'BSX', 'MUFG', 'PFE', 'CMCSA', 'SCHW', 'BMY', 'C', 'BA', 'SHOP', 'MO', 'SBUX', 'INTC', 'PBR-A', 'PBR', 'KKR', 'NKE', 'BP', 'ANET', 'MSTR', 'LRCX', 'APP', 'MU', 'APH', 'INFY', 'EPD', 'WMB', 'CMG', 'SE', 'JD', 'PYPL', 'USB', 'ITUB', 'KMI', 'ET', 'LYG', 'CSX', 'BCS', 'DELL', 'NEM', 'SLB', 'HLN', 'NU', 'TFC', 'GM', 'TGT', 'MRVL', 'KVUE', 'FCX', 'VALE', 'ABEV', 'OXY', 'F', 'PCG', 'KHC', 'BKR', 'VST', 'GOLD', 'RKT', 'HOOD', 'STLA', 'GEHC', 'EQT', 'NOK', 'ERIC', 'OWL', 'DAL', 'VOD', 'TTD', 'VRT', 'BBD', 'CCL', 'HPQ', 'CVE', 'XPEV', 'DOW', 'WBD', 'ASX', 'MCHP', 'CTRA', 'DVN', 'UAL', 'HBAN', 'SMCI', 'PINS', 'TEVA', 'HAL', 'UMC', 'RF', 'HPE', 'DKNG', 'LUV', 'GRAB', 'KEY', 'KGC', 'ITCI', 'ARCC', 'SNAP', 'CNH', 'AMCR', 'ONON', 'RIVN', 'YMM', 'AFRM', 'NLY', 'WDC', 'GME', 'SOF

In [11]:
historical_data = obter_dados_historicos(df_tickers, start_date_str, end_date_str)
df_retorno_mensal = retorno_mensal(historical_data)

# Mostrar o DataFrame final
df_retorno_mensal.show()

[Stage 6:=>  (9 + 8) / 24][Stage 7:>   (0 + 0) / 24][Stage 8:>   (0 + 0) / 24]

[Stage 6:==>(21 + 3) / 24][Stage 7:>   (0 + 5) / 24][Stage 8:>   (0 + 0) / 24]

[Stage 7:==>(21 + 3) / 24][Stage 8:>   (0 + 5) / 24][Stage 9:>   (0 + 0) / 24][Stage 8:==>(19 + 5) / 24][Stage 9:>   (0 + 3) / 24][Stage 10:>  (0 + 0) / 24]

[Stage 9:==>(20 + 4) / 24][Stage 10:>  (0 + 4) / 24][Stage 11:>  (0 + 0) / 24][Stage 10:>  (6 + 9) / 24][Stage 11:>  (0 + 0) / 24][Stage 12:>  (0 + 0) / 24]

[Stage 10:> (10 + 8) / 24][Stage 11:>  (0 + 0) / 24][Stage 12:>  (0 + 0) / 24]

[Stage 10:=>(21 + 3) / 24][Stage 11:>  (1 + 6) / 24][Stage 12:>  (0 + 0) / 24]

[Stage 11:=>(17 + 7) / 24][Stage 12:>  (0 + 1) / 24][Stage 13:>  (0 + 0) / 24][Stage 12:>  (6 + 8) / 24][Stage 13:>  (0 + 0) / 24][Stage 14:>  (0 + 0) / 24]

[Stage 12:=>(19 + 5) / 24][Stage 13:>  (0 + 3) / 24][Stage 14:>  (0 + 0) / 24][Stage 13:=>(14 + 8) / 24][Stage 14:>  (0 + 0) / 24][Stage 15:>  (0 + 0) / 24]

[Stage 14:=> (9 + 8) / 24][Stage 15:>  (0 + 0) / 24][Stage 16:>  (0 + 0) / 24]

[Stage 15:> (7 + 10) / 24][Stage 16:>  (0 + 0) / 24][Stage 17:>  (0 + 0) / 24][Stage 16:> (10 + 8) / 24][Stage 17:>  (0 + 0) / 24][Stage 18:>  (0 + 0) / 24]

[Stage 17:=> (8 + 9) / 24][Stage 18:>  (0 + 0) / 24][Stage 19:>  (0 + 0) / 24]

[Stage 18:>  (2 + 8) / 24][Stage 19:>  (0 + 0) / 24][Stage 20:>  (0 + 0) / 24][Stage 19:>  (4 + 8) / 24][Stage 20:>  (0 + 0) / 24][Stage 21:>  (0 + 0) / 24]

[Stage 20:>  (7 + 8) / 24][Stage 21:>  (0 + 0) / 24][Stage 22:>  (0 + 0) / 24]

[Stage 21:>  (4 + 9) / 24][Stage 22:>  (0 + 0) / 24][Stage 23:>  (0 + 0) / 24][Stage 22:> (3 + 10) / 24][Stage 23:>  (0 + 0) / 24][Stage 24:>  (0 + 0) / 24]

[Stage 23:>  (1 + 8) / 24][Stage 24:>  (0 + 0) / 24][Stage 25:>  (0 + 0) / 24][Stage 24:>  (5 + 8) / 24][Stage 25:>  (0 + 0) / 24][Stage 26:>  (0 + 0) / 24]

[Stage 25:>  (5 + 8) / 24][Stage 26:>  (0 + 0) / 24][Stage 27:>  (0 + 0) / 24][Stage 26:=>(12 + 8) / 24][Stage 27:>  (0 + 0) / 24][Stage 28:>  (0 + 0) / 24]

[Stage 27:=> (8 + 8) / 24][Stage 28:>  (0 + 0) / 24][Stage 29:>  (0 + 0) / 24][Stage 28:> (10 + 8) / 24][Stage 29:>  (0 + 0) / 24][Stage 30:>  (0 + 0) / 24]

[Stage 29:=>(14 + 8) / 24][Stage 30:>  (0 + 0) / 24][Stage 31:>  (0 + 0) / 24][Stage 30:=>(20 + 4) / 24][Stage 31:>  (0 + 4) / 24][Stage 32:>  (0 + 0) / 24]

[Stage 31:=>(19 + 5) / 24][Stage 32:>  (0 + 4) / 24][Stage 33:>  (0 + 0) / 24][Stage 33:>  (2 + 8) / 24][Stage 34:>  (0 + 0) / 24][Stage 35:>  (0 + 0) / 24]

[Stage 33:=>(21 + 3) / 24][Stage 34:>  (0 + 6) / 24][Stage 35:>  (0 + 0) / 24][Stage 34:=>(16 + 8) / 24][Stage 35:>  (0 + 0) / 24][Stage 36:>  (0 + 0) / 24]

[Stage 35:=>(20 + 4) / 24][Stage 36:>  (0 + 4) / 24][Stage 37:>  (0 + 0) / 24][Stage 36:=>(20 + 4) / 24][Stage 37:>  (0 + 4) / 24][Stage 38:>  (0 + 0) / 24]

[Stage 37:=>(23 + 1) / 24][Stage 38:>  (0 + 7) / 24][Stage 39:>  (0 + 0) / 24][Stage 39:>  (2 + 8) / 24][Stage 40:>  (0 + 0) / 24][Stage 41:>  (0 + 0) / 24]

[Stage 40:>  (2 + 8) / 24][Stage 41:>  (0 + 0) / 24][Stage 42:>  (0 + 0) / 24]

[Stage 41:>  (4 + 9) / 24][Stage 42:>  (0 + 0) / 24][Stage 43:>  (0 + 0) / 24][Stage 42:>  (3 + 8) / 24][Stage 43:>  (0 + 0) / 24][Stage 44:>  (0 + 0) / 24]

[Stage 42:=>(23 + 1) / 24][Stage 43:>  (0 + 7) / 24][Stage 44:>  (0 + 0) / 24][Stage 43:=>(23 + 1) / 24][Stage 44:>  (0 + 7) / 24][Stage 45:>  (0 + 0) / 24]

[Stage 45:>  (5 + 8) / 24][Stage 46:>  (0 + 0) / 24][Stage 47:>  (0 + 0) / 24]

[Stage 46:>  (5 + 8) / 24][Stage 47:>  (0 + 0) / 24][Stage 48:>  (0 + 0) / 24][Stage 47:>  (0 + 8) / 24][Stage 48:>  (0 + 0) / 24][Stage 49:>  (0 + 0) / 24]

[Stage 47:=>(21 + 3) / 24][Stage 48:>  (0 + 5) / 24][Stage 49:>  (0 + 0) / 24]

[Stage 48:=>(22 + 2) / 24][Stage 49:>  (1 + 6) / 24][Stage 50:>  (0 + 0) / 24]

[Stage 50:>  (2 + 8) / 24][Stage 51:>  (0 + 0) / 24][Stage 52:>  (0 + 0) / 24]

[Stage 51:=> (9 + 8) / 24][Stage 52:>  (0 + 0) / 24][Stage 53:>  (0 + 0) / 24]

[Stage 52:=> (9 + 8) / 24][Stage 53:>  (0 + 0) / 24][Stage 54:>  (0 + 0) / 24][Stage 53:=> (8 + 8) / 24][Stage 54:>  (0 + 0) / 24][Stage 55:>  (0 + 0) / 24]

[Stage 54:> (10 + 9) / 24][Stage 55:>  (0 + 0) / 24][Stage 56:>  (0 + 0) / 24][Stage 55:=>(13 + 8) / 24][Stage 56:>  (0 + 0) / 24][Stage 57:>  (0 + 0) / 24]

[Stage 56:=>(17 + 7) / 24][Stage 57:>  (0 + 1) / 24][Stage 58:>  (0 + 0) / 24][Stage 57:=>(21 + 3) / 24][Stage 58:>  (0 + 5) / 24][Stage 59:>  (0 + 0) / 24]

[Stage 59:> (0 + 10) / 24][Stage 60:>  (0 + 0) / 24][Stage 61:>  (0 + 0) / 24][Stage 60:>  (5 + 8) / 24][Stage 61:>  (0 + 0) / 24][Stage 62:>  (0 + 0) / 24]

[Stage 61:> (10 + 8) / 24][Stage 62:>  (0 + 0) / 24][Stage 63:>  (0 + 0) / 24][Stage 62:=>(17 + 7) / 24][Stage 63:>  (0 + 1) / 24][Stage 64:>  (0 + 0) / 24]

[Stage 63:=>(20 + 4) / 24][Stage 64:>  (0 + 4) / 24][Stage 65:>  (0 + 0) / 24][Stage 65:>  (2 + 8) / 24][Stage 66:>  (0 + 0) / 24][Stage 67:>  (0 + 0) / 24]

[Stage 66:=> (9 + 8) / 24][Stage 67:>  (0 + 0) / 24][Stage 68:>  (0 + 0) / 24][Stage 67:> (10 + 8) / 24][Stage 68:>  (0 + 0) / 24][Stage 69:>  (0 + 0) / 24]

[Stage 68:=>(17 + 7) / 24][Stage 69:>  (0 + 1) / 24][Stage 70:>  (0 + 0) / 24][Stage 69:=>(20 + 4) / 24][Stage 70:>  (0 + 4) / 24][Stage 71:>  (0 + 0) / 24]

[Stage 71:>  (1 + 8) / 24][Stage 72:>  (0 + 0) / 24][Stage 73:>  (0 + 0) / 24][Stage 72:>  (4 + 8) / 24][Stage 73:>  (0 + 0) / 24][Stage 74:>  (0 + 0) / 24]

[Stage 73:>  (4 + 9) / 24][Stage 74:>  (0 + 0) / 24][Stage 75:>  (0 + 0) / 24][Stage 74:=>(16 + 8) / 24][Stage 75:>  (0 + 0) / 24][Stage 76:>  (0 + 0) / 24]

[Stage 75:=>(21 + 3) / 24][Stage 76:>  (0 + 5) / 24][Stage 77:>  (0 + 0) / 24][Stage 77:>  (1 + 8) / 24][Stage 78:>  (0 + 0) / 24][Stage 79:>  (0 + 0) / 24]

[Stage 78:>  (7 + 8) / 24][Stage 79:>  (0 + 0) / 24][Stage 80:>  (0 + 0) / 24][Stage 79:=>(12 + 8) / 24][Stage 80:>  (0 + 0) / 24][Stage 81:>  (0 + 0) / 24]

[Stage 80:=>(17 + 7) / 24][Stage 81:>  (0 + 1) / 24][Stage 82:>  (0 + 0) / 24][Stage 81:=>(23 + 1) / 24][Stage 82:>  (0 + 7) / 24][Stage 83:>  (0 + 0) / 24]

[Stage 83:>  (6 + 8) / 24][Stage 84:>  (0 + 0) / 24][Stage 85:>  (0 + 0) / 24][Stage 84:> (10 + 8) / 24][Stage 85:>  (0 + 0) / 24][Stage 86:>  (0 + 0) / 24]

[Stage 85:=>(15 + 8) / 24][Stage 86:>  (0 + 0) / 24][Stage 87:>  (0 + 0) / 24]

[Stage 86:=>(15 + 8) / 24][Stage 87:>  (0 + 0) / 24][Stage 88:>  (0 + 0) / 24][Stage 87:=>(23 + 1) / 24][Stage 88:>  (0 + 7) / 24][Stage 89:>  (0 + 0) / 24]

[Stage 89:>  (2 + 8) / 24][Stage 90:>  (0 + 0) / 24][Stage 91:>  (0 + 0) / 24][Stage 90:> (10 + 8) / 24][Stage 91:>  (0 + 0) / 24][Stage 92:>  (0 + 0) / 24]

[Stage 91:=>(16 + 8) / 24][Stage 92:>  (0 + 0) / 24][Stage 93:>  (0 + 0) / 24][Stage 92:=>(22 + 2) / 24][Stage 93:>  (0 + 6) / 24][Stage 94:>  (0 + 0) / 24]

[Stage 93:=>(23 + 1) / 24][Stage 94:>  (0 + 7) / 24][Stage 95:>  (0 + 0) / 24][Stage 95:>  (5 + 8) / 24][Stage 96:>  (0 + 0) / 24][Stage 97:>  (0 + 0) / 24]

[Stage 96:=>(12 + 8) / 24][Stage 97:>  (0 + 0) / 24][Stage 98:>  (0 + 0) / 24][Stage 97:=>(16 + 8) / 24][Stage 98:>  (0 + 0) / 24][Stage 99:>  (0 + 0) / 24]

[Stage 99:>  (0 + 8) / 24][Stage 100:> (0 + 0) / 24][Stage 101:> (0 + 0) / 24][Stage 100:> (2 + 8) / 24][Stage 101:> (0 + 0) / 24][Stage 102:> (0 + 0) / 24]

[Stage 101:> (8 + 8) / 24][Stage 102:> (0 + 0) / 24][Stage 103:> (0 + 0) / 24][Stage 102:>(12 + 8) / 24][Stage 103:> (0 + 0) / 24][Stage 104:> (0 + 0) / 24]

[Stage 103:>(14 + 8) / 24][Stage 104:> (0 + 0) / 24][Stage 105:> (0 + 0) / 24][Stage 104:>(20 + 4) / 24][Stage 105:> (0 + 5) / 24][Stage 106:> (0 + 0) / 24]

[Stage 106:> (1 + 9) / 24][Stage 107:> (0 + 0) / 24][Stage 108:> (0 + 0) / 24][Stage 107:> (8 + 8) / 24][Stage 108:> (0 + 0) / 24][Stage 109:> (0 + 0) / 24]

[Stage 108:>(12 + 8) / 24][Stage 109:> (0 + 0) / 24][Stage 110:> (0 + 0) / 24][Stage 109:>(16 + 8) / 24][Stage 110:> (0 + 0) / 24][Stage 111:> (0 + 0) / 24]

[Stage 110:>(20 + 4) / 24][Stage 111:> (0 + 5) / 24][Stage 112:> (0 + 0) / 24][Stage 112:> (1 + 8) / 24][Stage 113:> (0 + 0) / 24][Stage 114:> (0 + 0) / 24]

[Stage 113:> (7 + 8) / 24][Stage 114:> (0 + 0) / 24][Stage 115:> (0 + 0) / 24]

[Stage 114:>(15 + 8) / 24][Stage 115:> (0 + 0) / 24][Stage 116:> (0 + 0) / 24][Stage 115:>(23 + 1) / 24][Stage 116:> (0 + 7) / 24][Stage 117:> (0 + 0) / 24]

[Stage 117:> (8 + 8) / 24][Stage 118:> (0 + 0) / 24][Stage 119:> (0 + 0) / 24]

[Stage 118:(13 + 10) / 24][Stage 119:> (0 + 0) / 24][Stage 120:> (0 + 0) / 24][Stage 120:> (0 + 8) / 24][Stage 121:> (0 + 0) / 24][Stage 122:> (0 + 0) / 24]

[Stage 121:> (5 + 8) / 24][Stage 122:> (0 + 0) / 24][Stage 123:> (0 + 0) / 24][Stage 122:> (8 + 8) / 24][Stage 123:> (0 + 0) / 24][Stage 124:> (0 + 0) / 24]

[Stage 123:>(13 + 8) / 24][Stage 124:> (0 + 0) / 24][Stage 125:> (0 + 0) / 24][Stage 124:>(20 + 4) / 24][Stage 125:> (0 + 4) / 24][Stage 126:> (0 + 0) / 24]

[Stage 125:>(22 + 2) / 24][Stage 126:> (1 + 6) / 24][Stage 127:> (0 + 0) / 24][Stage 127:> (5 + 8) / 24][Stage 128:> (0 + 0) / 24][Stage 129:> (0 + 0) / 24]

[Stage 128:>(10 + 8) / 24][Stage 129:> (0 + 0) / 24][Stage 130:> (0 + 0) / 24][Stage 129:>(14 + 8) / 24][Stage 130:> (0 + 0) / 24][Stage 131:> (0 + 0) / 24]

[Stage 130:>(19 + 5) / 24][Stage 131:> (0 + 3) / 24][Stage 132:> (0 + 0) / 24][Stage 132:> (2 + 8) / 24][Stage 133:> (0 + 0) / 24][Stage 134:> (0 + 0) / 24]

[Stage 133:> (8 + 8) / 24][Stage 134:> (0 + 0) / 24][Stage 135:> (0 + 0) / 24][Stage 134:>(16 + 8) / 24][Stage 135:> (0 + 0) / 24][Stage 136:> (0 + 0) / 24]

[Stage 135:>(21 + 3) / 24][Stage 136:> (0 + 5) / 24][Stage 137:> (0 + 0) / 24][Stage 137:> (7 + 9) / 24][Stage 138:> (0 + 0) / 24][Stage 139:> (0 + 0) / 24]

[Stage 138:>(12 + 8) / 24][Stage 139:> (0 + 0) / 24][Stage 140:> (0 + 0) / 24][Stage 139:>(17 + 7) / 24][Stage 140:> (0 + 1) / 24][Stage 141:> (0 + 0) / 24]

[Stage 140:>(21 + 3) / 24][Stage 141:> (0 + 5) / 24][Stage 142:> (0 + 0) / 24][Stage 142:> (3 + 8) / 24][Stage 143:> (0 + 0) / 24][Stage 144:> (0 + 0) / 24]

[Stage 143:> (7 + 8) / 24][Stage 144:> (0 + 0) / 24][Stage 145:> (0 + 0) / 24][Stage 144:> (9 + 8) / 24][Stage 145:> (0 + 0) / 24][Stage 146:> (0 + 0) / 24]

[Stage 145:>(14 + 8) / 24][Stage 146:> (0 + 0) / 24][Stage 147:> (0 + 0) / 24][Stage 146:>(22 + 2) / 24][Stage 147:> (0 + 6) / 24][Stage 148:> (0 + 0) / 24]

[Stage 148:> (0 + 8) / 24][Stage 149:> (0 + 0) / 24][Stage 150:> (0 + 0) / 24][Stage 149:> (7 + 8) / 24][Stage 150:> (0 + 0) / 24][Stage 151:> (0 + 0) / 24]

[Stage 150:>(11 + 8) / 24][Stage 151:> (0 + 0) / 24][Stage 152:> (0 + 0) / 24][Stage 151:>(11 + 8) / 24][Stage 152:> (0 + 0) / 24][Stage 153:> (0 + 0) / 24]

[Stage 152:>(15 + 8) / 24][Stage 153:> (0 + 0) / 24][Stage 154:> (0 + 0) / 24][Stage 153:>(19 + 5) / 24][Stage 154:> (0 + 3) / 24][Stage 155:> (0 + 0) / 24]

[Stage 154:>(20 + 4) / 24][Stage 155:> (0 + 6) / 24][Stage 156:> (0 + 0) / 24][Stage 156:> (3 + 8) / 24][Stage 157:> (0 + 0) / 24][Stage 158:> (0 + 0) / 24]

[Stage 157:>(10 + 8) / 24][Stage 158:> (0 + 0) / 24][Stage 159:> (0 + 0) / 24][Stage 158:>(15 + 8) / 24][Stage 159:> (0 + 0) / 24][Stage 160:> (0 + 0) / 24]

[Stage 159:>(20 + 4) / 24][Stage 160:> (0 + 4) / 24][Stage 161:> (0 + 0) / 24][Stage 160:=(24 + 0) / 24][Stage 161:> (0 + 8) / 24][Stage 162:> (0 + 0) / 24]

[Stage 162:> (7 + 8) / 24][Stage 163:> (0 + 0) / 24][Stage 164:> (0 + 0) / 24][Stage 163:>(15 + 8) / 24][Stage 164:> (0 + 0) / 24][Stage 165:> (0 + 0) / 24]

[Stage 164:>(23 + 1) / 24][Stage 165:> (0 + 7) / 24][Stage 166:> (0 + 0) / 24][Stage 165:>(23 + 1) / 24][Stage 166:> (0 + 7) / 24][Stage 167:> (0 + 0) / 24]

[Stage 167:> (7 + 8) / 24][Stage 168:> (0 + 0) / 24][Stage 169:> (0 + 0) / 24][Stage 168:>(10 + 9) / 24][Stage 169:> (0 + 0) / 24][Stage 170:> (0 + 0) / 24]

[Stage 169:>(17 + 7) / 24][Stage 170:> (0 + 2) / 24][Stage 171:> (0 + 0) / 24][Stage 171:> (1 + 8) / 24][Stage 172:> (0 + 0) / 24][Stage 173:> (0 + 0) / 24]

[Stage 172:> (9 + 8) / 24][Stage 173:> (0 + 0) / 24][Stage 174:> (0 + 0) / 24]

[Stage 173:>(17 + 7) / 24][Stage 174:> (0 + 1) / 24][Stage 175:> (0 + 0) / 24][Stage 175:> (1 + 8) / 24][Stage 176:> (0 + 0) / 24][Stage 177:> (0 + 0) / 24]

[Stage 176:> (7 + 8) / 24][Stage 177:> (0 + 0) / 24][Stage 178:> (0 + 0) / 24][Stage 177:>(15 + 8) / 24][Stage 178:> (0 + 0) / 24][Stage 179:> (0 + 0) / 24]

[Stage 178:>(22 + 2) / 24][Stage 179:> (0 + 6) / 24][Stage 180:> (0 + 0) / 24][Stage 180:> (1 + 8) / 24][Stage 181:> (0 + 0) / 24][Stage 182:> (0 + 0) / 24]

[Stage 181:> (7 + 8) / 24][Stage 182:> (0 + 0) / 24][Stage 183:> (0 + 0) / 24][Stage 182:> (8 + 8) / 24][Stage 183:> (0 + 0) / 24][Stage 184:> (0 + 0) / 24]

[Stage 183:>(13 + 8) / 24][Stage 184:> (0 + 0) / 24][Stage 185:> (0 + 0) / 24][Stage 184:>(17 + 7) / 24][Stage 185:> (0 + 1) / 24][Stage 186:> (0 + 0) / 24]

[Stage 185:>(23 + 1) / 24][Stage 186:> (0 + 7) / 24][Stage 187:> (0 + 0) / 24][Stage 187:> (1 + 8) / 24][Stage 188:> (0 + 0) / 24][Stage 189:> (0 + 0) / 24]

[Stage 188:> (3 + 8) / 24][Stage 189:> (0 + 0) / 24][Stage 190:> (0 + 0) / 24][Stage 189:>(10 + 8) / 24][Stage 190:> (0 + 0) / 24][Stage 191:> (0 + 0) / 24]

[Stage 190:>(16 + 8) / 24][Stage 191:> (0 + 0) / 24][Stage 192:> (0 + 0) / 24][Stage 191:>(18 + 6) / 24][Stage 192:> (0 + 2) / 24][Stage 193:> (0 + 0) / 24]

[Stage 192:>(19 + 5) / 24][Stage 193:> (0 + 3) / 24][Stage 194:> (0 + 0) / 24][Stage 193:>(15 + 8) / 24][Stage 194:> (0 + 0) / 24][Stage 195:> (0 + 0) / 24]

[Stage 194:>(18 + 6) / 24][Stage 195:> (0 + 2) / 24][Stage 196:> (0 + 0) / 24]

[Stage 195:>(23 + 1) / 24][Stage 196:> (1 + 7) / 24][Stage 197:> (0 + 0) / 24][Stage 196:>(18 + 6) / 24][Stage 197:> (0 + 4) / 24][Stage 198:> (0 + 0) / 24]

[Stage 197:>(23 + 1) / 24][Stage 198:> (0 + 7) / 24][Stage 199:> (0 + 0) / 24][Stage 199:> (4 + 8) / 24][Stage 200:> (0 + 0) / 24][Stage 201:> (0 + 0) / 24]

[Stage 200:> (3 + 8) / 24][Stage 201:> (0 + 0) / 24][Stage 202:> (0 + 0) / 24][Stage 201:> (7 + 8) / 24][Stage 202:> (0 + 0) / 24][Stage 203:> (0 + 0) / 24]





25/04/06 15:11:32 WARN DAGScheduler: Broadcasting large task binary with size 3.8 MiB
                                                                                

+------+-------------------+------------------+------------------+------------------+------------------+----------+----------+--------------+-----------------------+--------------------+--------------------+----------+--------------------+
|ticker|               data|          abertura|              alta|             baixa|        fechamento|    volume|dividendos|desdobramentos|fechamento_mes_anterior|       valor_retorno| porcentagem_retorno|  dt_ptcao|          dthr_igtao|
+------+-------------------+------------------+------------------+------------------+------------------+----------+----------+--------------+-----------------------+--------------------+--------------------+----------+--------------------+
|  AMZN|2015-05-01 01:00:00| 21.19099998474121|21.950000762939453|20.727500915527344| 21.46150016784668|1039660000|       0.0|           0.0|                   NULL|                NULL|                NULL|2025-04-06|2025-04-06 12:10:...|
|  AMZN|2015-06-01 01:00:00|21.520000457

# dimensões gerais 

2. **Criação dos DataFrames**:
   - **Tabela Geral** (`df_financeira`): Contém informações gerais da empresa, como setor, indústria, número de empregados, localização e resumo das atividades.
   - **Tabela Financeira** (`df_financeira`): Contém dados financeiros da empresa, como capitalização de mercado, receita, lucro líquido, EBITDA, dívida total, entre outros.
   - **Tabela de Mercado** (`df_mercado`): Inclui dados relacionados ao mercado, como preço atual, preço de abertura, volume de negociação, beta, entre outros.
   - **Tabela de Dividendos** (`df_dividendos`): Contém informações sobre dividendos, incluindo taxa de dividendos, data ex-dividendo e índice de distribuição.
   - **Tabela de Valuation** (`df_valuation`): Inclui dados de valuation da empresa, como índices P/E (Price to Earnings), P/B (Price to Book) e PEG (Price/Earnings to Growth).
   - **Tabela de Retorno Mensal** (`df_retorno_mensal`): Retorno mensal da ação com base em preço da ação, dividendos e percentual

In [12]:
from pyspark.sql.functions import from_unixtime, col

def criar_tabelas_spark(tickers):
    if isinstance(tickers, str):
        tickers = (tickers,)
    
    # Esquema para a tabela geral
    schema_geral = StructType([
        StructField('ticker', StringType(), False),
        StructField('setor', StringType(), True),
        StructField('industria', StringType(), True),
        StructField('funcionarios', IntegerType(), True),
        StructField('cidade', StringType(), True),
        StructField('estado', StringType(), True),
        StructField('pais', StringType(), True),
        StructField('website', StringType(), True),
        StructField('resumo_negocios', StringType(), True),
        StructField('exchange', StringType(), True)
    ])
    
    # Esquema para a tabela financeira
    schema_financeira = StructType([
        StructField('ticker', StringType(), False),
        StructField('capitalizacao_mercado', LongType(), True),
        StructField('valor_empresa', LongType(), True),
        StructField('receita', LongType(), True),
        StructField('lucros_brutos', LongType(), True),
        StructField('lucro_liquido', LongType(), True),
        StructField('ebitda', LongType(), True),
        StructField('divida_total', LongType(), True),
        StructField('caixa_total', LongType(), True),
        StructField('dividend_yield', DoubleType(), True)
    ])
    
    # Esquema para a tabela de mercado
    schema_mercado = StructType([
        StructField('ticker', StringType(), False),
        StructField('preco_atual', DoubleType(), True),
        StructField('fechamento_anterior', DoubleType(), True),
        StructField('abertura', DoubleType(), True),
        StructField('minimo_dia', DoubleType(), True),
        StructField('maximo_dia', DoubleType(), True),
        StructField('minimo_52_semanas', DoubleType(), True),
        StructField('maximo_52_semanas', DoubleType(), True),
        StructField('volume', LongType(), True),
        StructField('volume_medio', LongType(), True),
        StructField('beta', DoubleType(), True)
    ])
    
    # Esquema para a tabela de dividendos
    schema_dividendos = StructType([
        StructField('ticker', StringType(), False),
        StructField('taxa_dividendo', DoubleType(), True),
        StructField('data_exdividendo', StringType(), True),  # Temporariamente como StringType
        StructField('indice_distribuicao', DoubleType(), True)
    ])
    
    # Esquema para a tabela de valuation
    schema_valuation = StructType([
        StructField('ticker', StringType(), False),
        StructField('pl_futuro', DoubleType(), True),
        StructField('pl_retroativo', DoubleType(), True),
        StructField('preco_booking', DoubleType(), True),
        StructField('indice_preco_lucro_cresc', DoubleType(), True)
    ])
    
    # Inicializa as listas de dicionários para cada tabela
    geral = []
    financeira = []
    mercado = []
    dividendos = []
    valuation = []
    
    for ticker in tickers:
        try:
            empresa = yf.Ticker(ticker)
            info = empresa.info
            
            # Filtra e trata valores infinitos
            def safe_get(key, default=None):
                value = info.get(key)
                if isinstance(value, str) and value in ('Infinity', '-Infinity'):
                    return default
                return value
            
            # Preencher dados da tabela geral
            geral.append({
                'ticker': ticker,
                'setor': info.get('sector'),
                'industria': info.get('industry'),
                'funcionarios': info.get('fullTimeEmployees'),
                'cidade': info.get('city'),
                'estado': info.get('state'),
                'pais': info.get('country'),
                'website': info.get('website'),
                'resumo_negocios': info.get('longBusinessSummary'),
                'exchange': info.get('exchange')
            })
            
            # Preencher dados da tabela financeira
            financeira.append({
                'ticker': ticker,
                'capitalizacao_mercado': safe_get('marketCap', 0),
                'valor_empresa': safe_get('enterpriseValue', 0),
                'receita': safe_get('revenue', 0),
                'lucros_brutos': safe_get('grossProfits', 0),
                'lucro_liquido': safe_get('netIncome', 0),
                'ebitda': safe_get('ebitda', 0),
                'divida_total': safe_get('totalDebt', 0),
                'caixa_total': safe_get('totalCash', 0),
                'dividend_yield': safe_get('dividendYield', 0.0)
            })
            
            # Preencher dados da tabela de mercado
            mercado.append({
                'ticker': ticker,
                'preco_atual': safe_get('currentPrice', 0.0),
                'fechamento_anterior': safe_get('previousClose', 0.0),
                'abertura': safe_get('open', 0.0),
                'minimo_dia': safe_get('dayLow', 0.0),
                'maximo_dia': safe_get('dayHigh', 0.0),
                'minimo_52_semanas': safe_get('fiftyTwoWeekLow', 0.0),
                'maximo_52_semanas': safe_get('fiftyTwoWeekHigh', 0.0),
                'volume': safe_get('volume', 0),
                'volume_medio': safe_get('averageVolume', 0),
                'beta': safe_get('beta', 0.0)
            })
            
            # Preencher dados da tabela de dividendos
            dividendos.append({
                'ticker': ticker,
                'taxa_dividendo': safe_get('dividendRate', 0.0),
                'data_exdividendo': safe_get('exDividendDate'),  # Unix timestamp
                'indice_distribuicao': safe_get('payoutRatio', 0.0)
            })
            
            # Preencher dados da tabela de valuation
            valuation.append({
                'ticker': ticker,
                'pl_futuro': safe_get('forwardPE', 0.0),
                'pl_retroativo': safe_get('trailingPE', 0.0),
                'preco_booking': safe_get('priceToBook', 0.0),
                'indice_preco_lucro_cresc': safe_get('pegRatio', 0.0)
            })
            
        except Exception as e:
            print(f"Erro ao processar o ticker {ticker}: {e}")
    
    # Criar DataFrames Spark com esquema definido
    df_geral = spark.createDataFrame(geral, schema=schema_geral)
    df_financeira = spark.createDataFrame(financeira, schema=schema_financeira)
    df_mercado = spark.createDataFrame(mercado, schema=schema_mercado)
    df_dividendos = spark.createDataFrame(dividendos, schema=schema_dividendos)
    df_valuation = spark.createDataFrame(valuation, schema=schema_valuation)

    # Converter Unix timestamp para data legível
    df_dividendos = df_dividendos.withColumn('data_exdividendo', from_unixtime(col('data_exdividendo').cast('bigint')))
    
    # Adicionar colunas de timestamp
    df_geral = df_geral.withColumn('dt_ptcao', date_format(current_timestamp(), 'yyyy-MM-dd'))
    df_geral = df_geral.withColumn('dthr_igtao', current_timestamp())

    df_financeira = df_financeira.withColumn('dt_ptcao', date_format(current_timestamp(), 'yyyy-MM-dd'))
    df_financeira = df_financeira.withColumn('dthr_igtao', current_timestamp())

    df_mercado = df_mercado.withColumn('dt_ptcao', date_format(current_timestamp(), 'yyyy-MM-dd'))
    df_mercado = df_mercado.withColumn('dthr_igtao', current_timestamp())

    df_dividendos = df_dividendos.withColumn('dt_ptcao', date_format(current_timestamp(), 'yyyy-MM-dd'))
    df_dividendos = df_dividendos.withColumn('dthr_igtao', current_timestamp())

    df_valuation = df_valuation.withColumn('dt_ptcao', date_format(current_timestamp(), 'yyyy-MM-dd'))
    df_valuation = df_valuation.withColumn('dthr_igtao', current_timestamp())
    
    return df_geral, df_financeira, df_mercado, df_dividendos, df_valuation


In [13]:
df_geral, df_financeira, df_mercado, df_dividendos, df_valuation = criar_tabelas_spark(df_tickers)
# Exibir os DataFrames
df_geral.show()
df_financeira.show()
df_mercado.show()
df_dividendos.show()
df_valuation.show()

+------+--------------------+--------------------+------------+-------------+------+-------------+--------------------+--------------------+--------+----------+--------------------+
|ticker|               setor|           industria|funcionarios|       cidade|estado|         pais|             website|     resumo_negocios|exchange|  dt_ptcao|          dthr_igtao|
+------+--------------------+--------------------+------------+-------------+------+-------------+--------------------+--------------------+--------+----------+--------------------+
|  AAPL|          Technology|Consumer Electronics|      150000|    Cupertino|    CA|United States|https://www.apple...|Apple Inc. design...|     NMS|2025-04-06|2025-04-06 12:13:...|
|  MSFT|          Technology|Software - Infras...|      228000|      Redmond|    WA|United States|https://www.micro...|Microsoft Corpora...|     NMS|2025-04-06|2025-04-06 12:13:...|
|  NVDA|          Technology|      Semiconductors|       36000|  Santa Clara|    CA|United

+------+-----------+-------------------+--------+----------+----------+-----------------+-----------------+---------+------------+-----+----------+--------------------+
|ticker|preco_atual|fechamento_anterior|abertura|minimo_dia|maximo_dia|minimo_52_semanas|maximo_52_semanas|   volume|volume_medio| beta|  dt_ptcao|          dthr_igtao|
+------+-----------+-------------------+--------+----------+----------+-----------------+-----------------+---------+------------+-----+----------+--------------------+
|  AAPL|     188.38|             203.19| 193.925|   187.345|    199.88|           164.08|            260.1|124921508|    54722618|1.259|2025-04-06|2025-04-06 12:13:...|
|  MSFT|     359.84|             373.11| 364.125|    359.49|    374.59|           359.48|           468.35| 48685644|    23734163|  1.0|2025-04-06|2025-04-06 12:13:...|
|  NVDA|      94.31|              101.8|   98.94|     92.11|  100.1241|           75.606|           153.13|525608136|   286926860|1.958|2025-04-06|2025-04-

+------+---------+-------------+-------------+------------------------+----------+--------------------+
|ticker|pl_futuro|pl_retroativo|preco_booking|indice_preco_lucro_cresc|  dt_ptcao|          dthr_igtao|
+------+---------+-------------+-------------+------------------------+----------+--------------------+
|  AAPL|22.669073|    29.949127|     42.44705|                    NULL|2025-04-06|2025-04-06 12:13:...|
|  MSFT|24.069565|    28.995972|     8.838671|                    NULL|2025-04-06|2025-04-06 12:13:...|
|  NVDA|22.890778|    32.078228|    29.099043|                    NULL|2025-04-06|2025-04-06 12:13:...|
|  AMZN|27.804878|     30.97826|     6.334272|                    NULL|2025-04-06|2025-04-06 12:13:...|
|  GOOG|16.507263|    18.375622|     5.549546|                    NULL|2025-04-06|2025-04-06 12:13:...|
| GOOGL|    16.25|    18.109453|     5.469161|                    NULL|2025-04-06|2025-04-06 12:13:...|
|  META|19.949804|     21.14495|     7.002845|                  

In [14]:
df_retorno_mensal.printSchema()
df_ativas.printSchema()
df_geral.printSchema()
df_financeira.printSchema()
df_mercado.printSchema()
df_dividendos.printSchema()
df_valuation.printSchema()


root
 |-- ticker: string (nullable = true)
 |-- data: timestamp (nullable = true)
 |-- abertura: double (nullable = true)
 |-- alta: double (nullable = true)
 |-- baixa: double (nullable = true)
 |-- fechamento: double (nullable = true)
 |-- volume: long (nullable = true)
 |-- dividendos: double (nullable = true)
 |-- desdobramentos: double (nullable = true)
 |-- fechamento_mes_anterior: double (nullable = true)
 |-- valor_retorno: double (nullable = true)
 |-- porcentagem_retorno: double (nullable = true)
 |-- dt_ptcao: string (nullable = false)
 |-- dthr_igtao: timestamp (nullable = false)

root
 |-- ticker: string (nullable = true)
 |-- nome_curto: string (nullable = true)
 |-- nome_exibicao: string (nullable = true)
 |-- preco_mercado_regular: double (nullable = true)
 |-- mudanca_mercado_regular: double (nullable = true)
 |-- mudanca_percentual_mercado_regular: double (nullable = true)
 |-- volume_mercado_regular: long (nullable = true)
 |-- capitalizacao_mercado: long (nullable =

# Sprint 3 - Indices

In [15]:
indices = {
    '^BVSP': 'Ibovespa',
    'BOVA11.SA': 'BOVA11',
    '^GSPC': 'S&P 500',
    '^DJI': 'Dow Jones',
    '^IXIC': 'NASDAQ',
    '^FTSE': 'FTSE 100',
    '^GDAXI': 'DAX 30',
    '^N225': 'Nikkei 225'
}

# Definindo o esquema do DataFrame
schema = StructType([
    StructField("indice", StringType(), True),
    StructField("abertura", DoubleType(), True),
    StructField("alta", DoubleType(), True),
    StructField("baixa", DoubleType(), True),
    StructField("fechamento", DoubleType(), True),
    StructField("fechamento_ajustado", DoubleType(), True),
    StructField("volume", LongType(), True),  # Alterado para LongType
    StructField("nome_indice", StringType(), True),
    StructField("data", DateType(), True)
])

# Criar uma lista para armazenar os dados
data_list = []

# Adicionando dados históricos (exemplo)
for ticker, name in indices.items():
    try:
        data = yf.download(ticker, start=start_date_str, end=end_date_str)
        if not data.empty:
            for index, row in data.iterrows():
                # Converter os valores para float ou inteiro antes de adicionar à lista
                data_list.append((ticker, float(row['Open']), float(row['High']), float(row['Low']), 
                                  float(row['Close']), float(row['Adj Close']), int(row['Volume']),  # Converte para inteiro
                                  name, index.date()))  # Usa index.date() para pegar apenas a parte da data
    except Exception as e:
        print(f"Falha ao baixar dados para {ticker}: {e}")

# Criar um DataFrame Spark
df_indices = spark.createDataFrame(data_list, schema)

janela = Window.partitionBy('indice').orderBy('data')

# Calcular preço de fechamento do mês anterior (deslocar uma linha para cima)
df_indices = df_indices.withColumn('fechamento_mes_anterior', lag('fechamento').over(janela))

# Calcular Retorno em valor (diferença absoluta)
df_indices = df_indices.withColumn('valor_retorno', col('fechamento') - col('fechamento_mes_anterior'))

# Calcular Retorno em porcentagem
df_indices = df_indices.withColumn('porcentagem_retorno', (col('valor_retorno') / col('fechamento_mes_anterior')) * 100)

# Adicionar colunas dt_ptcao e dthr_igtao
df_indices = df_indices.withColumn('dt_ptcao', date_format(current_timestamp(), 'yyyy-MM-dd'))
df_indices = df_indices.withColumn('dthr_igtao', current_timestamp())

# Exibir o DataFrame final
df_indices = df_indices.orderBy('indice','data')
df_indices.show()

YF.download() has changed argument auto_adjust default to True


[*********************100%***********************]  1 of 1 completed


  data_list.append((ticker, float(row['Open']), float(row['High']), float(row['Low']),
  float(row['Close']), float(row['Adj Close']), int(row['Volume']),  # Converte para inteiro


Falha ao baixar dados para ^BVSP: 'Adj Close'


[*********************100%***********************]  1 of 1 completed


  data_list.append((ticker, float(row['Open']), float(row['High']), float(row['Low']),
  float(row['Close']), float(row['Adj Close']), int(row['Volume']),  # Converte para inteiro


Falha ao baixar dados para BOVA11.SA: 'Adj Close'


[*********************100%***********************]  1 of 1 completed


  data_list.append((ticker, float(row['Open']), float(row['High']), float(row['Low']),
  float(row['Close']), float(row['Adj Close']), int(row['Volume']),  # Converte para inteiro


Falha ao baixar dados para ^GSPC: 'Adj Close'


[*********************100%***********************]  1 of 1 completed


  data_list.append((ticker, float(row['Open']), float(row['High']), float(row['Low']),
  float(row['Close']), float(row['Adj Close']), int(row['Volume']),  # Converte para inteiro


Falha ao baixar dados para ^DJI: 'Adj Close'


[*********************100%***********************]  1 of 1 completed


  data_list.append((ticker, float(row['Open']), float(row['High']), float(row['Low']),
  float(row['Close']), float(row['Adj Close']), int(row['Volume']),  # Converte para inteiro


Falha ao baixar dados para ^IXIC: 'Adj Close'


[*********************100%***********************]  1 of 1 completed


  data_list.append((ticker, float(row['Open']), float(row['High']), float(row['Low']),
  float(row['Close']), float(row['Adj Close']), int(row['Volume']),  # Converte para inteiro


Falha ao baixar dados para ^FTSE: 'Adj Close'


[*********************100%***********************]  1 of 1 completed


  data_list.append((ticker, float(row['Open']), float(row['High']), float(row['Low']),
  float(row['Close']), float(row['Adj Close']), int(row['Volume']),  # Converte para inteiro


Falha ao baixar dados para ^GDAXI: 'Adj Close'


[*********************100%***********************]  1 of 1 completed


  data_list.append((ticker, float(row['Open']), float(row['High']), float(row['Low']),
  float(row['Close']), float(row['Adj Close']), int(row['Volume']),  # Converte para inteiro


Falha ao baixar dados para ^N225: 'Adj Close'


+------+--------+----+-----+----------+-------------------+------+-----------+----+-----------------------+-------------+-------------------+--------+----------+
|indice|abertura|alta|baixa|fechamento|fechamento_ajustado|volume|nome_indice|data|fechamento_mes_anterior|valor_retorno|porcentagem_retorno|dt_ptcao|dthr_igtao|
+------+--------+----+-----+----------+-------------------+------+-----------+----+-----------------------+-------------+-------------------+--------+----------+
+------+--------+----+-----+----------+-------------------+------+-----------+----+-----------------------+-------------+-------------------+--------+----------+



# Postgres upsert

In [16]:
SRVNAME = "postgres"
USER = "airflow"
PASSWORD = "airflow"
HOST = "postgres"
PORT = "5432"
DBNAME = "paxpedb"

# Parâmetros de conexão usando as variáveis
conn_params = {
    'dbname': DBNAME,
    'user': USER,
    'password': PASSWORD,
    'host': HOST,
    'port': PORT
}

In [17]:
def test_connection():
    try:
        # Connect to the PostgreSQL server using variables
        connection = psycopg2.connect(
            dbname=SRVNAME,
            user=USER,
            password=PASSWORD,
            host=HOST,
            port=PORT
        )
        print("Conexão com postgres sucedida")
        return True
    except OperationalError as e:
        print(f"Error: {e}")
        return False
    finally:
        if connection:
            connection.close()
if not test_connection():
    sys.exit(1)

Conexão com postgres sucedida


### para inserir no postgres é necessário
> 1. alimentar as tabelas em stage --- temp_nometabela
> 2. realizar o upsert comparando as tabelas

In [18]:
def write_to_postgres(df, table_name, schema_name="paxpestg"):
  
    df.write \
        .format("jdbc") \
        .option("url", f"jdbc:postgresql://{HOST}:{PORT}/{DBNAME}") \
        .option("dbtable", f"{schema_name}.{table_name}") \
        .option("user", USER) \
        .option("password", PASSWORD) \
        .option("driver", "org.postgresql.Driver") \
        .mode("overwrite") \
        .save()

In [19]:
# escrita das tabelas em stage
write_to_postgres(df_retorno_mensal, "temp_retorno_mensal")
write_to_postgres(df_ativas, "temp_captacao_mercado")
write_to_postgres(df_geral, "temp_cadastro")
write_to_postgres(df_financeira, "temp_financas")
write_to_postgres(df_mercado, "temp_mercado")
write_to_postgres(df_dividendos, "temp_dividendos")
write_to_postgres(df_valuation, "temp_valuation")
write_to_postgres(df_indices, "temp_indices")

[Stage 420:>(14 + 8) / 24][Stage 421:> (0 + 0) / 24][Stage 422:> (0 + 0) / 24][Stage 421:>(15 + 8) / 24][Stage 422:> (0 + 0) / 24][Stage 423:> (0 + 0) / 24]

[Stage 422:>(19 + 5) / 24][Stage 423:> (0 + 3) / 24][Stage 424:> (0 + 0) / 24][Stage 423:>(22 + 2) / 24][Stage 424:> (0 + 6) / 24][Stage 425:> (0 + 0) / 24]

[Stage 425:> (6 + 8) / 24][Stage 426:> (0 + 0) / 24][Stage 427:> (0 + 0) / 24][Stage 426:> (1 + 8) / 24][Stage 427:> (0 + 0) / 24][Stage 428:> (0 + 0) / 24]

[Stage 427:> (4 + 9) / 24][Stage 428:> (0 + 0) / 24][Stage 429:> (0 + 0) / 24][Stage 428:>(11 + 8) / 24][Stage 429:> (0 + 0) / 24][Stage 430:> (0 + 0) / 24]

[Stage 429:>(16 + 8) / 24][Stage 430:> (0 + 0) / 24][Stage 431:> (0 + 0) / 24][Stage 430:>(20 + 4) / 24][Stage 431:> (0 + 4) / 24][Stage 432:> (0 + 0) / 24]

[Stage 432:> (1 + 8) / 24][Stage 433:> (0 + 0) / 24][Stage 434:> (0 + 0) / 24][Stage 433:> (7 + 8) / 24][Stage 434:> (0 + 0) / 24][Stage 435:> (0 + 0) / 24]

[Stage 434:>(15 + 8) / 24][Stage 435:> (0 + 0) / 24][Stage 436:> (0 + 0) / 24][Stage 435:>(21 + 3) / 24][Stage 436:> (0 + 5) / 24][Stage 437:> (0 + 0) / 24]

[Stage 436:>(21 + 3) / 24][Stage 437:> (0 + 5) / 24][Stage 438:> (0 + 0) / 24]

[Stage 438:> (4 + 8) / 24][Stage 439:> (0 + 0) / 24][Stage 440:> (0 + 0) / 24]

[Stage 439:> (7 + 9) / 24][Stage 440:> (0 + 0) / 24][Stage 441:> (0 + 0) / 24][Stage 440:>(14 + 8) / 24][Stage 441:> (0 + 0) / 24][Stage 442:> (0 + 0) / 24]

[Stage 441:>(19 + 5) / 24][Stage 442:> (0 + 3) / 24][Stage 443:> (0 + 0) / 24][Stage 443:> (2 + 8) / 24][Stage 444:> (0 + 0) / 24][Stage 445:> (0 + 0) / 24]

[Stage 444:> (5 + 8) / 24][Stage 445:> (0 + 0) / 24][Stage 446:> (0 + 0) / 24][Stage 445:>(10 + 8) / 24][Stage 446:> (0 + 0) / 24][Stage 447:> (0 + 0) / 24]

[Stage 446:>(16 + 8) / 24][Stage 447:> (0 + 0) / 24][Stage 448:> (0 + 0) / 24][Stage 447:>(20 + 4) / 24][Stage 448:> (0 + 4) / 24][Stage 449:> (0 + 0) / 24]

[Stage 448:>(23 + 1) / 24][Stage 449:> (0 + 7) / 24][Stage 450:> (0 + 0) / 24][Stage 450:> (3 + 8) / 24][Stage 451:> (0 + 0) / 24][Stage 452:> (0 + 0) / 24]

[Stage 451:> (6 + 8) / 24][Stage 452:> (0 + 0) / 24][Stage 453:> (0 + 0) / 24][Stage 452:>(11 + 8) / 24][Stage 453:> (0 + 0) / 24][Stage 454:> (0 + 0) / 24]

[Stage 453:>(14 + 9) / 24][Stage 454:> (0 + 0) / 24][Stage 455:> (0 + 0) / 24]

[Stage 454:>(20 + 4) / 24][Stage 455:> (0 + 4) / 24][Stage 456:> (0 + 0) / 24][Stage 456:> (1 + 8) / 24][Stage 457:> (0 + 0) / 24][Stage 458:> (0 + 0) / 24]

[Stage 457:> (5 + 8) / 24][Stage 458:> (0 + 0) / 24][Stage 459:> (0 + 0) / 24][Stage 458:>(12 + 8) / 24][Stage 459:> (0 + 0) / 24][Stage 460:> (0 + 0) / 24]

[Stage 459:>(15 + 8) / 24][Stage 460:> (0 + 0) / 24][Stage 461:> (0 + 0) / 24][Stage 460:>(19 + 5) / 24][Stage 461:> (0 + 3) / 24][Stage 462:> (0 + 0) / 24]

[Stage 462:> (1 + 8) / 24][Stage 463:> (0 + 0) / 24][Stage 464:> (0 + 0) / 24]

[Stage 463:>(5 + 10) / 24][Stage 464:> (0 + 0) / 24][Stage 465:> (0 + 0) / 24][Stage 464:>(11 + 8) / 24][Stage 465:> (0 + 0) / 24][Stage 466:> (0 + 0) / 24]

[Stage 465:>(18 + 6) / 24][Stage 466:> (0 + 2) / 24][Stage 467:> (0 + 0) / 24][Stage 466:>(23 + 1) / 24][Stage 467:> (1 + 8) / 24][Stage 468:> (0 + 0) / 24]

[Stage 468:> (5 + 8) / 24][Stage 469:> (0 + 0) / 24][Stage 470:> (0 + 0) / 24][Stage 469:>(13 + 8) / 24][Stage 470:> (0 + 0) / 24][Stage 471:> (0 + 0) / 24]

[Stage 470:>(20 + 4) / 24][Stage 471:> (0 + 4) / 24][Stage 472:> (0 + 0) / 24][Stage 472:> (1 + 8) / 24][Stage 473:> (0 + 0) / 24][Stage 474:> (0 + 0) / 24]

[Stage 473:> (5 + 8) / 24][Stage 474:> (0 + 0) / 24][Stage 475:> (0 + 0) / 24][Stage 474:>(10 + 8) / 24][Stage 475:> (0 + 0) / 24][Stage 476:> (0 + 0) / 24]

[Stage 475:(14 + 10) / 24][Stage 476:> (0 + 1) / 24][Stage 477:> (0 + 0) / 24][Stage 476:>(21 + 3) / 24][Stage 477:> (0 + 5) / 24][Stage 478:> (0 + 0) / 24]

[Stage 478:> (3 + 8) / 24][Stage 479:> (0 + 0) / 24][Stage 480:> (0 + 0) / 24][Stage 479:> (5 + 8) / 24][Stage 480:> (0 + 0) / 24][Stage 481:> (0 + 0) / 24]

[Stage 480:>(13 + 8) / 24][Stage 481:> (0 + 0) / 24][Stage 482:> (0 + 0) / 24][Stage 481:>(20 + 4) / 24][Stage 482:> (0 + 4) / 24][Stage 483:> (0 + 0) / 24]

[Stage 482:>(23 + 1) / 24][Stage 483:> (0 + 7) / 24][Stage 484:> (0 + 0) / 24][Stage 484:> (5 + 8) / 24][Stage 485:> (0 + 0) / 24][Stage 486:> (0 + 0) / 24]

[Stage 485:>(10 + 8) / 24][Stage 486:> (0 + 0) / 24][Stage 487:> (0 + 0) / 24][Stage 486:>(16 + 8) / 24][Stage 487:> (0 + 0) / 24][Stage 488:> (0 + 0) / 24]

[Stage 487:>(19 + 5) / 24][Stage 488:> (0 + 3) / 24][Stage 489:> (0 + 0) / 24][Stage 489:> (0 + 8) / 24][Stage 490:> (0 + 0) / 24][Stage 491:> (0 + 0) / 24]

[Stage 490:> (7 + 8) / 24][Stage 491:> (0 + 0) / 24][Stage 492:> (0 + 0) / 24][Stage 491:>(12 + 8) / 24][Stage 492:> (0 + 0) / 24][Stage 493:> (0 + 0) / 24]

[Stage 492:>(16 + 8) / 24][Stage 493:> (0 + 0) / 24][Stage 494:> (0 + 0) / 24][Stage 493:>(23 + 1) / 24][Stage 494:> (0 + 7) / 24][Stage 495:> (0 + 0) / 24]

[Stage 495:> (7 + 8) / 24][Stage 496:> (0 + 0) / 24][Stage 497:> (0 + 0) / 24][Stage 496:>(11 + 8) / 24][Stage 497:> (0 + 0) / 24][Stage 498:> (0 + 0) / 24]

[Stage 497:>(16 + 8) / 24][Stage 498:> (0 + 0) / 24][Stage 499:> (0 + 0) / 24][Stage 498:>(23 + 1) / 24][Stage 499:> (0 + 7) / 24][Stage 500:> (0 + 0) / 24]

[Stage 500:> (5 + 8) / 24][Stage 501:> (0 + 0) / 24][Stage 502:> (0 + 0) / 24][Stage 501:> (8 + 8) / 24][Stage 502:> (0 + 0) / 24][Stage 503:> (0 + 0) / 24]

[Stage 502:>(16 + 8) / 24][Stage 503:> (0 + 0) / 24][Stage 504:> (0 + 0) / 24][Stage 503:>(20 + 4) / 24][Stage 504:> (0 + 4) / 24][Stage 505:> (0 + 0) / 24]

[Stage 505:> (4 + 8) / 24][Stage 506:> (0 + 0) / 24][Stage 507:> (0 + 0) / 24][Stage 506:> (7 + 9) / 24][Stage 507:> (0 + 0) / 24][Stage 508:> (0 + 0) / 24]

[Stage 507:>(14 + 8) / 24][Stage 508:> (0 + 0) / 24][Stage 509:> (0 + 0) / 24][Stage 508:>(21 + 3) / 24][Stage 509:> (0 + 5) / 24][Stage 510:> (0 + 0) / 24]

[Stage 510:> (1 + 8) / 24][Stage 511:> (0 + 0) / 24][Stage 512:> (0 + 0) / 24][Stage 511:> (8 + 8) / 24][Stage 512:> (0 + 0) / 24][Stage 513:> (0 + 0) / 24]

[Stage 512:> (9 + 8) / 24][Stage 513:> (0 + 0) / 24][Stage 514:> (0 + 0) / 24][Stage 513:>(17 + 7) / 24][Stage 514:> (0 + 1) / 24][Stage 515:> (0 + 0) / 24]

[Stage 514:>(18 + 6) / 24][Stage 515:> (0 + 2) / 24][Stage 516:> (0 + 0) / 24][Stage 516:> (1 + 8) / 24][Stage 517:> (0 + 0) / 24][Stage 518:> (0 + 0) / 24]

[Stage 517:> (2 + 8) / 24][Stage 518:> (0 + 0) / 24][Stage 519:> (0 + 0) / 24][Stage 518:> (9 + 8) / 24][Stage 519:> (0 + 0) / 24][Stage 520:> (0 + 0) / 24]

[Stage 519:>(13 + 8) / 24][Stage 520:> (0 + 0) / 24][Stage 521:> (0 + 0) / 24][Stage 520:>(17 + 7) / 24][Stage 521:> (0 + 1) / 24][Stage 522:> (0 + 0) / 24]

[Stage 521:>(21 + 3) / 24][Stage 522:> (0 + 5) / 24][Stage 523:> (0 + 0) / 24][Stage 523:> (1 + 8) / 24][Stage 524:> (0 + 0) / 24][Stage 525:> (0 + 0) / 24]

[Stage 524:> (9 + 8) / 24][Stage 525:> (0 + 0) / 24][Stage 526:> (0 + 0) / 24][Stage 525:>(14 + 8) / 24][Stage 526:> (0 + 0) / 24][Stage 527:> (0 + 0) / 24]

[Stage 526:>(17 + 7) / 24][Stage 527:> (0 + 1) / 24][Stage 528:> (0 + 0) / 24][Stage 527:>(20 + 4) / 24][Stage 528:> (0 + 5) / 24][Stage 529:> (0 + 0) / 24]

[Stage 529:> (2 + 8) / 24][Stage 530:> (0 + 0) / 24][Stage 531:> (0 + 0) / 24][Stage 530:> (8 + 8) / 24][Stage 531:> (0 + 0) / 24][Stage 532:> (0 + 0) / 24]

[Stage 531:>(16 + 8) / 24][Stage 532:> (0 + 0) / 24][Stage 533:> (0 + 0) / 24][Stage 532:>(16 + 8) / 24][Stage 533:> (0 + 0) / 24][Stage 534:> (0 + 0) / 24]

[Stage 533:>(23 + 1) / 24][Stage 534:> (0 + 7) / 24][Stage 535:> (0 + 0) / 24][Stage 535:> (5 + 8) / 24][Stage 536:> (0 + 0) / 24][Stage 537:> (0 + 0) / 24]

[Stage 536:>(11 + 8) / 24][Stage 537:> (0 + 0) / 24][Stage 538:> (0 + 0) / 24][Stage 537:>(15 + 8) / 24][Stage 538:> (0 + 0) / 24][Stage 539:> (0 + 0) / 24]

[Stage 538:>(22 + 2) / 24][Stage 539:> (0 + 6) / 24][Stage 540:> (0 + 0) / 24][Stage 540:> (4 + 8) / 24][Stage 541:> (0 + 0) / 24][Stage 542:> (0 + 0) / 24]

[Stage 541:> (6 + 8) / 24][Stage 542:> (0 + 0) / 24][Stage 543:> (0 + 0) / 24][Stage 542:>(14 + 8) / 24][Stage 543:> (0 + 0) / 24][Stage 544:> (0 + 0) / 24]

[Stage 543:>(21 + 3) / 24][Stage 544:> (0 + 5) / 24][Stage 545:> (0 + 0) / 24][Stage 545:> (4 + 8) / 24][Stage 546:> (0 + 0) / 24][Stage 547:> (0 + 0) / 24]

[Stage 546:>(12 + 8) / 24][Stage 547:> (0 + 0) / 24][Stage 548:> (0 + 0) / 24][Stage 547:>(15 + 8) / 24][Stage 548:> (0 + 0) / 24][Stage 549:> (0 + 0) / 24]

[Stage 548:>(21 + 3) / 24][Stage 549:> (0 + 5) / 24][Stage 550:> (0 + 0) / 24][Stage 550:> (5 + 9) / 24][Stage 551:> (0 + 0) / 24][Stage 552:> (0 + 0) / 24]

[Stage 551:> (7 + 8) / 24][Stage 552:> (0 + 0) / 24][Stage 553:> (0 + 0) / 24][Stage 552:>(14 + 8) / 24][Stage 553:> (0 + 0) / 24][Stage 554:> (0 + 0) / 24]

[Stage 553:(14 + 10) / 24][Stage 554:> (0 + 1) / 24][Stage 555:> (0 + 0) / 24][Stage 554:>(22 + 2) / 24][Stage 555:> (0 + 6) / 24][Stage 556:> (0 + 0) / 24]

[Stage 556:> (6 + 8) / 24][Stage 557:> (0 + 0) / 24][Stage 558:> (0 + 0) / 24][Stage 557:>(12 + 8) / 24][Stage 558:> (0 + 0) / 24][Stage 559:> (0 + 0) / 24]

[Stage 558:>(16 + 8) / 24][Stage 559:> (0 + 0) / 24][Stage 560:> (0 + 0) / 24][Stage 559:>(19 + 5) / 24][Stage 560:> (0 + 3) / 24][Stage 561:> (0 + 0) / 24]

[Stage 561:> (1 + 8) / 24][Stage 562:> (0 + 0) / 24][Stage 563:> (0 + 0) / 24][Stage 562:> (6 + 8) / 24][Stage 563:> (0 + 0) / 24][Stage 564:> (0 + 0) / 24]

[Stage 563:> (8 + 8) / 24][Stage 564:> (0 + 0) / 24][Stage 565:> (0 + 0) / 24][Stage 564:>(14 + 8) / 24][Stage 565:> (0 + 0) / 24][Stage 566:> (0 + 0) / 24]

[Stage 565:>(18 + 6) / 24][Stage 566:> (0 + 2) / 24][Stage 567:> (0 + 0) / 24][Stage 567:> (0 + 8) / 24][Stage 568:> (0 + 0) / 24][Stage 569:> (0 + 0) / 24]

[Stage 568:> (2 + 8) / 24][Stage 569:> (0 + 0) / 24][Stage 570:> (0 + 0) / 24][Stage 569:> (8 + 8) / 24][Stage 570:> (0 + 0) / 24][Stage 571:> (0 + 0) / 24]

[Stage 570:>(11 + 8) / 24][Stage 571:> (0 + 0) / 24][Stage 572:> (0 + 0) / 24][Stage 571:>(16 + 8) / 24][Stage 572:> (0 + 0) / 24][Stage 573:> (0 + 0) / 24]

[Stage 573:> (0 + 8) / 24][Stage 574:> (0 + 0) / 24][Stage 575:> (0 + 0) / 24][Stage 574:> (5 + 8) / 24][Stage 575:> (0 + 0) / 24][Stage 576:> (0 + 0) / 24]

[Stage 575:>(10 + 8) / 24][Stage 576:> (0 + 0) / 24][Stage 577:> (0 + 0) / 24][Stage 576:>(14 + 8) / 24][Stage 577:> (0 + 0) / 24][Stage 578:> (0 + 0) / 24]

[Stage 577:>(18 + 6) / 24][Stage 578:> (0 + 2) / 24][Stage 579:> (0 + 0) / 24][Stage 579:> (1 + 8) / 24][Stage 580:> (0 + 0) / 24][Stage 581:> (0 + 0) / 24]

[Stage 580:> (5 + 8) / 24][Stage 581:> (0 + 0) / 24][Stage 582:> (0 + 0) / 24][Stage 581:>(10 + 8) / 24][Stage 582:> (0 + 0) / 24][Stage 583:> (0 + 0) / 24]

[Stage 582:>(16 + 8) / 24][Stage 583:> (0 + 0) / 24][Stage 584:> (0 + 0) / 24][Stage 583:>(19 + 5) / 24][Stage 584:> (0 + 3) / 24][Stage 585:> (0 + 0) / 24]

[Stage 585:> (0 + 8) / 24][Stage 586:> (0 + 0) / 24][Stage 587:> (0 + 0) / 24][Stage 586:> (7 + 8) / 24][Stage 587:> (0 + 0) / 24][Stage 588:> (0 + 0) / 24]

[Stage 587:>(12 + 8) / 24][Stage 588:> (0 + 0) / 24][Stage 589:> (0 + 0) / 24][Stage 588:>(19 + 5) / 24][Stage 589:> (0 + 3) / 24][Stage 590:> (0 + 0) / 24]

[Stage 590:> (0 + 8) / 24][Stage 591:> (0 + 0) / 24][Stage 592:> (0 + 0) / 24][Stage 591:> (8 + 8) / 24][Stage 592:> (0 + 0) / 24][Stage 593:> (0 + 0) / 24]

[Stage 592:>(13 + 8) / 24][Stage 593:> (0 + 0) / 24][Stage 594:> (0 + 0) / 24][Stage 593:>(14 + 8) / 24][Stage 594:> (0 + 0) / 24][Stage 595:> (0 + 0) / 24]

[Stage 594:>(17 + 7) / 24][Stage 595:> (0 + 1) / 24][Stage 596:> (0 + 0) / 24][Stage 595:>(23 + 1) / 24][Stage 596:> (0 + 7) / 24][Stage 597:> (0 + 0) / 24]

[Stage 597:> (6 + 8) / 24][Stage 598:> (0 + 0) / 24][Stage 599:> (0 + 0) / 24][Stage 598:>(11 + 8) / 24][Stage 599:> (0 + 0) / 24][Stage 600:> (0 + 0) / 24]

[Stage 599:>(15 + 8) / 24][Stage 600:> (0 + 0) / 24][Stage 601:> (0 + 0) / 24][Stage 600:>(21 + 3) / 24][Stage 601:> (0 + 5) / 24][Stage 602:> (0 + 0) / 24]

[Stage 602:> (1 + 8) / 24][Stage 603:> (0 + 0) / 24][Stage 604:> (0 + 0) / 24][Stage 603:> (4 + 8) / 24][Stage 604:> (0 + 0) / 24][Stage 605:> (0 + 0) / 24]

[Stage 604:>(11 + 8) / 24][Stage 605:> (0 + 0) / 24][Stage 606:> (0 + 0) / 24][Stage 605:>(18 + 6) / 24][Stage 606:> (0 + 2) / 24][Stage 607:> (0 + 0) / 24]

[Stage 606:>(20 + 4) / 24][Stage 607:> (0 + 4) / 24][Stage 608:> (0 + 0) / 24][Stage 608:> (3 + 8) / 24][Stage 609:> (0 + 0) / 24][Stage 610:> (0 + 0) / 24]

[Stage 609:> (9 + 8) / 24][Stage 610:> (0 + 0) / 24][Stage 611:> (0 + 0) / 24][Stage 610:>(15 + 8) / 24][Stage 611:> (0 + 0) / 24][Stage 612:> (0 + 0) / 24]

[Stage 611:>(19 + 5) / 24][Stage 612:> (0 + 3) / 24][Stage 613:> (0 + 0) / 24][Stage 613:> (1 + 8) / 24][Stage 614:> (0 + 0) / 24][Stage 615:> (0 + 0) / 24]

[Stage 614:> (7 + 8) / 24][Stage 615:> (0 + 0) / 24][Stage 616:> (0 + 0) / 24]



25/04/06 15:14:06 WARN DAGScheduler: Broadcasting large task binary with size 3.8 MiB


[Stage 817:=====>                                                (22 + 8) / 200]

















In [20]:
def upsert_data(table_name, temp_table_name, key_columns):
    try:
        # Conectar ao PostgreSQL
        connection = psycopg2.connect(**conn_params)
        cursor = connection.cursor()
        
        # Definir o fuso horário da sessão para UTC-3
        cursor.execute("SET TIME ZONE 'America/Sao_Paulo';")

        # Definir esquemas
        schema_fact = "paxpe"
        schema_stg = "paxpestg"
        
        # Obter todas as colunas da tabela
        cursor.execute(f"""
        SELECT column_name
        FROM information_schema.columns
        WHERE table_schema = '{schema_fact}' AND table_name = '{table_name}'
        """)
        all_columns = [row[0] for row in cursor.fetchall()]

        # Identificar colunas não chave e colunas de referência
        non_key_columns = [col for col in all_columns if col not in key_columns and col not in ['dt_ptcao', 'dthr_igtao']]
        reference_columns = ['dthr_igtao']

        # Construir a declaração SQL de atualização col1, col2
        key_columns_str = ', '.join(key_columns)
        update_set_non_key = ', '.join([
            f"{col} = EXCLUDED.{col}" 
            for col in non_key_columns
        ])
        
        # Atualizar apenas as colunas não-chave se houver uma mudança
        where_condition = ' OR '.join([
            f"target.{col} IS DISTINCT FROM EXCLUDED.{col}" 
            for col in non_key_columns
        ])
        
        # Para atualizar as colunas de referência se houver uma atualização nas colunas não-chave
        update_set_reference = ', '.join([
            f"{col} = EXCLUDED.{col}" 
            for col in reference_columns
        ])
        
        sql = f"""
        INSERT INTO {schema_fact}.{table_name} 
        (SELECT * FROM {schema_stg}.{temp_table_name})
        ON CONFLICT ({key_columns_str}) 
        DO UPDATE SET
        {update_set_non_key},
        {update_set_reference}
        WHERE EXISTS (
            SELECT 1
            FROM {schema_fact}.{table_name} AS target
            WHERE { ' AND '.join([f"target.{key} = EXCLUDED.{key}" for key in key_columns]) }
            AND ({where_condition})
        );
        """
        
        # Executar a declaração SQL de upsert
        cursor.execute(sql)
        connection.commit()
        print(f"Upsert de {schema_stg}.{temp_table_name} realizado para {schema_fact}.{table_name}.")
    
    except Exception as e:
        print(f"Error: {e}")
    
    finally:
        if connection:
            cursor.close()
            connection.close()

In [21]:
# colunas chave para cada tabela
key_columns_retorno_mensal = ["ticker", "data"]
key_columns_cadastro = ["ticker"]
key_columns_cap_mercado = ["ticker"]
key_columns_financas = ["ticker"]
key_columns_mercado = ["ticker"]
key_columns_dividendos = ["ticker","data_exdividendo"]
key_columns_valuation = ["ticker"]
key_columns_indices = ["indice","data"]

In [22]:
# realiza o upsert
upsert_data("retorno_mensal", "temp_retorno_mensal", key_columns_retorno_mensal)
upsert_data("cadastro", "temp_cadastro", key_columns_cadastro)
upsert_data("captacao_mercado", "temp_captacao_mercado", key_columns_cap_mercado)
upsert_data("financas", "temp_financas", key_columns_financas)
upsert_data("mercado", "temp_mercado", key_columns_mercado)
upsert_data("dividendos", "temp_dividendos", key_columns_dividendos)
upsert_data("valuation", "temp_valuation", key_columns_valuation)
upsert_data("indices", "temp_indices", key_columns_indices)

Upsert de paxpestg.temp_retorno_mensal realizado para paxpe.retorno_mensal.
Upsert de paxpestg.temp_cadastro realizado para paxpe.cadastro.
Upsert de paxpestg.temp_captacao_mercado realizado para paxpe.captacao_mercado.
Upsert de paxpestg.temp_financas realizado para paxpe.financas.
Upsert de paxpestg.temp_mercado realizado para paxpe.mercado.
Upsert de paxpestg.temp_dividendos realizado para paxpe.dividendos.
Upsert de paxpestg.temp_valuation realizado para paxpe.valuation.
Upsert de paxpestg.temp_indices realizado para paxpe.indices.
