# 📘 Resumo do Dataset — Indicadores Técnicos e Cambiais para Bitcoin

Este dataset foi construído com PySpark e contém:

- 📅 Dados diários da Bitcoin: preços (`open`, `close`, `high`, `low`), volume e recompensa de bloco (`block_reward`)
- 📈 Indicadores técnicos calculados com janelas móveis:
  - `SMA_20_btc`, `RSI_btc`, `EMA_12_btc`, `EMA_26_btc`, `MACD_btc`, `MACD_signal_btc`
  - `BB_upper_btc`, `BB_lower_btc` (Bandas de Bollinger)
- 📊 Dados económicos adicionais:
  - `CPI`, `S&P500` (`open`, `close`, `volume`)
- 💱 Dados cambiais:
  - `USDCHF`, `EURCHF`

⚙️ Este dataset está pronto para análise de séries temporais, estudos de impacto de halvings e modelos de previsão do preço da Bitcoin.


In [93]:
import os
import time
import pandas as pd
import numpy as np
#import yfinance as yf
import matplotlib.pyplot as plt
from pyspark.sql.window import Window
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.sql.types import StructType, StructField, StringType, DateType, IntegerType, DoubleType, ArrayType
from pyspark.sql.functions import pandas_udf, udf, col

In [94]:
os.environ['JAVA_HOME'] = "/nix/store/8drvwqmcxh2rvasgr7visxrgfjxnd3an-openjdk-11.0.19+7"
print('JAVA_HOME:', os.environ.get('JAVA_HOME'))

JAVA_HOME: /nix/store/8drvwqmcxh2rvasgr7visxrgfjxnd3an-openjdk-11.0.19+7


In [95]:
# Crear Spark Session y medir tiempo
start_time = time.time()

spark = SparkSession.builder \
.appName('btcproject') \
.config('spark.driver.memory', '8g') \
.config('spark.executor.memory', '8g') \
    .getOrCreate()

#.config('spark.local.dir', temp_path) \


#spark.sparkContext.setLogLevel('ERROR')

end_time = time.time()
print('Spark Version:', spark.version)
print(f'Tiempo total en crear SparkSession: {round(end_time - start_time, 2)} segundos')

Spark Version: 3.5.0
Tiempo total en crear SparkSession: 0.57 segundos


# Data Cleaning


## Daily Bitcoin
https://coincodex.com/crypto/bitcoin/historical-data/

In [96]:
df_btc = spark.read \
    .option("header", True) \
    .option("sep", ",") \
    .option("inferSchema", True) \
    .csv('../data/04/bitcoin.csv')

In [97]:
df_btc.show()
df_btc.printSchema()

+----------+----------+--------+--------+--------+--------+--------------------+--------------------+
|     Start|       End|    Open|    High|     Low|   Close|              Volume|          Market Cap|
+----------+----------+--------+--------+--------+--------+--------------------+--------------------+
|2025-04-23|2025-04-24|93642.39|94383.81| 92155.5|93565.02|7.992915306502438E10|1.856407285385874...|
|2025-04-22|2025-04-23|87556.88|93729.04|87306.04|93375.16|6.139408800816724...|1.780730400337749E12|
|2025-04-21|2025-04-22|85190.58|88358.48|85190.58|87525.71|4.610174416478397E10|1.733086941755306...|
|2025-04-20|2025-04-21|85059.77|85320.15|84018.53|85095.42|2.363965210136934E10|1.683010284160163...|
|2025-04-19|2025-04-20|84484.98|85510.04| 84368.6|85127.56|2.200324311390941E10|1.689659242621345E12|
|2025-04-18|2025-04-19|84892.79|85109.28|84344.03|84464.64|3.094039408019164E10|1.679968878910571...|
|2025-04-17|2025-04-18|84081.07|85426.94|83814.62|84931.22|4.330348236714634E10|1.

In [98]:
df_btc = df_btc.select(
    F.col("Start").alias("Date"),
    F.col("Open").alias("open_btc"),
    F.col("High").alias("high_btc"),
    F.col("Low").alias("low_btc"),
    F.col("Close").alias("close_btc"),
    F.col("Volume").alias("volume_btc")
)

In [99]:
df_btc.select(
    F.min("Date").alias("Fecha Minima"),
    F.max("Date").alias("Fecha Maxima")
).show()

+------------+------------+
|Fecha Minima|Fecha Maxima|
+------------+------------+
|  2010-07-17|  2025-04-23|
+------------+------------+



In [100]:
df_btc.show()

+----------+--------+--------+--------+---------+--------------------+
|      Date|open_btc|high_btc| low_btc|close_btc|          volume_btc|
+----------+--------+--------+--------+---------+--------------------+
|2025-04-23|93642.39|94383.81| 92155.5| 93565.02|7.992915306502438E10|
|2025-04-22|87556.88|93729.04|87306.04| 93375.16|6.139408800816724...|
|2025-04-21|85190.58|88358.48|85190.58| 87525.71|4.610174416478397E10|
|2025-04-20|85059.77|85320.15|84018.53| 85095.42|2.363965210136934E10|
|2025-04-19|84484.98|85510.04| 84368.6| 85127.56|2.200324311390941E10|
|2025-04-18|84892.79|85109.28|84344.03| 84464.64|3.094039408019164E10|
|2025-04-17|84081.07|85426.94|83814.62| 84931.22|4.330348236714634E10|
|2025-04-16|83652.14|85332.95| 83163.1| 84051.96|4.534236508933449...|
|2025-04-15|84521.75|86413.41|83632.78| 83677.65|5.796099945000349E10|
|2025-04-14| 83779.1| 85756.0| 83773.4| 84495.07|7.025542211409059E10|
|2025-04-13|85285.14|85780.08| 83171.6|  83693.1|6.052662212396167E10|
|2025-

In [101]:
df_nulls = df_btc.filter(
    F.col("open_btc").isNull() |
    F.col("high_btc").isNull() |
    F.col("low_btc").isNull() |
    F.col("close_btc").isNull() |
    F.col("volume_btc").isNull()
)
df_nulls.show()

+----+--------+--------+-------+---------+----------+
|Date|open_btc|high_btc|low_btc|close_btc|volume_btc|
+----+--------+--------+-------+---------+----------+
+----+--------+--------+-------+---------+----------+



In [102]:
df_btc = df_btc.withColumn(
    "block_reward",
    F.when(F.col("Date") <= F.lit("2012-11-28"), 50)      # 1er halving: 28/11/2012
     .when((F.col("Date") > F.lit("2012-11-28")) & 
           (F.col("Date") <= F.lit("2016-07-09")), 25)    # 2do halving: 09/07/2016
     .when((F.col("Date") > F.lit("2016-07-09")) & 
           (F.col("Date") <= F.lit("2020-05-11")), 12.5)  # 3er halving: 11/05/2020
     .when((F.col("Date") > F.lit("2020-05-11")) & 
           (F.col("Date") <= F.lit("2024-04-20")), 6.25)  # 4to halving: 20/04/2024
     .otherwise(3.125)                                    # Post 4to halving
)
df_btc.show()

+----------+--------+--------+--------+---------+--------------------+------------+
|      Date|open_btc|high_btc| low_btc|close_btc|          volume_btc|block_reward|
+----------+--------+--------+--------+---------+--------------------+------------+
|2025-04-23|93642.39|94383.81| 92155.5| 93565.02|7.992915306502438E10|       3.125|
|2025-04-22|87556.88|93729.04|87306.04| 93375.16|6.139408800816724...|       3.125|
|2025-04-21|85190.58|88358.48|85190.58| 87525.71|4.610174416478397E10|       3.125|
|2025-04-20|85059.77|85320.15|84018.53| 85095.42|2.363965210136934E10|       3.125|
|2025-04-19|84484.98|85510.04| 84368.6| 85127.56|2.200324311390941E10|       3.125|
|2025-04-18|84892.79|85109.28|84344.03| 84464.64|3.094039408019164E10|       3.125|
|2025-04-17|84081.07|85426.94|83814.62| 84931.22|4.330348236714634E10|       3.125|
|2025-04-16|83652.14|85332.95| 83163.1| 84051.96|4.534236508933449...|       3.125|
|2025-04-15|84521.75|86413.41|83632.78| 83677.65|5.796099945000349E10|      

## Daily SP500
https://stooq.com/q/d/?f=20100101&t=20250424&s=%5Espx&c=0

In [103]:
df_sp500 = spark.read \
    .option("header", True) \
    .option("sep", ",") \
    .option("inferSchema", True) \
    .csv('../data/04/sp500.csv')

In [104]:
df_sp500.show()
df_sp500.printSchema()

+----------+-------+-------+-------+-------+-------------+
|      Date|   Open|   High|    Low|  Close|       Volume|
+----------+-------+-------+-------+-------+-------------+
|2010-01-04|1116.56|1133.87|1116.56|1132.99|2.217444444E9|
|2010-01-05|1132.66|1136.63|1129.66|1136.52|     1.3839E9|
|2010-01-06|1135.71|1139.19|1133.95|1137.14|2.762588889E9|
|2010-01-07|1136.27|1142.46|1131.32|1141.69|2.928155556E9|
|2010-01-08|1140.52|1145.39|1136.22|1144.98|2.438661111E9|
|2010-01-11|1145.96|1149.74|1142.02|1146.98|2.364322222E9|
|2010-01-12|1143.81|1143.81|1131.77|1136.22|2.620088889E9|
|2010-01-13|1137.31| 1148.4|1133.18|1145.68|2.316866667E9|
|2010-01-14|1145.68|1150.41| 1143.8|1148.46|2.175111111E9|
|2010-01-15|1147.72|1147.77|1131.39|1136.03|2.643738889E9|
|2010-01-19|1136.03|1150.45|1135.77|1150.23|2.624905556E9|
|2010-01-20|1147.95|1147.95|1129.25|1138.04|2.672533333E9|
|2010-01-21|1138.68|1141.58|1114.84|1116.48|    3.81905E9|
|2010-01-22|1115.49|1115.49|1090.18|1091.76|    3.44925E

In [105]:
df_sp500.select(
    F.min("Date").alias("Fecha Minima"),
    F.max("Date").alias("Fecha Maxima")
).show()

+------------+------------+
|Fecha Minima|Fecha Maxima|
+------------+------------+
|  2010-01-04|  2025-04-23|
+------------+------------+



In [106]:
df_sp500 = df_sp500.select(
    F.col("Date"),
    F.col("Open").alias("open_sp500"),
    F.col("High").alias("high_sp500"),
    F.col("Low").alias("low_sp500"),
    F.col("Close").alias("close_sp500"),
    F.col("Volume").alias("volume_sp500")
)


In [107]:
df_sp500.show()

+----------+----------+----------+---------+-----------+-------------+
|      Date|open_sp500|high_sp500|low_sp500|close_sp500| volume_sp500|
+----------+----------+----------+---------+-----------+-------------+
|2010-01-04|   1116.56|   1133.87|  1116.56|    1132.99|2.217444444E9|
|2010-01-05|   1132.66|   1136.63|  1129.66|    1136.52|     1.3839E9|
|2010-01-06|   1135.71|   1139.19|  1133.95|    1137.14|2.762588889E9|
|2010-01-07|   1136.27|   1142.46|  1131.32|    1141.69|2.928155556E9|
|2010-01-08|   1140.52|   1145.39|  1136.22|    1144.98|2.438661111E9|
|2010-01-11|   1145.96|   1149.74|  1142.02|    1146.98|2.364322222E9|
|2010-01-12|   1143.81|   1143.81|  1131.77|    1136.22|2.620088889E9|
|2010-01-13|   1137.31|    1148.4|  1133.18|    1145.68|2.316866667E9|
|2010-01-14|   1145.68|   1150.41|   1143.8|    1148.46|2.175111111E9|
|2010-01-15|   1147.72|   1147.77|  1131.39|    1136.03|2.643738889E9|
|2010-01-19|   1136.03|   1150.45|  1135.77|    1150.23|2.624905556E9|
|2010-

In [108]:
df_nulls = df_sp500.filter(
    F.col("open_sp500").isNull() |
    F.col("high_sp500").isNull() |
    F.col("low_sp500").isNull() |
    F.col("close_sp500").isNull() |
    F.col("volume_sp500").isNull()
)
df_nulls.show()

+----+----------+----------+---------+-----------+------------+
|Date|open_sp500|high_sp500|low_sp500|close_sp500|volume_sp500|
+----+----------+----------+---------+-----------+------------+
+----+----------+----------+---------+-----------+------------+



## CPI 
## 100 = 2010-01-01
https://fred.stlouisfed.org/series/CPIAUCSL#

In [109]:
df_cpi = spark.read \
    .option("header", True) \
    .option("sep", ",") \
    .option("inferSchema", True) \
    .csv('../data/04/CPIAUCSL_NBD20100101.csv')

In [110]:
df_cpi.show()
df_cpi.printSchema()

+----------------+--------------------+
|observation_date|CPIAUCSL_NBD20100101|
+----------------+--------------------+
|      2010-01-01|               100.0|
|      2010-02-01|            99.90482|
|      2010-03-01|            99.93793|
|      2010-04-01|            99.96092|
|      2010-05-01|            99.90896|
|      2010-06-01|            99.86712|
|      2010-07-01|            100.0538|
|      2010-08-01|           100.20001|
|      2010-09-01|           100.36186|
|      2010-10-01|            100.7113|
|      2010-11-01|           100.96649|
|      2010-12-01|           101.37203|
|      2011-01-01|           101.70078|
|      2011-02-01|            102.0277|
|      2011-03-01|           102.55554|
|      2011-04-01|           103.03695|
|      2011-05-01|           103.36478|
|      2011-06-01|           103.36478|
|      2011-07-01|            103.6356|
|      2011-08-01|           103.96252|
+----------------+--------------------+
only showing top 20 rows

root
 |-- obse

In [111]:
df_cpi = df_cpi.select(
    F.col("observation_date").alias("Date"),
    F.col("CPIAUCSL_NBD20100101").alias("cpi")
)

In [112]:
df_cpi.show()

+----------+---------+
|      Date|      cpi|
+----------+---------+
|2010-01-01|    100.0|
|2010-02-01| 99.90482|
|2010-03-01| 99.93793|
|2010-04-01| 99.96092|
|2010-05-01| 99.90896|
|2010-06-01| 99.86712|
|2010-07-01| 100.0538|
|2010-08-01|100.20001|
|2010-09-01|100.36186|
|2010-10-01| 100.7113|
|2010-11-01|100.96649|
|2010-12-01|101.37203|
|2011-01-01|101.70078|
|2011-02-01| 102.0277|
|2011-03-01|102.55554|
|2011-04-01|103.03695|
|2011-05-01|103.36478|
|2011-06-01|103.36478|
|2011-07-01| 103.6356|
|2011-08-01|103.96252|
+----------+---------+
only showing top 20 rows



## Join

In [113]:
# 1. Crear columnas de año y mes en ambos datasets
df_btc_month = df_btc.withColumn("year", F.year("Date")) \
                    .withColumn("month", F.month("Date"))

df_btc_month.show(5)
df_cpi_month = df_cpi.withColumn("year", F.year("Date")) \
                    .withColumn("month", F.month("Date"))

# 2. Hacer join usando las columnas de año y mes
df_combined = df_btc_month.join(
    df_cpi_month.drop("Date"),
    ["year", "month"],
    "left"
).drop("year", "month")

# 3. Ordenar por fecha
df_combined = df_combined.orderBy(F.col("Date").desc())

df_combined.show(30)


+----------+--------+--------+--------+---------+--------------------+------------+----+-----+
|      Date|open_btc|high_btc| low_btc|close_btc|          volume_btc|block_reward|year|month|
+----------+--------+--------+--------+---------+--------------------+------------+----+-----+
|2025-04-23|93642.39|94383.81| 92155.5| 93565.02|7.992915306502438E10|       3.125|2025|    4|
|2025-04-22|87556.88|93729.04|87306.04| 93375.16|6.139408800816724...|       3.125|2025|    4|
|2025-04-21|85190.58|88358.48|85190.58| 87525.71|4.610174416478397E10|       3.125|2025|    4|
|2025-04-20|85059.77|85320.15|84018.53| 85095.42|2.363965210136934E10|       3.125|2025|    4|
|2025-04-19|84484.98|85510.04| 84368.6| 85127.56|2.200324311390941E10|       3.125|2025|    4|
+----------+--------+--------+--------+---------+--------------------+------------+----+-----+
only showing top 5 rows

+----------+--------+--------+--------+---------+--------------------+------------+---------+
|      Date|open_btc|high_

                                                                                

In [114]:
df_combined = df_combined.join(
    df_sp500,
    on = "Date",
    how = "left"
)

df_combined = df_combined.orderBy(F.col("Date").desc())

df_combined.orderBy(F.col("Date").desc()).toPandas().head(30)

                                                                                

Unnamed: 0,Date,open_btc,high_btc,low_btc,close_btc,volume_btc,block_reward,cpi,open_sp500,high_sp500,low_sp500,close_sp500,volume_sp500
0,2025-04-23,93642.39,94383.81,92155.5,93565.02,79929150000.0,3.125,,5395.92,5469.69,5356.17,5375.86,3552094000.0
1,2025-04-22,87556.88,93729.04,87306.04,93375.16,61394090000.0,3.125,,5207.67,5309.61,5207.67,5287.76,3110056000.0
2,2025-04-21,85190.58,88358.48,85190.58,87525.71,46101740000.0,3.125,,5232.94,5232.94,5101.63,5158.2,2968349000.0
3,2025-04-20,85059.77,85320.15,84018.53,85095.42,23639650000.0,3.125,,,,,,
4,2025-04-19,84484.98,85510.04,84368.6,85127.56,22003240000.0,3.125,,,,,,
5,2025-04-18,84892.79,85109.28,84344.03,84464.64,30940390000.0,3.125,,,,,,
6,2025-04-17,84081.07,85426.94,83814.62,84931.22,43303480000.0,3.125,,5305.45,5328.31,5255.58,5282.7,3153700000.0
7,2025-04-16,83652.14,85332.95,83163.1,84051.96,45342370000.0,3.125,,5335.75,5367.24,5220.79,5275.7,3352866000.0
8,2025-04-15,84521.75,86413.41,83632.78,83677.65,57961000000.0,3.125,,5411.99,5450.41,5386.44,5396.63,2856819000.0
9,2025-04-14,83779.1,85756.0,83773.4,84495.07,70255420000.0,3.125,,5441.96,5459.46,5358.02,5405.97,3287278000.0


In [115]:
window_ffill = Window.orderBy("Date").rowsBetween(Window.unboundedPreceding, 0)

for colname in ["open_sp500", "high_sp500", "low_sp500", "close_sp500", "volume_sp500"]:
    df_combined = df_combined.withColumn(
        colname,
        F.last(colname, ignorenulls=True).over(window_ffill)
    )

In [116]:
df_combined.orderBy(F.col("Date").desc()).toPandas().head(30)

25/04/29 11:36:48 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:36:48 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:36:49 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:36:49 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:36:49 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:36:49 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 1

Unnamed: 0,Date,open_btc,high_btc,low_btc,close_btc,volume_btc,block_reward,cpi,open_sp500,high_sp500,low_sp500,close_sp500,volume_sp500
0,2025-04-23,93642.39,94383.81,92155.5,93565.02,79929150000.0,3.125,,5395.92,5469.69,5356.17,5375.86,3552094000.0
1,2025-04-22,87556.88,93729.04,87306.04,93375.16,61394090000.0,3.125,,5207.67,5309.61,5207.67,5287.76,3110056000.0
2,2025-04-21,85190.58,88358.48,85190.58,87525.71,46101740000.0,3.125,,5232.94,5232.94,5101.63,5158.2,2968349000.0
3,2025-04-20,85059.77,85320.15,84018.53,85095.42,23639650000.0,3.125,,5305.45,5328.31,5255.58,5282.7,3153700000.0
4,2025-04-19,84484.98,85510.04,84368.6,85127.56,22003240000.0,3.125,,5305.45,5328.31,5255.58,5282.7,3153700000.0
5,2025-04-18,84892.79,85109.28,84344.03,84464.64,30940390000.0,3.125,,5305.45,5328.31,5255.58,5282.7,3153700000.0
6,2025-04-17,84081.07,85426.94,83814.62,84931.22,43303480000.0,3.125,,5305.45,5328.31,5255.58,5282.7,3153700000.0
7,2025-04-16,83652.14,85332.95,83163.1,84051.96,45342370000.0,3.125,,5335.75,5367.24,5220.79,5275.7,3352866000.0
8,2025-04-15,84521.75,86413.41,83632.78,83677.65,57961000000.0,3.125,,5411.99,5450.41,5386.44,5396.63,2856819000.0
9,2025-04-14,83779.1,85756.0,83773.4,84495.07,70255420000.0,3.125,,5441.96,5459.46,5358.02,5405.97,3287278000.0


In [117]:
df_nulls = df_combined.filter(
    F.col("cpi").isNull() )
df_nulls.show(25)

25/04/29 11:36:52 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:36:53 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:36:53 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:36:53 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:36:53 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:36:53 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 1

+----------+--------+--------+--------+---------+--------------------+------------+----+----------+----------+---------+-----------+-------------+
|      Date|open_btc|high_btc| low_btc|close_btc|          volume_btc|block_reward| cpi|open_sp500|high_sp500|low_sp500|close_sp500| volume_sp500|
+----------+--------+--------+--------+---------+--------------------+------------+----+----------+----------+---------+-----------+-------------+
|2025-04-01|82612.64|85417.94|82470.01| 85216.81|4.768397020478049E10|       3.125|NULL|   5597.53|   5650.57|  5558.52|    5633.07|2.806321225E9|
|2025-04-02|85170.68|87898.01| 82487.4| 82548.31|5.237611157841463...|       3.125|NULL|   5580.76|   5695.31|  5571.48|    5670.97|2.785811866E9|
|2025-04-03|82259.03| 83781.7|81307.75| 83199.95|7.766843260005576E10|       3.125|NULL|   5492.74|   5499.53|  5390.83|    5396.52|5.005321892E9|
|2025-04-04|83259.08|84676.27|81767.53| 83879.86|6.263226227224738E10|       3.125|NULL|   5292.14|   5292.14|   5069.

                                                                                

In [118]:
# Definir ventana para forward fill
window_ffill = Window.orderBy("Date").rowsBetween(Window.unboundedPreceding, 0)

# Rellenar valores NULL en la columna 'cpi' con el último valor no nulo anterior
df_combined = df_combined.withColumn(
    "cpi",
    F.last("cpi", ignorenulls=True).over(window_ffill)
)

# (Opcional) Verifica el resultado
df_combined.select("Date", "cpi").orderBy(F.col("Date").desc()).show(25)

25/04/29 11:36:57 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:36:57 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:36:57 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:36:57 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:36:57 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:36:57 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 1

+----------+---------+
|      Date|      cpi|
+----------+---------+
|2025-04-23|146.95753|
|2025-04-22|146.95753|
|2025-04-21|146.95753|
|2025-04-20|146.95753|
|2025-04-19|146.95753|
|2025-04-18|146.95753|
|2025-04-17|146.95753|
|2025-04-16|146.95753|
|2025-04-15|146.95753|
|2025-04-14|146.95753|
|2025-04-13|146.95753|
|2025-04-12|146.95753|
|2025-04-11|146.95753|
|2025-04-10|146.95753|
|2025-04-09|146.95753|
|2025-04-08|146.95753|
|2025-04-07|146.95753|
|2025-04-06|146.95753|
|2025-04-05|146.95753|
|2025-04-04|146.95753|
|2025-04-03|146.95753|
|2025-04-02|146.95753|
|2025-04-01|146.95753|
|2025-03-31|146.95753|
|2025-03-30|146.95753|
+----------+---------+
only showing top 25 rows



In [119]:
df_combined.orderBy(F.col("Date").desc()).toPandas().head(30)

25/04/29 11:37:00 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:00 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:01 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:01 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:01 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:01 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 1

Unnamed: 0,Date,open_btc,high_btc,low_btc,close_btc,volume_btc,block_reward,cpi,open_sp500,high_sp500,low_sp500,close_sp500,volume_sp500
0,2025-04-23,93642.39,94383.81,92155.5,93565.02,79929150000.0,3.125,146.95753,5395.92,5469.69,5356.17,5375.86,3552094000.0
1,2025-04-22,87556.88,93729.04,87306.04,93375.16,61394090000.0,3.125,146.95753,5207.67,5309.61,5207.67,5287.76,3110056000.0
2,2025-04-21,85190.58,88358.48,85190.58,87525.71,46101740000.0,3.125,146.95753,5232.94,5232.94,5101.63,5158.2,2968349000.0
3,2025-04-20,85059.77,85320.15,84018.53,85095.42,23639650000.0,3.125,146.95753,5305.45,5328.31,5255.58,5282.7,3153700000.0
4,2025-04-19,84484.98,85510.04,84368.6,85127.56,22003240000.0,3.125,146.95753,5305.45,5328.31,5255.58,5282.7,3153700000.0
5,2025-04-18,84892.79,85109.28,84344.03,84464.64,30940390000.0,3.125,146.95753,5305.45,5328.31,5255.58,5282.7,3153700000.0
6,2025-04-17,84081.07,85426.94,83814.62,84931.22,43303480000.0,3.125,146.95753,5305.45,5328.31,5255.58,5282.7,3153700000.0
7,2025-04-16,83652.14,85332.95,83163.1,84051.96,45342370000.0,3.125,146.95753,5335.75,5367.24,5220.79,5275.7,3352866000.0
8,2025-04-15,84521.75,86413.41,83632.78,83677.65,57961000000.0,3.125,146.95753,5411.99,5450.41,5386.44,5396.63,2856819000.0
9,2025-04-14,83779.1,85756.0,83773.4,84495.07,70255420000.0,3.125,146.95753,5441.96,5459.46,5358.02,5405.97,3287278000.0


In [120]:
df_combined.filter(
    F.col("open_sp500").isNull()
).show()

25/04/29 11:37:05 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:05 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:05 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:05 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:05 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:05 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 1

+----------+--------+--------+-------+---------+----------+------------+--------+----------+----------+---------+-----------+------------+
|      Date|open_btc|high_btc|low_btc|close_btc|volume_btc|block_reward|     cpi|open_sp500|high_sp500|low_sp500|close_sp500|volume_sp500|
+----------+--------+--------+-------+---------+----------+------------+--------+----------+----------+---------+-----------+------------+
|2010-07-17|    0.05|    0.05|   0.05|     0.05|       0.0|        50.0|100.0538|      NULL|      NULL|     NULL|       NULL|        NULL|
|2010-07-18|  0.0858|  0.0858| 0.0858|   0.0858|       0.0|        50.0|100.0538|      NULL|      NULL|     NULL|       NULL|        NULL|
+----------+--------+--------+-------+---------+----------+------------+--------+----------+----------+---------+-----------+------------+



                                                                                

In [121]:
df_combined = df_combined.filter(F.col("open_sp500").isNotNull())

In [122]:
df_combined.orderBy(F.col("Date").asc()).show()

25/04/29 11:37:09 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:09 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:09 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:09 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:09 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:09 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 1

+----------+--------+--------+-------+---------+----------+------------+---------+----------+----------+---------+-----------+-------------+
|      Date|open_btc|high_btc|low_btc|close_btc|volume_btc|block_reward|      cpi|open_sp500|high_sp500|low_sp500|close_sp500| volume_sp500|
+----------+--------+--------+-------+---------+----------+------------+---------+----------+----------+---------+-----------+-------------+
|2010-07-19|  0.0808|  0.0808| 0.0808|   0.0808|       0.0|        50.0| 100.0538|   1066.85|    1074.7|  1061.11|    1071.25|2.271944444E9|
|2010-07-20|  0.0747|  0.0747| 0.0747|   0.0747|       0.0|        50.0| 100.0538|   1064.53|   1083.94|  1056.88|    1083.48|2.618488889E9|
|2010-07-21|  0.0792|  0.0792| 0.0792|   0.0792|       0.0|        50.0| 100.0538|   1086.67|   1088.96|  1065.25|    1069.59|2.637322222E9|
|2010-07-22|  0.0505|  0.0505| 0.0505|   0.0505|       0.0|        50.0| 100.0538|   1072.14|    1097.5|  1072.14|    1093.67|2.681611111E9|
|2010-07-23| 

In [123]:
df_combined.select(
    F.min("Date").alias("Fecha Minima"),
    F.max("Date").alias("Fecha Maxima")
).show()

25/04/29 11:37:12 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:12 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:13 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:13 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:13 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:13 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 1

+------------+------------+
|Fecha Minima|Fecha Maxima|
+------------+------------+
|  2010-07-19|  2025-04-23|
+------------+------------+



## 📊 Indicadores Técnicos para Bitcoin (PySpark)

Este conjunto de indicadores técnicos ajuda a capturar tendências, momentum e volatilidade do preço da Bitcoin com base na coluna `close_btc`.

---

### ✅ 1. SMA (Simple Moving Average)
A média simples de `n` dias. Ajuda a identificar a tendência geral do mercado.

### ✅ 2. EMA (Exponential Moving Average)
Semelhante à SMA, mas dá mais peso aos preços recentes. Usamos uma função UDF baseada em Pandas para o cálculo real.

### ✅ 3. MACD (Moving Average Convergence Divergence)
Calculado como `EMA_12 - EMA_26`. A "signal line" é a média móvel de 9 dias do MACD. Mostra mudanças de direção na tendência.

### ✅ 4. RSI (Relative Strength Index)
Oscilador que mede a força do movimento recente. RSI > 70 → sobrecompra; < 30 → sobrevenda.

### ✅ 5. Bollinger Bands
Faixas de volatilidade baseadas na `SMA_20` ± 2 desvios padrão. Quando o preço toca as bandas extremas, pode indicar inversão de tendência.

---

In [124]:
df_combined.write.option("header", True).mode("overwrite").csv("../data/04/final_btc_ml_dataset.csv")

25/04/29 11:37:16 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:16 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:16 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:16 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:16 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:16 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 1

In [125]:
df_combined = spark.read.option("header", True).option("inferSchema", True).csv("../data/04/final_btc_ml_dataset.csv")
df_combined.printSchema()

root
 |-- Date: date (nullable = true)
 |-- open_btc: double (nullable = true)
 |-- high_btc: double (nullable = true)
 |-- low_btc: double (nullable = true)
 |-- close_btc: double (nullable = true)
 |-- volume_btc: double (nullable = true)
 |-- block_reward: double (nullable = true)
 |-- cpi: double (nullable = true)
 |-- open_sp500: double (nullable = true)
 |-- high_sp500: double (nullable = true)
 |-- low_sp500: double (nullable = true)
 |-- close_sp500: double (nullable = true)
 |-- volume_sp500: double (nullable = true)



In [126]:
# ==================== 1. SMA_20 - Média móvel simples de 20 dias ====================
w = Window.orderBy("Date")
window_20 = w.rowsBetween(-19, 0)

df_combined = df_combined.withColumn("SMA_20", F.avg("close_btc").over(window_20))

# ==================== 2. Bollinger Bands - com base na SMA_20 ====================
df_combined = df_combined.withColumn("BB_std", F.stddev("close_btc").over(window_20))
df_combined = df_combined.withColumn("BB_upper", F.col("SMA_20") + 2 * F.col("BB_std"))
df_combined = df_combined.withColumn("BB_lower", F.col("SMA_20") - 2 * F.col("BB_std"))

# ==================== 3. RSI - Índice de força relativa (14 dias) ====================
window_14 = w.rowsBetween(-13, 0)
delta = F.col("close_btc") - F.lag("close_btc", 1).over(w)
gain = F.when(delta > 0, delta).otherwise(0)
loss = F.when(delta < 0, -delta).otherwise(0)

df_combined = df_combined.withColumn("gain", gain).withColumn("loss", loss)
df_combined = df_combined.withColumn("avg_gain", F.avg("gain").over(window_14))
df_combined = df_combined.withColumn("avg_loss", F.avg("loss").over(window_14))
df_combined = df_combined.withColumn("RS", F.col("avg_gain") / F.col("avg_loss"))
df_combined = df_combined.withColumn("RSI", 100 - (100 / (1 + F.col("RS"))))

# ==================== 4. EMA_12 - Aproximação com pesos exponenciais ====================
def add_ema_column(df, col_name, new_col, n):
    alpha = 2 / (n + 1)
    weights = [((1 - alpha) ** i) for i in reversed(range(n))]
    norm_weights = [w / sum(weights) for w in weights]
    for i, peso in enumerate(norm_weights):
        df = df.withColumn(f"{new_col}_lag_{i}", F.lag(col_name, i).over(w) * F.lit(peso))
    df = df.withColumn(new_col, sum(F.col(f"{new_col}_lag_{i}") for i in range(n)))
    for i in range(n):
        df = df.drop(f"{new_col}_lag_{i}")
    return df

df_combined = add_ema_column(df_combined, "close_btc", "EMA_12", 12)
df_combined = add_ema_column(df_combined, "close_btc", "EMA_26", 26)

# ==================== 5. MACD - Diferença entre EMAs ====================
df_combined = df_combined.withColumn("MACD", F.col("EMA_12") - F.col("EMA_26"))

# ==================== 6. MACD_signal - Média móvel de 9 dias do MACD ====================
window_9 = w.rowsBetween(-8, 0)
df_combined = df_combined.withColumn("MACD_signal", F.avg("MACD").over(window_9))

# ==================== Mostrar as variáveis calculadas ====================
df_combined.select(
    "Date", "close_btc",
    "SMA_20", "BB_upper", "BB_lower",
    "RSI", "EMA_12", "EMA_26", "MACD", "MACD_signal"
).orderBy("Date") \
 .limit(40) \
 .coalesce(1) \
 .show()



25/04/29 11:37:40 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:40 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:40 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:40 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:40 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:40 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 1

+----------+---------+--------------------+-------------------+--------------------+------------------+--------------------+------+----+-----------+
|      Date|close_btc|              SMA_20|           BB_upper|            BB_lower|               RSI|              EMA_12|EMA_26|MACD|MACD_signal|
+----------+---------+--------------------+-------------------+--------------------+------------------+--------------------+------+----+-----------+
|2010-07-19|   0.0808|              0.0808|               NULL|                NULL|              NULL|                NULL|  NULL|NULL|       NULL|
|2010-07-20|   0.0747|             0.07775|0.08637670273047587| 0.06912329726952413|               0.0|                NULL|  NULL|NULL|       NULL|
|2010-07-21|   0.0792| 0.07823333333333334|0.08455894265839707|  0.0719077240082696| 42.45283018867929|                NULL|  NULL|NULL|       NULL|
|2010-07-22|   0.0505|              0.0713|0.09951016365307605| 0.04308983634692395|11.450381679389324|   

                                                                                

# 📊 O que significam os valores calculados?

A tabela mostra vários **indicadores técnicos** baseados no preço da Bitcoin. Aqui explicamos **de forma simples** o que cada um quer dizer, e como ler os valores (como por exemplo um `RSI = 42`).

---

## ✅ `SMA_20` — Média Móvel Simples (20 dias)

- **É a média dos últimos 20 preços de fecho** da Bitcoin.
- Ajuda a perceber a **tendência geral**:
  - Se o preço estiver sempre acima da média → tendência de subida
  - Se estiver abaixo → tendência de descida

💡 *Exemplo:*  
Se `SMA_20 = 0.07`, e o preço atual for `0.08`, isso pode indicar uma subida recente.

---

## ✅ `BB_upper` e `BB_lower` — Bandas de Bollinger

- São dois limites com base na média e na **volatilidade** (variação dos preços).
- Mostram se o preço está a sair fora do "normal":
  - `BB_upper` → limite superior (pode indicar sobrecompra)
  - `BB_lower` → limite inferior (pode indicar sobrevenda)

💡 *Exemplo:*  
Se o preço ultrapassar `BB_upper`, pode estar "alto demais" e vir a descer.

---

## ✅ `RSI` — Índice de Força Relativa

- Mede **a força das subidas e descidas recentes**, entre 0 e 100.
- Interpretação típica:
  - **RSI > 70** → Bitcoin pode estar "cara demais" (sobrecompra)
  - **RSI < 30** → Pode estar "barata demais" (sobrevenda)
  - **RSI entre 40 e 60** → Sem tendência forte

💡 *Exemplo:*  
Se o `RSI = 42`, como no início do ficheiro, quer dizer:
> A força das subidas e descidas está **moderada**. O mercado não está nem muito quente nem muito frio — está estável.

---

## ✅ `EMA_12` e `EMA_26` — Médias Móveis Exponenciais

- São parecidas com a `SMA`, mas **dão mais peso aos preços mais recentes**.
- Usadas para reações mais rápidas às mudanças do mercado:
  - `EMA_12` → segue tendência de curto prazo
  - `EMA_26` → segue tendência mais lenta (longo prazo)

💡 *Se `EMA_12 > EMA_26`, tendência de subida*  
💡 *Se `EMA_12 < EMA_26`, tendência de descida*

---

## ✅ `MACD` e `MACD_signal`

- `MACD` = diferença entre `EMA_12` e `EMA_26`
- `MACD_signal` = média do MACD dos últimos 9 dias
- Usados para **sinais de compra ou venda**:
  - Quando `MACD` cruza **para cima** do `signal` → possível compra
  - Quando cruza **para baixo** → possível venda

---

## ❓ E os `NULL`?

Significa apenas que **não havia dados suficientes antes** para calcular a média. Por exemplo:
- A `EMA_26` só aparece a partir da **linha 26**
- O `MACD_signal` só aparece a partir da **linha 35**

---

## ✅ Conclusão

Estes indicadores ajudam a:
- Ver se o preço está a subir ou a descer
- Saber se está "caro demais" ou "barato demais"
- Detetar possíveis pontos de compra ou venda

Tudo isto foi calculado com **PySpark puro** a partir dos preços históricos da Bitcoin.


In [127]:
df_combined.printSchema()

root
 |-- Date: date (nullable = true)
 |-- open_btc: double (nullable = true)
 |-- high_btc: double (nullable = true)
 |-- low_btc: double (nullable = true)
 |-- close_btc: double (nullable = true)
 |-- volume_btc: double (nullable = true)
 |-- block_reward: double (nullable = true)
 |-- cpi: double (nullable = true)
 |-- open_sp500: double (nullable = true)
 |-- high_sp500: double (nullable = true)
 |-- low_sp500: double (nullable = true)
 |-- close_sp500: double (nullable = true)
 |-- volume_sp500: double (nullable = true)
 |-- SMA_20: double (nullable = true)
 |-- BB_std: double (nullable = true)
 |-- BB_upper: double (nullable = true)
 |-- BB_lower: double (nullable = true)
 |-- gain: double (nullable = true)
 |-- loss: double (nullable = true)
 |-- avg_gain: double (nullable = true)
 |-- avg_loss: double (nullable = true)
 |-- RS: double (nullable = true)
 |-- RSI: double (nullable = true)
 |-- EMA_12: double (nullable = true)
 |-- EMA_26: double (nullable = true)
 |-- MACD: doub

In [128]:
df_combined = df_combined.drop("gain", "loss", "avg_gain", "avg_loss", "RS")

df_combined = df_combined \
    .withColumnRenamed("SMA_20", "SMA_20_btc") \
    .withColumnRenamed("BB_std", "BB_std_btc") \
    .withColumnRenamed("BB_upper", "BB_upper_btc") \
    .withColumnRenamed("BB_lower", "BB_lower_btc") \
    .withColumnRenamed("RSI", "RSI_btc") \
    .withColumnRenamed("EMA_12", "EMA_12_btc") \
    .withColumnRenamed("EMA_26", "EMA_26_btc") \
    .withColumnRenamed("MACD", "MACD_btc") \
    .withColumnRenamed("MACD_signal", "MACD_signal_btc")


In [129]:
df_combined.select("Date", "close_btc", "SMA_20_btc", "BB_std_btc", "BB_upper_btc", "BB_lower_btc", 
                   "RSI_btc", "EMA_12_btc", "EMA_26_btc", "MACD_btc", "MACD_signal_btc").orderBy(F.col("Date").desc()).toPandas().head(10)

25/04/29 11:37:50 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:50 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:50 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:50 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:50 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:37:50 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 1

Unnamed: 0,Date,close_btc,SMA_20_btc,BB_std_btc,BB_upper_btc,BB_lower_btc,RSI_btc,EMA_12_btc,EMA_26_btc,MACD_btc,MACD_signal_btc
0,2025-04-23,93565.02,84122.6965,4178.050327,92478.797155,75766.595845,74.014549,85155.263748,82976.819485,2178.444262,-1043.961254
1,2025-04-22,93375.16,83604.443,3539.166126,90682.775251,76526.110749,79.466042,84663.106156,82978.502355,1684.603801,-1546.72868
2,2025-04-21,87525.71,83063.1005,2692.833692,88448.767885,77677.433115,65.972927,83555.9632,83222.585081,333.378119,-2012.236797
3,2025-04-20,85095.42,82947.6555,2536.395523,88020.446545,77874.864455,63.909006,83294.453008,83491.629625,-197.176617,-2252.612708
4,2025-04-19,85127.56,82819.949,2486.371318,87792.691636,77847.206364,52.608004,82021.944677,83814.941612,-1792.996936,-2436.545597
5,2025-04-18,84464.64,82680.862,2427.600398,87536.062796,77825.661204,50.999097,81456.710322,84103.903903,-2647.193581,-2432.260908
6,2025-04-17,84931.22,82590.5855,2391.071577,87372.728654,77808.442346,52.936471,80825.254398,84240.723994,-3415.469596,-2328.123712
7,2025-04-16,84051.96,82565.425,2367.678636,87300.782272,77830.067728,52.570243,81217.059861,84194.817309,-2977.757448,-2100.373959
8,2025-04-15,83677.65,82724.515,2570.987043,87866.489087,77582.540913,47.560399,81622.599529,84184.08282,-2561.483292,-1798.113555
9,2025-04-14,84495.07,82887.903,2733.448648,88354.800295,77421.005705,52.924521,81853.827648,84200.290222,-2346.462575,-1472.138772


# 📊 Resumo Técnico — Bitcoin (10 dias finais)

Este é um resumo dos indicadores técnicos para os últimos 10 dias da Bitcoin, com foco na tendência, força de mercado e possíveis sinais de compra/venda.

---

## 📈 Tendência Geral
- O **preço está acima da média móvel de 20 dias (`SMA_20_btc`)** em todos os dias → indica **tendência de alta**.
- O **preço aproxima-se das bandas superiores (`BB_upper_btc`)**, sugerindo possível **zona de sobrecompra**.

---

## 💪 Força do Movimento (RSI)
- `RSI_btc` varia entre **47 e 66**
  - Acima de 50 → tendência **positiva**
  - Mais próximo de 70 (ex: `65.97` no dia 21/04) → **força compradora alta**

---

## 📉 MACD (Tendência e impulso)
- `MACD_btc` sobe de **-3415 (17/04)** para **+333 (21/04)**  
  → clara **inversão de tendência para alta**
- `MACD_signal_btc` continua negativo, mas o `MACD` ultrapassa-o no dia 21 → **sinal de compra**

---

## 🧠 Interpretação
| Data       | RSI   | MACD     | Sinal Técnico        |
|------------|-------|----------|-----------------------|
| 17–19 Abr  | 52-53 | -3400 a -1793 | Tendência fraca, mas a recuperar |
| 20 Abr     | 63.91 | -197     | Reversão a acontecer  |
| 21 Abr     | 65.97 | +333     | Tendência de subida forte ✔️ |

---

## ✅ Conclusão
Os dados mostram uma **recuperação clara da Bitcoin** com sinais técnicos positivos, principalmente a partir de 19/04. O cruzamento do MACD com o sinal em 21/04 reforça a ideia de entrada numa nova fase de alta.


# Inclusão EUR e Dolar vs CHF
Data: 
https://www.wsj.com/market-data/quotes/fx/EURCHF/historical-prices
https://www.wsj.com/market-data/quotes/fx/USDCHF/historical-prices

In [130]:
df_usdchf = spark.read.option("header", True).option("inferSchema", True).csv("../data/04/USD_CHF.csv")

df_usdchf.select(
    F.min("Date").alias("min"),
    F.max("Date").alias("max")) \
    .show()

df_usdchf.printSchema()

df_usdchf = df_usdchf.withColumn("Date", F.to_date("Date", "MM/dd/yy"))
df_usdchf = df_usdchf.select(
    F.col("Date"),
    F.col(" Close").alias("USDCHF")
)
df_usdchf.show(1)

+--------+--------+
|     min|     max|
+--------+--------+
|01/01/10|12/31/24|
+--------+--------+

root
 |-- Date: string (nullable = true)
 |--  Open: double (nullable = true)
 |--  High: double (nullable = true)
 |--  Low: double (nullable = true)
 |--  Close: double (nullable = true)

+----------+------+
|      Date|USDCHF|
+----------+------+
|2025-04-23|0.8309|
+----------+------+
only showing top 1 row



In [131]:
df_eurchf = spark.read.option("header", True).option("inferSchema", True).csv("../data/04/EUR_CHF.csv")

df_eurchf.select(
    F.min("Date").alias("min"),
    F.max("Date").alias("max")) \
    .show()

df_eurchf.printSchema()

df_eurchf = df_eurchf.withColumn("Date", F.to_date("Date", "MM/dd/yy"))
df_eurchf = df_eurchf.select(
    F.col("Date"),
    F.col(" Close").alias("EURCHF")
)
df_eurchf.show(1)

+--------+--------+
|     min|     max|
+--------+--------+
|01/01/10|12/31/24|
+--------+--------+

root
 |-- Date: string (nullable = true)
 |--  Open: double (nullable = true)
 |--  High: double (nullable = true)
 |--  Low: double (nullable = true)
 |--  Close: double (nullable = true)

+----------+------+
|      Date|EURCHF|
+----------+------+
|2025-04-23|0.9403|
+----------+------+
only showing top 1 row



In [132]:
df_combined = df_combined \
    .join(df_usdchf, on="Date", how="left") \
    .join(df_eurchf, on="Date", how="left")

In [133]:
df_combined.printSchema()

root
 |-- Date: date (nullable = true)
 |-- open_btc: double (nullable = true)
 |-- high_btc: double (nullable = true)
 |-- low_btc: double (nullable = true)
 |-- close_btc: double (nullable = true)
 |-- volume_btc: double (nullable = true)
 |-- block_reward: double (nullable = true)
 |-- cpi: double (nullable = true)
 |-- open_sp500: double (nullable = true)
 |-- high_sp500: double (nullable = true)
 |-- low_sp500: double (nullable = true)
 |-- close_sp500: double (nullable = true)
 |-- volume_sp500: double (nullable = true)
 |-- SMA_20_btc: double (nullable = true)
 |-- BB_std_btc: double (nullable = true)
 |-- BB_upper_btc: double (nullable = true)
 |-- BB_lower_btc: double (nullable = true)
 |-- RSI_btc: double (nullable = true)
 |-- EMA_12_btc: double (nullable = true)
 |-- EMA_26_btc: double (nullable = true)
 |-- MACD_btc: double (nullable = true)
 |-- MACD_signal_btc: double (nullable = true)
 |-- USDCHF: double (nullable = true)
 |-- EURCHF: double (nullable = true)



In [134]:
df_combined.write.option("header", True).mode("overwrite").csv("../data/04/final_btc_ml_dataset.csv")

25/04/29 11:38:02 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:38:02 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:38:02 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:38:02 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:38:02 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 11:38:02 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
25/04/29 1

In [135]:
spark.stop()