# 📡 Data Extraction — Open-Meteo API

Este notebook realiza a **extração de dados climáticos** da API pública [Open-Meteo](https://open-meteo.com/).  
Os dados são referentes a previsões diárias de temperatura para os **próximos 90 dias** em cidades da **América do Sul**.

📌 Etapa: Extração  
💾 Destino: Camada Bronze do Data Lake (formato Delta)  
🌎 Cidades: São Paulo, Buenos Aires, Lima, Santiago, Bogotá


In [1]:
# Imports principais: bibliotecas padrão e PySpark
import requests  # Para chamadas HTTP
import pandas as pd  # Para manipulação tabular com DataFrame
from pyspark.sql import SparkSession  # Para iniciar a SparkSession
from delta import configure_spark_with_delta_pip  # Para usar Delta Lake
from datetime import datetime, timedelta, date  # Para trabalhar com datas



In [2]:
# Criando a SparkSession com suporte ao Delta Lake (Delta Spark)


builder = SparkSession.builder \
    .appName("DeltaLakeIngestion") \
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")

spark = configure_spark_with_delta_pip(builder).getOrCreate()




25/04/14 20:19:40 WARN Utils: Your hostname, obi-wan-kenote resolves to a loopback address: 127.0.1.1; using 10.255.255.254 instead (on interface lo)
25/04/14 20:19:40 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address


:: loading settings :: url = jar:file:/home/kenote_ubuntu/projetos/Airflow/.venv/lib/python3.10/site-packages/pyspark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml


Ivy Default Cache set to: /home/kenote_ubuntu/.ivy2/cache
The jars for the packages stored in: /home/kenote_ubuntu/.ivy2/jars
io.delta#delta-core_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-a4c697e0-ca59-441c-a089-1d9c3439ae3c;1.0
	confs: [default]
	found io.delta#delta-core_2.12;2.3.0 in central
	found io.delta#delta-storage;2.3.0 in central
	found org.antlr#antlr4-runtime;4.8 in central
:: resolution report :: resolve 159ms :: artifacts dl 6ms
	:: modules in use:
	io.delta#delta-core_2.12;2.3.0 from central in [default]
	io.delta#delta-storage;2.3.0 from central in [default]
	org.antlr#antlr4-runtime;4.8 from central in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   3  

25/04/14 20:19:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


In [3]:
# Lista de cidades com suas coordenadas geográficas
# Fonte: Google Maps ou dados oficiais da API
locations = [
    {"city": "Cabedelo", "lat": -6.97, "lon": -34.83},
    {"city": "João Pessoa", "lat": -7.12, "lon": -34.87},
    {"city": "São Paulo", "lat": -23.55, "lon": -46.63},
    {"city": "Buenos Aires", "lat": -34.60, "lon": -58.38},
    {"city": "Lima", "lat": -12.04, "lon": -77.03},
    {"city": "Santiago", "lat": -33.45, "lon": -70.66},
    {"city": "Bogotá", "lat": 4.71, "lon": -74.07}
]


In [4]:
from datetime import datetime, timedelta, date
import requests

# 1. Define as datas válidas
start_date = datetime.today().date()
max_forecast_date = date(2025, 4, 29)
end_date = min(start_date + timedelta(days=89), max_forecast_date)

# 2. Converte para string no formato YYYY-MM-DD
start_date_str = start_date.isoformat()
end_date_str = end_date.isoformat()

print(f"📅 Coletando dados de {start_date_str} até {end_date_str}\n")

# 3. Lista para armazenar os dados
raw_data = []

# 4. Faz a requisição para cada cidade
for loc in locations:
    url = (
        f"https://api.open-meteo.com/v1/forecast"
        f"?latitude={loc['lat']}&longitude={loc['lon']}"
        f"&daily=temperature_2m_max,temperature_2m_min"
        f"&timezone=auto"
        f"&start_date={start_date_str}&end_date={end_date_str}"
    )

    print(f"🔍 Testando URL para {loc['city']}:\n{url}\n")

    response = requests.get(url)
    print(f"📨 Resposta da API ({response.status_code}) para {loc['city']}:\n{response.text[:200]}...\n")

    if response.status_code == 200:
        json = response.json()
        if "daily" in json and "time" in json["daily"]:
            for i in range(len(json["daily"]["time"])):
                raw_data.append({
                    "city": loc["city"],
                    "date": json["daily"]["time"][i],
                    "temp_min": json["daily"]["temperature_2m_min"][i],
                    "temp_max": json["daily"]["temperature_2m_max"][i],
                })
            print(f"✅ Dados coletados com sucesso para: {loc['city']}\n")
        else:
            print(f"⚠️ Dados ausentes no corpo da resposta para: {loc['city']}\n")
    else:
        print(f"❌ Erro ao acessar a API para: {loc['city']} — status: {response.status_code}\n")






📅 Coletando dados de 2025-04-14 até 2025-04-29

🔍 Testando URL para Cabedelo:
https://api.open-meteo.com/v1/forecast?latitude=-6.97&longitude=-34.83&daily=temperature_2m_max,temperature_2m_min&timezone=auto&start_date=2025-04-14&end_date=2025-04-29



📨 Resposta da API (200) para Cabedelo:
{"latitude":-7.0,"longitude":-34.875,"generationtime_ms":0.03612041473388672,"utc_offset_seconds":-10800,"timezone":"America/Fortaleza","timezone_abbreviation":"GMT-3","elevation":6.0,"daily_units":{"...

✅ Dados coletados com sucesso para: Cabedelo

🔍 Testando URL para João Pessoa:
https://api.open-meteo.com/v1/forecast?latitude=-7.12&longitude=-34.87&daily=temperature_2m_max,temperature_2m_min&timezone=auto&start_date=2025-04-14&end_date=2025-04-29



📨 Resposta da API (200) para João Pessoa:
{"latitude":-7.125,"longitude":-34.875,"generationtime_ms":0.04100799560546875,"utc_offset_seconds":-10800,"timezone":"America/Fortaleza","timezone_abbreviation":"GMT-3","elevation":43.0,"daily_units"...

✅ Dados coletados com sucesso para: João Pessoa

🔍 Testando URL para São Paulo:
https://api.open-meteo.com/v1/forecast?latitude=-23.55&longitude=-46.63&daily=temperature_2m_max,temperature_2m_min&timezone=auto&start_date=2025-04-14&end_date=2025-04-29



📨 Resposta da API (200) para São Paulo:
{"latitude":-23.5,"longitude":-46.5,"generationtime_ms":0.036597251892089844,"utc_offset_seconds":-10800,"timezone":"America/Sao_Paulo","timezone_abbreviation":"GMT-3","elevation":737.0,"daily_units":...

✅ Dados coletados com sucesso para: São Paulo

🔍 Testando URL para Buenos Aires:
https://api.open-meteo.com/v1/forecast?latitude=-34.6&longitude=-58.38&daily=temperature_2m_max,temperature_2m_min&timezone=auto&start_date=2025-04-14&end_date=2025-04-29



📨 Resposta da API (200) para Buenos Aires:
{"latitude":-34.625,"longitude":-58.5,"generationtime_ms":0.035643577575683594,"utc_offset_seconds":-10800,"timezone":"America/Argentina/Buenos_Aires","timezone_abbreviation":"GMT-3","elevation":23.0,...

✅ Dados coletados com sucesso para: Buenos Aires

🔍 Testando URL para Lima:
https://api.open-meteo.com/v1/forecast?latitude=-12.04&longitude=-77.03&daily=temperature_2m_max,temperature_2m_min&timezone=auto&start_date=2025-04-14&end_date=2025-04-29



📨 Resposta da API (200) para Lima:
{"latitude":-11.875,"longitude":-77.125,"generationtime_ms":0.04017353057861328,"utc_offset_seconds":-18000,"timezone":"America/Lima","timezone_abbreviation":"GMT-5","elevation":152.0,"daily_units":{"...

✅ Dados coletados com sucesso para: Lima

🔍 Testando URL para Santiago:
https://api.open-meteo.com/v1/forecast?latitude=-33.45&longitude=-70.66&daily=temperature_2m_max,temperature_2m_min&timezone=auto&start_date=2025-04-14&end_date=2025-04-29



📨 Resposta da API (200) para Santiago:
{"latitude":-33.5,"longitude":-70.625,"generationtime_ms":0.031948089599609375,"utc_offset_seconds":-14400,"timezone":"America/Santiago","timezone_abbreviation":"GMT-4","elevation":541.0,"daily_units"...

✅ Dados coletados com sucesso para: Santiago

🔍 Testando URL para Bogotá:
https://api.open-meteo.com/v1/forecast?latitude=4.71&longitude=-74.07&daily=temperature_2m_max,temperature_2m_min&timezone=auto&start_date=2025-04-14&end_date=2025-04-29



📨 Resposta da API (200) para Bogotá:
{"latitude":4.875,"longitude":-74.25,"generationtime_ms":0.052809715270996094,"utc_offset_seconds":-18000,"timezone":"America/Bogota","timezone_abbreviation":"GMT-5","elevation":2558.0,"daily_units":{...

✅ Dados coletados com sucesso para: Bogotá



In [5]:
# Convertendo os dados para um DataFrame Pandas
df_pd = pd.DataFrame(raw_data)

# Convertendo o DataFrame Pandas para DataFrame Spark
df_spark = spark.createDataFrame(df_pd)


  for column, series in pdf.iteritems():
  for column, series in pdf.iteritems():


In [6]:
# Caminho da camada Bronze
bronze_path = "/home/kenote_ubuntu/projetos/Airflow/data/bronze/open_meteo"

# Salvando os dados brutos no formato Delta
df_spark.write.format("delta") \
    .mode("overwrite") \
    .save(bronze_path)


[Stage 1:>                                                        (0 + 12) / 12]

25/04/14 20:20:04 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 95.00% for 8 writers
25/04/14 20:20:04 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 84.44% for 9 writers
25/04/14 20:20:04 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 76.00% for 10 writers
25/04/14 20:20:04 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 69.09% for 11 writers
25/04/14 20:20:04 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 63.33% for 12 writers


25/04/14 20:20:04 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 69.09% for 11 writers
25/04/14 20:20:04 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 76.00% for 10 writers
25/04/14 20:20:04 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 84.44% for 9 writers
25/04/14 20:20:04 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 95.00% for 8 writers


                                                                                

25/04/14 20:20:05 WARN package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.








                                                                                



                                                                                

In [7]:
# Releitura do arquivo salvo para validar a persistência no Delta
df_read = spark.read.format("delta").load(bronze_path)

# Visualização final para garantir que está tudo certo
df_read.toPandas().head(10)


Unnamed: 0,city,date,temp_min,temp_max
0,Bogotá,2025-04-17,11.6,20.6
1,Bogotá,2025-04-18,13.4,21.3
2,Bogotá,2025-04-19,13.1,21.6
3,Bogotá,2025-04-20,13.3,19.3
4,Bogotá,2025-04-21,11.5,19.0
5,Bogotá,2025-04-22,10.6,18.1
6,Bogotá,2025-04-23,11.0,18.0
7,Bogotá,2025-04-24,10.6,16.3
8,Bogotá,2025-04-25,9.8,14.9
9,Bogotá,2025-04-26,9.2,18.5


# Conclusao

✅ Extração de dados de clima da API Open-Meteo, salvando os dados crus na camada Bronze do Data Lake, mantendo:

📦 Dados não transformados

✅ Arquivo Delta versionável

🧪 Visualizações pré e pós salvamento

✨ Organização e documentação ideal para projetos profissionais