# 02 - Enriquecimento de Dados
Autora: Fernanda Baptista de Siqueira  
Curso: MBA em Tecnologia para Negócios – AI, Data Science e Big Data  
Tema: Análise de Acidentes de Trânsito em Porto Alegre (2020–2024)  
Origem DataFrame: Equipe Armazém de Dados de Mobilidade - EAMOB/CIET  
https://dadosabertos.poa.br/dataset/acidentes-de-transito-acidentes (11/05/2025)  

### 1. Importa bibliotecas e carrega Dataset

In [18]:
from config import (
    pd, np, os, salvar_parquet,
    resumo_df, checar_nulos, COORD,
    ANOS, PATH_CHUVA, URL, PATH_CLEAN
)

import openmeteo_requests
import requests_cache
from retry_requests import retry


### 2. Configura API Open-Meteo com cache e retry em caso de erro 

In [19]:
cache_session = requests_cache.CachedSession('.cache', expire_after = -1)
retry_session = retry(cache_session, retries = 5, backoff_factor = 0.2)
openmeteo = openmeteo_requests.Client(session = retry_session)

### 3. Cria função para chamar API

*Função para coletar dados horários de precipitação de um ponto (lat/lon)
para um determinado ano, salvando em formato parquet.*

*Args:*  
* *lat (float): Latitude*
* *lon (float): Longitude*
* *nome (str): Nome da região (ex.: NORTE)*


In [20]:
def dados_chuva(lat: float, lon: float, nome: str):
   
    os.makedirs(PATH_CHUVA, exist_ok=True)

    for ano in ANOS:
        params = {
            "latitude": lat,
            "longitude": lon,
            "start_date": f"{ano}-01-01",
            "end_date": f"{ano}-12-31",
            "hourly": 'precipitation',
            "timezone": 'America/Sao_Paulo'
        }

        responses = openmeteo.weather_api(URL, params=params)
        response = responses[0]
        hourly = response.Hourly()

        chuva_hora = hourly.Variables(0).ValuesAsNumpy()

        df_chuva = pd.DataFrame({
            "data": pd.date_range(
                start=pd.to_datetime(hourly.Time(), unit="s", utc=True),
                end=pd.to_datetime(hourly.TimeEnd(), unit="s", utc=True),
                freq=pd.Timedelta(seconds=hourly.Interval()),
                inclusive="left"
            ),
            "chuva": chuva_hora
        })

        nome_arquivo = f"{PATH_CHUVA}{nome.lower()}_{ano}.parquet"
        salvar_parquet(df_chuva, nome_arquivo)
    return df_chuva

### 4. Chama API por região  
1) Região NORTE

In [21]:
lat, lon = COORD["NORTE"]
dados_chuva(lat, lon, "NORTE")

# Confere df
df_teste = pd.read_parquet(f"{PATH_CHUVA}norte_2021.parquet")
resumo_df(df_teste)
checar_nulos(df_teste)

Salvo: ../dados/intermediarios/clima/norte_2020.parquet
Salvo: ../dados/intermediarios/clima/norte_2021.parquet
Salvo: ../dados/intermediarios/clima/norte_2022.parquet
Salvo: ../dados/intermediarios/clima/norte_2023.parquet
Salvo: ../dados/intermediarios/clima/norte_2024.parquet
Dimensões: (8760, 2)

Tipos de dados:
data     datetime64[ns, UTC]
chuva                float32
dtype: object

Nulos por coluna:
data     0
chuva    0
dtype: int64


Unnamed: 0,data,chuva
0,2021-01-01 03:00:00+00:00,0.0
1,2021-01-01 04:00:00+00:00,0.0
2,2021-01-01 05:00:00+00:00,0.0
3,2021-01-01 06:00:00+00:00,0.0
4,2021-01-01 07:00:00+00:00,0.0


Percentual de valores nulos por coluna (%):


data    0.00
chuva   0.00
dtype: float64

2) Região LESTE

In [14]:
lat, lon = COORD["LESTE"]
dados_chuva(lat, lon, "LESTE")

# Confere df
# df_teste = pd.read_parquet(f"{caminho_chuva}leste_2021.parquet")
# resumo_df(df_teste)
# checar_nulos(df_teste)

Salvo: ../dados/intermediarios/clima/leste_2020.parquet
Salvo: ../dados/intermediarios/clima/leste_2021.parquet
Salvo: ../dados/intermediarios/clima/leste_2022.parquet
Salvo: ../dados/intermediarios/clima/leste_2023.parquet
Salvo: ../dados/intermediarios/clima/leste_2024.parquet


Unnamed: 0,data,chuva
0,2024-01-01 03:00:00+00:00,0.00
1,2024-01-01 04:00:00+00:00,0.00
2,2024-01-01 05:00:00+00:00,0.00
3,2024-01-01 06:00:00+00:00,0.00
4,2024-01-01 07:00:00+00:00,0.00
...,...,...
8779,2024-12-31 22:00:00+00:00,0.00
8780,2024-12-31 23:00:00+00:00,0.00
8781,2025-01-01 00:00:00+00:00,0.00
8782,2025-01-01 01:00:00+00:00,0.00


3) Região CENTRO

In [15]:
lat, lon = COORD["CENTRO"]
dados_chuva(lat, lon, "CENTRO")

# Confere df
# df_teste = pd.read_parquet(f"{caminho_chuva}centro_2021.parquet")
# resumo_df(df_teste)
# checar_nulos(df_teste)

Salvo: ../dados/intermediarios/clima/centro_2020.parquet
Salvo: ../dados/intermediarios/clima/centro_2021.parquet
Salvo: ../dados/intermediarios/clima/centro_2022.parquet
Salvo: ../dados/intermediarios/clima/centro_2023.parquet
Salvo: ../dados/intermediarios/clima/centro_2024.parquet


Unnamed: 0,data,chuva
0,2024-01-01 03:00:00+00:00,0.00
1,2024-01-01 04:00:00+00:00,0.00
2,2024-01-01 05:00:00+00:00,0.00
3,2024-01-01 06:00:00+00:00,0.00
4,2024-01-01 07:00:00+00:00,0.00
...,...,...
8779,2024-12-31 22:00:00+00:00,0.00
8780,2024-12-31 23:00:00+00:00,0.00
8781,2025-01-01 00:00:00+00:00,0.00
8782,2025-01-01 01:00:00+00:00,0.00


4) Região SUL

In [16]:
lat, lon = COORD["SUL"]
dados_chuva(lat, lon, "SUL")

# Confere df
# df_teste = pd.read_parquet(f"{caminho_chuva}sul_2021.parquet")
# resumo_df(df_teste)
# checar_nulos(df_teste)

Salvo: ../dados/intermediarios/clima/sul_2020.parquet
Salvo: ../dados/intermediarios/clima/sul_2021.parquet
Salvo: ../dados/intermediarios/clima/sul_2022.parquet
Salvo: ../dados/intermediarios/clima/sul_2023.parquet
Salvo: ../dados/intermediarios/clima/sul_2024.parquet


Unnamed: 0,data,chuva
0,2024-01-01 03:00:00+00:00,0.00
1,2024-01-01 04:00:00+00:00,0.00
2,2024-01-01 05:00:00+00:00,0.00
3,2024-01-01 06:00:00+00:00,0.00
4,2024-01-01 07:00:00+00:00,0.00
...,...,...
8779,2024-12-31 22:00:00+00:00,0.00
8780,2024-12-31 23:00:00+00:00,0.00
8781,2025-01-01 00:00:00+00:00,0.00
8782,2025-01-01 01:00:00+00:00,0.00
