# Final Consolidation and Essential Data Cleaning (Cabo Frio)

After filtering the national data to obtain records only for Cabo Frio (as performed in Section 2), we now have multiple annual processed CSV files in the ../data/processed/dengue/ directory.

This section consolidates these annual files into a single master DataFrame and performs essential data type cleaning, focusing primarily on date columns.

---

### Importing Required Libraries

In [12]:
import pandas as pd                     # Data manipulation library
import os                               # Operating system interfaces 
from typing import List, Optional       # Type hinting for better code clarity

## Setup and Variables

We define the paths and the list of annual files that need to be read and merged

In [13]:
# Define directories (consistent with Notebook 2)
PROCESSED_DIR: str = '../data/processed/dengue'
FINAL_FILE: str = 'DENGCF10y.csv'
final_filepath = os.path.join(PROCESSED_DIR, FINAL_FILE)

PROCESSED_FILES: List[str] = [
    'DENGBR24_processed.csv', 
    'DENGBR23_processed.csv', 
    'DENGBR22_processed.csv', 
    'DENGBR21_processed.csv', 
    'DENGBR20_processed.csv', 
    'DENGBR19_processed.csv', 
    'DENGBR18_processed.csv', 
    'DENGBR17_processed.csv', 
    'DENGBR16_processed.csv', 
    'DENGBR15_processed.csv'
]

## Concatenate and Inspect the Master DataFrame

In [14]:
all_cabo_frio_dfs: List[pd.DataFrame] = []

# Loop to load and append each annual DataFrame to the list
for file in PROCESSED_FILES:
    filepath = os.path.join(PROCESSED_DIR, file)
    
    df_temp = pd.read_csv(filepath, sep=';', encoding='utf-8')
    all_cabo_frio_dfs.append(df_temp)

# Concatenate all DataFrames into a single master DF
df_cabo_frio = pd.concat(all_cabo_frio_dfs, ignore_index=True)

## Essential Cleaning: Date Conversion

* SINAN dates are often in DDMMAAAA format (DayMonthYear) - Letting Pandas deal with the format
* Date columns: 'DT_SIN_PRI' (Date of First Symptoms) and 'DT_OBITO' (Date of Death) and 'DT_NOTIFIC' (Date of Notification)

In [None]:
date_cols: List[str] = ['DT_SIN_PRI', 'DT_OBITO', 'DT_NOTIFIC'] 

for col in date_cols:
    # Convert date columns to datetime format, coercing errors to NaT
    df_cabo_frio[col] = pd.to_datetime(
        df_cabo_frio[col], 
        errors='coerce'
    )

## Save the Final Clean and Consolidated Dataset

In [16]:
df_cabo_frio.to_csv(final_filepath, sep=';', index=False, encoding='utf-8')

---
### Result

After running this notebook, the `DENGCF10y.csv` will generate in the `data/processed/dengue/` directory, containing the fully consolidated and cleaned dataset for Cabo Frio from 2015 to 2024.The final consolidated dataset will serve as the empirical basis for subsequent descriptive analyses and narrative interpretation.