# Gold Data Aggregation

**Purpose:**  
Transform the curated [SILVER_DF](./2_silver.ipynb) dataset into information-ready assets optimized for reporting, dashboards, and analytics consumption. Gold tables and views are structured to answer specific questions efficiently, with pre-computed metrics and clear dimensional relationships.

**Transformations Applied:**
- **Aggregate** key metrics over animal shelter dimensions (monthly intake counts, outcome rates by breed)  
- **Compute** derived KPIs (average age at intake, percent adoption rate)  
- **Build** visualizations (charts, dashboards) off these Gold-layer models to surface insights 

These Gold-layer assets deliver high performance and an intuitive schema for end users, supporting ad-hoc exploration, charting, and executive summaries.

For more on Medallion Architecture, see [Databricks Glossary: Medallion Architecture](https://www.databricks.com/glossary/medallion-architecture) (Databricks, n.d.).

---

### References  
Databricks. (n.d.). *Medallion Architecture*. Retrieved May 10, 2025, from https://www.databricks.com/glossary/medallion-architecture


## Table of Contents

1. [Setup](#1-setup)  
   Install required packages and import libraries.

2. [Configuration & Data Loading](#2-configuration-and-data-loading)  
   Centralize file paths, API parameters, and date-column lists, then ingest the raw Bronze dataset into pandas.

## 1. Setup

**Purpose:**  Ensure the environment has all necessary libraries installed and imported.  
```python
# Install project-wide dependencies
%pip install -r ../../requirements.txt
``` 

> **Note:** we use a project-wide `requirements.txt` for consistency

In [18]:
%pip install -r ../../requirements.txt

Note: you may need to restart the kernel to use updated packages.


In [29]:
import pandas as pd

In [30]:
# Data source configurations
SILVER_FILE_PATH = "../../data-assets/silver/silver.parquet"
SILVER_DF = pd.read_parquet(SILVER_FILE_PATH)

In [40]:
# Let's see what columns you actually have
print("Available columns in SILVER_DF:")
print(list(SILVER_DF.columns))
print("\nFirst few rows:")
print(SILVER_DF.head())

Available columns in SILVER_DF:
['animal_id', 'animal_type', 'breed', 'primary_color', 'age', 'date_of_birth', 'sex', 'intake_type', 'intake_condition', 'intake_reason', 'intake_date', 'outcome_type', 'outcome_date', 'region', 'age_stage', 'season']

First few rows:
  animal_id animal_type       breed primary_color       age date_of_birth  \
0  A0011910         dog    pit_bull         brown  2.494182           NaT   
1  A0011910         dog    pit_bull         white  2.494182           NaT   
2  A0178985         dog  rottweiler         other  2.494182           NaT   
3  A0180810         dog       mixed         other  2.494182           NaT   
4  A0180810         dog       mixed         black  2.494182           NaT   

      sex intake_type intake_condition    intake_reason intake_date  \
0    male       stray          healthy            other  2023-12-21   
1  female   treatment          healthy          medical  2024-02-19   
2  female       stray          medical            other  