# <b> <span style="color:white">Electricity Sector Data Streaming & Analysis</span></b>


# <b> <span style="color:white">GROUP 04</span></b>


| Name                   | SID       | Unikey   |
| ---------------------- | --------- | -------- |
| Putu Eka Udiyani Putri | 550067302 | pput0940 |
| Rengga Firmandika      | 550126632 | rfir0117 |
| Vincentius Ansel Suppa | 550206406 | vsup0468 |


## <b> <span style="color:orange">0. Configuration and Import Required Libraries</span></b>


This notebook acquires, integrates, augments, and stores Australian electricity/emissions datasets:

- [NGER](https://data.cer.gov.au/datasets/NGER/ID0243) (emissions & generation, 2014–2024)
- [CER](https://cer.gov.au/markets/reports-and-data/large-scale-renewable-energy-data) (approved/committed/probable projects)
- [ABS](https://www.abs.gov.au/methodologies/data-region-methodology/2011-24#data-downloads) (population & industry by state)
- Geocoding augmentation via OpenStreetMap Nominatim

Outputs are loaded into a DuckDB database.

**Quick start:**
1. Project structure:
   
   <pre>
   Assignment2_Tut07_G04/
   ├── Assignment_2.ipynb      # main notebook
   ├── requirements.txt        # list of required libraries to run the notebook
   └── AUGMENTED/      # geocoded data
   </pre>

   Ensure your working directory is writable.

2. Create venv & install exact dependencies<br/>
   `python -m venv .venv`<br/>
   Windows: `.\.venv\Scripts\activate` | macOS/Linux: `source .venv/bin/activate`<br/>
   `python -m pip install --upgrade pip`<br/>
   `pip install -r requirements.txt`

3. Replace `your_api_key` with your actual API key. 

4. Run the full pipeline (extract -> clean -> augment -> transform -> load)<br/>

**Notes:**

1. Geocoder fallback is flagged in geo_resolution (exact -> approximated by postcode -> approximated by state).
2. Augmentation process may take a considerable amount of time. We cached the previous API calls in augmented_dataset.txt so reruns do not to re-hit the API. To redo the geocoding process from the start, please remove the augmented_dataset.txt from the folder.

Import all the required libraries first.


In [None]:
from dotenv import load_dotenv
import os

: 

## <b> <span style="color:orange">1. Data Acquisition</span></b>


### <b> <span style="color:pink">1.1 National Greenhouse and Energy Reporting (NGER)</span></b>


This dataset consists of 10 annual CSV files released by the Clean Energy Regulator, covering the years 2014 to 2024.  
The files contain information on electricity generation and emissions intensity from facilities that are connected to major electricity networks in Australia.
