# Cloud AI Project Final Documentation

## Team Yunus
**Members:**
* **Marin Janushaj** - Dataset 1: UK Housing Price Prediction
* **Yunus Eren Ertaş** - Dataset 2: UK Electricity Consumption Analysis

**Course:** Cloud & AI
**Year:** 2025

---

This notebook serves as the final documentation and index for our project, mapping our work to the project assignment requirements.

## 1. Prepare

We have established our team identity as **Team Yunus** and set up our GitHub repository with the required structure.

* **Repository Structure:**
    * `dataset1_uk_housing/`: Contains all notebooks for the housing dataset.
    * `dataset2_uk_electricity/`: Contains all notebooks for the electricity dataset.
    * `notebooks/`: This documentation folder.
    * `app.py` & `electricity_app.py`: Streamlit frontend applications.
    * `README.md`: Comprehensive project overview.

## 2. Dataset 1: UK Housing

**Goal:** Predict housing prices in England and Wales using a dataset of 22.4 million transactions (1995-2017).

### Workflow & Notebooks

1. **Data Loading**: [1_load.ipynb](../dataset1_uk_housing/1_load.ipynb)
   * Loaded the 2GB+ dataset efficiently.
   * Initial exploration of structure and memory usage.

2. **Cleaning**: [2_clean.ipynb](../dataset1_uk_housing/2_clean.ipynb)
   * Renamed columns to snake_case.
   * Handled missing values and outliers.
   * Prepared categorical variables.
   * **Output:** Cleaned parquet file for efficient processing.

3. **Exploratory Data Analysis (EDA)**: [3_eda.ipynb](../dataset1_uk_housing/3_eda.ipynb)
   * Visualized price trends over time (1995-2017).
   * Analyzed geographic distribution (London vs. others).
   * Investigated impact of property type and tenure.

4. **Modeling**: [4_model.ipynb](../dataset1_uk_housing/4_model.ipynb)
   * Trained Linear Regression, Random Forest, XGBoost, and LightGBM.
   * **Best Model:** LightGBM (R² = 0.446).
   * Feature importance analysis.

5. **AutoML Comparison**: [4.5_pycaret_comparison.ipynb](../dataset1_uk_housing/4.5_pycaret_comparison.ipynb)
   * Used PyCaret to compare 15+ models.
   * Found that manual training on the full dataset outperformed AutoML on sampled data.

6. **Cloud Training (AWS)**: [4.7_sagemaker_ready.ipynb](../dataset1_uk_housing/4.7_sagemaker_ready.ipynb)
   * Implemented training on AWS SageMaker.
   * Used hyperparameter tuning jobs.

7. **Model Comparison**: [6_model_comparison.ipynb](../dataset1_uk_housing/6_model_comparison.ipynb)
   * Comprehensive comparison of all approaches (Manual vs. AutoML vs. Cloud).
   * **Conclusion:** LightGBM trained on the full dataset was the superior approach.

## 3. Dataset 2: UK Historic Electricity Demand

**Goal:** Analyze and predict electricity demand in England/Wales (2001-2025).

### Workflow & Notebooks

1. **Data Combination**: [1_combine.ipynb](../dataset2_uk_electricity/1_combine.ipynb)
   * Combined multiple CSV files (one per year) into a single dataset.
   * Standardized column names and timestamps.

2. **Cleaning**: [2_clean.ipynb](../dataset2_uk_electricity/2_clean.ipynb)
   * Handled missing timestamps and demand values.
   * Addressed outliers in demand data.

3. **Exploratory Data Analysis**: [3_eda.ipynb](../dataset2_uk_electricity/3_eda.ipynb)
   * Analyzed daily, weekly, and seasonal patterns.
   * Visualized long-term demand trends.

4. **Modeling**: [4_model.ipynb](../dataset2_uk_electricity/4_model.ipynb)
   * Built time-series forecasting models.
   * Evaluated model performance.

5. **Cloud/Advanced Modeling**: [6_aws.ipynb](../dataset2_uk_electricity/6_aws.ipynb)
   * Explored cloud-based or advanced modeling techniques.

## 4. Deployment

We have successfully deployed our models using Streamlit for the frontend and created a robust backend structure.

### Applications
* **Housing Price Predictor**: `app.py`
    * Interactive UI to input property details.
    * Real-time price prediction with confidence intervals.
* **Electricity Demand Dashboard**: `electricity_app.py`
    * Visualization of consumption patterns.
    * Demand forecasting interface.

### Infrastructure
* **Docker**: Containerized the application for consistent deployment (`Dockerfile`, `docker-compose.yml`).
* **API**: Implemented a Flask API for backend predictions (`api/`).
* **Cloud**: Ready for deployment on platforms like AWS or Streamlit Cloud.

## 5. Conclusion

We have met all the requirements of the project assignment:
1.  **Preparation**: Organized team and repo structure.
2.  **Cleaning & Exploration**: Thorough EDA and cleaning pipelines for both datasets.
3.  **Modeling**: Implemented manual, AutoML, and Cloud-based models. Compared results and selected the best performers.
4.  **Deployment**: Created functional web applications and prepared for hosting.

We are ready for the final presentation on **November 28th**.