# Machine Learning Models for Energy Analytics

## Overview
This notebook implements three types of machine learning models to analyse energy consumption and CO₂ emissions:
1. **Linear Regression** - Predicting energy consumption
2. **K-Means Clustering** - Grouping countries by energy profiles
3. **Decision Tree Classification** - Classifying energy categories

## Learning Objectives
- Understand supervised vs unsupervised learning
- Implement regression, clustering, and classification models
- Evaluate model performance using appropriate metrics
- Interpret model results for business insights

## Libraries Used
- **scikit-learn**: Machine learning algorithms and tools
- **pandas/numpy**: Data manipulation
- **matplotlib/seaborn**: Visualisation

In [1]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.linear_model import LinearRegression
from sklearn.cluster import KMeans
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import (mean_squared_error, r2_score, mean_absolute_error,
                             silhouette_score, classification_report, confusion_matrix,
                             accuracy_score)
import warnings
warnings.filterwarnings('ignore')

# Set visualization style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)

## 1. Load Cleaned Data

In [None]:
from pathlib import Path
import os

current_dir = Path.cwd()
parent = current_dir.parent

os.chdir(parent)
current_dir = str(Path.cwd())   # update the variable so future code is consistent
print("New current directory:", current_dir)
processed_file_path = current_dir+'\\dataset\\processed\\cleaned_energy_data.csv'
df = pd.read_csv(processed_file_path)