### Group ID:
### Group Members Name with Student ID:
1. Student 1
2. Student 2
3. Student 3
4. Student 4


# Problem Statement

The objective of the problem is to implement an Actor-Critic reinforcement learning algorithm to optimize energy consumption in a building. The agent should learn to adjust the temperature settings dynamically to minimize energy usage while maintaining comfortable indoor conditions.

#### Dataset Details
Dataset: https://archive.ics.uci.edu/dataset/374/appliances+energy+prediction

This dataset contains energy consumption data for a residential building, along with various environmental and operational factors.

Data Dictionary:
* Appliances:       Energy use in Wh
* lights:           Energy use of light fixtures in the house in Wh
* T1 - T9:          Temperatures in various rooms and outside
* RH_1 to RH_9:     Humidity measurements in various rooms and outside
* Visibility:       Visibility in km
* Tdewpoint:       Dew point temperature
* Pressure_mm_hgg:  Pressure in mm Hg
* Windspeed:        Wind speed in m/s

#### Environment Details
**State Space:**
The state space consists of various features from the dataset that impact energy consumption and comfort levels.

* Current Temperature (T1 to T9): Temperatures in various rooms and outside.
* Current Humidity (RH_1 to RH_9): Humidity measurements in different locations.
* Visibility (Visibility): Visibility in meters.
* Dew Point (Tdewpoint): Dew point temperature.
* Pressure (Press_mm_hg): Atmospheric pressure in mm Hg.
* Windspeed (Windspeed): Wind speed in m/s.

Total State Vector Dimension: Number of features = 9 (temperature) + 9 (humidity) + 1 (visibility) + 1 (dew point) + 1 (pressure) + 1 (windspeed) = 22 features

**Target Variable:** Appliances (energy consumption in Wh).

**Action Space:**
The action space consists of discrete temperature adjustments:
* Action 0: Decrease temperature by 1°C
* Action 1: Maintain current temperature
* Action 2: Increase temperature by 1°C


- If the action is to decrease the temperature by 1°C, you'll adjust each temperature feature (T1 to T9) down by 1°C.
- If the action is to increase the temperature by 1°C, you'll adjust each temperature feature (T1 to T9) up by 1°C.
- Other features remain unchanged.

**Policy (Actor):** A neural network that outputs a probability distribution over possible temperature adjustments.

**Value function (Critic):** A neural network that estimates the expected cumulative reward (energy savings) from a given state.

**Reward function:**
The reward function should reflect the overall comfort and energy efficiency based on all temperature readings. i.e., balance between minimising temperature deviations and minimizing energy consumption.

* Calculate the penalty based on the deviation of each temperature from the target temperature and then aggregate these penalties.
* Measure the change in energy consumption before and after applying the RL action.
* Combine the comfort penalty and energy savings to get the final reward.

*Example:*

Target temperature=22°C

Initial Temperatures: T1=23, T2=22, T3=21, T4=23, T5=22, T6=21, T7=24, T8=22, T9=23

Action Taken: Decrease temperature by 1°C for each room

Resulting Temperatures: T1 = 22, T2 = 21, T3 = 20, T4 = 22, T5 = 21, T6 = 20, T7 = 23, T8 = 21, T9 = 22

Energy Consumption: 50 Wh (before RL adjustment) and 48 Wh (after RL adjustment)
* Energy Before (50 Wh): Use the energy consumption from the dataset at the current time step.
* Energy After (48 Wh): Use the energy consumption from the dataset at the next time step (if available).

Consider only temperature features for deviation calculation.

Deviation = abs (Ti− Ttarget )

Deviations=[ abs(22−22), abs(21−22), abs(20−22), abs(22−22),  abs(21−22), abs(20−22), abs(23−22), abs(21−22), abs(22−22) ]

Deviations = [0, 1, 2, 0, 1, 2, 1, 1, 0], Sum of deviations = 8

Energy Savings = Energy Before−Energy After = 50 – 48 = 2Wh

Reward= −Sum of Deviations + Energy Savings = -8+6 = -2

#### Expected Outcomes
1. Pre-process the dataset to handle any missing values and create training and testing sets.
2. Implement the Actor-Critic algorithm using TensorFlow.
3. Train the model over 500 episodes to minimize energy consumption while maintaining an indoor temperature of 22°C.
4. Plot the total reward obtained in each episode to evaluate the learning progress.
5. Evaluate the performance of the model on test set to measure its performance
6. Provide graphs showing the convergence of the Actor and Critic losses.
7. Plot the learned policy by showing the action probabilities across different state values (e.g., temperature settings).
8. Provide an analysis on a comparison of the energy consumption before and after applying the reinforcement learning algorithm.


#### Code Execution

In [None]:
#### Load the dataset
data=pd.read_csv(file_path)

# Check and replace missing values
# Pre process the data set to get the features and target and scale them

features = [  ]
target=[  ]

X=data[features]
y=data[target]

# Normalize them with Standard Scaler

# Split the data to training and testing sets (80% for training, 20% for testing)

X_train, X_test, y_train, y_test = train_test_split(test_size=0.2,random_state=0)

#### Defining Actor Critic Model using tensorflow (1 M)

In [None]:
### Define Actor Model

def build_actor_model():

    # define the NN model to give probability distribution over actions


    return model

### Define Critic Model

def build_critic_model():

    # define the NN model for value function estimation

    return model

state_space = 22
action = 3  # Decrease, Maintain, Increase

actor_model = build_actor_model()
critic_model = build_critic_model()

### Reward Function (0.5 M)

In [None]:
### Calculate Reward Function

def calculate_reward():

    # consider only temperature features for deviation calculation with target temperature as 22C
    # calculate energy savings by taking difference between energy before and after
    # calculate and return the reward

    return reward

#### Environment Simulation (0.5 M)


In [None]:
### Environment Simulation

def simulate_environment():

    temp_adjustment = 1
    # Increase of decrease each temperature by 1C

    # get the energy before from current index
    # get the energy after from next index

    # get the respective reward

    return next_state,reward

#### Implementation of Training Function (2 M)

In [None]:
# Train the Actor-Critic models

def train_function(episodes=500):

# for each episode:
    # get the action probabilities
    # chose the action with highest probability

    # similate the environment with the chosen action

    # store results

    # update the state

# Compute critic target values with discount factor and rewards and the next values obtained from critic model for next state

# Update Critic model, capture critic loss

# Calculate advantages

# Update Actor Model, capture actor loss

# Print the mean reward of all states for each episode

#### Evaluate the performance of the model on test set (0.5 M)

In [None]:
# Evaluate the model on the test set

def evaluate_model():

    # predict the action and simulate the environment accordingly and get the respective next state

    # calculate rewards for test set



# Print the total reward obtained on the test set

### Plot the convergence of Actor and Critic losses (1 M)

In [None]:
# Plot the convergence of Actor and Critic losses

### Plot the learned policy - by showing the action probabilities across different state values (1 M)

In [None]:
# Plot the learned policy - by showing the action probabilities across different state values

# From the trained actor model, for each state in training set,
# plot the probability of each action (increasing/decreasing/maintaining) the temperature

#### Conclusion (0.5 M)

In [None]:
# Provide an analysis on a comparison of the energy consumption
# before and after applying the reinforcement learning algorithm.