# 1. Importación de paquetes

In [1]:
#Import packages
from modules import preprocess
from modules import training
import pandas as pd

# 2. Preprocesamiento del *dataset*: [Room Occupancy Estimation Data Set](https://www.kaggle.com/ananthr1/room-occupancy-estimation-data-set)

## Paso 1: Lectura del *dataset* y obtención de datos

In [2]:
df = preprocess.load_csv_data('Occupancy_Estimation.csv')
df.shape

(10129, 19)

In [3]:
df.head()

Unnamed: 0,Date,Time,S1_Temp,S2_Temp,S3_Temp,S4_Temp,S1_Light,S2_Light,S3_Light,S4_Light,S1_Sound,S2_Sound,S3_Sound,S4_Sound,S5_CO2,S5_CO2_Slope,S6_PIR,S7_PIR,Room_Occupancy_Count
0,2017/12/22,10:49:41,24.94,24.75,24.56,25.38,121,34,53,40,0.08,0.19,0.06,0.06,390,0.769231,0,0,1
1,2017/12/22,10:50:12,24.94,24.75,24.56,25.44,121,33,53,40,0.93,0.05,0.06,0.06,390,0.646154,0,0,1
2,2017/12/22,10:50:42,25.0,24.75,24.5,25.44,121,34,53,40,0.43,0.11,0.08,0.06,390,0.519231,0,0,1
3,2017/12/22,10:51:13,25.0,24.75,24.56,25.44,121,34,53,40,0.41,0.1,0.1,0.09,390,0.388462,0,0,1
4,2017/12/22,10:51:44,25.0,24.75,24.56,25.44,121,34,54,40,0.18,0.06,0.06,0.06,390,0.253846,0,0,1


## Paso 2: Recodificación de la columna *RoomOccupancyCount*

En este caso, la columna que contiene la variable de salida (*RoomOccupancyCount*) cuenta el número de personas en la sala. Para simplificar, nuestro objetivo va a ser detectar si la sala está vacía o, en cambio, hay alguien (no el número exacto de personas).

Por lo tanto, se **recodificará cualquier valor mayor que 0 en esa columna como 1**, de tal forma que la variable de salida sea binaria:
- 0 -> ausencia.
- 1 -> presencia.

In [4]:
# Show "Room_Occupancy_Count" before processing
df["Room_Occupancy_Count"]

0        1
1        1
2        1
3        1
4        1
        ..
10124    0
10125    0
10126    0
10127    0
10128    0
Name: Room_Occupancy_Count, Length: 10129, dtype: int64

In [5]:
#Process "Room_Occupancy_Count"
preprocess.recode_dataset_output(df)

# Show "Room_Occupancy_Count" after processing
df["Room_Occupancy_Count"]

0        1
1        1
2        1
3        1
4        1
        ..
10124    0
10125    0
10126    0
10127    0
10128    0
Name: Room_Occupancy_Count, Length: 10129, dtype: int64

## Paso 3: Eliminación de las columnas temporales *Date* y *Time*

Respecto a los datos ofrecidos por las columas *Date* y *Time*, vamos a filtrarlos y dejarlos fuera del proceso. Ya que, aunque los incluyeramos (por ejemplo, juntando ambas partes en una sola columna y convirtiéndolo a formato 'epoch', como long int), los valores de muestras sucesivas de esa columna estarían totalmente correlados entre sí y, como consecuencia, fastidiaríamos a la mayoría de algoritmos que vamos a emplear posteriormente.

Se debe tener en cuenta que no nos estamos olvidando de la información temporal para considerar como si cada valor muestreado (fila de la tabla) fuese independiente de las demás filas. Sabemos que eso no es así, pero estamos considerando que esa información temporal no nos ofrece valor añadido para predecir si la habitación está ocupada o vacía. En caso contrario, se tendrían que usar modelos bastante más complicados para considerar esa relación temporal que indica que, en realidad, las muestras de dos filas adyacentes son consecutivas en el tiempo.

In [6]:
# Filter Date and Time columns
preprocess.remove_time_columns(df)

# Show the result
df.head()

Unnamed: 0,S1_Temp,S2_Temp,S3_Temp,S4_Temp,S1_Light,S2_Light,S3_Light,S4_Light,S1_Sound,S2_Sound,S3_Sound,S4_Sound,S5_CO2,S5_CO2_Slope,S6_PIR,S7_PIR,Room_Occupancy_Count
0,24.94,24.75,24.56,25.38,121,34,53,40,0.08,0.19,0.06,0.06,390,0.769231,0,0,1
1,24.94,24.75,24.56,25.44,121,33,53,40,0.93,0.05,0.06,0.06,390,0.646154,0,0,1
2,25.0,24.75,24.5,25.44,121,34,53,40,0.43,0.11,0.08,0.06,390,0.519231,0,0,1
3,25.0,24.75,24.56,25.44,121,34,53,40,0.41,0.1,0.1,0.09,390,0.388462,0,0,1
4,25.0,24.75,24.56,25.44,121,34,54,40,0.18,0.06,0.06,0.06,390,0.253846,0,0,1


# 2. Entrenamiento y seguimiento de la huella de carbono

### Codecarbon:

In [7]:
# Logistic Regression
training.train_LR_codecarbon(df,0.25)

[codecarbon INFO @ 11:48:57] [setup] RAM Tracking...
[codecarbon INFO @ 11:48:57] [setup] GPU Tracking...
[codecarbon INFO @ 11:48:57] No GPU found.
[codecarbon INFO @ 11:48:57] [setup] CPU Tracking...
[codecarbon INFO @ 11:49:00] CPU Model on constant consumption mode: Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
[codecarbon INFO @ 11:49:00] >>> Tracker's metadata:
[codecarbon INFO @ 11:49:00]   Platform system: Linux-5.15.0-46-generic-x86_64-with-glibc2.35
[codecarbon INFO @ 11:49:00]   Python version: 3.10.4
[codecarbon INFO @ 11:49:00]   Available RAM : 1.930 GB
[codecarbon INFO @ 11:49:00]   CPU count: 1
[codecarbon INFO @ 11:49:00]   CPU model: Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
[codecarbon INFO @ 11:49:00]   GPU count: None
[codecarbon INFO @ 11:49:00]   GPU model: None
[codecarbon INFO @ 11:49:03] Energy consumed for RAM : 0.000000 kWh. RAM Power : 0.7236471176147461 W
[codecarbon INFO @ 11:49:03] Energy consumed for all CPUs : 0.000001 kWh. All CPUs Power : 32.5 W
[codecarbo

Emissions: 1.076942502693517e-07 kg


In [8]:
# Random Forest
training.train_RF_codecarbon(df,0.25)

[codecarbon INFO @ 11:49:03] [setup] RAM Tracking...
[codecarbon INFO @ 11:49:03] [setup] GPU Tracking...
[codecarbon INFO @ 11:49:03] No GPU found.
[codecarbon INFO @ 11:49:03] [setup] CPU Tracking...
[codecarbon INFO @ 11:49:06] CPU Model on constant consumption mode: Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
[codecarbon INFO @ 11:49:06] >>> Tracker's metadata:
[codecarbon INFO @ 11:49:06]   Platform system: Linux-5.15.0-46-generic-x86_64-with-glibc2.35
[codecarbon INFO @ 11:49:06]   Python version: 3.10.4
[codecarbon INFO @ 11:49:06]   Available RAM : 1.930 GB
[codecarbon INFO @ 11:49:06]   CPU count: 1
[codecarbon INFO @ 11:49:06]   CPU model: Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
[codecarbon INFO @ 11:49:06]   GPU count: None
[codecarbon INFO @ 11:49:06]   GPU model: None
[codecarbon INFO @ 11:49:09] Energy consumed for RAM : 0.000000 kWh. RAM Power : 0.7236471176147461 W
[codecarbon INFO @ 11:49:09] Energy consumed for all CPUs : 0.000002 kWh. All CPUs Power : 32.5 W
[codecarbo

Emissions: 3.4383364130684536e-07 kg


In [9]:
# Linear SVC (Support Vector Machines)
training.train_SVC_codecarbon(df,0.25)

[codecarbon INFO @ 11:49:09] [setup] RAM Tracking...
[codecarbon INFO @ 11:49:09] [setup] GPU Tracking...
[codecarbon INFO @ 11:49:09] No GPU found.
[codecarbon INFO @ 11:49:09] [setup] CPU Tracking...
[codecarbon INFO @ 11:49:11] CPU Model on constant consumption mode: Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
[codecarbon INFO @ 11:49:11] >>> Tracker's metadata:
[codecarbon INFO @ 11:49:11]   Platform system: Linux-5.15.0-46-generic-x86_64-with-glibc2.35
[codecarbon INFO @ 11:49:11]   Python version: 3.10.4
[codecarbon INFO @ 11:49:11]   Available RAM : 1.930 GB
[codecarbon INFO @ 11:49:11]   CPU count: 1
[codecarbon INFO @ 11:49:11]   CPU model: Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
[codecarbon INFO @ 11:49:11]   GPU count: None
[codecarbon INFO @ 11:49:11]   GPU model: None
[codecarbon INFO @ 11:49:14] Energy consumed for RAM : 0.000000 kWh. RAM Power : 0.7236471176147461 W
[codecarbon INFO @ 11:49:14] Energy consumed for all CPUs : 0.000000 kWh. All CPUs Power : 32.5 W
[codecarbo

Emissions: 2.9016005969000015e-08 kg


In [10]:
# Show results
codecarbon_emissions = pd.read_csv("emissions.csv",sep=",")
codecarbon_emissions

Unnamed: 0,timestamp,project_name,run_id,duration,emissions,emissions_rate,cpu_power,gpu_power,ram_power,cpu_energy,...,python_version,cpu_count,cpu_model,gpu_count,gpu_model,longitude,latitude,ram_total_size,tracking_mode,on_cloud
0,2022-08-28T11:22:25,codecarbon,6c51acf7-babd-40dd-9c55-f5783ba513d5,0.05066,8.637366e-08,0.001705,32.5,0.0,0.723647,4.459752e-07,...,3.10.4,1,Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz,,,-3.8661,40.3192,1.929726,machine,N
1,2022-08-28T11:22:30,codecarbon,454248c0-e227-44c6-b057-06ca16acc51b,0.290324,4.998027e-07,0.001722,32.5,0.0,0.723647,2.574127e-06,...,3.10.4,1,Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz,,,-3.8661,40.3192,1.929726,machine,N
2,2022-08-28T11:22:35,codecarbon,a055224c-422b-40b8-aef6-ec355f8a68f5,0.017249,2.831957e-08,0.001642,32.5,0.0,0.723647,1.460138e-07,...,3.10.4,1,Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz,,,-3.8661,40.3192,1.929726,machine,N
3,2022-08-28T11:24:52,codecarbon,ba7db359-927f-4460-9921-356fbf8e0b95,0.037277,6.284057e-08,0.001686,32.5,0.0,0.723647,3.245998e-07,...,3.10.4,1,Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz,,,-3.8661,40.3192,1.929726,machine,N
4,2022-08-28T11:24:57,codecarbon,45a3c187-388b-457d-8c5e-5bd455eb2323,0.229514,3.947145e-07,0.00172,32.5,0.0,0.723647,2.03287e-06,...,3.10.4,1,Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz,,,-3.8661,40.3192,1.929726,machine,N
5,2022-08-28T11:25:02,codecarbon,35b951c8-c210-4a62-8f84-1bb53d3415f1,0.023467,3.3676e-08,0.001435,32.5,0.0,0.723647,1.741068e-07,...,3.10.4,1,Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz,,,-3.8661,40.3192,1.929726,machine,N
6,2022-08-28T11:29:01,codecarbon,cebf61fe-9831-4862-bd45-471a4979cb63,0.024377,4.013155e-08,0.001646,32.5,0.0,0.723647,2.084568e-07,...,3.10.4,1,Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz,,,-3.8661,40.3192,1.929726,machine,N
7,2022-08-28T11:29:06,codecarbon,68c4a4d1-b954-4b48-b944-99f3d3ccddad,0.266339,4.646352e-07,0.001745,32.5,0.0,0.723647,2.393552e-06,...,3.10.4,1,Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz,,,-3.8661,40.3192,1.929726,machine,N
8,2022-08-28T11:29:11,codecarbon,c3dffb6f-68bf-4d05-89b9-1a931afa1ef8,0.033513,5.688293e-08,0.001697,32.5,0.0,0.723647,2.93035e-07,...,3.10.4,1,Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz,,,-3.8661,40.3192,1.929726,machine,N
9,2022-08-28T11:41:37,codecarbon,61af641e-09ac-4abc-9426-a96fc9f6f687,0.083925,1.45145e-07,0.001729,32.5,0.0,0.723647,7.474949e-07,...,3.10.4,1,Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz,,,-3.8661,40.3192,1.929726,machine,N


### Eco2ai:

In [11]:
# Logistic Regression
training.train_LR_eco2ai(df,0.25)

    If you use a VPN, you may have problems with identifying your country by IP.
    It is recommended to disable VPN or
    manually install the ISO-Alpha-2 code of your country during initialization of the Tracker() class.
    You can find the ISO-Alpha-2 code of your country here: https://www.iban.com/country-codes
    

There is no any available GPU devices or your gpu is not supported by Nvidia library!
The thacker will consider CPU usage only



In [12]:
# Random Forest
training.train_RF_eco2ai(df,0.25)

    If you use a VPN, you may have problems with identifying your country by IP.
    It is recommended to disable VPN or
    manually install the ISO-Alpha-2 code of your country during initialization of the Tracker() class.
    You can find the ISO-Alpha-2 code of your country here: https://www.iban.com/country-codes
    

There is no any available GPU devices or your gpu is not supported by Nvidia library!
The thacker will consider CPU usage only



In [13]:
# Linear SVC (Support Vector Machines)
training.train_SVC_eco2ai(df,0.25)

    If you use a VPN, you may have problems with identifying your country by IP.
    It is recommended to disable VPN or
    manually install the ISO-Alpha-2 code of your country during initialization of the Tracker() class.
    You can find the ISO-Alpha-2 code of your country here: https://www.iban.com/country-codes
    

There is no any available GPU devices or your gpu is not supported by Nvidia library!
The thacker will consider CPU usage only



In [14]:
# Show results
eco2ai_emissions = pd.read_csv("eco2ai_emissions.csv",sep=",")
eco2ai_emissions

Unnamed: 0,project_name,experiment_description(model type etc.),start_time,duration(s),power_consumption(kWTh),CO2_emissions(kg),CPU_name,GPU_name,OS,region/country
0,TFG_Project,training LogisticRegression model,2022-08-28 11:29:13,0.39286,4.948666e-09,9.873331e-10,Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz/1 dev...,0 device(s),Linux,ES/Aragon
1,TFG_Project,training RandomForest model,2022-08-28 11:29:15,0.457908,5.902811e-09,1.177699e-09,Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz/1 dev...,0 device(s),Linux,ES/Aragon
2,TFG_Project,training LogisticRegression model,2022-08-28 11:29:17,0.198576,2.4205e-09,4.829261e-10,Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz/1 dev...,0 device(s),Linux,ES/Aragon
3,TFG_Project,training LogisticRegression model,2022-08-28 11:41:49,0.41671,5.583236e-09,1.113939e-09,Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz/1 dev...,0 device(s),Linux,ES/Aragon
4,TFG_Project,training RandomForest model,2022-08-28 11:41:51,0.575514,8.07255e-09,1.610595e-09,Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz/1 dev...,0 device(s),Linux,ES/Aragon
5,TFG_Project,training LogisticRegression model,2022-08-28 11:41:53,0.221485,2.930131e-09,5.846051e-10,Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz/1 dev...,0 device(s),Linux,ES/Aragon
6,TFG_Project,training LogisticRegression model,2022-08-28 11:42:48,0.236932,3.138547e-09,6.261873e-10,Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz/1 dev...,0 device(s),Linux,ES/Aragon
7,TFG_Project,training RandomForest model,2022-08-28 11:42:50,0.578525,8.438694e-09,1.683646e-09,Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz/1 dev...,0 device(s),Linux,ES/Aragon
8,TFG_Project,training LogisticRegression model,2022-08-28 11:42:52,0.232657,3.025994e-09,6.037313e-10,Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz/1 dev...,0 device(s),Linux,ES/Aragon
9,TFG_Project,training LogisticRegression model,2022-08-28 11:44:43,0.325646,2.455719e-09,4.899528e-10,Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz/1 dev...,0 device(s),Linux,ES/Aragon


### Carbontracker

In [15]:
training.train_LR_carbontracker(df,0.25,1)

CarbonTracker: The following components were found: CPU with device(s) .


TypeError: 'NoneType' object is not callable