# Análisis del Consumo de Energía en los Estados Unidos (1973-2024)

**Autor:** Juan Manuel Martínez Estrada  
**Fecha:** 2025-05-02  
**Versión:** 1.0  

---



## Índice

1. [Carga y Comprensión Inicial de Datos - Consumo Energía Renovable EE.UU.](#1-carga-y-comprensión-inicial-de-datos---consumo-energía-renovable-eeuu)

    1. [Configuración del Entorno](#11-configuración-del-entorno)
    2. [Objetivos del Notebook](#12-objetivos-del-notebook)


# 1. Carga y Comprensión Inicial de Datos - Consumo Energía Renovable EE.UU.

---


## 1.1 Configuración del Entorno

* Importaciones

In [14]:
# Importaciones
import os
import sys
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

* Definición de constantes (rutas a archivos de datos crudos).

In [15]:
# Definiendo el path o ruta hacia el dataset original
data_path = os.path.join('..', 'data', '01_raw', 'dataset.csv')


## 1.2. Objetivos del Notebook
* Importación de librerias

* Cargar el dataset crudo (`US_Renewable_Energy_Consumption.csv`).

In [16]:
# Verificamos si el archivo existe
if not os.path.exists(data_path):
    print(f"El archivo {data_path} no existe.")
    sys.exit(1)

# Cargamos el dataset
df = pd.read_csv(data_path, sep=',', encoding='utf-8')

* Realizar una inspección inicial de la estructura y contenido.

In [17]:
df.shape  # Verificamos la forma del dataset

(3065, 17)

In [18]:
df.head(10)  # Mostramos las primeras 10 filas del dataset

Unnamed: 0,Year,Month,Sector,Hydroelectric Power,Geothermal Energy,Solar Energy,Wind Energy,Wood Energy,Waste Energy,"Fuel Ethanol, Excluding Denaturant",Biomass Losses and Co-products,Biomass Energy,Total Renewable Energy,Renewable Diesel Fuel,Other Biofuels,Conventional Hydroelectric Power,Biodiesel
0,1973,1,Commerical,0.0,0.0,0.0,0.0,0.57,0.0,0.0,0.0,0.57,0.57,0.0,0.0,0.0,0.0
1,1973,1,Electric Power,0.0,0.49,0.0,0.0,0.054,0.157,0.0,0.0,0.211,89.223,0.0,0.0,88.522,0.0
2,1973,1,Industrial,1.04,0.0,0.0,0.0,98.933,0.0,0.0,0.0,98.933,99.973,0.0,0.0,0.0,0.0
3,1973,1,Residential,0.0,0.0,0.0,0.0,30.074,0.0,0.0,0.0,0.0,30.074,0.0,0.0,0.0,0.0
4,1973,1,Transportation,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,1973,2,Commerical,0.0,0.0,0.0,0.0,0.515,0.0,0.0,0.0,0.515,0.515,0.0,0.0,0.0,0.0
6,1973,2,Electric Power,0.0,0.448,0.0,0.0,0.157,0.144,0.0,0.0,0.301,79.331,0.0,0.0,78.582,0.0
7,1973,2,Industrial,0.962,0.0,0.0,0.0,89.359,0.0,0.0,0.0,89.359,90.32,0.0,0.0,0.0,0.0
8,1973,2,Residential,0.0,0.0,0.0,0.0,27.164,0.0,0.0,0.0,0.0,27.164,0.0,0.0,0.0,0.0
9,1973,2,Transportation,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [19]:
df.tail(10)  # Mostramos las últimas 10 filas del dataset

Unnamed: 0,Year,Month,Sector,Hydroelectric Power,Geothermal Energy,Solar Energy,Wind Energy,Wood Energy,Waste Energy,"Fuel Ethanol, Excluding Denaturant",Biomass Losses and Co-products,Biomass Energy,Total Renewable Energy,Renewable Diesel Fuel,Other Biofuels,Conventional Hydroelectric Power,Biodiesel
3055,2023,12,Commerical,0.0,1.673,3.913,0.037,7.003,6.396,2.653,0.0,16.051,21.739,0.0,0.0,0.0,0.0
3056,2023,12,Electric Power,0.0,4.821,31.162,130.846,11.91,15.135,0.0,0.0,27.044,259.513,0.0,0.0,65.64,0.0
3057,2023,12,Industrial,0.269,0.357,0.942,0.037,104.598,14.357,1.666,74.073,194.694,196.299,0.0,0.0,0.0,0.0
3058,2023,12,Residential,0.0,3.363,14.658,0.0,38.247,0.0,0.0,0.0,0.0,56.268,0.0,0.0,0.0,0.0
3059,2023,12,Transportation,0.0,0.0,0.0,0.0,0.0,0.0,93.57,0.0,156.234,0.0,38.344,4.101,0.0,20.219
3060,2024,1,Commerical,0.073,1.669,4.267,0.036,7.053,6.233,2.441,0.0,15.728,21.773,0.0,0.0,0.0,0.0
3061,2024,1,Electric Power,0.0,4.667,32.707,119.265,15.071,13.873,0.0,0.0,28.944,257.661,0.0,0.0,72.078,0.0
3062,2024,1,Industrial,0.308,0.356,0.987,0.035,104.878,14.171,1.533,67.742,188.325,190.011,0.0,0.0,0.0,0.0
3063,2024,1,Residential,0.0,3.354,14.897,0.0,34.065,0.0,0.0,0.0,0.0,52.316,0.0,0.0,0.0,0.0
3064,2024,1,Transportation,0.0,0.0,0.0,0.0,0.0,0.0,86.098,0.0,140.188,0.0,30.78,3.442,0.0,19.867


In [20]:
df.info()  # Información general del dataset

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3065 entries, 0 to 3064
Data columns (total 17 columns):
 #   Column                              Non-Null Count  Dtype  
---  ------                              --------------  -----  
 0   Year                                3065 non-null   int64  
 1   Month                               3065 non-null   int64  
 2   Sector                              3065 non-null   object 
 3   Hydroelectric Power                 3065 non-null   float64
 4   Geothermal Energy                   3065 non-null   float64
 5   Solar Energy                        3065 non-null   float64
 6   Wind Energy                         3065 non-null   float64
 7   Wood Energy                         3065 non-null   float64
 8   Waste Energy                        3065 non-null   float64
 9   Fuel Ethanol, Excluding Denaturant  3065 non-null   float64
 10  Biomass Losses and Co-products      3065 non-null   float64
 11  Biomass Energy                      3065 no

* Identificar tipos de datos preliminares y posibles problemas.

In [21]:
df.dtypes  # Tipos de datos de cada columna

Year                                    int64
Month                                   int64
Sector                                 object
Hydroelectric Power                   float64
Geothermal Energy                     float64
Solar Energy                          float64
Wind Energy                           float64
Wood Energy                           float64
Waste Energy                          float64
Fuel Ethanol, Excluding Denaturant    float64
Biomass Losses and Co-products        float64
Biomass Energy                        float64
Total Renewable Energy                float64
Renewable Diesel Fuel                 float64
Other Biofuels                        float64
Conventional Hydroelectric Power      float64
Biodiesel                             float64
dtype: object

In [22]:
df.isna().sum()  # Verificamos si hay valores nulos en el dataset

Year                                  0
Month                                 0
Sector                                0
Hydroelectric Power                   0
Geothermal Energy                     0
Solar Energy                          0
Wind Energy                           0
Wood Energy                           0
Waste Energy                          0
Fuel Ethanol, Excluding Denaturant    0
Biomass Losses and Co-products        0
Biomass Energy                        0
Total Renewable Energy                0
Renewable Diesel Fuel                 0
Other Biofuels                        0
Conventional Hydroelectric Power      0
Biodiesel                             0
dtype: int64

In [23]:
df.isnull().sum()  # Verificamos si hay valores nulos en el dataset

Year                                  0
Month                                 0
Sector                                0
Hydroelectric Power                   0
Geothermal Energy                     0
Solar Energy                          0
Wind Energy                           0
Wood Energy                           0
Waste Energy                          0
Fuel Ethanol, Excluding Denaturant    0
Biomass Losses and Co-products        0
Biomass Energy                        0
Total Renewable Energy                0
Renewable Diesel Fuel                 0
Other Biofuels                        0
Conventional Hydroelectric Power      0
Biodiesel                             0
dtype: int64

* Obtener un resumen estadístico básico inicial.

In [24]:
df.describe()  # Información general del dataset

Unnamed: 0,Year,Month,Hydroelectric Power,Geothermal Energy,Solar Energy,Wind Energy,Wood Energy,Waste Energy,"Fuel Ethanol, Excluding Denaturant",Biomass Losses and Co-products,Biomass Energy,Total Renewable Energy,Renewable Diesel Fuel,Other Biofuels,Conventional Hydroelectric Power,Biodiesel
count,3065.0,3065.0,3065.0,3065.0,3065.0,3065.0,3065.0,3065.0,3065.0,3065.0,3065.0,3065.0,3065.0,3065.0,3065.0,3065.0
mean,1998.042414,6.491028,0.169759,1.146369,2.015008,4.282404,36.644408,5.820124,6.976648,4.834706,46.285969,70.872209,0.428949,0.031752,15.757374,0.95372
std,14.747378,3.456934,0.373819,1.550857,5.774511,18.124793,46.900639,8.247359,21.91192,15.601717,64.24152,71.197761,2.68785,0.258149,32.134059,3.985003
min,1973.0,1.0,-0.002,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,1985.0,3.0,0.0,0.0,0.0,0.0,0.483,0.0,0.0,0.0,0.258,2.07,0.0,0.0,0.0,0.0
50%,1998.0,6.0,0.0,0.357,0.004,0.0,12.062,0.108,0.007,0.0,9.716,50.984,0.0,0.0,0.0,0.0
75%,2011.0,9.0,0.036,1.673,0.774,0.001,51.808,12.764,1.283,0.0,89.359,126.982,0.0,0.0,0.0,0.0
max,2024.0,12.0,2.047,5.951,64.04,157.409,183.628,32.875,104.42,75.373,233.2,308.175,38.344,4.101,117.453,27.871


* Revisión de tipos de datos (¿Necesitan `Year`/`Month` ser combinados a datetime?).

In [25]:
df['Day'] = 1  # asigna el primer día del mes
if 'Date' not in df.columns:
    if 'Year' in df.columns and 'Month' in df.columns:
        df['Date'] = pd.to_datetime(df[['Year', 'Month', 'Day']], format='%Y-%m-%d')  # crea una nueva columna con la fecha
        df = df.drop(columns=['Year', 'Month', 'Day'])  # elimina la columna 'Day'
        df.set_index('Date', inplace=True)  # establece la columna 'Date' como índice

# Comprobando la información del dataset
df.head(10)  # Mostramos las primeras 10 filas del dataset


Unnamed: 0_level_0,Sector,Hydroelectric Power,Geothermal Energy,Solar Energy,Wind Energy,Wood Energy,Waste Energy,"Fuel Ethanol, Excluding Denaturant",Biomass Losses and Co-products,Biomass Energy,Total Renewable Energy,Renewable Diesel Fuel,Other Biofuels,Conventional Hydroelectric Power,Biodiesel
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
1973-01-01,Commerical,0.0,0.0,0.0,0.0,0.57,0.0,0.0,0.0,0.57,0.57,0.0,0.0,0.0,0.0
1973-01-01,Electric Power,0.0,0.49,0.0,0.0,0.054,0.157,0.0,0.0,0.211,89.223,0.0,0.0,88.522,0.0
1973-01-01,Industrial,1.04,0.0,0.0,0.0,98.933,0.0,0.0,0.0,98.933,99.973,0.0,0.0,0.0,0.0
1973-01-01,Residential,0.0,0.0,0.0,0.0,30.074,0.0,0.0,0.0,0.0,30.074,0.0,0.0,0.0,0.0
1973-01-01,Transportation,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1973-02-01,Commerical,0.0,0.0,0.0,0.0,0.515,0.0,0.0,0.0,0.515,0.515,0.0,0.0,0.0,0.0
1973-02-01,Electric Power,0.0,0.448,0.0,0.0,0.157,0.144,0.0,0.0,0.301,79.331,0.0,0.0,78.582,0.0
1973-02-01,Industrial,0.962,0.0,0.0,0.0,89.359,0.0,0.0,0.0,89.359,90.32,0.0,0.0,0.0,0.0
1973-02-01,Residential,0.0,0.0,0.0,0.0,27.164,0.0,0.0,0.0,0.0,27.164,0.0,0.0,0.0,0.0
1973-02-01,Transportation,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


* Identificación de columnas categóricas (`Sector`).

In [26]:
df['Sector'].unique()  # Valores únicos de la columna 'Sector'

array(['Commerical', 'Electric Power', 'Industrial', 'Residential',
       'Transportation'], dtype=object)