# Proyecto de predicción de energia renovable Global

Este conjunto de datos muestra los indicadores de energia renovable en todos los paises desde el año 2000 hasta el 2020. Contiene las siguientes caracteristicas:

- **Entity**: El nombre del país o región para el cual se reportan los datos.
- **Year**: El año para el cual se reportan los datos, que va desde 2000 hasta 2020.
- **Access to electricity (% of population)**: El porcentaje de población con acceso a electricidad.
- **Access to clean fuels for cooking (% of population)**: El porcentaje de la población que depende principalmente de combustibles limpios.
- **Renewable-electricity-generating-capacity-per-capita**: Capacidad instalada de energía renovable por persona.
- **Financial flows to developing countries (US $)**: Ayuda y asistencia de países desarrollados para proyectos de energía limpia.
- **Renewable energy share in total final energy consumption (%)**: Porcentaje de energía renovable en el consumo final total de energía.
- **Electricity from fossil fuels (TWh)**: Electricidad generada a partir de combustibles fósiles (carbón, petróleo, gas) en teravatios-hora.
- **Electricity from nuclear (TWh)**: Electricidad generada a partir de energía nuclear en teravatios-hora.
- **Electricity from renewables (TWh)**: Electricidad generada a partir de fuentes renovables (hidroeléctrica, solar, eólica, etc.) en teravatios-hora.
- **Low-carbon electricity (% electricity)**: Porcentaje de electricidad de fuentes bajas en carbono (nuclear y renovables).
- **Primary energy consumption per capita (kWh/person)**: Consumo de energía por persona en kilovatios-hora.
- **Energy intensity level of primary energy (MJ/$2011 PPP GDP)**: Uso de energía por unidad de PIB a paridad de poder adquisitivo.
- **Value_co2_emissions (metric tons per capita)**: Emisiones de dióxido de carbono por persona en toneladas métricas.
- **Renewables (% equivalent primary energy)**: Energía primaria equivalente derivada de fuentes renovables.
- **GDP growth (annual %)**: Tasa de crecimiento anual del PIB basada en moneda local constante.
- **GDP per capita**: Producto interno bruto por persona.
- **Density (P/Km2)**: Densidad poblacional en personas por kilómetro cuadrado.
- **Land Area (Km2)**: Área total de tierra en kilómetros cuadrados.
- **Latitude**: Latitud del centroide del país en grados decimales.
- **Longitude**: Longitud del centroide del país en grados decimales.


## Librerias utilizadas


In [1]:
# preprocesamiento de datos y graficos
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Creacion del modelo, metricas y procesamiento de los datos
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, r2_score

import plotly.graph_objects as go

## Importar conjunto de datos


In [2]:
# Importar conjunto de datos
df = pd.read_csv("global-data-on-sustainable-energy (1).csv")
df.head()

Unnamed: 0,Entity,Year,Access to electricity (% of population),Access to clean fuels for cooking,Renewable-electricity-generating-capacity-per-capita,Financial flows to developing countries (US $),Renewable energy share in the total final energy consumption (%),Electricity from fossil fuels (TWh),Electricity from nuclear (TWh),Electricity from renewables (TWh),...,Primary energy consumption per capita (kWh/person),Energy intensity level of primary energy (MJ/$2017 PPP GDP),Value_co2_emissions_kt_by_country,Renewables (% equivalent primary energy),gdp_growth,gdp_per_capita,Density\n(P/Km2),Land Area(Km2),Latitude,Longitude
0,Afghanistan,2000,1.613591,6.2,9.22,20000.0,44.99,0.16,0.0,0.31,...,302.59482,1.64,760.0,,,,60,652230.0,33.93911,67.709953
1,Afghanistan,2001,4.074574,7.2,8.86,130000.0,45.6,0.09,0.0,0.5,...,236.89185,1.74,730.0,,,,60,652230.0,33.93911,67.709953
2,Afghanistan,2002,9.409158,8.2,8.47,3950000.0,37.83,0.13,0.0,0.56,...,210.86215,1.4,1029.999971,,,179.426579,60,652230.0,33.93911,67.709953
3,Afghanistan,2003,14.738506,9.5,8.09,25970000.0,36.66,0.31,0.0,0.63,...,229.96822,1.4,1220.000029,,8.832278,190.683814,60,652230.0,33.93911,67.709953
4,Afghanistan,2004,20.064968,10.9,7.75,,44.24,0.33,0.0,0.56,...,204.23125,1.2,1029.999971,,1.414118,211.382074,60,652230.0,33.93911,67.709953


## Estadística descriptiva


In [3]:
# Mostrar informacion general del conjunto de datos (tipo, cantidad)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3649 entries, 0 to 3648
Data columns (total 21 columns):
 #   Column                                                            Non-Null Count  Dtype  
---  ------                                                            --------------  -----  
 0   Entity                                                            3649 non-null   object 
 1   Year                                                              3649 non-null   int64  
 2   Access to electricity (% of population)                           3639 non-null   float64
 3   Access to clean fuels for cooking                                 3480 non-null   float64
 4   Renewable-electricity-generating-capacity-per-capita              2718 non-null   float64
 5   Financial flows to developing countries (US $)                    1560 non-null   float64
 6   Renewable energy share in the total final energy consumption (%)  3455 non-null   float64
 7   Electricity from fossil fuels (TW

In [4]:
df.count()

Entity                                                              3649
Year                                                                3649
Access to electricity (% of population)                             3639
Access to clean fuels for cooking                                   3480
Renewable-electricity-generating-capacity-per-capita                2718
Financial flows to developing countries (US $)                      1560
Renewable energy share in the total final energy consumption (%)    3455
Electricity from fossil fuels (TWh)                                 3628
Electricity from nuclear (TWh)                                      3523
Electricity from renewables (TWh)                                   3628
Low-carbon electricity (% electricity)                              3607
Primary energy consumption per capita (kWh/person)                  3649
Energy intensity level of primary energy (MJ/$2017 PPP GDP)         3442
Value_co2_emissions_kt_by_country                  

In [5]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Year,3649.0,2010.038,6.054228,2000.0,2005.0,2010.0,2015.0,2020.0
Access to electricity (% of population),3639.0,78.9337,30.27554,1.252269,59.80089,98.36157,100.0,100.0
Access to clean fuels for cooking,3480.0,63.25529,39.04366,0.0,23.175,83.15,100.0,100.0
Renewable-electricity-generating-capacity-per-capita,2718.0,113.1375,244.1673,0.0,3.54,32.91,112.21,3060.19
Financial flows to developing countries (US $),1560.0,94224000.0,298154400.0,0.0,260000.0,5665000.0,55347500.0,5202310000.0
Renewable energy share in the total final energy consumption (%),3455.0,32.63816,29.8949,0.0,6.515,23.3,55.245,96.04
Electricity from fossil fuels (TWh),3628.0,70.365,348.0519,0.0,0.29,2.97,26.8375,5184.13
Electricity from nuclear (TWh),3523.0,13.45019,73.00662,0.0,0.0,0.0,0.0,809.41
Electricity from renewables (TWh),3628.0,23.96801,104.4311,0.0,0.04,1.47,9.6,2184.94
Low-carbon electricity (% electricity),3607.0,36.80118,34.31488,0.0,2.877847,27.86507,64.40379,100.0


In [6]:
# Validar si existen datos nulos
df.isnull().sum()

Entity                                                                 0
Year                                                                   0
Access to electricity (% of population)                               10
Access to clean fuels for cooking                                    169
Renewable-electricity-generating-capacity-per-capita                 931
Financial flows to developing countries (US $)                      2089
Renewable energy share in the total final energy consumption (%)     194
Electricity from fossil fuels (TWh)                                   21
Electricity from nuclear (TWh)                                       126
Electricity from renewables (TWh)                                     21
Low-carbon electricity (% electricity)                                42
Primary energy consumption per capita (kWh/person)                     0
Energy intensity level of primary energy (MJ/$2017 PPP GDP)          207
Value_co2_emissions_kt_by_country                  

In [None]:
df_2 = df.copy()
df_2 = df_2.drop(columns=["Entity", "Density\n(P/Km2)"])

In [18]:
df_2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3649 entries, 0 to 3648
Data columns (total 20 columns):
 #   Column                                                            Non-Null Count  Dtype  
---  ------                                                            --------------  -----  
 0   Year                                                              3649 non-null   int64  
 1   Access to electricity (% of population)                           3639 non-null   float64
 2   Access to clean fuels for cooking                                 3480 non-null   float64
 3   Renewable-electricity-generating-capacity-per-capita              2718 non-null   float64
 4   Financial flows to developing countries (US $)                    1560 non-null   float64
 5   Renewable energy share in the total final energy consumption (%)  3455 non-null   float64
 6   Electricity from fossil fuels (TWh)                               3628 non-null   float64
 7   Electricity from nuclear (TWh)   

In [19]:
matrix_correlation = df_2.corr()


ValueError: could not convert string to float: '2,239'