# <a id='0'>ESPAÑA y sus FUENTES de ENERGÍA</h1>
![FuentesEnergia](
img_energia.webp)

## *ÍNDICE:* 
---
- [ 1. ANÁLISIS EXPLORATORIO DE DATOS (EDA):](#1)
    - [1.1. Hipótesis](#11)
    - [1.2. Acondicionamiento bases de datos](#12)
    - [1.3. Análisis visual](#13)
    - [1.4. Análisis estadístico](#14)
- [2. CONCLUSIONES](#2)
---

### *Contacto:*
___
* Email: ***carla.glezz@gmail.com***
* Linkedin: ***https://www.linkedin.com/in/mariacarlagonzalezgonzalez/***
---

 
# **1. Análisis Exploratio de Datos (EDA)**<a id='1'></a>

##  **1.1. Hipótesis**<a id='11'></a>

### **¿Energía a invertir en España?** 🤔

1. **¿Progresión de las distintas fuentes de energía en España en los últimos años?** 
    * Hipótesis: previsiblemente hay un aumento en renovables
    * ¿Energía o energías que más generan? 


2. **¿Autoabastecimiento?**
    * ¿Coinciden la energía que consume España con la que genera?
        + Excedente podría indicar la exportación y un deficit la importación. 
        + Otra posibilidad es falta de datos en alguno de los resultados. 
        (Se está trabajando en el almacenaje de energía pero de momento no es una tecnología en uso)
    * En función de la demanda/consumo anual de energía en España, ¿cuánto faltaría para autoabastecerse?    


3. **¿Eficiencia vs efectos adversos?** (elegiendo 2 de las energías con mayor probabilidad de ser nuestra elección final)
    * ¿Se puede encontrar un equilibrio? ¿Cómo nos afecta?


4. **¿Influencia del clima?**
    * Es previsible que sea afirmativa la respuesta, sobretodo en renovables.
        + Los datos son una muestra, es necesario verificar que es suficiente y representativa de la población. 
    * ¿Se considera significativa esta influencia? 
        + Es posible que halla otros factores más significativos, como pueden ser la potencia instalada, la tecnología empleada,etc.


5. **Análisis económico** *(Punto a estudiar más adelante)*
    * ¿Cuales son las fuentes más baratas?
    * ¿Cuales son las más rentables?
    * ¿Las que más generan son las más rentables económicamente?
    * ¿Se puede hallar, si no lo hubiera, entre eficiencia energética y rentabilidad económica?


    




##  **1.2. Acondicionamiento bases de datos**<a id='12'></a>

In [None]:
# Librery
from utils.utilsEDA import *

: 

In [None]:
# Read all csv 

# NOTA IMPORTANTE: Copiar path hasta la carpeta scr (único cambio para funcionamiento)
preroot='/Users/mcgg/Documents/TheBridge_DataScience_py_local/02_DATA_ANALISIS/EDA_MC/EDA-energia/energy-data/src'

df_spn=pd.read_csv(preroot+'/data/spn_energy_data.csv')
df_wheather_spn=pd.read_csv(preroot+'/data/spn_weather_features.csv')

: 

### *Extracción inicial info. de los CSVs*

In [None]:
print('* Datos energía España: *')
df_spn.info()

: 

In [None]:
print('* Datos clima España: *')
df_wheather_spn.info()

: 

### *Tratamiento bases de datos:*


#### csv generación energía España

```df_spn```: DataFrame que contiene los datos de energía en españa recogidos desde 1 enero 2015 hasta 31 diciembre 2018

|column	|description	|
|-------|---------------|
|time||
|generation biomass||
|generation fossil brown coal/lignite||
|generation fossil bcoal-derived gas||
|generation fossil gas||
|generation fossil hard coal||
|generation fossil oil||
|generation fossil oil shale||
|generation fossil peat||
|generation geothermal||
|generation hydro pumped storage aggregated||
|generation hydro pumped storage consumption||
|generation hydro run-of-river and poundage||
|generation hydro water reservoir||
|generation marine||
|generation nuclear||
|generation other||
|generation other renewable||
|generation solar||
|generation waste||
|generation wind offshore||
|generation wind onshore||
|forecast solar day ahead||
|forecast wind offshore eday ahead||
|forecast wind onshore day ahead||
|total load forecast||
|total load actual||
|price day ahead||
|price actual||


In [None]:
print('Número de columnas: ',len(df_spn.columns))
df_spn.columns

: 

In [None]:
df_spn.tail()

: 

In [None]:
# Se reemplaza 'time' quitándole el intervalo
df_spn['time']=df_spn['time'].replace({'\+01:00|\+02:00|\+03:00':''},regex=True)

# Se transforma en formato datetime la columna 'time' 
df_spn['time']=pd.to_datetime(df_spn['time'], infer_datetime_format=True)
df_spn.columns

: 

In [None]:
# Se eliminan las filas y columnas que contengan NaN en todos sus elementos 
comp_colna_filna(df_spn,['time'])
df_spn=del_colna_filna(df_spn,['time'])
len(df_spn.columns)

: 

In [None]:
# Se comprueba los NaN restantes de cada fila 
df_spn.isna().sum()

: 

In [None]:
# Se van rellenando columnas hacia delante, es decir, cogiendo el valor anterior 
df_spn.fillna(method='ffill',inplace=True) 
# Se obtiene así un dataset libre de NaN
df_spn.isna().sum()

: 

In [None]:
# Se eligen las columnas con las que se va a trabajar
df_spn=df_spn.loc[:,df_spn.columns.str.startswith('generation')+df_spn.columns.str.startswith('time')]
# Se eliminan las columnas que todos sus datos sean 0, dado que indica el no registro de esa variable 
[df_spn.drop(columns=col,inplace=True) for col in df_spn.columns if (df_spn[col]==0).sum()==len(df_spn)]
df_spn.head()

: 

---
*Agrupaciones*: 

---

In [None]:
#df_spn_day=df_spn.copy()
#df_spn_day=df_spn_day.groupby(df_spn_day['time'].dt.date).mean()
#df_spn_day.groupby(df_spn_day['time'].dt.year).mean()
df_spn_day = pd.read_csv(preroot+'/data/spn_energy_data.csv', index_col=0, parse_dates=True)
df_spn_day.index

: 

In [None]:
df_spn_day.loc['2015']

: 

In [None]:
df_spn_day=df_spn.copy()
#df_spn_day['day']=df_spn_day['time'].replace({'| \d+\d+:\d+\d+:\d+\d+|':''},regex=True)

df_spn_year=df_spn_day.copy()

# Se crea una columna tipo datatime llamada 'day_dt' equivale al str de 'time' 
#df_spn_year['day_dt']=transf_dt(df_spn_year,'day',change='%Y-%m-%d')

df_spn_month=df_spn_year.groupby(df_spn_year['day_dt'].dt.month).mean()
df_spn_year=df_spn_year.groupby(df_spn_year['day_dt'].dt.year).mean()

df_spn_day_2015=df_spn_day.loc[df_spn_day['day'].str.startswith('2015'),:]
df_spn_day_2016=df_spn_day.loc[df_spn_day['day'].str.startswith('2016'),:]
df_spn_day_2017=df_spn_day.loc[df_spn_day['day'].str.startswith('2017'),:]
df_spn_day_2018=df_spn_day.loc[df_spn_day['day'].str.startswith('2018'),:]

# Se agrupa por fecha: se realiza la media de todas las horas del día.
df_spn_day=df_spn_day.groupby('day').mean()


: 

In [None]:
df_spn_year

: 

: 

In [None]:
df_spn_day_i=[df_spn_day_2015,df_spn_day_2016,df_spn_day_2017,df_spn_day_2018]

for i in df_spn_day_i:
    i['month_dt']=transf_dt(i,'day',change='%Y-%m-%d')

df_spn_month_2015=df_spn_day_2015.groupby(df_spn_day_2015['month_dt'].dt.month).mean()
df_spn_month_2016=df_spn_day_2016.groupby(df_spn_day_2016['month_dt'].dt.month).mean()
df_spn_month_2017=df_spn_day_2017.groupby(df_spn_day_2017['month_dt'].dt.month).mean()
df_spn_month_2018=df_spn_day_2018.groupby(df_spn_day_2018['month_dt'].dt.month).mean()


: 

In [None]:
df_spn_day.columns

: 

In [None]:
# División entre renovables y no renovables:
rnw=['generation biomass', 'generation hydro pumped storage consumption',
       'generation hydro run-of-river and poundage',
       'generation hydro water reservoir',
       'generation other renewable', 'generation solar',
       'generation wind onshore']
       
nrnw=['generation fossil brown coal/lignite',
       'generation fossil gas', 'generation fossil hard coal',
       'generation fossil oil','generation nuclear',
       'generation other','generation waste']

: 

In [None]:
df_spn_day_rnw=df_spn_day[rnw]
df_spn_day_nrnw=df_spn_day[nrnw]
df_spn_month_rnw=df_spn_month[rnw]
df_spn_month_nrnw=df_spn_month[nrnw]
df_spn_year_rnw=df_spn_year[rnw]
df_spn_year_nrnw=df_spn_year[nrnw]

df_spn_day_2015_rnw=df_spn_day_2015[rnw]
df_spn_day_2015_nrnw=df_spn_day_2015[nrnw]
df_spn_day_2016_rnw=df_spn_day_2016[rnw]
df_spn_day_2016_nrnw=df_spn_day_2016[nrnw]
df_spn_day_2017_rnw=df_spn_day_2017[rnw]
df_spn_day_2017_nrnw=df_spn_day_2017[nrnw]
df_spn_day_2018_rnw=df_spn_day_2018[rnw]
df_spn_day_2018_nrnw=df_spn_day_2018[nrnw]


: 

#### csv clima España

```df_wheather_spn```: DataFrame que contiene los datos del clima en españa recogidos desde 1 enero 2015 hasta 31 diciembre 2018

|column	|description	|
|-------|---------------|
|dt_iso||
|city_name||
|temp||
|temp_min||
|temp_max||
|pressure||
|humidity||
|wind_speed||
|wind_deg||
|rain_1h||
|rain_3h||
|snow_3h||
|clouds_all||
|weather_id||
|weather_main||
|weather_description||
|weather_icon||


In [None]:
df_wheather_spn.city_name.unique()

: 

In [None]:
df_wheather_spn.columns

: 

In [None]:

[df_wheather_spn[col].unique() for col in ['clouds_all', 'weather_id', 'weather_main', 'weather_description',
       'weather_icon']]

: 

In [None]:
df_wheather_spn.isna().sum()

: 

---
Agrupaciones:

---

##  **1.3. Análisis visual**<a id='13'></a>

### Visualizaciones, primeros análisis.

In [None]:
plt.style.use('ggplot')

: 

In [None]:
df_spn_day_2017.corr()[df_spn_day_2017.corr()>0.55]

: 

In [None]:
plt.figure(figsize=(15,10))
sns.heatmap(df_spn_month_2016.corr(),annot=True,);


: 

In [None]:
df_spn_day.boxplot(rot='90',figsize=(15,10))

: 

In [None]:
df_spn_day_2015.boxplot(rot='90',figsize=(15,10))

: 

In [None]:
# Pie chart, where the slices will be ordered and plotted counter-clockwise:
from tkinter import *
x1=df_spn_day.loc[:,df_spn_day.columns.str.startswith('generation')].sum()
labels = df_spn_day.columns
sizes = (x1/x1.sum())*100
explode = (0.1, 0.1,0.1,0.1,0.1,0.3,0.3,0.3,0.1,0.1,0.3,0.3,0.1,0.3)  # only "explode" the 2nd slice (i.e. 'Hogs')
colors=['burlywood','tan','gray','darkgray','lemonchiffon','cadetblue','powderblue',
'slategray','lightcoral','lightgray','seagreen',
'khaki','lightseagreen','mediumaquamarine']

fig1, ax1 = plt.subplots(figsize=(20,10))
ax1.pie(sizes, explode=explode, autopct='%1.1f%%',
        shadow=True, startangle=90,colors=colors,pctdistance=1.1)
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax1.legend(loc='best',labels=labels)
plt.title('Energías España 2015-2019')

plt.show()

: 

In [None]:
sns.pairplot(df_spn_month.corr(),kind='kde',);

: 

In [None]:
sns.pairplot(df_spn_month.corr());

: 

In [None]:
df_spn_month_2018.boxplot(rot='90',figsize=(20,10))

: 

In [None]:
plt.style.use('ggplot')

: 

In [None]:
df_spn_day_rnw.boxplot(rot='90',figsize=(20,10))

: 

In [None]:
df_spn_day_nrnw.boxplot(rot='90',figsize=(15,10))

: 

In [None]:
sns.heatmap(df_spn_day_rnw.corr(),annot=True);

: 

In [None]:
plt.plot(df_spn_day.groupby(df_spn_day['time'].dt.year))

: 

: 

In [None]:
# Pie chart, where the slices will be ordered and plotted counter-clockwise:
from tkinter import *
x1=df_spn_day_rnw.sum()
labels = df_spn_day_rnw.columns
sizes = (x1/x1.sum())*100
explode = (0.1,)*df_spn_day_rnw.shape[1] # only "explode" the 2nd slice (i.e. 'Hogs')
colorsrwn=['burlywood','cadetblue','powderblue',
'slategray','seagreen',
'khaki','mediumaquamarine']
colorsnrwn=['tan','gray','darkgray','lemonchiffon','lightcoral','lightgray','seagreen',
'lightseagreen']

fig1, ax1 = plt.subplots(figsize=(20,10))
ax1.pie(sizes, explode=explode, colors=colorsrwn,autopct='%1.1f%%',
        shadow=True, startangle=90,pctdistance=1.1)
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax1.legend(loc='best',labels=labels)
plt.title('Energías renovables España 2015-2019')

plt.show()

: 

In [None]:
x1=df_spn_day_nrnw.sum()
labels = df_spn_day_nrnw.columns
sizes = (x1/x1.sum())*100
explode = (0.1,)*df_spn_day_nrnw.shape[1] # only "explode" the 2nd slice (i.e. 'Hogs')
fig1, ax1 = plt.subplots(figsize=(20,10))
ax1.pie(sizes, explode=explode,colors=colorsnrwn, autopct='%1.1f%%',
        shadow=True, startangle=90,pctdistance=1.1)
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax1.legend(loc='best',labels=labels)
plt.title('Energías no renovables España 2015-2019')

: 

: 

##  **1.4. Análisis estadístico**<a id='14'></a>

#  **2. Conclusiones**<a id='2'></a>