# SEIN peak demand data collected from meters

Peak demand of the Peruvian national electric system (SEIN) collected from meters in 15-minute periods and reported monthly by the COES (Peru's system operator). This data is published in Excel format.

Source: https://www.coes.org.pe/Portal/portalinformacion/demanda?indicador=maxima  

Description of raw data extracted from daily reports:

- "Fecha/Hora": Date and time in "dd-mm-yyyy hh:mm" format.  
- "Total (MW)": Total power dispatch  of COES's generators in MW.  
- "Importación (MW)": Imports from Ecuador in MW.  
- "Exportación (MW)": Exports to Ecuador in MW.  
- "Máxima Demanda (MW)": Power demand of the national electric system (SEIN) in MW. 

In [4]:
import requests
from datetime import datetime, timedelta
import os
import time
import sagemaker
from sagemaker import get_execution_role
import pandas as pd
import glob
import os

## Donwload data

In [3]:
start_time = time.time()

form_data = {
    "fecha": "02 2023",
    "tiposEmpresa": "1,2,3,4,5",
    "empresas": "69,10422,11772,12439,10481,12056,13196,12896,10420,12708,180,13165,11146,10901,12584,11095,4,5,17,11153,58,9,19,76,10684,30,27,40,11228,23,24,2,11563,11389,11129,11509,14173,11412,206,10636,11544,11058,12096,12097,11395,13783,11429,10552,18,48,10725,11527,10647,11840,11841,10974,11258,11644,10582,11064,11444,11940,12364,11149,11528,12634,47,12479,11185,13,10916,11100,13965,10755,67,108,11217,12190,12480,11102,13120,11053,11218,10984,11063,149,11323,11101,13966,11486,61,10913,10587,8,138,6,11567,12758,11103,10767,10,14342,11894",
    "tiposGeneracion": "4,1,3,2",
    "central": "1"
  }

export_url = "https://www.coes.org.pe/Portal/Medidores/Ranking/exportar"
download_url = "https://www.coes.org.pe/Portal/Medidores/Ranking/descargar"

post_url = export_url
get_url = download_url

start_date = datetime(2003, 1, 1)
end_date = datetime(2023, 3, 1)

payload = form_data.copy()
payload['fecha'] = start_date.strftime('%m %Y')

session = requests.Session()

if not os.path.exists("./peak_demand"):
    os.makedirs("./peak_demand")

count_files = 0

while start_date <= end_date:
    filename = f"ranking_demand_{start_date.strftime('%Y-%m')}.xlsx"
    file_path = os.path.join("peak_demand", filename)

    response = session.post(post_url, data=payload)

    if response.status_code == 200:
        response = session.get(get_url, stream=True)
        with open(file_path, 'wb') as f:
            for chunk in response.iter_content(chunk_size=8192):
                f.write(chunk)
        print(f'File downloaded successfully: {filename}')
        count_files += 1

    start_date = start_date.replace(day=1) + timedelta(days=32)
    payload['fecha'] = start_date.strftime('%m %Y')

print(f'Downloaded files: {count_files} files')

end_time = time.time()
time_taken = end_time - start_time
print(f'Time taken: {time_taken} seconds')

File downloaded successfully: ranking_demand_2003-01.xlsx
File downloaded successfully: ranking_demand_2003-02.xlsx
File downloaded successfully: ranking_demand_2003-03.xlsx
File downloaded successfully: ranking_demand_2003-04.xlsx
File downloaded successfully: ranking_demand_2003-05.xlsx
File downloaded successfully: ranking_demand_2003-06.xlsx
File downloaded successfully: ranking_demand_2003-07.xlsx
File downloaded successfully: ranking_demand_2003-08.xlsx
File downloaded successfully: ranking_demand_2003-09.xlsx
File downloaded successfully: ranking_demand_2003-10.xlsx
File downloaded successfully: ranking_demand_2003-11.xlsx
File downloaded successfully: ranking_demand_2003-12.xlsx
File downloaded successfully: ranking_demand_2004-01.xlsx
File downloaded successfully: ranking_demand_2004-02.xlsx
File downloaded successfully: ranking_demand_2004-03.xlsx
File downloaded successfully: ranking_demand_2004-04.xlsx
File downloaded successfully: ranking_demand_2004-05.xlsx
File downloade

## Merge and save data into a csv file

In [None]:
%pip install openpyxl

In [5]:
merged_data = pd.DataFrame()

for file_path in glob.glob('./peak_demand/*.xlsx'):
    df = pd.read_excel(file_path, sheet_name="Ordenamiento MD", header=9)
    filename = os.path.basename(file_path)
    df['filename'] = filename
    
    merged_data = pd.concat([merged_data, df], ignore_index=True)

merged_data = merged_data.iloc[:, 1:]

merged_data.to_csv('peak_demand.csv', index=False,  encoding='utf-8-sig')

In [6]:
df = pd.read_csv('peak_demand.csv')
df.head()

Unnamed: 0,N° de Registos/MES,Fecha/Hora,Total (MW),Importación (MW),Exportación (MW),Máxima Demanda (MW),filename
0,1,27/10/2011 19:00,4787.86208,0.0,0.0,4787.86208,ranking_demand_2011-10.xlsx
1,2,24/10/2011 19:00,4785.41497,0.0,0.0,4785.41497,ranking_demand_2011-10.xlsx
2,3,26/10/2011 19:15,4777.58864,0.0,0.0,4777.58864,ranking_demand_2011-10.xlsx
3,4,26/10/2011 19:30,4777.26644,0.0,0.0,4777.26644,ranking_demand_2011-10.xlsx
4,5,20/10/2011 19:00,4774.50047,0.0,0.0,4774.50047,ranking_demand_2011-10.xlsx


In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 706944 entries, 0 to 706943
Data columns (total 7 columns):
 #   Column               Non-Null Count   Dtype  
---  ------               --------------   -----  
 0   N° de Registos/MES   706944 non-null  int64  
 1   Fecha/Hora           706944 non-null  object 
 2   Total (MW)           706944 non-null  float64
 3   Importación (MW)     706944 non-null  float64
 4   Exportación (MW)     706944 non-null  float64
 5   Máxima Demanda (MW)  706944 non-null  float64
 6   filename             706944 non-null  object 
dtypes: float64(4), int64(1), object(2)
memory usage: 37.8+ MB


## Load data into S3

In [None]:
os.environ['PROJECT_BUCKET'] = ""

In [12]:
session = sagemaker.Session()
bucket = os.getenv("PROJECT_BUCKET")
region = session.boto_region_name
role = get_execution_role()

print("AWS Region: {}".format(region))

AWS Region: us-east-1


In [None]:
!aws s3 sync ./peak_demand s3://{bucket}/data/peak_demand/

In [None]:
!aws s3 cp ./peak_demand.csv s3://{bucket}/data/