# Total power dispatch (total demand)

Total power dispatch data retrieved from SCADA systems and published in 30-minute periods by the COES (Peru's system operator).  The data includes power dispatch from COES and non-COES generators.  
Source: https://www.coes.org.pe/Portal/portalinformacion/demanda  

Description of raw data:  
- "FECHA": Date and time in "dd/mm/yyyy hh:mm" format.  
- "EJECUTADO": Total power dispatch of COES and non-COES generators in MW.  Data retrieved from SCADA systems.
- "PROG. DIARIA": Scheduled power dispatch from daily plan in MW.  
- "PROG. SEMANAL": Scheduled power dispatch from weekly plan in MW.

In [3]:
import pandas as pd
import os
from os import environ
import requests
from datetime import datetime
import time
import sagemaker
from sagemaker import get_execution_role

## Dowload data

In [4]:
start_time = time.time()

start_date = datetime(2003, 1, 1)
end_date = datetime(2023, 2, 28)

form_data = {
    "fechaInicial": f"{start_date.strftime('%d/%m/%Y')}",
    "fechaFinal": f"{end_date.strftime('%d/%m/%Y')}",
}

export_url = "https://www.coes.org.pe/Portal/portalinformacion/exportardemanda"
download_url = "https://www.coes.org.pe/Portal/portalinformacion/descargardemanda"

post_url = export_url
get_url = download_url

payload = form_data.copy()

session = requests.Session()

if not os.path.exists("./total_demand"):
    os.makedirs("./total_demand")

filename = f"total_demand.xlsx"
file_path = os.path.join("./total_demand", filename)

response = session.post(post_url, data=payload)

if response.status_code == 200:
    response = session.get(get_url, stream=True)
    with open(file_path, 'wb') as f:
        for chunk in response.iter_content(chunk_size=8192):
            f.write(chunk)
    print(f'File downloaded successfully: {filename}')


end_time = time.time()
time_taken = end_time - start_time
print(f'Time taken: {time_taken} seconds')

File downloaded successfully: total_demand.xlsx
Time taken: 58.45266890525818 seconds


## Save data as a csv file

In [5]:
df = pd.read_excel(file_path, header=3)
df.to_csv('./total_demand.csv', index=False,  encoding='utf-8-sig')
df

Unnamed: 0,FECHA,EJECUTADO,PROG. DIARIA,PROG. SEMANAL
0,01/01/2003 00:30,2306.76635,2304.71554,2292.96587
1,01/01/2003 01:00,2237.36810,2208.29031,2225.53541
2,01/01/2003 01:30,2150.86425,2125.24247,2147.93319
3,01/01/2003 02:00,2063.83669,2090.81638,2092.68125
4,01/01/2003 02:30,1990.51493,2035.67195,2015.89887
...,...,...,...,...
350587,28/02/2023 22:00,7275.16939,7317.62767,7443.17706
350588,28/02/2023 22:30,7246.85625,7287.74214,7335.39421
350589,28/02/2023 23:00,7012.36879,7122.78297,7289.98850
350590,28/02/2023 23:30,6864.95446,6903.65765,6844.23093


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 350592 entries, 0 to 350591
Data columns (total 4 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   FECHA          350592 non-null  object 
 1   EJECUTADO      350592 non-null  float64
 2   PROG. DIARIA   349728 non-null  float64
 3   PROG. SEMANAL  350592 non-null  float64
dtypes: float64(3), object(1)
memory usage: 10.7+ MB


## Load data into S3

In [None]:
os.environ['PROJECT_BUCKET'] = ""

In [8]:
session = sagemaker.Session()
bucket = os.getenv("PROJECT_BUCKET")
region = session.boto_region_name
role = get_execution_role()

print("AWS Region: {}".format(region))

AWS Region: us-east-1


In [None]:
!aws s3 sync ./total_demand s3://{bucket}/data/total_demand/

In [None]:
!aws s3 cp ./total_demand.csv s3://{bucket}/data/