# TFM - CREATING BID DATASETS FROM OMIE RAW DATA

## 1. INTRODUCTION

The aim of this Notebook is to create datasets form OMIE webpage and store them locally in a monthly basis as .csv files, in order to be used in other Notebooks, where this information is used to create models, plots, filtering, etc.

First of all, data from OMIE web page are locally downloaded:

* "Cabeceras" (headers): https://www.omie.es/en/file-access-list?parents%5B0%5D=/&parents%5B1%5D=Day-ahead%20Market&parents%5B2%5D=4.%20Bids&dir=Header%20of%20bids%20for%20Day-ahead%20Market&realdir=cab

* "Detalles" (details): https://www.omie.es/en/file-access-list?parents%5B0%5D=/&parents%5B1%5D=Day-ahead%20Market&parents%5B2%5D=4.%20Bids&dir=Day-ahead%20market%20bids%20detail&realdir=det

Daily bids info in OMIE is organized in "cabeceras" (headers) and "detalles" in monthly .zip files. Each of the month files has daily special text files (".1" extension). "Cabeceras" files have aprox. 1500 lines and "Detalles" aprox. 50000.

After unzipping one file, a directory is created with the name "CAB_yyyymm" or "DET_yyyymm" for "Cabeceras" and "Detalles", being "yyyy" and "mm" the corresponding year and month. In each directory text daily files with the following name: "CAB_yyyymmdd.1" and "DET_yyyymmdd.1" (being "yyyy", "mm", and "dd", the year, month and day of each file).

In the following sections, "cabeceras" and "detalles" files are explored and monthly dataframes are created and merged. Once this monthly merging files are created, they are locally stored as .csv files. Note that each monthly file has aprox. 1.5M lines, so it is important to run this Notebook only when it is necesary to retreive information from OMIE.

In [278]:
import pandas as pd
import numpy as np

## 2. EXPLORING "CABECERAS" FILES

First of all, "Cabeceras" files will be exprored, figuring out the way the information is included in the daily text files. An example file from september 2020 is considered.

In [222]:
### CHANGE PATH TO './RawData/OMIE/BIDS/CAB/cab_202009/' ###

cab_path = '/home/dsc/Documents/TFM/Data/OMIE/CAB/cab_202009/'

In [223]:
#Exploring a directory downloaded from OMIE web page, after unzipping it.

cab_list = !ls -1 $cab_path

In [224]:
cab_list

['CAB_20200901.1',
 'CAB_20200902.1',
 'CAB_20200903.1',
 'CAB_20200904.1',
 'CAB_20200905.1',
 'CAB_20200906.1',
 'CAB_20200907.1',
 'CAB_20200908.1',
 'CAB_20200909.1',
 'CAB_20200910.1',
 'CAB_20200911.1',
 'CAB_20200912.1',
 'CAB_20200913.1',
 'CAB_20200914.1',
 'CAB_20200915.1',
 'CAB_20200916.1',
 'CAB_20200917.1',
 'CAB_20200918.1',
 'CAB_20200919.1',
 'CAB_20200920.1',
 'CAB_20200921.1',
 'CAB_20200922.1',
 'CAB_20200923.1',
 'CAB_20200924.1',
 'CAB_20200925.1',
 'CAB_20200926.1',
 'CAB_20200927.1',
 'CAB_20200928.1',
 'CAB_20200929.1',
 'CAB_20200930.1']

In [225]:
#Reading all lines from each of the daily files in the directory.

cab_09 = []

for archive in cab_list:
    #errors='replace' must be added because there are special characters inside the description 
    #(mostly ñ, and spanish accents)
    with open(cab_path + archive, errors='replace') as f: 
        lines = f.readlines()
        cab_09.append(lines)
        print(len(lines))

1483
1488
1465
1488
1470
1450
1470
1474
1475
1477
1484
1467
1452
1478
1474
1463
1461
1463
1465
1443
1481
1478
1481
1474
1484
1471
1454
1480
1482
1477


In [226]:
#Checking the way to obtain from the name of the files the date of each file.
len(archive)

14

In [227]:
archive[4:12]

'20200930'

In [228]:
archive[8:12]

'0930'

In [229]:
type(archive[8:12])

str

In [230]:
#Checking the way to obtain the information from each file.
#cab_09 is a list of list. The first list contains the daily files, the list in the list is the information
#from each text daily file.
type(cab_09) , len(cab_09)

(list, 30)

In [231]:
type(cab_09[0]) ,  len(cab_09[0])

(list, 1483)

In [232]:
type(cab_09[29]) ,  len(cab_09[29])

(list, 1477)

In [233]:
#The information is retreived in strings of 170 length.
type(cab_09[0][0]) , len(cab_09[0][0])

(str, 170)

In [234]:
#Checking that all lines' length is 170

for x in range(len(cab_09)):
    for i in cab_09[x]:
        if len(i) != 170:
            print(len(i))
    print('OK')

OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK


In [235]:
#Checking the structure of each line
x = cab_09[0][0]
x

'1696149  6EDPC2  EDP COMERCIAL COMPRA (PORT)   CNO            0.000            0.000    0.0    0.0            0.000            0.000 6000.0    0.0    0.0 220100531155344\n'

In [236]:
len(x)

170

In [237]:
#Bid code: this code for each day is the bid code will be used to find the bid in the "Detalles" file.
#It is important to be aware that a bid code can be the same in different days.
x[0:7]

'1696149'

In [238]:
#Bid version.
x[7:10]

'  6'

In [239]:
#Bid unit. This is a special code for each bid agent (or unit).
x[10:17]

'EDPC2  '

In [240]:
#Unit description.
x[17:47]

'EDP COMERCIAL COMPRA (PORT)   '

In [241]:
#Sell or Buy indicator for each bid.
x[47:50]

'CNO'

In [242]:
#N/A.
x[50:67]

'            0.000'

In [243]:
#N/A.
x[67:84]

'            0.000'

In [244]:
#N/A.
x[84:91]

'    0.0'

In [245]:
#N/A.
x[91:98]

'    0.0'

In [246]:
#Maximum power increasing
x[98:115]

'            0.000'

In [247]:
#Maximum power decreasing
x[115:132]

'            0.000'

In [248]:
#Maximum power (MW)
x[132:139]

' 6000.0'

In [249]:
#Maximum start-up power
x[139:146]

'    0.0'

In [250]:
#Maximum shut-off power
x[146:153]

'    0.0'

In [251]:
#Interconexion code
x[153:170]

' 220100531155344\n'

In [252]:
[x[0:7] + '^' + x[7:10] +'^'+ x[10:17] +'^'+ x[17:47] +'^'+ x[47:50] +'^'+ x[132:139]]

['1696149^  6^EDPC2  ^EDP COMERCIAL COMPRA (PORT)   ^CNO^ 6000.0']

In [253]:
#Creating a unique list with strings separated by ',' and filtered with only the needed info:
#[Bid code, Version num, Bid unit, Unit description, Buy/Sell indicator, Maximum power]
#[[0:7],[7:10],[10:17],[17:47],[47:50],[132:139]]
#Year, month and day are also included
#First of all they are joined with "^" (to avoid problems with spaces in the description) and 
#then "split" method is used. 

cab_total_09 = []

for day in range(len(cab_09)):
    for count in range(len(cab_09[day])):
        cab_total_09.append((cab_09[day][count][0:7] +'^'+ 
                            cab_09[day][count][7:10] +'^'+ 
                            cab_09[day][count][10:17] +'^'+ 
                            cab_09[day][count][17:47] +'^'+ 
                            cab_09[day][count][47:50] +'^'+ 
                            cab_09[day][count][132:139] +'^'+ 
                            cab_list[day][4:8] + '^' + 
                            cab_list[day][8:10] + '^' + 
                            cab_list[day][10:12]).split('^'))

In [254]:
#List with all the info

len(cab_total_09)

44152

In [255]:
cab_total_09[0]

['1696149',
 '  6',
 'EDPC2  ',
 'EDP COMERCIAL COMPRA (PORT)   ',
 'CNO',
 ' 6000.0',
 '2020',
 '09',
 '01']

In [256]:
type(cab_total_09)

list

In [257]:
cab_total_09[29390]

['1696149',
 '  6',
 'EDPC2  ',
 'EDP COMERCIAL COMPRA (PORT)   ',
 'CNO',
 ' 6000.0',
 '2020',
 '09',
 '21']

In [258]:
len(cab_total_09[29390])

9

In [259]:
type(cab_total_09[29390])

list

In [260]:
#Creating a dataframe with the right name of the columns
#columns=['Bid_Code', 'Num_Version', 'Bid_Unit', 'Unit_Description', 'Sell/Buy', 'Pot_max', 'Year','Month','Day']

df_cab_09 = pd.DataFrame(cab_total_09,
                        columns=['Bid_Code', 
                                 'Num_Version', 
                                 'Bid_Unit', 
                                 'Unit_Description', 
                                 'Sell_Buy', 
                                 'Pot_max', 
                                 'Year',
                                 'Month',
                                 'Day'])

In [261]:
df_cab_09.head()

Unnamed: 0,Bid_Code,Num_Version,Bid_Unit,Unit_Description,Sell_Buy,Pot_max,Year,Month,Day
0,1696149,6,EDPC2,EDP COMERCIAL COMPRA (PORT),CNO,6000.0,2020,9,1
1,1717319,3,EONUC01,EONUR CONSUMO CLIENTES TUR,CNO,400.0,2020,9,1
2,1811311,7,IPG,C.H. IP GENERACION,VNO,84.0,2020,9,1
3,426609,12,IPB,C.H.B.IP BOMBEO,CNO,99.0,2020,9,1
4,2532852,28,NRENVD1,NRENO-VENTA,VNO,1.7,2020,9,1


In [262]:
#An example with spaces in the description is checked
df_cab_09.iloc[549]

Bid_Code                                   6469718
Num_Version                                      0
Bid_Unit                                   VISAC01
Unit_Description    SAMOYEDO, S.L.                
Sell_Buy                                       CNO
Pot_max                                        1.2
Year                                          2020
Month                                           09
Day                                             01
Name: 549, dtype: object

## 3. EXPLORING "DETALLES" FILES

In this section, "Detalles" files are explored in the same way it was done in the previous section with "Cabeceras" files.

"Detalles" files have much more information than "Cabeceras" (aprox. 50000 lines per each daily file), so only a daily file has been chosen as an example.

In [266]:
### CHANGE PATH TO './RawData/OMIE/BIDS/DET/det_202009/' ###

det_path_09 = '/home/dsc/Documents/TFM/Data/OMIE/DET/det_202009/'
det_list = !ls -1 $det_path_09

In [267]:
det_path_0901 = det_path_09 + 'DET_20200901.1'

In [268]:
det_0901 = []

with open(det_path_0901) as det:
        det_lines = det.readlines()
        det_0901.append(det_lines)
        print(len(det_lines))

50676


In [269]:
#Only one file
len(det_0901)

1

In [270]:
len(det_0901[0])

50676

In [271]:
len(det_0901[0][0])

58

In [176]:
#Checking that all lines' length is 58

for x in range(len(det_0901)):
    for i in det_0901[x]:
        if len(i) != 58:
            print(len(i))
    print('OK')

OK


In [177]:
y = det_0901[0][0]
y

'1696149  622 1            0.000            0.010    0.1SS\n'

In [178]:
#Bid code
y[0:7]

'1696149'

In [179]:
#Version number
y[7:10]

'  6'

In [180]:
#Bid period (from 1 to 24 - it is the bid hour)
y[10:12]

'22'

In [181]:
#Block number. Each hour can be split i several blocks to offer different prices for different 
#amount of energy (the sum of the energy for all the blocks cannot be higher that the maximum power for this hour)
y[12:14]

' 1'

In [182]:
#Bid price per energy (€/MWh)
y[31:48]

'            0.010'

In [183]:
#Bid energy (MWh)
y[48:55]

'    0.1'

In [184]:
#Creating a unique list with strings separated by ',' and filtered with only the needed info:
#[Bid code, Version num, Bid period, Block, Price, Energy]
#[[0:7],[7:10],[10:17],[17:47],[47:50],[132:139]]
#Year, month and day are also included
#First of all they are joined with "^" (to avoid problems with spaces in the description) and 
#then "split" method is used. 

det_total_0901 = []

for day in range(len(det_0901)):
    for count in range(len(det_0901[day])):
        det_total_0901.append((det_0901[day][count][0:7] +'^'+ 
                            det_0901[day][count][7:10] +'^'+ 
                            det_0901[day][count][10:12] +'^'+ 
                            det_0901[day][count][12:14] +'^'+ 
                            det_0901[day][count][31:48] +'^'+ 
                            det_0901[day][count][48:55] +'^'+ 
                            det_list[day][4:8] + '^' + 
                            det_list[day][8:10] + '^' + 
                            det_list[day][10:12]).split('^'))


In [185]:
#List with all the info for one day.

len(det_total_0901)

50676

In [186]:
det_total_0901[0]

['1696149',
 '  6',
 '22',
 ' 1',
 '            0.010',
 '    0.1',
 '2020',
 '09',
 '01']

In [187]:
type(det_total_0901)

list

In [188]:
len(det_total_0901[29390])

9

In [189]:
#Creating a dataframe with the right name of the columns
#columns=[Bid code, Version num, Bid period, Block, Price, Energy, Year, Month, Day]
df_det_0901 = pd.DataFrame(det_total_0901,
                        columns=['Bid_Code', 
                                 'Num_Version', 
                                 'Period', 
                                 'Block', 
                                 'Price', 
                                 'Energy', 
                                 'Year',
                                 'Month',
                                 'Day'])

In [190]:
df_det_0901.head()

Unnamed: 0,Bid_Code,Num_Version,Period,Block,Price,Energy,Year,Month,Day
0,1696149,6,22,1,0.01,0.1,2020,9,1
1,1717319,3,1,1,0.0,1.0,2020,9,1
2,1717319,3,2,1,0.0,1.0,2020,9,1
3,1717319,3,3,1,0.0,1.0,2020,9,1
4,1717319,3,4,1,0.0,1.0,2020,9,1


In [191]:
df_det_0901.dtypes

Bid_Code       object
Num_Version    object
Period         object
Block          object
Price          object
Energy         object
Year           object
Month          object
Day            object
dtype: object

In [192]:
#Example of one bid: bid for each hour (only one block)
df_det_0901.loc[df_det_0901['Bid_Code'] == '1717319']

Unnamed: 0,Bid_Code,Num_Version,Period,Block,Price,Energy,Year,Month,Day
1,1717319,3,1,1,0.0,1.0,2020,9,1
2,1717319,3,2,1,0.0,1.0,2020,9,1
3,1717319,3,3,1,0.0,1.0,2020,9,1
4,1717319,3,4,1,0.0,1.0,2020,9,1
5,1717319,3,5,1,0.0,1.0,2020,9,1
6,1717319,3,6,1,0.0,1.0,2020,9,1
7,1717319,3,7,1,0.0,1.0,2020,9,1
8,1717319,3,8,1,0.0,1.0,2020,9,1
9,1717319,3,9,1,0.0,1.0,2020,9,1
10,1717319,3,10,1,0.0,1.0,2020,9,1


In [193]:
#Example of merging the montly file from "cabeceras" (September 2020) with the daily file from "detalles"
#(September, 1 2020). The meging is "inner", so it is done considering all the columns that have the same name:
#Bid_Code, Num Version, Year, Month, Day 

df_cab_09.merge(df_det_0901,how = 'inner')

Unnamed: 0,Bid_Code,Num_Version,Bid_Unit,Unit_Description,Sell_Buy,Pot_max,Year,Month,Day,Period,Block,Price,Energy
0,1696149,6,EDPC2,EDP COMERCIAL COMPRA (PORT),CNO,6000.0,2020,09,01,22,1,0.010,0.1
1,1717319,3,EONUC01,EONUR CONSUMO CLIENTES TUR,CNO,400.0,2020,09,01,1,1,0.000,1.0
2,1717319,3,EONUC01,EONUR CONSUMO CLIENTES TUR,CNO,400.0,2020,09,01,2,1,0.000,1.0
3,1717319,3,EONUC01,EONUR CONSUMO CLIENTES TUR,CNO,400.0,2020,09,01,3,1,0.000,1.0
4,1717319,3,EONUC01,EONUR CONSUMO CLIENTES TUR,CNO,400.0,2020,09,01,4,1,0.000,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
50671,6469400,4,NEXUC01,NEXUS ENERG�A COMPRA (ESP),CNO,900.0,2020,09,01,24,21,25.720,0.1
50672,6469400,4,NEXUC01,NEXUS ENERG�A COMPRA (ESP),CNO,900.0,2020,09,01,24,22,25.220,0.1
50673,6469400,4,NEXUC01,NEXUS ENERG�A COMPRA (ESP),CNO,900.0,2020,09,01,24,23,24.720,0.1
50674,6469400,4,NEXUC01,NEXUS ENERG�A COMPRA (ESP),CNO,900.0,2020,09,01,24,24,24.220,0.1


In [194]:
#Example for one Bid_Code
df_cab_09.merge(df_det_0901,how = 'inner').loc[df_det_0901['Bid_Code'] == '1717319']

Unnamed: 0,Bid_Code,Num_Version,Bid_Unit,Unit_Description,Sell_Buy,Pot_max,Year,Month,Day,Period,Block,Price,Energy
1,1717319,3,EONUC01,EONUR CONSUMO CLIENTES TUR,CNO,400.0,2020,9,1,1,1,0.0,1.0
2,1717319,3,EONUC01,EONUR CONSUMO CLIENTES TUR,CNO,400.0,2020,9,1,2,1,0.0,1.0
3,1717319,3,EONUC01,EONUR CONSUMO CLIENTES TUR,CNO,400.0,2020,9,1,3,1,0.0,1.0
4,1717319,3,EONUC01,EONUR CONSUMO CLIENTES TUR,CNO,400.0,2020,9,1,4,1,0.0,1.0
5,1717319,3,EONUC01,EONUR CONSUMO CLIENTES TUR,CNO,400.0,2020,9,1,5,1,0.0,1.0
6,1717319,3,EONUC01,EONUR CONSUMO CLIENTES TUR,CNO,400.0,2020,9,1,6,1,0.0,1.0
7,1717319,3,EONUC01,EONUR CONSUMO CLIENTES TUR,CNO,400.0,2020,9,1,7,1,0.0,1.0
8,1717319,3,EONUC01,EONUR CONSUMO CLIENTES TUR,CNO,400.0,2020,9,1,8,1,0.0,1.0
9,1717319,3,EONUC01,EONUR CONSUMO CLIENTES TUR,CNO,400.0,2020,9,1,9,1,0.0,1.0
10,1717319,3,EONUC01,EONUR CONSUMO CLIENTES TUR,CNO,400.0,2020,9,1,10,1,0.0,1.0


Now, it is time to try to deal with a whole month. September, 2020 is chosen as an example.

In [195]:
det_09 = []

for archive in det_list:
    with open(det_path_09 + archive, errors='replace') as f:
        lines = f.readlines()
        det_09.append(lines)
        print(len(lines))

50676
50276
49509
50102
48885
48385
48974
49975
49167
49163
49939
47797
48843
49202
48848
48730
48658
50117
49236
48829
50030
49036
49580
49526
50182
49290
49451
49594
48059
48770


In [196]:
#Checking that all lines' length is 58

for x in range(len(det_09)):
    for i in det_09[x]:
        if len(i) != 58:
            print(len(i))
    print('OK')

OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK
OK


In [197]:
det_total_09 = []

for day in range(len(det_09)):
    for count in range(len(det_09[day])):
        det_total_09.append((det_09[day][count][0:7] +'^'+ 
                            det_09[day][count][7:10] +'^'+ 
                            det_09[day][count][10:12] +'^'+ 
                            det_09[day][count][12:14] +'^'+ 
                            det_09[day][count][31:48] +'^'+ 
                            det_09[day][count][48:55] +'^'+ 
                            det_list[day][4:8] + '^' + 
                            det_list[day][8:10] + '^' + 
                            det_list[day][10:12]).split('^'))

In [198]:
#List with all the monthly info.

len(det_total_09)

1478829

In [199]:
det_total_09[0]

['1696149',
 '  6',
 '22',
 ' 1',
 '            0.010',
 '    0.1',
 '2020',
 '09',
 '01']

In [200]:
type(det_total_09)

list

In [201]:
len(det_total_09[29390])

9

In [202]:
#Creating a dataframe with the right name of the columns
#columns=[Bid code, Version num, Bid period, Block, Price, Energy, Year, Month, Day]

df_det_09 = pd.DataFrame(det_total_09,
                        columns=['Bid_Code', 
                                 'Num_Version', 
                                 'Period', 
                                 'Block', 
                                 'Price', 
                                 'Energy', 
                                 'Year',
                                 'Month',
                                 'Day'])

In [203]:
df_det_09.head()

Unnamed: 0,Bid_Code,Num_Version,Period,Block,Price,Energy,Year,Month,Day
0,1696149,6,22,1,0.01,0.1,2020,9,1
1,1717319,3,1,1,0.0,1.0,2020,9,1
2,1717319,3,2,1,0.0,1.0,2020,9,1
3,1717319,3,3,1,0.0,1.0,2020,9,1
4,1717319,3,4,1,0.0,1.0,2020,9,1


In [204]:
df_det_09.shape

(1478829, 9)

In [205]:
#Creating the whole September 2020 dataframe with "cabeceras" and "detalles" merged.

df_merge_09 = df_cab_09.merge(df_det_09,how = 'inner')

In [206]:
df_merge_09.head()

Unnamed: 0,Bid_Code,Num_Version,Bid_Unit,Unit_Description,Sell_Buy,Pot_max,Year,Month,Day,Period,Block,Price,Energy
0,1696149,6,EDPC2,EDP COMERCIAL COMPRA (PORT),CNO,6000.0,2020,9,1,22,1,0.01,0.1
1,1717319,3,EONUC01,EONUR CONSUMO CLIENTES TUR,CNO,400.0,2020,9,1,1,1,0.0,1.0
2,1717319,3,EONUC01,EONUR CONSUMO CLIENTES TUR,CNO,400.0,2020,9,1,2,1,0.0,1.0
3,1717319,3,EONUC01,EONUR CONSUMO CLIENTES TUR,CNO,400.0,2020,9,1,3,1,0.0,1.0
4,1717319,3,EONUC01,EONUR CONSUMO CLIENTES TUR,CNO,400.0,2020,9,1,4,1,0.0,1.0


In [207]:
df_merge_09.shape

(1478829, 13)

In [208]:
#We can see the amout of Sell and Buy bids
df_merge_09['Sell_Buy'].value_counts()

VNO    1072350
CNO     405759
VNP        720
Name: Sell_Buy, dtype: int64

In [209]:
#Dataframe with Sell bids
df_merge_09_V = df_merge_09.loc[df_merge_09['Sell_Buy'] != 'CNO']

In [210]:
df_merge_09_V.shape

(1073070, 13)

In [211]:
df_merge_09_V.head()

Unnamed: 0,Bid_Code,Num_Version,Bid_Unit,Unit_Description,Sell_Buy,Pot_max,Year,Month,Day,Period,Block,Price,Energy
25,1811311,7,IPG,C.H. IP GENERACION,VNO,84.0,2020,9,1,23,1,150.0,0.1
29,2532852,28,NRENVD1,NRENO-VENTA,VNO,1.7,2020,9,1,9,1,0.0,0.1
30,2532852,28,NRENVD1,NRENO-VENTA,VNO,1.7,2020,9,1,10,1,0.0,0.4
31,2532852,28,NRENVD1,NRENO-VENTA,VNO,1.7,2020,9,1,11,1,0.0,0.7
32,2532852,28,NRENVD1,NRENO-VENTA,VNO,1.7,2020,9,1,12,1,0.0,1.0


In [212]:
#Looking for the September 2020 bids of PALOS1
df_merge_09_PALOS1 = df_merge_09_V.loc[df_merge_09_V['Bid_Unit'].str.contains('PALOS1')]

In [213]:
df_merge_09_PALOS1.head()

Unnamed: 0,Bid_Code,Num_Version,Bid_Unit,Unit_Description,Sell_Buy,Pot_max,Year,Month,Day,Period,Block,Price,Energy
18724,6468171,3,PALOS1,C.C. PALOS 1,VNO,394.1,2020,9,1,1,12,180.3,394.1
18725,6468171,3,PALOS1,C.C. PALOS 1,VNO,394.1,2020,9,1,2,12,180.3,394.1
18726,6468171,3,PALOS1,C.C. PALOS 1,VNO,394.1,2020,9,1,3,12,180.3,394.1
18727,6468171,3,PALOS1,C.C. PALOS 1,VNO,394.1,2020,9,1,4,1,1.13,75.0
18728,6468171,3,PALOS1,C.C. PALOS 1,VNO,394.1,2020,9,1,4,12,180.3,319.1


In [214]:
df_merge_09_PALOS1.iloc[0]

Bid_Code                                   6468171
Num_Version                                      3
Bid_Unit                                   PALOS1 
Unit_Description    C.C. PALOS 1                  
Sell_Buy                                       VNO
Pot_max                                      394.1
Year                                          2020
Month                                           09
Day                                             01
Period                                           1
Block                                           12
Price                                      180.300
Energy                                       394.1
Name: 18724, dtype: object

In [215]:
df_merge_09_PALOS1.shape

(5970, 13)

## 4. CRETING .csv FILES TO STORE MONTHLY MERGED DATAFRAMES

In this section, a function to create a monthly dataframe with the information from "Cabeceras" and "Detalles" is created. This function will be used to store locally the information in .csv files in a monthly due to the size of the files. 

In [273]:
def OMIE_merge_month(month, year):
    
    #In this part of the code "Cabeceras" (CAB) files will be read
    
    #Path where CAB files are located
    
    ### CHANGE PATH TO './RawData/OMIE/BIDS/' ###
    path = '/home/dsc/Documents/TFM/Data/OMIE/'
    
    cab_path = path + 'CAB/cab_' + year + month + '/'
    cab_list = !ls -1 $cab_path
    
    #Files from the chosen month will be stored in cab_month as a list of lists
    #(num. files x num. lines)
    cab_month = []
    for archive in cab_list:
        with open(cab_path + archive, errors='replace') as f:
            lines = f.readlines()
            cab_month.append(lines)
    
    #All lines are included in a single line, including day, month and year (read from the file name)
    cab_total_month = []
    for day in range(len(cab_month)):
        for count in range(len(cab_month[day])):
            cab_total_month.append((cab_month[day][count][0:7] +'^'+ 
                            cab_month[day][count][7:10] +'^'+ 
                            cab_month[day][count][10:17] +'^'+ 
                            cab_month[day][count][17:47] +'^'+ 
                            cab_month[day][count][47:50] +'^'+ 
                            cab_month[day][count][132:139] +'^'+ 
                            cab_list[day][4:8] + '^' + 
                            cab_list[day][8:10] + '^' + 
                            cab_list[day][10:12]).split('^'))
            
    #The list of lists is trasnformed into a datafreme with its corresponding names
    df_cab_month = pd.DataFrame(cab_total_month,
                        columns=['Bid_Code', 
                                 'Num_Version', 
                                 'Bid_Unit', 
                                 'Unit_Description', 
                                 'Sell_Buy', 
                                 'Pot_max', 
                                 'Year',
                                 'Month',
                                 'Day'])
    
    #A similar is process is done for "DETALLE" (DET) files
    
    #Path where DET files are located
    det_path = path + '/DET/det_' + year + month + '/'
    det_list = !ls -1 $det_path
    
    det_month = []
    for archive in det_list:
        with open(det_path + archive) as f:
            lines = f.readlines()
            det_month.append(lines)  

    det_total_month = []
    for day in range(len(det_month)):
        for count in range(len(det_month[day])):
            det_total_month.append((det_month[day][count][0:7] +'^'+ 
                            det_month[day][count][7:10] +'^'+ 
                            det_month[day][count][10:12] +'^'+ 
                            det_month[day][count][12:14] +'^'+ 
                            det_month[day][count][31:48] +'^'+ 
                            det_month[day][count][48:55] +'^'+ 
                            det_list[day][4:8] + '^' + 
                            det_list[day][8:10] + '^' + 
                            det_list[day][10:12]).split('^'))
    
    df_det_month = pd.DataFrame(det_total_month,
                        columns=['Bid_Code', 
                                 'Num_Version', 
                                 'Period', 
                                 'Block', 
                                 'Price', 
                                 'Energy', 
                                 'Year',
                                 'Month',
                                 'Day'])
    
    #A new datafreme is created with both files merged
    df_merge_month = df_cab_month.merge(df_det_month,how = 'inner')
    
    return df_merge_month

Now we will create 12 dataframes for one year (from November 2019 to October 2020), and they will be stored locally as ".csv" files, to be used in other notebooks where yearly bid dataframes for different units will be created.

In [277]:
### CHANGE PATH TO './RawData/OMIE/BIDS/CAB_DET_merged/' ###
output_path = '/home/dsc/Documents/TFM/Data/OMIE/CAB_DET/'

In [3]:
OMIE_112019 = OMIE_merge_month('11', '2019')

In [4]:
OMIE_112019.to_csv(output_path + 'OMIE_112019.csv')

In [7]:
OMIE_122019 = OMIE_merge_month('12', '2019')

In [8]:
OMIE_122019.to_csv(output_path + 'OMIE_122019.csv')

In [5]:
OMIE_012020 = OMIE_merge_month('01', '2020')

In [6]:
OMIE_012020.to_csv(output_path + 'OMIE_012020.csv')

In [9]:
OMIE_022020 = OMIE_merge_month('02', '2020')

In [10]:
OMIE_022020.to_csv(output_path + 'OMIE_022020.csv')

In [3]:
OMIE_032020 = OMIE_merge_month('03', '2020')

In [4]:
OMIE_032020.to_csv(output_path + 'OMIE_032020.csv')

In [5]:
OMIE_042020 = OMIE_merge_month('04', '2020')

In [6]:
OMIE_042020.to_csv(output_path + 'OMIE_042020.csv')

In [7]:
OMIE_052020 = OMIE_merge_month('05', '2020')

In [8]:
OMIE_052020.to_csv(output_path + 'OMIE_052020.csv')

In [9]:
OMIE_062020 = OMIE_merge_month('06', '2020')

In [10]:
OMIE_062020.to_csv(output_path + 'OMIE_062020.csv')

In [3]:
OMIE_072020 = OMIE_merge_month('07', '2020')

In [4]:
OMIE_072020.to_csv(output_path + 'OMIE_072020.csv')

In [5]:
OMIE_082020 = OMIE_merge_month('08', '2020')

In [6]:
OMIE_082020.to_csv(output_path + 'OMIE_082020.csv')

In [7]:
OMIE_092020 = OMIE_merge_month('09', '2020')

In [8]:
OMIE_092020.to_csv(output_path + 'OMIE_092020.csv')

In [9]:
OMIE_102020 = OMIE_merge_month('10', '2020')

In [10]:
OMIE_102020.to_csv(output_path + 'OMIE_102020.csv')