# Data Processing

This notebook automates the process of analyzing cross-border electricity trading data between Switzerland and Germany. It does so by:

- Fetching auction data from the Joint Allocation Office (JAO) API.
- Loading day-ahead price data for bidding zones (e.g., CH, DE-LU).
- Combining data into a unified DataFrame for further analysis.
- Ensuring the final dataset is sorted by timestamp.
- Saving the processed data for further analysis.

### Key Data Sources:
1. JAO API: Auction prices and capacities for cross-border electricity trading.
2. Entso-e Transparency Platform: Day-ahead market prices for relevant bidding zones.

### Outputs:
The processed data is saved as a CSV file (`merged_data.csv`) in the `data/processed` folder for downstream analysis.

### API Key:
API KEY: Define your API key (make a file in src/config.py with KEY = 'your_api_key' provided by JAO)

### Import Required Packages
The necessary Python packages for data processing and API interaction are imported in this step.

In [1]:
import sys; sys.path.append("..")

import pandas as pd
import requests
import ast
from datetime import datetime, timedelta
from calendar import monthrange
import os

### Import Project-Specific Functions
These include custom functions for fetching auction data, loading price series, and accessing API keys stored in configuration files.

In [2]:
# Import the fetch_auction_data function
from src.datafeed.upstream import fetch_auction_data
from src.datafeed.downstream import load_price_series
from src.config import KEY

### Define Constants
- API key for accessing JAO data.
- Bidding zones (e.g., CH, DE-LU, DE-AT-LU) to analyze.
- Cross-border trading corridors for analysis.
- Start and end years for data fetching.
Make sure the `KEY` is defined in the `src/config.py` file as described in the project's README.

In [3]:

api_key = KEY

# Set the start and end years
start_year = 2016
end_year = 2023

# Define the bidding zones and years to analyze. This data needs to be downloaded from Entso-e Transparency Platform (see the README.md for more information)
bidding_zones = ['CH', 'DE-LU', 'DE-AT-LU']

# cross-border trading corridors (which will be downloaded from JAO, provided the API key in config.py)
corridors = ["de-ch", "ch-de"] # can be exchanged with different pairs. visit jao.eu for more information

""" FOR THE MOCK DATA, USE THIS CODE BELOW AND COMMENT OUT RESPECTIVE CODE ABOVE """
#bidding_zones = ['TEST1', 'TEST2']
#corridors = ["TEST1-TEST2"]


' FOR THE MOCK DATA, USE THIS CODE BELOW AND COMMENT OUT RESPECTIVE CODE ABOVE '

### Fetch Auction Data from JAO API
This step downloads cross-border electricity trading auction data for the specified corridors and years from JAO API.

In [4]:
try:
    from src.config import KEY  # Ensure the API key is available in the configuration.
except ImportError:
    KEY = None  # Fallback if not available.

# Check if actual corridors are being used
if corridors != ["TEST1-TEST2"]:
    for corridor in corridors:
        # Call the function to fetch and process the data
        auction_data = fetch_auction_data(start_year, end_year, api_key, corridor)

        # Optionally, display the first few rows of the combined DataFrame
        print(auction_data.head())  # Display the data for verification
else:
    print("Using mock data. No API calls will be made.")


  combined_df = pd.concat(dfs, ignore_index=True)


         date  productHour  de-ch_auctionPrice  de-ch_requestedCapacity  \
0  2016-01-02  00:00-01:00               14.88                   1389.0   
1  2016-01-02  01:00-02:00               13.96                   1388.0   
2  2016-01-02  02:00-03:00               13.04                   1388.0   
3  2016-01-02  03:00-04:00               13.54                   1388.0   
4  2016-01-02  04:00-05:00               13.73                   1398.0   

   de-ch_offeredCapacity  
0                  313.0  
1                  313.0  
2                  313.0  
3                  313.0  
4                  313.0  


  combined_df = pd.concat(dfs, ignore_index=True)


         date  productHour  ch-de_auctionPrice  ch-de_requestedCapacity  \
0  2016-01-02  00:00-01:00                 0.0                   3016.0   
1  2016-01-02  01:00-02:00                 0.0                   3015.0   
2  2016-01-02  02:00-03:00                 0.0                   3015.0   
3  2016-01-02  03:00-04:00                 0.0                   3015.0   
4  2016-01-02  04:00-05:00                 0.0                   3015.0   

   ch-de_offeredCapacity  
0                 4273.0  
1                 4273.0  
2                 4273.0  
3                 4273.0  
4                 4273.0  


### Load Day-Ahead Prices and Combine Data
This step:
- Loads price series for each bidding zone (CH, DE-LU, DE-AT-LU).
- Ensures timestamps from all zones are unified.
- Combines auction and price data into a single DataFrame.

Notice that Missing dataframes is expected! Could be that the data does not exist for that period, which is e.g. the case for DE-AT-LU after 2018. 

In [None]:
# Define a range of years from start_year to end_year (inclusive)
years = range(start_year, end_year + 1)

# Dictionary to store price series for each zone
price_series_dict = {}

# Load price series for each bidding zone
for zone in bidding_zones:
    price_series_dict[zone] = load_price_series(zone, years)

# Get unified set of timestamps from all zones
all_timestamps = sorted(set().union(*(
    series.index for series in price_series_dict.values()
)))

# Create and populate final DataFrame
df_final = pd.DataFrame(index=all_timestamps)
for zone, series in price_series_dict.items():
    df_final[zone] = series

# Loop through each corridor and add auction data to the final DataFrame
for corridor in corridors:
    # Load auction data for the specific corridor
    auction_data = pd.read_csv(f"../data/external/jao_{corridor}.csv")
    # Create a datetime column by combining the date and the start of the productHour range
    auction_data['datetime'] = pd.to_datetime(
        auction_data['date'] + ' ' + auction_data['productHour'].str.split('-').str[0]
    )
    # Drop duplicate rows based on the datetime column
    auction_data = auction_data.drop_duplicates(subset=['datetime'])
    # Set the datetime column as the index
    auction_data = auction_data.set_index('datetime')

    # Add auction-related columns to the final DataFrame
    df_final[f"{corridor}_auctionPrice"] = auction_data[f"{corridor}_auctionPrice"]
    df_final[f"{corridor}_requestedCapacity"] = auction_data[f"{corridor}_requestedCapacity"]
    df_final[f"{corridor}_offeredCapacity"] = auction_data[f"{corridor}_offeredCapacity"]

# Sort the final DataFrame by index (datetime)
df_final.sort_index(inplace=True)

#create folder 
processed_folder = '../data/processed'
os.makedirs(processed_folder, exist_ok=True)

# Save the final merged DataFrame to a CSV file for further analysis
df_final.to_csv('../data/processed/merged_data.csv')

# Display the final DataFrame
df_final

Missing ../data/raw/DE-LU_Day-ahead Prices_2016.csv: [Errno 2] No such file or directory: '../data/raw/DE-LU_Day-ahead Prices_2016.csv'
Missing ../data/raw/DE-LU_Day-ahead Prices_2017.csv: [Errno 2] No such file or directory: '../data/raw/DE-LU_Day-ahead Prices_2017.csv'
Missing ../data/raw/DE-AT-LU_Day-ahead Prices_2019.csv: [Errno 2] No such file or directory: '../data/raw/DE-AT-LU_Day-ahead Prices_2019.csv'
Missing ../data/raw/DE-AT-LU_Day-ahead Prices_2020.csv: [Errno 2] No such file or directory: '../data/raw/DE-AT-LU_Day-ahead Prices_2020.csv'
Missing ../data/raw/DE-AT-LU_Day-ahead Prices_2021.csv: [Errno 2] No such file or directory: '../data/raw/DE-AT-LU_Day-ahead Prices_2021.csv'
Missing ../data/raw/DE-AT-LU_Day-ahead Prices_2022.csv: [Errno 2] No such file or directory: '../data/raw/DE-AT-LU_Day-ahead Prices_2022.csv'
Missing ../data/raw/DE-AT-LU_Day-ahead Prices_2023.csv: [Errno 2] No such file or directory: '../data/raw/DE-AT-LU_Day-ahead Prices_2023.csv'


Unnamed: 0,CH,DE-LU,DE-AT-LU,de-ch_auctionPrice,de-ch_requestedCapacity,de-ch_offeredCapacity,ch-de_auctionPrice,ch-de_requestedCapacity,ch-de_offeredCapacity
2016-01-01 00:00:00,41.09,,23.86,9.26,1663.0,392.0,0.00,2542.0,4194.0
2016-01-01 01:00:00,40.16,,22.39,8.50,1678.0,392.0,0.00,2542.0,4194.0
2016-01-01 02:00:00,36.03,,20.59,8.87,1678.0,392.0,0.00,2542.0,4194.0
2016-01-01 03:00:00,33.59,,16.81,7.50,1678.0,392.0,0.00,2542.0,4194.0
2016-01-01 04:00:00,32.92,,17.41,10.02,1668.0,392.0,0.00,2542.0,4194.0
...,...,...,...,...,...,...,...,...,...
2023-12-31 19:00:00,23.40,9.0,,2.34,3552.0,530.0,0.05,11596.0,4270.0
2023-12-31 20:00:00,20.65,7.95,,2.34,3701.0,530.0,0.00,11381.0,4270.0
2023-12-31 21:00:00,9.58,6.0,,3.45,3701.0,530.0,0.00,11381.0,4270.0
2023-12-31 22:00:00,16.78,10.68,,3.54,3681.0,530.0,0.00,11381.0,4270.0
