# Data Processing

This notebook automates the process of analyzing cross-border electricity trading data between Switzerland and Germany. It does so by:

- Fetching auction data from the Joint Allocation Office (JAO) API.
- Loading day-ahead price data for bidding zones (e.g., CH, DE-LU).
- Combining data into a unified DataFrame for further analysis.
- Ensuring the final dataset is sorted by timestamp.
- Saving the processed data for further analysis.

### Key Data Sources:
1. JAO API: Auction prices and capacities for cross-border electricity trading.
2. Entso-e Transparency Platform: Day-ahead market prices for relevant bidding zones.

### Outputs:
The processed data is saved as a CSV file (`merged_data.csv`) in the `data/processed` folder for downstream analysis.

### Import Required Packages
The necessary Python packages for data processing and API interaction are imported in this step.

In [None]:
import sys; sys.path.append("..")

import pandas as pd
import requests
import ast
from datetime import datetime, timedelta
from calendar import monthrange
import os

### Import Project-Specific Functions
These include custom functions for fetching auction data, loading price series, and accessing API keys stored in configuration files.

In [None]:
# Import the fetch_auction_data function
from src.datafeed.upstream import fetch_auction_data
from src.datafeed.downstream import load_price_series
from src.config import KEY

### Define Constants
- API key for accessing JAO data.
- Bidding zones (e.g., CH, DE-LU, DE-AT-LU) to analyze.
- Cross-border trading corridors for analysis.
- Start and end years for data fetching.
Make sure the `KEY` is defined in the `src/config.py` file as described in the project's README.

In [None]:
# Define your API key (make a file in src/config.py with KEY = 'your_api_key' provided by JAO)
api_key = KEY

# Set the start and end years
start_year = 2016
end_year = 2023

# Define the bidding zones and years to analyze. This data needs to be downloaded from Entso-e Transparency Platform (see the README.md for more information)
bidding_zones = ['CH', 'DE-LU', 'DE-AT-LU']

# cross-border trading corridors (which will be downloaded from JAO, provided the API key in config.py)
corridors = ["de-ch", "ch-de"] # can be exchanged with different pairs. visit jao.eu for more information

### Fetch Auction Data from JAO API
This step downloads cross-border electricity trading auction data for the specified corridors and years.

In [None]:
for corridor in corridors:

    # Call the function to fetch and process the data
    auction_data = fetch_auction_data(start_year, end_year, api_key, corridor)

    # Optionally, display the first few rows of the combined DataFrame
    auction_data.head()


### Load Day-Ahead Prices and Combine Data
This step:
- Loads price series for each bidding zone (CH, DE-LU, DE-AT-LU).
- Ensures timestamps from all zones are unified.
- Combines auction and price data into a single DataFrame.

In [None]:
# Define a range of years from start_year to end_year (inclusive)
years = range(start_year, end_year + 1)

# Dictionary to store price series for each zone
price_series_dict = {}

# Load price series for each bidding zone
for zone in bidding_zones:
    price_series_dict[zone] = load_price_series(zone, years)

# Get unified set of timestamps from all zones
all_timestamps = sorted(set().union(*(
    series.index for series in price_series_dict.values()
)))

# Create and populate final DataFrame
df_final = pd.DataFrame(index=all_timestamps)
for zone, series in price_series_dict.items():
    df_final[zone] = series

# Loop through each corridor and add auction data to the final DataFrame
for corridor in corridors:
    # Load auction data for the specific corridor
    auction_data = pd.read_csv(f"../data/external/jao_{corridor}.csv")
    # Create a datetime column by combining the date and the start of the productHour range
    auction_data['datetime'] = pd.to_datetime(
        auction_data['date'] + ' ' + auction_data['productHour'].str.split('-').str[0]
    )
    # Drop duplicate rows based on the datetime column
    auction_data = auction_data.drop_duplicates(subset=['datetime'])
    # Set the datetime column as the index
    auction_data = auction_data.set_index('datetime')

    # Add auction-related columns to the final DataFrame
    df_final[f"{corridor}_auctionPrice"] = auction_data[f"{corridor}_auctionPrice"]
    df_final[f"{corridor}_requestedCapacity"] = auction_data[f"{corridor}_requestedCapacity"]
    df_final[f"{corridor}_offeredCapacity"] = auction_data[f"{corridor}_offeredCapacity"]

# Sort the final DataFrame by index (datetime)
df_final.sort_index(inplace=True)

#create folder 
processed_folder = '../data/processed'
os.makedirs(processed_folder, exist_ok=True)

# Save the final merged DataFrame to a CSV file for further analysis
df_final.to_csv('../data/processed/merged_data.csv')

# Display the final DataFrame
df_final