## Preprocessing of radiation data
In this Jupyter notebook we preprocess the data on global radiation used as a descriptive feature. Two data sets form the basis for this feature
    - A dataset on the average "yearly average global radiation on an optimally inclined surface [$\frac{W}{m^2}$]", a set of the "SARAH Solar Radiation Data" provided by the Joint Research Centre of the European Commission (Link: https://joint-research-centre.ec.europa.eu/pvgis-photovoltaic-geographical-information-system/pvgis-data-download/sarah-solar-radiation-data_en).
    - A registry of all municipalities (GV-ISys) in 2019 provided by the Federal Statistical Office and the Statistical Regional Offices (Link: https://www.destatis.de/DE/Themen/Laender-Regionen/Regionales/Gemeindeverzeichnis/Administrativ/Archiv/GVAuszugJ/31122019_Auszug_GV.html).
The ascii grid file provides the global radiation on a resolution of 0.05° for a large area including Germany. Longitudinal and latitudinal coordinates characterize (the corners of) the grid cells.
The registry includes longitude and latitude of the municipalities. We can use these to find the corresponding values of global radiation for all municipalities. To obtain data of global radiation on the level of municipality associations, we take the average the global radiation of all municipalities belonging to a municipality association.

In [1]:
import os
import pandas as pd
import numpy as np

os.chdir("../../..")
from xai_green_tech_adoption.utils.utils import *

pd.options.mode.chained_assignment = None

#### 1. Read registry of municipalities and deduce ags and rs

In [2]:
# import municipality registry
df_m_registry_raw = pd.read_csv(
    "data/raw_data/descriptive_features/Data_Radiation/Verzeichnis_selbstaendige_Gemeinden_2019.csv",
    sep=";",
    skiprows=3,
    usecols=["Land", "RB", "Kreis", "VB", "Gem", "Längengrad", "Breitengrad"],
    dtype={"Land": "str", "RB": "str", "Kreis": "str", "VB": "str", "Gem": "str"},
    decimal=",",
)
display(df_m_registry_raw.head(10))
df_m_registry_raw.rename(
    {
        "Land": col_state_code,
        "RB": col_nuts2_code,
        "Kreis": col_county_code,
        "VB": col_ma_code,
        "Gem": col_m_code,
        "Längengrad": col_longi,
        "Breitengrad": col_lati,
    },
    inplace=True,
    axis=1,
)

Unnamed: 0,Land,RB,Kreis,VB,Gem,Längengrad,Breitengrad
0,Gebietsstand am 31.12.2019 (Jahr),,,,,Zuordnungsstand am 31.12.2019,
1,,,,,,,
2,01,,,,,,
3,01,0.0,1.0,,,,
4,01,0.0,1.0,0.0,,,
5,01,0.0,1.0,0.0,0.0,943751,54.78252
6,01,0.0,2.0,,,,
7,01,0.0,2.0,0.0,,,
8,01,0.0,2.0,0.0,0.0,1013727,54.321775
9,01,0.0,3.0,,,,


In [3]:
df_m_registry_raw = df_m_registry_raw[2:]
df_m_registry_raw = df_m_registry_raw[df_m_registry_raw[col_m_code].notna()]
df_m_registry_raw[col_longi] = df_m_registry_raw[col_longi].replace(
    {",": "."}, regex=True
)
df_m_registry_raw[col_longi] = df_m_registry_raw[col_longi].astype(float)
# derive ags of municipalities
df_m_registry_raw[col_id_m] = (
    df_m_registry_raw[col_state_code]
    + df_m_registry_raw[col_nuts2_code]
    + df_m_registry_raw[col_county_code]
    + df_m_registry_raw[col_m_code]
)
df_m_registry_raw[col_id_m] = df_m_registry_raw[col_id_m].astype(int)
# derive rs of ma the m belongs to
df_m_registry_raw[col_id_ma] = (
    df_m_registry_raw[col_state_code]
    + df_m_registry_raw[col_nuts2_code]
    + df_m_registry_raw[col_county_code]
    + df_m_registry_raw[col_ma_code]
)
df_m_registry_raw[col_id_ma] = df_m_registry_raw[col_id_ma].astype(int)

In [4]:
# Check whether all ma's contained in the INKAR dataset are also considered in computations based on registry
df_inkar_ma_id = pd.read_csv(
    "data/intermediate_data/preprocessed_inkar_data.csv", sep=";", usecols=[col_id_ma]
)
assert (
    len(
        [
            ma_missing
            for ma_missing in list(df_inkar_ma_id[col_id_ma])
            if ma_missing not in list(df_m_registry_raw[col_id_ma].unique())
        ]
    )
    == 0
), "Attention: Irradiation data is missing for some municipality associations contained in the INKAR dataset."
# only keep rows if INKAR contains ma
print(
    f"There are {df_m_registry_raw[~df_m_registry_raw[col_id_ma].isin(list(df_inkar_ma_id[col_id_ma]))].shape[0]} municipalities which belong to municipality associations (according to their RS) that are not covered by INKAR. My intention is to only include approximation of the global radiation. Thus, I drop these municipalities."
)
df_m_registry = df_m_registry_raw[
    df_m_registry_raw[col_id_ma].isin(list(df_inkar_ma_id[col_id_ma]))
]

There are 215 municipalities which belong to municipality associations (according to their RS) that are not covered by INKAR. My intention is to only include approximation of the global radiation. Thus, I drop these municipalities.


### 2. Read and match global irradiance

In [5]:
# import grid-data on global irradiance
radiation_grid_raw = np.loadtxt(
    "data/raw_data/descriptive_features/Data_Radiation/gh_opt_year.asc", skiprows=6
)

# meta data of dataset

# range of latitude: 40°S, 62°30' N
# positive values -> N, negative values -> S
latitude_south_end = -40

# range of longitude: 65°W, 128°E
# positive values -> East, negative values -> West
longitude_west_end = -65

# cell grid size: 3', corresponds to 0.05°
cell_grid_size = 0.05

# coordinates in Germany are north and east -> positive values
# example: Munich: 48°N, 11°E

In [6]:
# NaNs are encoded as -9999
radiation_grid_raw = np.where(radiation_grid_raw == -9999, np.nan, radiation_grid_raw)

In [7]:
def determine_radiation(m):
    # The ceil of the fraction yields the column number the coordinate is contained in. Indices starting at zero lead to the subtraction of one.
    longi_col = int(np.ceil((m[col_longi] - longitude_west_end) / cell_grid_size) - 1)
    lati_pos_from_south = np.ceil((m[col_lati] - latitude_south_end) / cell_grid_size)
    lati_row = int(radiation_grid_raw.shape[0] - lati_pos_from_south - 1)
    return radiation_grid_raw[lati_row, longi_col]

In [8]:
df_irradiance_m = df_m_registry.copy()
# get values of global radiation for all municipalities
df_irradiance_m[col_radiation] = df_irradiance_m.apply(determine_radiation, axis=1)
assert (
    df_irradiance_m[df_irradiance_m[col_radiation].isna()].shape[0] == 0
), "Attention: Could not find global irradiance values for all municipalities."

In [9]:
# Take mean of global radiation over all municipalities of an ma to derive radiation values for municipality associations
df_irradiance_ma = pd.DataFrame(
    df_irradiance_m[[col_id_ma, col_radiation]]
    .groupby(by=col_id_ma, as_index=False)
    .mean()
)

In [11]:
df_irradiance_ma.to_csv(
    "data/intermediate_data/ma_global_radiation.csv", index=False, sep=";"
)