## <u>2. Datenaufbereitung Bureau

In diesem Dokument werden für die Untersuchung unwichtige Variablen aus dem Bureau-Datensatz gelöscht. Dabei werden zuerst die kategorischen Variablen (nominale & ordinale) betrachtet, um anschließend die metrischen Variablen zu betrachten. Im Gegensatz zu den Application-Daten liegt in dem Bureau-Datensatz eine 1:N-Beziehung vor, da ein Kreditnehmer in der Vergangenheit mehrere Kredite gehabt haben kann. Dies erfordert die Gruppierung der historischen Daten.

*Vorgehensweise kategorische Variablen:*
- Gruppierung der Variablen
- Löschung von Variablen mit mehr als 60% fehlenden Daten
- Löschung von nominalen Variablen mit weniger als 5pP relativer Anteilsdifferenz zwischen Paybacks und Defaults
- Bildung von Korrelationsclustern (Kontingenzkoeffizent bei nominalen Daten)
- Löschung von Variablen ohne kausalen Einfluss auf die Kreditwürdigkeitsbestimmung

*Vorgehensweise metrischer Variablen:*
- Gruppierung der Variablen
- Löschung von Variablen mit mehr als 60% fehlenden Daten
- Bildung von Korrelationsclustern (Pearson-Korrelationskoeffizient)
- Löschung von Variablen ohne kausalen Einfluss auf die Kreditwürdigkeitsbestimmung

## Initialisierung

In [1]:
from pathlib import Path
from scipy import stats

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns

np.set_printoptions(suppress=True)

pd.options.display.max_columns = None
pd.options.display.max_rows = None
pd.options.display.max_colwidth = None

from sklearn.linear_model import LogisticRegression

from IPython.display import display, Markdown

In [2]:
path1 = Path(r"A:\Workspace\Python\Masterarbeit\Kaggle Home Credit Datensatz")
path2 = Path(r"C:\Users\rober\Documents\Workspace\Python\Masterarbeit\Kaggle Home Credit Datensatz")

if path1.is_dir():
    DATASET_DIR = path1
else:
    DATASET_DIR = path2

In [3]:
app_train = pd.read_csv(DATASET_DIR / "application_train.csv")
bureau = pd.read_csv(DATASET_DIR / "bureau.csv")
description = pd.read_csv(DATASET_DIR / "HomeCredit_columns_description.csv", encoding="latin", index_col=0)

In [4]:
des = description.loc[description['Table']=="bureau.csv", "Row":"Special"]

In [5]:
bureau = pd.merge(bureau, app_train[["SK_ID_CURR","TARGET"]] ,on="SK_ID_CURR")

In [6]:
bureau.head()

Unnamed: 0,SK_ID_CURR,SK_ID_BUREAU,CREDIT_ACTIVE,CREDIT_CURRENCY,DAYS_CREDIT,CREDIT_DAY_OVERDUE,DAYS_CREDIT_ENDDATE,DAYS_ENDDATE_FACT,AMT_CREDIT_MAX_OVERDUE,CNT_CREDIT_PROLONG,AMT_CREDIT_SUM,AMT_CREDIT_SUM_DEBT,AMT_CREDIT_SUM_LIMIT,AMT_CREDIT_SUM_OVERDUE,CREDIT_TYPE,DAYS_CREDIT_UPDATE,AMT_ANNUITY,TARGET
0,215354,5714462,Closed,currency 1,-497,0,-153.0,-153.0,,0,91323.0,0.0,,0.0,Consumer credit,-131,,0
1,215354,5714463,Active,currency 1,-208,0,1075.0,,,0,225000.0,171342.0,,0.0,Credit card,-20,,0
2,215354,5714464,Active,currency 1,-203,0,528.0,,,0,464323.5,,,0.0,Consumer credit,-16,,0
3,215354,5714465,Active,currency 1,-203,0,,,,0,90000.0,,,0.0,Credit card,-16,,0
4,215354,5714466,Active,currency 1,-629,0,1197.0,,77674.5,0,2700000.0,,,0.0,Consumer credit,-21,,0


In [7]:
# Spalten die innerhalb der Aufbereitung nicht verändert werden können
skip = ["TARGET", "SK_ID_CURR", "SK_ID_BUREAU"]

In [8]:
# nominale und metrische Spalten
n_heads = [element for element in bureau.columns if bureau[element].dtype.name == "object"]
m_heads = [element for element in bureau.columns if bureau[element].dtype.name != "object"]

## <u>kategorische Variablen

In [9]:
df = bureau[["SK_ID_BUREAU", "SK_ID_CURR", "TARGET"] + n_heads].copy()

In [10]:
des[des["Row"] == "FLAG_EMP_PHONE"]

Unnamed: 0,Row,Description,Special


In [11]:
df.head()

Unnamed: 0,SK_ID_BUREAU,SK_ID_CURR,TARGET,CREDIT_ACTIVE,CREDIT_CURRENCY,CREDIT_TYPE
0,5714462,215354,0,Closed,currency 1,Consumer credit
1,5714463,215354,0,Active,currency 1,Credit card
2,5714464,215354,0,Active,currency 1,Consumer credit
3,5714465,215354,0,Active,currency 1,Credit card
4,5714466,215354,0,Active,currency 1,Consumer credit


# Informationsgehalt:
- Anzahl Kredite je Kreditnehmer
- Kreditstatus der Kredite
- Kredittyp

In [12]:
# Anzahl Kredite

cnt = df[["SK_ID_CURR", "SK_ID_BUREAU"]].groupby(by=["SK_ID_CURR"]).count()
cnt.columns = ["CNT_BURAEU"]
cnt.head()

Unnamed: 0_level_0,CNT_BURAEU
SK_ID_CURR,Unnamed: 1_level_1
100002,8
100003,4
100004,2
100007,1
100008,3


Kreditnehmer 100002 hat in seiner Kredithistorie 8 Kredite bei externen Kreditgebern in Anspruch genommen

In [13]:
# Kreditstatus

status = df[["SK_ID_CURR", "CREDIT_ACTIVE"]].groupby(by=["SK_ID_CURR", "CREDIT_ACTIVE"]).size().unstack(fill_value=0)
status.head()

CREDIT_ACTIVE,Active,Bad debt,Closed,Sold
SK_ID_CURR,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
100002,2,0,6,0
100003,1,0,3,0
100004,0,0,2,0
100007,0,0,1,0
100008,1,0,2,0


Zum Zeitpunkt der Kreditvergabe im Application-Datensatz hat Kreditnehmer 100002 2 aktive Kredite bei externen Kreditnehmern.

In [14]:
# Kredittyp

typ = df[["SK_ID_CURR", "CREDIT_TYPE"]].groupby(by=["SK_ID_CURR", "CREDIT_TYPE"]).size().unstack(fill_value=0)
typ.head()

CREDIT_TYPE,Another type of loan,Car loan,Cash loan (non-earmarked),Consumer credit,Credit card,Interbank credit,Loan for business development,Loan for purchase of shares (margin lending),Loan for the purchase of equipment,Loan for working capital replenishment,Microloan,Mobile operator loan,Mortgage,Real estate loan,Unknown type of loan
SK_ID_CURR,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
100002,0,0,0,4,4,0,0,0,0,0,0,0,0,0,0
100003,0,0,0,2,2,0,0,0,0,0,0,0,0,0,0
100004,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0
100007,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
100008,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0


Kreditnehmer 100002 hat in seiner 4 Konsumentenkredite und 4 Kreditkarten in Anspruch genommen.

In [15]:
result = pd.DataFrame(index=bureau.SK_ID_CURR.unique())
result.index.name = "SK_ID_CURR"

In [16]:
result = pd.merge(result, cnt, how="left", left_index=True, right_index=True)
result = pd.merge(result, status, how="left", left_index=True, right_index=True)
result = pd.merge(result, typ, how="left", left_index=True, right_index=True)

In [17]:
df = result
df.head()

Unnamed: 0_level_0,CNT_BURAEU,Active,Bad debt,Closed,Sold,Another type of loan,Car loan,Cash loan (non-earmarked),Consumer credit,Credit card,Interbank credit,Loan for business development,Loan for purchase of shares (margin lending),Loan for the purchase of equipment,Loan for working capital replenishment,Microloan,Mobile operator loan,Mortgage,Real estate loan,Unknown type of loan
SK_ID_CURR,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
215354,11,6,0,5,0,0,1,0,7,3,0,0,0,0,0,0,0,0,0,0
162297,6,3,0,3,0,0,0,0,3,2,0,0,0,0,0,0,0,1,0,0
402440,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
238881,8,3,0,5,0,0,0,0,5,3,0,0,0,0,0,0,0,0,0,0
222183,8,5,0,3,0,0,1,0,4,3,0,0,0,0,0,0,0,0,0,0


In [18]:
target = app_train[["SK_ID_CURR", "TARGET"]]
target = target.set_index("SK_ID_CURR")

In [19]:
df = pd.merge(df, target, left_index=True, right_index=True)

### Löschung der Spalten mit weniger als 40% ausgefüllten Daten

In [20]:
result = {
          "header":[],
          "rate":[],
          "des":[]
         }
for key in df.keys():
    if key in skip:
        continue
    rate = df[key].isna().sum() / len(df[key]) * 100
    if rate > 60:
        result["header"].append(key)
        result["rate"].append(rate)
        result["des"].append(des[des["Row"] == key]["Description"])

result = pd.DataFrame(result)
result

Unnamed: 0,header,rate,des


Es gibt keine kategorischen Variablen mit mehr als 60% fehlenden Daten

In [21]:
df = df.drop(result.header.values, axis=1)

### Unterscheidbarkeit von mindestens 5pP einer Kategorie

In [22]:
ID_Payback = df[df["TARGET"] == 0].index.values
ID_Default = df[df["TARGET"] == 1].index.values

In [23]:
payback = df.loc[ID_Payback]
default = df.loc[ID_Default]

In [24]:
result = {
    "head" : [],
    "cat" : [],
    "payback" : [],
    "default" : [],
    "diff" : []
}

for head in df.columns.values:
    df1 = payback[head].value_counts().rename_axis(head).reset_index(name='payback')
    df2 = default[head].value_counts().rename_axis(head).reset_index(name='default')
    
    df1["payback"] = df1["payback"]/df1["payback"].sum()*100
    df2["default"] = df2["default"]/df2["default"].sum()*100
    
    df_ = df1.merge(df2, how="outer", on=head)
    
    df_["diff"] = (df_["default"]-df_["payback"])
    
    df_ = df_.sort_values("diff", ascending=False)
    
    for diff in df_["diff"]:
        if np.isnan(diff):
            continue
        if diff > 5 or diff < -5:
            row = df_.loc[df_["diff"] == diff]
            cat = row[head][row[head].index[0]]
            
            result["head"].append(head)
            result["cat"].append(cat)
            result["payback"].append(round(row["payback"].values[0],2))
            result["default"].append(round(row["default"].values[0],2))
            result["diff"].append(round(diff,2))

result = pd.DataFrame(result)
result.sort_values("diff", ascending=False)

Unnamed: 0,head,cat,payback,default,diff
1,Closed,0,12.19,18.14,5.95
0,Active,0,17.98,12.87,-5.11


Aus den nominalen Variablen kristallisieren sich zwei Unterscheidungsmerkmale heraus. Defaults sind häufiger Kreditnehmer ohne Kredithistorie bei externen Kreditgebern. Darüber hinaus haben sie paralell zum Kreditbeginn häufiger einen oder mehr externe Kredite, die sie paralell abbezahlen.
Paybacks hingegen haben häufiger keinen aktiven Kredit bei externen Kreditgebern den sie paralell abbezahlen. Zusätzlich haben sie bei externen Kreditgebern eine Kredithistorie.

In [25]:
remove = [head for head in df.columns.values if head not in list(result["head"].unique()) + skip]

In [26]:
remove

['CNT_BURAEU',
 'Bad debt',
 'Sold',
 'Another type of loan',
 'Car loan',
 'Cash loan (non-earmarked)',
 'Consumer credit',
 'Credit card',
 'Interbank credit',
 'Loan for business development',
 'Loan for purchase of shares (margin lending)',
 'Loan for the purchase of equipment',
 'Loan for working capital replenishment',
 'Microloan',
 'Mobile operator loan',
 'Mortgage',
 'Real estate loan',
 'Unknown type of loan']

In [27]:
df = df.drop(remove, axis=1)

In [28]:
df = df.drop(["TARGET"], axis=1)

In [29]:
df = df.add_prefix("B_")

In [30]:
df.head()

Unnamed: 0_level_0,B_Active,B_Closed
SK_ID_CURR,Unnamed: 1_level_1,Unnamed: 2_level_1
215354,6,5
162297,3,3
402440,1,0
238881,3,5
222183,5,3


In [31]:
des[des["Row"] == "CREDIT_ACTIVE"]

Unnamed: 0,Row,Description,Special
127,CREDIT_ACTIVE,Status of the Credit Bureau (CB) reported credits,


### Speichern der kategorischen Werte

In [32]:
cats = df

## <u>metrische Variablen

In [33]:
df = bureau[m_heads].copy()

In [34]:
df[df["SK_ID_CURR"] == 298038].head()

Unnamed: 0,SK_ID_CURR,SK_ID_BUREAU,DAYS_CREDIT,CREDIT_DAY_OVERDUE,DAYS_CREDIT_ENDDATE,DAYS_ENDDATE_FACT,AMT_CREDIT_MAX_OVERDUE,CNT_CREDIT_PROLONG,AMT_CREDIT_SUM,AMT_CREDIT_SUM_DEBT,AMT_CREDIT_SUM_LIMIT,AMT_CREDIT_SUM_OVERDUE,DAYS_CREDIT_UPDATE,AMT_ANNUITY,TARGET
958,298038,5715506,-573,0,-24.0,-406.0,,0,540000.0,0.0,0.0,0.0,-404,,0
959,298038,5715507,-291,0,785.0,,,0,441000.0,415494.0,0.0,0.0,-19,,0
960,298038,5715508,-320,0,45.0,,,0,675000.0,197154.0,,0.0,-39,26550.0,0
961,298038,5715509,-760,0,4778.0,-451.0,0.0,0,3600000.0,,,0.0,-451,67995.0,0
962,298038,5715510,-586,0,-339.0,-553.0,0.0,0,40791.33,0.0,0.0,0.0,-553,67995.0,0


In [35]:
df["DAYS_CREDIT_ENDDATE"].isna().sum()

89098

# Informationsgehalt:
(Es werden nur Kredite betrachtet, die maximal ein halbes Jahr in der Vergangenheit liegen.)
- Summen: CREDIT_DAY_OVERDUE (überzogene Tage), 
- Durchschnitt: DAYS_CREDIT_ENDDATE (verbleibende Laufzeit), AMT_CREDIT_SUM (Kredithöhe), AMT_CREDIT_SUM_DEBT (Schuldenhöhe), AMT_CREDIT_SUM_OVERDUE (überzogener Betrag), AMT_ANNUITY (Zahlungsbeitrag pro Jahr), DEBT_PER_LIMIT (Verschuldungsquote)

In [36]:
df = df[df["DAYS_CREDIT_ENDDATE"] > -180]

In [37]:
result = pd.DataFrame(index=bureau.SK_ID_CURR.unique())
result.index.name = "SK_ID_CURR"

In [38]:
# Durchschnitte
mean_heads = ["DAYS_CREDIT_ENDDATE"]

for head in mean_heads:
    A = df[["SK_ID_CURR", head]]
    A = A.groupby(by=["SK_ID_CURR"]).mean()
    result = pd.merge(result, A, how="left", left_index=True, right_index=True)

In [39]:
# Summe
sum_heads = ["CREDIT_DAY_OVERDUE", "AMT_CREDIT_SUM", "AMT_CREDIT_SUM_DEBT", "AMT_CREDIT_SUM_OVERDUE", "AMT_ANNUITY"]

for head in sum_heads:
    A = df[["SK_ID_CURR", head]]
    A = A.fillna(0)
    A = A.groupby(by=["SK_ID_CURR"]).sum()
    result = pd.merge(result, A, how="left", left_index=True, right_index=True)

In [40]:
df = result
df.head()

Unnamed: 0_level_0,DAYS_CREDIT_ENDDATE,CREDIT_DAY_OVERDUE,AMT_CREDIT_SUM,AMT_CREDIT_SUM_DEBT,AMT_CREDIT_SUM_OVERDUE,AMT_ANNUITY
SK_ID_CURR,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
215354,5031.0,0.0,3702750.3,284463.18,0.0,0.0
162297,5261.0,0.0,7033500.0,0.0,0.0,0.0
402440,269.0,0.0,89910.0,76905.0,0.0,0.0
238881,821.5,0.0,174684.06,8131.5,0.0,0.0
222183,929.5,0.0,5862393.0,1185081.84,0.0,0.0


### Löschung der Spalten mit weniger als 40% ausgefüllten Daten

In [41]:
result = {
          "header":[],
          "rate":[],
          "des":[]
         }
for key in df.keys():
    if key in skip:
        continue
    rate = df[key].isna().sum() / len(df[key]) * 100
    if rate > 60:
        result["header"].append(key)
        result["rate"].append(rate)
        result["des"].append(des[des["Row"] == key]["Description"])

result = pd.DataFrame(result)
result

Unnamed: 0,header,rate,des


In [42]:
df = df.drop(result.header.values, axis=1)

Es gibt keine Variablen mit mehr als 60% fehlenden Daten.

### Bildung von Korrelationsclustern

In [43]:
c = df.corr(method='pearson') * 100

In [44]:
c

Unnamed: 0,DAYS_CREDIT_ENDDATE,CREDIT_DAY_OVERDUE,AMT_CREDIT_SUM,AMT_CREDIT_SUM_DEBT,AMT_CREDIT_SUM_OVERDUE,AMT_ANNUITY
DAYS_CREDIT_ENDDATE,100.0,0.362271,0.08388,1.155067,0.349455,-0.069126
CREDIT_DAY_OVERDUE,0.362271,100.0,0.538082,0.430028,23.735029,-0.040109
AMT_CREDIT_SUM,0.08388,0.538082,100.0,69.713015,1.380309,8.592225
AMT_CREDIT_SUM_DEBT,1.155067,0.430028,69.713015,100.0,1.415717,4.300374
AMT_CREDIT_SUM_OVERDUE,0.349455,23.735029,1.380309,1.415717,100.0,0.080189
AMT_ANNUITY,-0.069126,-0.040109,8.592225,4.300374,0.080189,100.0


In [45]:
families = []
for i, row in c.iterrows():
    r = row[row > 80]
    if len(r) > 1 and set(r.index) not in families:
        families.append(set(r.index))

for A in families:
    for B in families:
        if A == B:
            continue
        if A.issubset(B):
            families.remove(A)
families

[]

In [46]:
result = {
          "family":[],
          "head":[],
          "r2":[],
          "na":[],
          "rate":[]
         }

for i, family in enumerate(families):
    headers = list(family)
    
    result["family"].append("")
    result["head"].append("")
    result["r2"].append("")
    result["na"].append("")
    result["rate"].append("")
    
    for head in headers:
        d = df[["TARGET"] + [head]]
        na = d[head].isna().sum() / len(d) * 100
        d = d.dropna()
        x = d[[head]]
        y = d[["TARGET"]]
        model = LogisticRegression().fit(x, y.values.ravel())
        r2 = round(model.score(x,y),5)
        
        result["family"].append(i)
        result["head"].append(head)
        result["r2"].append(round(r2,5))
        result["na"].append(round(na,2))
        result["rate"].append(r2/na)
    
result = pd.DataFrame(result)
result       

Unnamed: 0,family,head,r2,na,rate


In [47]:
df.head()

Unnamed: 0_level_0,DAYS_CREDIT_ENDDATE,CREDIT_DAY_OVERDUE,AMT_CREDIT_SUM,AMT_CREDIT_SUM_DEBT,AMT_CREDIT_SUM_OVERDUE,AMT_ANNUITY
SK_ID_CURR,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
215354,5031.0,0.0,3702750.3,284463.18,0.0,0.0
162297,5261.0,0.0,7033500.0,0.0,0.0,0.0
402440,269.0,0.0,89910.0,76905.0,0.0,0.0
238881,821.5,0.0,174684.06,8131.5,0.0,0.0
222183,929.5,0.0,5862393.0,1185081.84,0.0,0.0


Es bilden sich keine Korrelationscluster

### Betrachtung der Kausalität

In [48]:
result = {
    "head":[],
    "des":[]
}

for head in df.columns.values:
    if head in skip:
        continue
    result["head"].append(head)
    result["des"].append(des[des["Row"] == head]["Description"])
    
result = pd.DataFrame(result)
result

Unnamed: 0,head,des
0,DAYS_CREDIT_ENDDATE,"131 Remaining duration of CB credit (in days) at the time of application in Home Credit Name: Description, dtype: object"
1,CREDIT_DAY_OVERDUE,"130 Number of days past due on CB credit at the time of application for related loan in our sample Name: Description, dtype: object"
2,AMT_CREDIT_SUM,"135 Current credit amount for the Credit Bureau credit Name: Description, dtype: object"
3,AMT_CREDIT_SUM_DEBT,"136 Current debt on Credit Bureau credit Name: Description, dtype: object"
4,AMT_CREDIT_SUM_OVERDUE,"138 Current amount overdue on Credit Bureau credit Name: Description, dtype: object"
5,AMT_ANNUITY,"141 Annuity of the Credit Bureau credit Name: Description, dtype: object"


Es kann nicht angenommen werden, dass eine Variable nicht auf die Kreditfähigkeit wirkt.

### Ergebnis

In [49]:
df = df.add_prefix("B_")

### Speichern der metrischen Werte

Zusammenführung der Kreditanzahl und der nominalen Variablen

In [50]:
cats = pd.merge(cnt, cats, left_index=True, right_index=True)

Zusammenführen der metrischen und kategorischen Variablen

In [51]:
df = pd.merge(cats, df, left_index=True, right_index=True)

In [52]:
df.head()

Unnamed: 0_level_0,CNT_BURAEU,B_Active,B_Closed,B_DAYS_CREDIT_ENDDATE,B_CREDIT_DAY_OVERDUE,B_AMT_CREDIT_SUM,B_AMT_CREDIT_SUM_DEBT,B_AMT_CREDIT_SUM_OVERDUE,B_AMT_ANNUITY
SK_ID_CURR,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
100002,8,2,6,309.0,0.0,638235.0,245781.0,0.0,0.0
100003,4,1,3,1216.0,0.0,810000.0,0.0,0.0,0.0
100004,2,0,2,,,,,,
100007,1,0,1,,,,,,
100008,3,1,2,471.0,0.0,267606.0,240057.0,0.0,0.0


In [53]:
df.to_csv(DATASET_DIR / "2. Datenaufbereitung" / "bureau.csv")