#### Reduce bridge data to relevant columns

This notebook is used to restrict the dataset to particular columns that we want to look at in the following analysis. We reduce to the following columns: 
- `Bauwerksname`
- `Baujahr Überbau`
- `Baujahr Unterbau`
- `Zustandsnote`
- `Baustoff Überbau`
- `Baustoff Klasse`
- `Länge (m)`
- `Zugeordneter Sachverhalt vereinfacht`
- `Traglastindex`
- `Teilbauwerksstadium`
- `Teilbauwerksart`
- `Kreis`
- `Bundeslandname`
- `x2`
- `y2`

Furthermore, this script changes `,` to `.` for the columns `Zustandsnote`, `Länge (m)`, `X` and `Y`. It also maps the `Traglastindex` to numbers, as follows: 
- `I` -> 1
- `II` -> 2
- `III` -> 3
- `IV` -> 4
- `V` -> 5
- anything else -> 0


In [22]:
# load libraries
import pandas as pd

In [55]:
# read original data
data = pd.read_csv('../data/bridge_statistic_germany.csv', sep=';')
#print(data.head())

# select relevant columns
data = data[['Bauwerksname', 'Baujahr Überbau', 'Baujahr Unterbau', 'Baustoff Überbau', 'Baustoffklasse', 'Länge (m)', 
             'Zugeordneter Sachverhalt vereinfacht', 'Zustandsnote', 'Traglastindex', 'Teilbauwerksstadium', 
             'Teilbauwerksart', 'Kreis', 'Bundeslandname', 'x2', 'y2']]

# rename 'x2' and 'y2' columns
data = data.rename(columns={'x2': 'X'})
data = data.rename(columns={'y2': 'Y'})

#print(data.head())

In [56]:
# columns with missing values
cols_with_missing = [col for col in data.columns if data[col].isnull().any()]
print("Columns with missing values:", cols_with_missing)

# print counts of unique values in Traglastindex
print(data['Traglastindex'].value_counts())

Columns with missing values: ['Baujahr Unterbau', 'Traglastindex', 'Kreis', 'Bundeslandname', 'X', 'Y']
Traglastindex
II     17546
III    12941
I      10996
IV      3521
V       2384
-       2367
kZN     1533
GR       817
*        357
>GR       95
Name: count, dtype: int64


In [57]:
data_modified = data.copy()

# Baujahr Überbau and Baujahr Unterbau -> Integer, no decimals
data_modified['Baujahr Überbau'] = data_modified['Baujahr Überbau'].astype('Int64')
data_modified['Baujahr Unterbau'] = data_modified['Baujahr Unterbau'].astype('Int64')

# Länge (m), Zustandsnote, X, Y -> "," to "."
data_modified['Länge (m)'] = data_modified['Länge (m)'].str.replace(',', '.').astype(float)
data_modified['Zustandsnote'] = data_modified['Zustandsnote'].str.replace(',', '.').astype(float)
data_modified['X'] = data_modified['X'].str.replace(',', '.').astype(float)
data_modified['Y'] = data_modified['Y'].str.replace(',', '.').astype(float)

# Traglastindex: I -> 1, II -> 2, III -> 3, IV -> 4, V -> 5, Rest -> 0
data_modified['Traglastindex'] = data_modified['Traglastindex'].map({'I': 1, 'II': 2, 'III': 3, 'IV': 4, 'V': 5}).fillna(0).astype(int)

#print(data_modified.head())

In [58]:
# save reduced data
data_modified.to_csv('../data/reduced_bridge_statistic_germany.csv', sep=';')