# Clean Local Subsidy Data – Gemeente Zwolle

This script cleans local subsidy data for the municipality of Zwolle. The data includes details on home insulation subsidies and measures taken by residents. The script filters and processes this data to prepare it for analysis or reporting.

## About the Raw Data:

The subsidy data was sourced as a CSV file. Upon inspection, the file includes information about various insulation measures applied in residential areas of Zwolle. The following columns are present in the dataset:

- Postal code
- Place
- Bad isolated parts of the house
- Date
- Amount of money
- Measures
- Do-it-yourself?
- Ventilation measure?
- buurt-naam
- Disctrict CBS



## Processing and Output:

The file was saved as a CSV and used in the script below. The final output is generates a CSV file.
## Before Running:

1. Update directories and file names as necessary under the CONFIGURATION section of the script.


In [1]:
import pandas as pd
import os

# -------------------------------
# CONFIGURATION
# -------------------------------
RAW_DATA_DIR = "../raw_data/"
OUTPUT_DIR = "../clean_data/"
os.makedirs(OUTPUT_DIR, exist_ok=True)

input_file = os.path.join(RAW_DATA_DIR, "Local_Subsidies gemeente Zwolle .csv")
output_file = os.path.join(OUTPUT_DIR, "Local_subsidies_clean.csv")

# -------------------------------
# FUNCTIONS
# -------------------------------
def clean_column_names(df):
    df.columns = (
        df.columns.str.strip()
        .str.lower()
        .str.replace(" ", "_", regex=False)
        .str.replace(".", "_", regex=False)
    )
    return df

def strip_whitespace(df):
    return df.applymap(lambda x: x.strip() if isinstance(x, str) else x)

def clean_data(df):
    df = clean_column_names(df)
    df = strip_whitespace(df)
    df = df.dropna(how='all')  # Drop entirely empty rows
    df = df.drop_duplicates()
    return df

def validate_and_clean_columns(df):
    # amount_of_money: convert to float
    if 'amount_of_money' in df.columns:
        df['amount_of_money'] = (
            df['amount_of_money']
            .replace(",", ".", regex=True)
            .astype(float)
        )

    # date: convert to datetime
    if 'date' in df.columns:
        df['date'] = pd.to_datetime(df['date'], dayfirst=True, errors='coerce')
        df = df[df['date'].notna()]

    # normalize do-it-yourself and ventilation columns to lowercase yes/no
    for col in ['do-it-yourself?', 'ventilation_measure?']:
        if col in df.columns:
            df[col] = df[col].str.strip().str.lower()

    return df

# -------------------------------
# PROCESSING
# -------------------------------
df = pd.read_csv(input_file, sep=",")
df = clean_data(df)
df = validate_and_clean_columns(df)

# -------------------------------
# SUMMARY
# -------------------------------
print(f"✅ Total rows after full cleaning and validation: {len(df)}")

# -------------------------------
# SAVE OUTPUT
# -------------------------------
df.to_csv(output_file, index=False)
print(f"📁 Cleaned file saved to: {output_file}")


✅ Total rows after full cleaning and validation: 237
📁 Cleaned file saved to: ../clean_data/Local_subsidies_clean.csv


  return df.applymap(lambda x: x.strip() if isinstance(x, str) else x)


## Minimize Local Subsidy Data – Gemeente Zwolle

The following code saves a minimized CSV file with the following columns:
- postal_code
- place
- date
- amount_of_money
- measures
- do-it-yourself?
- ventilation_measure?
- buurt-naam
- disctrict_cbs

## Before running:
1. Ensure the correct file name and path are specified under CONFIGURATION.
2. Confirm the selected columns are present in your input CSV.


In [2]:
import pandas as pd
import os

# -------------------------------
# CONFIGURATION
# -------------------------------
CLEANED_FILE = "../clean_data/Local_subsidies_clean.csv"
MINIMIZED_DIR = "../minimized_data/"
OUTPUT_FILENAME = "local_subsidies_minimized.csv"

# Create output directory if it doesn't exist
os.makedirs(MINIMIZED_DIR, exist_ok=True)

# -------------------------------
# LOAD DATA
# -------------------------------
df = pd.read_csv(CLEANED_FILE)

# No filtering – keep all columns
df_minimized = df.copy()

# -------------------------------
# SAVE OUTPUT
# -------------------------------
df_minimized.to_csv(os.path.join(MINIMIZED_DIR, OUTPUT_FILENAME), index=False)

print(f"✅ Full dataset saved to minimized folder:\n📁 {os.path.join(MINIMIZED_DIR, OUTPUT_FILENAME)}")



✅ Full dataset saved to minimized folder:
📁 ../minimized_data/local_subsidies_minimized.csv
