In [90]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sympy.assumptions.cnf import AND

df = pd.read_csv("zitbanken0.csv", sep=";")

print(df.columns)


Index(['OBJECTID', 'ID', 'STADSDEEL', 'BUURT_CODE', 'STRAAT', 'EIGENAAR',
       'BEHEERDER', 'ONDERHOUDER', 'SOORT', 'VOET', 'ZITOBJECT', 'AFWERKING',
       'KLEUR', 'AANLEG_JAAR', 'STANDPLAATS', 'LOCATIE', 'LEVERANCIER',
       'geo_shape', 'geo_point_2d'],
      dtype='object')


What is the exact value of geo_point_2d for the first row of your dataset?

In [8]:
# .loc[...]: This is the label-based indexer.
# It looks up data by the name of the index (row) and the name of the column.
# loc stands for label-based Location

df.loc[0, "geo_point_2d"]

df.loc[1]

OBJECTID                                                     1146
ID                                                          10619
STADSDEEL                                                Tongelre
BUURT_CODE                                                   1335
STRAAT                                        Keizersmantelstraat
EIGENAAR                                                      BOR
BEHEERDER                                           BOR Stedelijk
ONDERHOUDER                                         BOR Stedelijk
SOORT                                                 Picknickset
VOET                                                        Beton
ZITOBJECT                                                    Hout
AFWERKING                                                   Beits
KLEUR                                                 Licht bruin
AANLEG_JAAR                                                2008.0
STANDPLAATS                                            Verharding
LOCATIE   

How many unique streets (STRAAT) appear in the dataset?

In [10]:
no_unique_st = df["STRAAT"].nunique()

532

Which street has the highest number of benches?

In [36]:
max_benches_grouped = df.groupby("STRAAT")["ID"].count()

max_benches_grouped = max_benches.sort_values(ascending=False)

print(max_benches_grouped)


STRAAT
STD.P. Philips de Jongh      94
STD.P. Stadswandelpark       73
Waterfront                   60
STD.P. Philips van Lennep    52
Buitengebied 50              45
                             ..
Thomaslaan                    1
Blondeelpad                   1
Toulousehof                   1
Schalmstraat                  1
't Hofke                      1
Name: ID, Length: 532, dtype: int64


What are the top 5 streets by bench count?

In [28]:
top5_max_benches = max_benches_grouped.head(5)

print(top5_max_benches)

STRAAT
STD.P. Philips de Jongh      94
STD.P. Stadswandelpark       73
Waterfront                   60
STD.P. Philips van Lennep    52
Buitengebied 50              45
Name: ID, dtype: int64


Looking at those top 5 entries, what does this tell you about where benches tend to be located in Eindhoven?

In [None]:
# Four of those five are parks or recreational areas, not normal streets.
#Benches in Eindhoven cluster overwhelmingly in parks and green areas, not along regular streets.

Find the next 10 streets (ranks 6–15) and check whether they are also parks or something else.

In [33]:
max_15 = max_benches.head(15)

print(max_15)

STRAAT
STD.P. Philips de Jongh      94
STD.P. Stadswandelpark       73
Waterfront                   60
STD.P. Philips van Lennep    52
Buitengebied 50              45
B.P. Karpendonck             42
Severijnpark                 38
Amandelpark                  35
STD.P. Henry Dunant          34
Winkelcentrum Woensel        33
18 Septemberplein            31
B.P. Engelsbergen            28
B.P. 't Ven                  18
Buitengebied 732             18
PL.. Romantica               17
Name: ID, dtype: int64


Parks dominate the top. Why does this happen?

In [None]:
# People don’t linger on streets, they pass through them — but parks are designed for staying, so benches concentrate where people actually stop.

What percentage of benches in the entire dataset are located in just the top 5 places you listed earlier?

In [64]:
total_bench = len(df)      # len tells the number

sum_max_benches_5 = max_benches_grouped.head(5).sum()

percent_max_5 = (sum_max_benches_5 / total_bench) * 100

print(f"{percent_max_5: .2f} percentage of benches in the entire dataset are located in just the top 5 places I listed earlier. ")

 7.96 percentage of benches in the entire dataset are located in just the top 5 places I listed earlier. 


In [None]:
# Does that number support your earlier assumption that benches are “overwhelmingly clustered in parks”?
# No.
# It weakens it.
# You saw high counts in parks, but the city is bigger than your top five rows. Most benches are spread across hundreds of other locations.

What would you need to calculate to understand how unevenly benches are distributed across locations?

In [65]:
 # { Yes, you can regroup benches by LOCATIE, but that tells you where they are — not how concentrated the distribution is.
 # If you want to understand distribution shape, you need something that captures inequality across groups. You’re looking for a statistical measure, not another grouping.
 # Here’s the nudge you need:
 # Economists use it for income inequality.
 # Ecologists use it for species distribution.
 # Data analysts use it for concentration across categories.
 # You’ve heard the name before.
 # What metric measures how unequal a distribution is?
 # Starts with a G.

 # The Gini coefficient.
 # It measures how unequal a distribution is — perfect for seeing whether benches are evenly spread across locations or heavily concentrated in just a few.

SyntaxError: invalid character '’' (U+2019) (977910576.py, line 3)

If the Gini coefficient for benches-per-street is high, what does that tell you about how benches are distributed across Eindhoven?

In [100]:
#It means a small number of locations hoard a large share of all benches, while most locations have very few.
# calculate Gini:

bench_count_per_street = df.groupby("STRAAT")["ID"].count()

number_of_streets = df["STRAAT"].nunique()

mean_bench_count = bench_count_per_street.sum() / number_of_streets

#print(f"There are{mean_bench_count: .2f} benches per street on average.")

#You need to build the pairwise absolute differences matrix.

# Convert your counts to a NumPy array:
x = bench_count_per_street.values

# Compute the pairwise absolute differences:
diffs = np.abs(x[:, None] - x[None, :])

# Sum all differences:

total_diffs = diffs.sum()

gini = total_diffs / (2 * number_of_streets**2 * mean_bench_count)

g = gini  # optional, but makes the code cleaner

if g < 0.20:
    interpretation = "very evenly distributed"
elif g < 0.40:
    interpretation = "moderately evenly distributed"
elif g < 0.60:
    interpretation = "unevenly distributed with noticeable clustering"
elif g < 0.80:
    interpretation = "highly concentrated in a small number of locations"
else:
    interpretation = "extremely concentrated"

print(f"Bench distribution: {interpretation} (Gini = {g:.3f})")
# Gini coefficient:
# 0 would be perfect equality → every street has the same number of benches.
# 1 would mean extreme inequality → one street has all benches, the rest have zero.
# 0.57 means the distribution is heavily skewed:
# a small set of places hold a disproportionate share of benches.



Bench distribution: unevenly distributed with noticeable clustering (Gini = 0.569)


If you group by LOCATIE, what’s the very first problem you must solve before that column is usable for analysis?

In [None]:
# LOCATIE isn’t standardized — you must clean and normalize the text before it becomes a meaningful category.
# That means dealing with:
# inconsistent naming
# typos
# different wording for the same place
# missing values
# overly long descriptions

What’s the simplest first cleaning step you can apply to LOCATIE to make it more consistent before grouping?

In [None]:
# Removing nulls is necessary, but it is not the simplest or most impactful first step.
# Nulls don’t cause inconsistency — messy text does.
# The first thing you should do with any free-text categorical field is brutally simple:
# Convert all text to a consistent case (lowercase).

Show me the unique values in LOCATIE, exactly as they appear in the dataset.

In [101]:
df["LOCATIE"].unique()

array(['Overig', 'Park', nan, 'Plein', 'School', 'Abri', 'Winkel',
       'Cafetaria', 'Begraafplaatsen'], dtype=object)

Which LOCATIE type has the most benches?

In [113]:
benches_per_locatie = df.groupby("LOCATIE")["ID"].count()
benches_per_locatie = benches_per_locatie.sort_values(ascending=False)


print(benches_per_locatie)

LOCATIE
Park               1737
Plein               529
Overig              321
Winkel              115
School               67
Abri                 19
Begraafplaatsen       4
Cafetaria             2
Name: ID, dtype: int64


What is the combined percentage of benches located in Park + Plein?

In [126]:
benches_per_locatie = df["LOCATIE"].value_counts()
percentage = (benches_per_locatie["Park"] + benches_per_locatie["Plein"]) / benches_per_locatie.sum() * 100

print(percentage)


81.10236220472441


What does it tell you about Eindhoven’s public–space strategy when 81% of all benches are placed in parks and squares instead of along streets, schools, shops, or other urban areas?

In [128]:
# Eindhoven invests benches in places where people are meant to spend time—parks and squares—and largely ignores everyday movement spaces like streets.

Do parks not only have more benches, but also more bench variety (types, materials, seating style) than other locations?

In [139]:
# How many unique bench types exist in the entire dataset, and how many appear specifically in parks?

unique_bench_types = df["SOORT"].nunique()

unique_bench_types_park = df[df["LOCATIE"] == "Park"]["SOORT"].nunique()

print(f"Total bench types: {unique_bench_types}, park bench types: {unique_bench_types_park}")

Total bench types: 20, park bench types: 18


What does the fact that 18 out of 20 bench types appear in parks tell you about Eindhoven’s approach to designing public seating?

In [140]:
# Eindhoven treats parks as its main social environments, so they get diverse, designed seating, while the rest of the city relies on a small set of simple, functional benches.

Check whether parks also have more materials/finishes than other places.

In [146]:
unique_afwerking_total = df["AFWERKING"].nunique()

unique_afwerking_park = df[df["LOCATIE"] == "Park"]["AFWERKING"].nunique()
print(unique_afwerking_total, unique_afwerking_park)

#What does the fact that parks have almost all the bench types but not more bench finishes tell you about Eindhoven’s approach to bench design?

# Eindhoven experiments with different bench types in parks, but keeps the materials standardized city-wide to control cost and maintenance.

4 4


Now we need to examine when benches were installed?

In [156]:
# 1. Compute how many benches were installed each year.

df["AANLEG_JAAR"].value_counts().sort_index()

 # Eindhoven installed almost all benches in one huge burst in 2006, followed by small, steady additions in later years.


AANLEG_JAAR
2001.0       3
2006.0    1291
2007.0       3
2008.0     247
2009.0     145
2010.0     172
2011.0     189
2012.0     161
2013.0     124
2014.0      42
2015.0      21
2016.0      24
2017.0      26
2018.0      10
2019.0      24
2020.0      50
2021.0      31
2022.0      34
2023.0      63
2024.0      50
2025.0      18
Name: count, dtype: int64

What does the installation spike in 2006 suggest about Eindhoven’s urban-planning strategy that year?

In [None]:
# 2006 was a major city wide infrastructure push, a coordinated upgrade of public space, not a gradual policy shift

Write one concise paragraph (3–4 sentences) summarizing Eindhoven’s bench-placement strategy based on everything you’ve discovered.



In [None]:
# It must include:
# spatial pattern (parks vs streets)
# design pattern (diverse types, standardized materials)
# temporal pattern (2006 installation surge)
#
# And it must be:
# direct
# analytical
# free of soft language
# free of speculated motives
# not “trying to sound smart”
# This is your “executive summary.”

In [158]:
print("Benches in Eindhoven are heavily concentrated in parks and squares, while ordinary streets receive very few. Parks also contain almost all of the bench-type diversity, but the materials remain standardized across the city, showing a balance between functional variety and controlled maintenance. Installation history is dominated by a massive one-year surge in 2006, followed only by small yearly additions afterward. Together, the patterns show a city that invests heavily in designed public spaces rather than spreading seating evenly across the urban fabric.")

Benches in Eindhoven are heavily concentrated in parks and squares, while ordinary streets receive very few. Parks also contain almost all of the bench-type diversity, but the materials remain standardized across the city, showing a balance between functional variety and controlled maintenance. Installation history is dominated by a massive one-year surge in 2006, followed only by small yearly additions afterward. Together, the patterns show a city that invests heavily in designed public spaces rather than spreading seating evenly across the urban fabric.
