# Lim Zhi Chao (A0252895N)

Link to GitHub repository: https://github.com/lzc88/DSA4262

NOTE: In order to run the entire notebook, you will require an API key from data.gov.sg

# Setting Up

### Import Dependencies

In [34]:
import os
import json
import time
import requests
import pandas as pd
import plotly.graph_objects as go
import plotly.io as pio
from dotenv import load_dotenv

pio.renderers.default = "notebook_connected"
load_dotenv("../.env")

True

### Helper Functions

In [35]:
def gho_data_with_details(code):

    url = (
        "https://ghoapi.azureedge.net/api/DIMENSION/GHO/DimensionValues"
        f"?$filter=Code eq '{code}'"
    )

    r = requests.get(url, timeout=30)
    r.raise_for_status()

    data = r.json()["value"][0]
    print(f"Dataset {data['Code']}: {data['Title']}")

    url = f"https://ghoapi.azureedge.net/api/{code}" "?$filter=SpatialDim eq 'SGP'"

    r = requests.get(url, timeout=30)
    r.raise_for_status()

    data = r.json()["value"]

    # Uncomment to view sample data point

    # print(f"{len(data)} data points")

    # sample = data[0]

    # print("Sample data with the following fields:")

    # for idx, k in enumerate(sample):
    #     print(f"[{idx+1:02d}] {k}: {sample[k]}")

    return data


def fetch_with_retry(url, headers, max_retries=3):

    for attempt in range(max_retries):

        r = requests.get(url, headers=headers)

        if r.status_code == 429:
            wait = min(60, 2**attempt)
            print(f"429 rate limit. Sleep {wait}s then retry...")
            time.sleep(wait)
            continue

        r.raise_for_status()

        return r.json()

    raise RuntimeError("Too many 429s; try again later.")


def _flatten_coords(coords):

    if not coords:
        return

    if isinstance(coords[0][0][0], (int, float)):
        polys = [coords]
    else:
        polys = coords

    for poly in polys:
        for ring in poly:
            for lon, lat in ring:
                yield lon, lat


def plot_geojson(geo_data, df, zoom=10.5, title="Insert Title"):

    fig = go.Figure(
        go.Choroplethmap(
            geojson=geo_data,
            locations=df["area_key"],
            z=df["number"],
            featureidkey="properties.area_key",
            colorscale="Reds",
            marker_line_width=0.5,
            colorbar_title="Population",
        )
    )

    lons, lats = [], []
    for feat in geo_data["features"]:
        geom = feat["geometry"]
        coords = geom["coordinates"]
        for lon, lat in _flatten_coords(coords):
            lons.append(lon)
            lats.append(lat)

    min_lon, min_lat, max_lon, max_lat = min(lons), min(lats), max(lons), max(lats)

    fig.update_layout(
        map=dict(
            style="carto-positron",
            center={"lat": (min_lat + max_lat) / 2, "lon": (min_lon + max_lon) / 2},
            zoom=zoom,
        ),
        margin={"r": 0, "t": 40, "l": 0, "b": 0},
        title=title,
    )

    fig.show()

# Topic: Suicide Risks in Singapore

While Singapore’s overall suicide rate has declined over the past two decades, this aggregate trend masks persistent and uneven risks across demographic groups.

This motivates an actionable question:

**If resources are limited, where and for whom should prevention and mental-health services be prioritised to reduce risk most effectively?**

### Datasets

* WHO GHO API
    * MH_12 - Age-standardised suicide rates (per 100 000 population)
    * SDGSUICIDE - Crude suicide rates (per 100 000 population)
* data.gov.sg API
    * HDB - Elderly and Future Elderly Resident Population by Geographical Distribution
    * URA - Master Plan 2019 Subzone Boundary (No Sea) (GEOJSON)

# Macro Plots

### Age-standardised suicide rates (per 100 000 population)

In [36]:
data = gho_data_with_details(code="MH_12")

required_fields = ["TimeDim", "Dim1", "NumericValue", "Low", "High"]

df = pd.DataFrame(data=data)
df = df[required_fields]
# Filter to only show data for both sexes
df = df[df["Dim1"] == "SEX_BTSX"]
# Sort by year
df = df.sort_values(["TimeDim"])
df = df.reset_index(drop=True)

# print("Filter to only show data where Dim1=SEX_BTSX (both sexes)")
# print(f"{len(df)}/{len(data)} data points (showing first 10)")
# df.head(10)

Dataset MH_12: Age-standardized suicide rates (per 100 000 population)


In [37]:
fig = go.Figure()

# Uncertainty band
fig.add_trace(
    go.Scatter(
        x=df["TimeDim"],
        y=df["High"],
        mode="lines",
        line=dict(width=0),
        showlegend=False,
        hoverinfo="skip",
    )
)

fig.add_trace(
    go.Scatter(
        x=df["TimeDim"],
        y=df["Low"],
        mode="lines",
        line=dict(width=0),
        fill="tonexty",
        fillcolor="rgba(144, 238, 144, 0.4)",
        name="Uncertainty (Low–High)",
    )
)

# Main line
fig.add_trace(
    go.Scatter(
        x=df["TimeDim"],
        y=df["NumericValue"],
        mode="lines+markers",
        name="Population Average (Both Sexes)",
    )
)

# Only show years with data points
years = sorted(df["TimeDim"].unique())
fig.update_xaxes(tickmode="array", tickvals=years)

fig.update_layout(
    title="Singapore: Age-standardised Suicide Rate (Both Sexes)",
    xaxis_title="Year",
    yaxis_title="Suicide rate per 100,000 population",
    template="simple_white",
    hovermode="x unified",
)

fig.show()

A line chart with an uncertainty band was chosen because it clearly communicates long-term trends while acknowledging uncertainty in the estimates. This allows sustained movements to be distinguished from short-term fluctuations.

This macro-level trend is important because it provides a national context for understanding suicide as a public health issue and helps assess whether population-wide interventions have coincided with changes over time.

The declining age-standardised suicide rate suggests overall improvement in mental health outcomes at the population level, which may reflect advances in healthcare access, economic stability, and public awareness. 

For policymakers, this indicates that broad strategies may be having an effect, but it does not identify which demographic groups continue to face elevated risk.

A key limitation of this dataset is that **age-standardisation masks heterogeneity across age and sex groups**, meaning that **subgroup vulnerabilities may be hidden** despite favourable national averages.

# Micro Plots

### Crude suicide rates (per 100 000 population)

In [38]:
data = gho_data_with_details(code="SDGSUICIDE")

required_fields = ["TimeDim", "Dim1", "Dim2", "NumericValue", "Low", "High"]

valid_age_groups = [
    "AGEGROUP_YEARS10-19",
    "AGEGROUP_YEARS20-29",
    "AGEGROUP_YEARS30-39",
    "AGEGROUP_YEARS40-49",
    "AGEGROUP_YEARS50-59",
    "AGEGROUP_YEARS60-69",
    "AGEGROUP_YEARS70PLUS",
]

df = pd.DataFrame(data=data)
df = df[required_fields]
df = df[df["Dim2"].isin(valid_age_groups)]
# Filter to only show data from 2021 (the only year that has data across age groups)
df = df[df["TimeDim"] == 2021]
# Sort by age groups
df = df.sort_values(["Dim1", "Dim2"])
df["Sex"] = (
    df["Dim1"]
    .str.replace("SEX_BTSX", "Both Sexes")
    .str.replace("SEX_MLE", "Male")
    .str.replace("SEX_FMLE", "Female")
)
df["AgeGroupLabel"] = (
    df["Dim2"].str.replace("AGEGROUP_YEARS", "").str.replace("PLUS", "+")
)
# Drop unnecessary columns
df = df.drop(columns=["Dim1", "Dim2", "Low", "High"], inplace=False)
df = df.reset_index(drop=True)

# print("Filter to only show data where from 2021")
# print(f"{len(df)}/{len(data)} data points (showing first 10)")
# df.head(10)

Dataset SDGSUICIDE: Crude suicide rates (per 100 000 population)


In [39]:
fig = go.Figure()

subset = df[df["Sex"] == "Male"]
fig.add_trace(
    go.Bar(
        x=subset["AgeGroupLabel"],
        y=subset["NumericValue"],
        name="Male",
        marker_color="steelblue",
    )
)

subset = df[df["Sex"] == "Female"]
fig.add_trace(
    go.Bar(
        x=subset["AgeGroupLabel"],
        y=subset["NumericValue"],
        name="Female",
        marker_color="indianred",
    )
)

fig.update_layout(
    title="Singapore: Suicide Rates by Age Group and Sex (2021)",
    xaxis_title="Age group",
    yaxis_title="Suicide rate per 100,000 population",
    barmode="group",
    template="simple_white",
    hovermode="x unified",
)

fig.show()

A grouped bar chart was selected as it enables direct comparison of suicide rates across age groups while highlighting differences between males and females within each group.

This micro-level analysis is important because suicide risk is known to vary substantially across age and sex, and identifying high-risk groups is essential for targeted prevention.

The chart shows that suicide rates increase with age and are consistently higher among males, with older men exhibiting the highest rates, indicating a concentrated vulnerability among elderly populations.

For clinicians and policymakers, this **highlights the need for age and sex specific interventions**, particularly focused on older adults who may face compounding risks such as social isolation, chronic illness, and reduced mobility.

A limitation of this dataset is that it reflects a single year of data and does not capture longitudinal changes or causal factors underlying these observed differences.

# Actionable Plots

### HDB Elderly and Future Elderly Resident Population by Geographical Distribution


In [40]:
dataset_id = "d_4180067b350bc9839a4cea487841d5d1"
headers = {"x-api-key": os.getenv("DATA_GOV_SG_API_KEY")}
base_url = "https://data.gov.sg/api/action/datastore_search"

limit = 100
offset = 0

url0 = f"{base_url}?resource_id={dataset_id}&limit={limit}&offset={offset}"
j0 = fetch_with_retry(url0, headers=headers)

total = j0["result"]["total"]
data = j0["result"]["records"]

print("Total rows:", total)

offset += limit
while offset < total:

    url = f"{base_url}?resource_id={dataset_id}&limit={limit}&offset={offset}"

    j = fetch_with_retry(url, headers=headers)

    batch = j["result"]["records"]

    if not batch:
        break

    data.extend(batch)
    offset += limit

    time.sleep(2)

print("Fetched rows:", len(data))

df = pd.DataFrame(data=data)
# Normalise town estate names
df["area_key"] = df["town_estate"].str.upper().str.strip()
# Filter to only show data from 2018 (the latest year)
df = df[df["shs_year"] == "2018"]
# Sort by town estate and elderly type
df = df.sort_values(["town_estate", "elderly_pop"])
df = df.reset_index(drop=True)

# print("Filter to only show data where from 2018")
# print(f"{len(df)}/{len(data)} data points (showing first 10)")
# df.head(10)

Total rows: 208
Fetched rows: 208


### Master Plan 2019 Subzone Boundary (No Sea) (GEOJSON)

In [41]:
dataset_id = "d_8594ae9ff96d0c708bc2af633048edfb"
headers = {"x-api-key": os.getenv("DATA_GOV_SG_API_KEY")}
url = f"https://api-open.data.gov.sg/v1/public/api/datasets/{dataset_id}/poll-download"

r = requests.get(url=url, headers=headers)
r.raise_for_status()

data = r.json()

url = data["data"]["url"]

r = requests.get(url=url, headers=headers)
r.raise_for_status()

geo_data = json.loads(r.text)

# Normalise GeoJSON town estate names
for feature in geo_data["features"]:
    feature["properties"]["area_key"] = (
        feature["properties"]["PLN_AREA_N"].upper().strip()
    )

In [42]:
df_areas = set(df["area_key"])
geo_areas = {feature["properties"]["area_key"] for feature in geo_data["features"]}

print("In data but not map:", df_areas - geo_areas)
print("In map but not data:", geo_areas - df_areas)

print(
    "\nAfter inspecting both datasets, WHAMPOA does not exist in the GeoJSON.\nTherefore, we replace KALLANG/WHAMPOA with KALLANG for better representation.\n"
)

df["area_key"] = df["area_key"].replace({"KALLANG/WHAMPOA": "KALLANG"})

df_areas = set(df["area_key"])
geo_areas = {feature["properties"]["area_key"] for feature in geo_data["features"]}

print("In data but not map:", df_areas - geo_areas)
print("In map but not data:", geo_areas - df_areas)

In data but not map: {'KALLANG/WHAMPOA', 'CENTRAL AREA'}
In map but not data: {'WOODLANDS', 'LIM CHU KANG', 'SINGAPORE RIVER', 'TUAS', 'CHANGI', 'SOUTHERN ISLANDS', 'NEWTON', 'SUNGEI KADUT', 'CHANGI BAY', 'DOWNTOWN CORE', 'MANDAI', 'KALLANG', 'MARINA EAST', 'RIVER VALLEY', 'PIONEER', 'MARINA SOUTH', 'SELETAR', 'BOON LAY', 'WESTERN ISLANDS', 'ORCHARD', 'NORTH-EASTERN ISLANDS', 'NOVENA', 'OUTRAM', 'STRAITS VIEW', 'CENTRAL WATER CATCHMENT', 'ROCHOR', 'TANGLIN', 'WESTERN WATER CATCHMENT', 'TENGAH', 'SIMPANG', 'PAYA LEBAR', 'MUSEUM'}

After inspecting both datasets, WHAMPOA does not exist in the GeoJSON.
Therefore, we replace KALLANG/WHAMPOA with KALLANG for better representation.

In data but not map: {'CENTRAL AREA'}
In map but not data: {'WOODLANDS', 'LIM CHU KANG', 'SINGAPORE RIVER', 'TUAS', 'CHANGI', 'SOUTHERN ISLANDS', 'NEWTON', 'SUNGEI KADUT', 'CHANGI BAY', 'DOWNTOWN CORE', 'MANDAI', 'MARINA EAST', 'RIVER VALLEY', 'PIONEER', 'MARINA SOUTH', 'SELETAR', 'BOON LAY', 'WESTERN ISLANDS', '

### Elderly Population (65+ Years Old) by Town Estate

In [43]:
df_elderly = df.copy()
df_elderly = df_elderly[df_elderly["elderly_pop"] == "Elderly"]
df_elderly["number"] = df_elderly["number"].astype(int)
df_elderly = df_elderly.sort_values(["number"], ascending=False)
df_elderly = df_elderly.reset_index(drop=True)

print(f"Elderly population across {len(df_elderly)} town estates")

plot_geojson(
    geo_data=geo_data,
    df=df_elderly,
    title="Elderly Population (65+ Years Old) by Town Estate",
)

df_elderly.head(10)

Elderly population across 26 town estates


Unnamed: 0,_id,shs_year,elderly_pop,town_estate,number,area_key
0,173,2018,Elderly,Bedok,38200,BEDOK
1,174,2018,Elderly,Bukit Merah,34700,BUKIT MERAH
2,160,2018,Elderly,Jurong West,33400,JURONG WEST
3,175,2018,Elderly,Ang Mo Kio,31000,ANG MO KIO
4,161,2018,Elderly,Tampines,31000,TAMPINES
5,162,2018,Elderly,Hougang,27000,HOUGANG
6,176,2018,Elderly,Toa Payoh,25300,TOA PAYOH
7,163,2018,Elderly,Yishun,24800,YISHUN
8,177,2018,Elderly,Kallang/Whampoa,24100,KALLANG
9,164,2018,Elderly,Choa Chu Kang,24000,CHOA CHU KANG


### Future Elderly Population (55-64 Years Old) by Town Estate

In [44]:
df_f_elderly = df.copy()
df_f_elderly = df_f_elderly[df_f_elderly["elderly_pop"] == "Future Elderly"]
df_f_elderly["number"] = df_f_elderly["number"].astype(int)
df_f_elderly = df_f_elderly.sort_values(["number"], ascending=False)
df_f_elderly = df_f_elderly.reset_index(drop=True)

print(f"Future elderly population across {len(df_f_elderly)} town estates")

plot_geojson(
    geo_data=geo_data,
    df=df_f_elderly,
    title="Future Elderly Population (55-64 Years Old) by Town Estate",
)

df_f_elderly.head(10)

Future elderly population across 26 town estates


Unnamed: 0,_id,shs_year,elderly_pop,town_estate,number,area_key
0,187,2018,Future Elderly,Tampines,41000,TAMPINES
1,186,2018,Future Elderly,Jurong West,35800,JURONG WEST
2,199,2018,Future Elderly,Bedok,32300,BEDOK
3,189,2018,Future Elderly,Yishun,30000,YISHUN
4,190,2018,Future Elderly,Choa Chu Kang,28800,CHOA CHU KANG
5,188,2018,Future Elderly,Hougang,27200,HOUGANG
6,191,2018,Future Elderly,Hougang,26600,HOUGANG
7,201,2018,Future Elderly,Ang Mo Kio,22600,ANG MO KIO
8,200,2018,Future Elderly,Bukit Merah,22500,BUKIT MERAH
9,183,2018,Future Elderly,Sengkang,21200,SENGKANG


A choropleth map is an appropriate design choice as it allows population density to be visualised geographically, making it easier to identify town estates where large numbers of future elderly and elderly residents are clustered.

This spatial analysis is important because suicide risk is highest among older adults, yet mental health resources are often planned uniformly rather than in response to where ageing populations are concentrated. 

When interpreted alongside age-specific suicide risk, the maps highlight estates such as Tampines, Jurong West, Bedok, Yishun, and Hougang as areas where both current (65+) and future (55–64 transitioning into 65+) demand for mental health services is likely to be greatest.

For policymakers and clinicians, this supports targeted deployment of mental health services, community outreach, and early screening programmes in specific locations rather than broad nationwide expansion.

A key limitation of this dataset is that **population size does not directly measure mental health need or increased suicide risk**. It does not account for other factors such as social support, existing service capacity, or individual-level risk. Therefore, it should be interpreted as an indicator of **potential service demand rather than actual clinical burden**.

# Conclusion

This analysis demonstrates that while Singapore’s overall age-standardised suicide rate has declined over time, the aggregated improvement conceals substantial demographic and spatial heterogeneity in risk.

Disaggregation by age and sex reveals that older adults, particularly elderly males, remain disproportionately vulnerable, underscoring the importance of targeted rather than general interventions.

By mapping the spatial concentration of future elderly and elderly populations, we can translate these demographic insights into actionable guidance and identify town estates where demand for mental health services is likely to be highest both currently and in the near future.

Together, the macro, micro, and spatial perspectives highlight the value of integrating epidemiological trends with demographic and geographic data to support proactive mental health planning.

A key limitation of this study is data timeliness. The suicide rate estimates are only available up to 2021, while population data are available up to 2018. **More recent datasets would enable more accurate and responsive analysis, particularly in light of recent social and economic changes**.

### AI Declaration

I used ChatGPT to improve expressions of sentences to refine my assignment. I am responsible for the content and quality of the submitted work.