# Comparing Conservation Status Distribution by Park

## Purpose

This analysis compares the **distribution of conservation status classifications across parks**, using normalized proportions to evaluate whether observed patterns differ meaningfully by site.

The goal is to assess **relative composition**, not absolute volume, in order to identify potential structural differences in conservation status reporting across parks.

---

## Context and Scope

This notebook builds on two prior steps:

1. A **methodological validation** that established structural consistency in observation composition across parks
2. A **descriptive baseline** summarizing overall conservation status distribution across the full dataset

Given these foundations, conservation status comparisons by park are treated as **methodologically comparable** at the chosen level of aggregation.

---

## Analytical Question

> Does the distribution of conservation status classifications differ across parks?

---

## Method Overview

1. Aggregate observations by park and conservation status
2. Normalize conservation status totals within each park to proportions
3. Compare distributions using normalized visualizations (e.g., stacked bar charts or heatmaps)

Normalization removes differences in total observation volume, allowing comparison of **relative conservation status composition** across parks.

---

## Interpretive Boundaries

* Observation counts reflect **classification frequency**, not species abundance or population health
* Differences may reflect reporting practices, ecological context, or taxonomic composition
* This analysis does not assess conservation outcomes or causal drivers

Findings are interpreted as **comparative structure**, not ecological inference.


In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from util import summarize

sns.set_theme(style='whitegrid')

# load merged / cleaned data frame `df_merged.feather`
df = pd.read_feather('df_merged.feather')

# All Observations by Category and Park

In [None]:
df_total_stack = (
    df.groupby(['park_name','category'],observed=True)['observations']
    .sum()
    .reset_index()
)

In [None]:
df_total_wide = df_total_stack.pivot(
    index='park_name',
    columns='category',
    values='observations'
)

In [None]:
ax = df_total_wide.plot(
    kind='bar',
    stacked=True,
    figsize=(10,6)
)

ax.set_ylabel('Observations')
ax.set_xlabel('')
ax.set_title("Total Observations by Park")

# Move legend outside
ax.legend(
    title='Category',
    bbox_to_anchor=(1.02, 1),
    loc='upper left',
    borderaxespad=0
)

plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.savefig('fig1a_total_observations_by_park.png')

In [None]:
df_total_norm = df_total_stack.copy()
df_total_norm['park_total'] = df_total_norm.groupby('park_name', observed=True)['observations'].transform('sum')
# new col = proportion as total obs divided by park total
df_total_norm['prop'] = df_total_norm['observations'] / df_total_norm['park_total']

In [None]:
# verify sum to 1 for norming
df_total_norm.groupby('park_name', observed=True)['prop'].sum()

In [None]:
df_total_plot = df_total_norm.pivot(
    index='park_name',
    columns='category',
    values='prop', 
).fillna(0)

In [None]:
fig, ax = plt.subplots(figsize=(10,6))

bottom = np.zeros(len(df_total_plot)) # set bottom = 0 for each specific park

for col in df_total_plot.columns:
    ax.bar(
        df_total_plot.index,
        df_total_plot[col],
        bottom=bottom,
        label=col
    )
    bottom += df_total_plot[col].values

ax.set_ylabel('Proportion of Observations')
ax.set_title('Normed Biodiversity Composition by Park (All Categories)')
ax.legend(
    title='Category',
    bbox_to_anchor=(1.02, 1),
    loc='upper left'
)

plt.xticks(rotation=30)
plt.tight_layout()

# This one was redundant... Not saved or included.

# Conservation Subset of Total Observations

In [None]:
df_cons_stat = df[df.conservation_status != 'No Concern']

In [None]:
df_stack = (
    df_cons_stat
        .groupby(['park_name', 'category'], observed=True)['observations']
        .sum()
        .reset_index()
)


In [None]:
df_wide = df_stack.pivot(
    index='park_name',
    columns='category',
    values='observations'
)


In [None]:
ax = df_wide.plot(
    kind='bar',
    stacked=True,
    figsize=(10, 6)
)

ax.set_ylabel('Observations')
ax.set_xlabel('')
ax.set_title('Conservation Observations by Park')

# Move legend outside
ax.legend(
    title='Category',
    bbox_to_anchor=(1.02, 1),
    loc='upper left',
    borderaxespad=0
)

plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.savefig('fig3_conservation_observations_by_park.png')

In [None]:
df_prop = df_wide.div(df_wide.sum(axis=1), axis=0)

ax = df_prop.plot(
    kind='bar',
    stacked=True,
    figsize=(10, 6)
)

ax.set_ylabel('Proportion')
ax.set_xlabel('')
ax.set_title('Proportion of Conservation Observations by Park')

# Move legend outside
ax.legend(
    title='Category',
    bbox_to_anchor=(1.02, 1),
    loc='upper left',
    borderaxespad=0
)

plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.savefig('fig3a_conservation_proportions_by_park.png')
