# Spatial variation in species richness and acoustic activity

This notebook looks at the sample data given by the wildlife trust. Aim is to explore how the data varies between sites.

# Setup System Path

In [None]:
import sys
import os
from pathlib import Path
import pandas as pd


# Go up one level to .../audiomoth
PROJECT_ROOT = Path(os.getcwd()).resolve().parent

# Add project root to sys.path so `src` is importable
sys.path.insert(0, str(PROJECT_ROOT))

PROCESSED_DATA_PATH = out_dir = (
    Path(PROJECT_ROOT) / "data_processed" / "analysis_df.parquet"
)
analysis_df = pd.read_parquet(PROCESSED_DATA_PATH)

# Make pandas show more columns/rows while exploring
pd.set_option("display.max_columns", 50)
pd.set_option("display.width", 120)

## Species richness compared with activity per site


In [None]:
summary = (
    analysis_df.groupby("site")
    .agg(
        habitat=("habitat", "first"),
        species_richness=("scientific_name", "nunique"),
        detections=("scientific_name", "size"),
    )
    .sort_values("species_richness", ascending=False)
)

summary

# Summary
Species richness varies considerably between sites and does not scale directly with total detection counts.

Creney Farm shows the highest overall species richness despite a moderate number of detections, suggesting high diversity and evenness. Sites associated with wet or edge habitats (e.g. beaver wetland margins and heathland) also exhibit relatively high richness.

Dense scrub habitats show moderate species richness and detection rates, but do not consistently produce the highest values, indicating that habitat structure alone does not determine acoustic activity.

Lowland deciduous woodland supports relatively high species richness, contradicting patterns suggested by the initial sample dataset.

Differences in sampling effort and calling intensity likely influence detection totals, and further analysis accounting for deployment duration and effort is required before drawing strong conclusions.

In [None]:
top_species_by_site = (
    analysis_df.groupby(["site", "common_name"])
    .size()
    .reset_index(name="detections")
    .sort_values(["site", "detections"], ascending=[True, False])
)

top_species_by_site.groupby("site").head(5)

Differences in acoustic activity between sites were driven largely by a small number of highly vocal species, particularly Common Chiffchaff in dense scrub habitats, whereas species-rich sites showed more even distributions of detections across species. This can be visualised clearly in the plot below.

In [None]:
import matplotlib.pyplot as plt

summary = analysis_df.groupby("site").agg(
    species_richness=("scientific_name", "nunique"),
    total_detections=("scientific_name", "size"),
)

dominance = (
    analysis_df.groupby(["site", "common_name"])
    .size()
    .groupby("site")
    .apply(lambda x: x.max() / x.sum())
    .rename("dominance")
)


summary = summary.join(dominance)

plt.figure()
plt.scatter(summary["species_richness"], summary["dominance"])

for site, row in summary.iterrows():
    plt.text(
        row["species_richness"],
        row["dominance"],
        str(site),
        fontsize=9,
        ha="left",
        va="bottom",
    )

plt.xlabel("Species richness")
plt.ylabel("Dominance (fraction of detections)")
plt.title("Dominance vs species richness by site")
plt.show()

The above graph shows the dominance of the most frequently detected bird species at each site plotted against species richness (the number of species detected per site).
Creney Farm shows a relatively high species richness combined with lower dominance, indicating more even species distribution and suggesting higher habitat biodiversity.