# Device Detections Exploration

This notebook explores detection patterns across AudioMoth devices and sites.
We summarise detection counts at the device, standardising for deployment effort where appropriate.

## This will cover:

- No. of detections by device, with overall % and site level %

- Detections per active device day (effort-standardised comparison)

- Each deviceâ€™s percentage contribution within its site


## Setup System Path And Get Data

In [None]:
import sys
import os
from pathlib import Path
import pandas as pd


# Go up one level to .../audiomoth
PROJECT_ROOT = Path(os.getcwd()).resolve().parent

# Add project root to sys.path so `src` is importable
sys.path.insert(0, str(PROJECT_ROOT))

PROCESSED_DATA_PATH = Path(PROJECT_ROOT) / "data_processed" / "analysis_df.parquet"
analysis_df = pd.read_parquet(PROCESSED_DATA_PATH)

# Make pandas show more columns/rows while exploring
pd.set_option("display.max_columns", 50)
pd.set_option("display.width", 120)

## Detections by device

### Volume

In [None]:
DEVICE_COL = "device"


device_detections_summary = (
    analysis_df.groupby(["device", "site"])
    .size()
    .rename("detections")
    .reset_index()
    .sort_values("site", ascending=True)
)

total = len(analysis_df)
device_detections_summary["%"] = (
    device_detections_summary["detections"] / total * 100
).round(2)

device_detections_summary["% within site"] = (
    device_detections_summary["detections"]
    / device_detections_summary.groupby("site")["detections"].transform("sum")
    * 100
).round(2)


device_detections_summary.reset_index(drop=True)

### Save

In [None]:
import src.data_store as data_store

data_store.save_dataframe_to_csv(
    device_detections_summary,
    Path(PROJECT_ROOT) / "outputs",
    "device_detections_summary",
)

### Effort

In [None]:
DEVICE_COL = "device"


device_intensity = (
    analysis_df.groupby(["device", "site"])
    .size()
    .rename("detections")
    .reset_index()
    .sort_values("site", ascending=True)
)


device_active_days = (
    analysis_df[["device", "site", "date"]]
    .drop_duplicates()
    .groupby(["device", "site"])
    .size()
    .rename("active_device_days")
    .reset_index()
)

device_intensity = device_intensity.merge(
    device_active_days, on=["device", "site"], how="left"
)

device_intensity["detections_per_device_day"] = (
    device_intensity["detections"] / device_intensity["active_device_days"]
).round(2)


device_intensity["effort_share_within_site"] = (
    device_intensity["detections_per_device_day"]
    / device_intensity.groupby("site")["detections_per_device_day"].transform("sum")
    * 100
).round(2)

In [None]:
import matplotlib.pyplot as plt

device_intensity.reset_index(drop=True)

# Create a colour map for sites
sites = device_intensity["site"].unique()

# Generate colours automatically (no manual colour picking)

cmap = plt.colormaps.get_cmap("tab10")
site_color_map = {site: cmap(i) for i, site in enumerate(sites)}


def highlight_by_site(row):
    color = site_color_map[row["site"]]
    return [
        f"background-color: rgba({int(color[0]*255)}, {int(color[1]*255)}, {int(color[2]*255)}, 0.15)"
    ] * len(row)


device_intensity.style.format(
    {
        "detections_per_device_day": "{:.2f}",
        "effort_share_within_site": "{:.2f}",
    }
).apply(highlight_by_site, axis=1)

At the site level, device performance was generally consistent within most sites, though several showed clear spatial variation. Breney Common and Higher Trevilmick had broadly comparable detection rates across devices, with one lower-performing unit at each site suggesting possible microhabitat or placement effects. Creney Farm and Red Moor showed more pronounced heterogeneity, with one device at each site substantially underperforming relative to others, while Red Moor also contained the single highest-performing device overall (CWT12; 564 detections per device day). Breney Farm and Lowertown, each with a single device, exhibited high detection rates, whereas Helman Tor showed comparatively low overall activity. Overall, variation appears more strongly driven by within-site spatial differences than by deployment effort, as detection rates were standardised per active device day.