# Site and Device Detections Exploration

This notebook explores detection patterns across AudioMoth devices and sites.
We summarise detection counts at the device and site level, examine temporal patterns (week, month, hour of day), and compare device-level detections with aggregated site-level activity.

## This will cover:

- No. of detections by device, with overall % and site level %

- No. of detections by month combined

- No. of detections by month per device

- No. of detections by week combined

- No. of detections by week per device

- Overall Daily detection patterns

- Daily detection patterns per device.


## Setup System Path And Get Data

In [None]:
import sys
import os
from pathlib import Path
import pandas as pd


# Go up one level to .../audiomoth
PROJECT_ROOT = Path(os.getcwd()).resolve().parent

# Add project root to sys.path so `src` is importable
sys.path.insert(0, str(PROJECT_ROOT))

PROCESSED_DATA_PATH = Path(PROJECT_ROOT) / "data_processed" / "analysis_df.parquet"
analysis_df = pd.read_parquet(PROCESSED_DATA_PATH)

# Make pandas show more columns/rows while exploring
pd.set_option("display.max_columns", 50)
pd.set_option("display.width", 120)

## Detections by device

In [None]:
DEVICE_COL = "device"


device_detections_summary = (
    analysis_df.groupby(["device", "site"])
    .size()
    .rename("detections")
    .reset_index()
    .sort_values("site", ascending=True)
)


total = len(analysis_df)
device_detections_summary["%"] = (
    device_detections_summary["detections"] / total * 100
).round(2)

device_detections_summary["% within site"] = (
    device_detections_summary["detections"]
    / device_detections_summary.groupby("site")["detections"].transform("sum")
    * 100
).round(2)


device_detections_summary.reset_index(drop=True)

In [None]:
import src.data_store as data_store

data_store.save_dataframe_to_csv(
    device_detections_summary,
    Path(PROJECT_ROOT) / "outputs",
    "device_detections_summary",
)

## Detections By Month

### Overall

In [None]:
# monthly detections (per site/device optional)
monthly_det = analysis_df.groupby(["month"]).size().rename("detections").reset_index()


# active device-days per month (count unique device+date pairs)
monthly_active_device_days = (
    analysis_df[["month", "device", "date"]]
    .drop_duplicates()
    .groupby(["month"])
    .size()
    .rename("active_device_days")
    .reset_index()
)

monthly_summary = monthly_det.merge(
    monthly_active_device_days, on=["month"], how="left"
)

monthly_summary["detections_per_device_day"] = (
    monthly_summary["detections"] / monthly_summary["active_device_days"]
).round(2)

monthly_summary

The following table compares monthly detection totals with the cumulative number of active device-days within each month.

Normalising monthly detections by the cumulative number of active device-days reveals substantial differences in detection intensity between months, with April showing the highest per-device-day activity.


### Per Device

In [None]:
monthly_detections_per_device = (
    analysis_df.groupby(["site", "device", "month"])
    .size()
    .rename("detections")
    .reset_index()
    .sort_values(["site", "device", "month"])
    .reset_index(drop=True)
)

# active device-days per month (count unique device+date pairs)
monthly_active_device_days = (
    analysis_df[["device", "month", "date"]]
    .drop_duplicates()
    .groupby(["device", "month"])
    .size()
    .rename("active_device_days")
    .reset_index()
)

monthly_detections_per_device = monthly_detections_per_device.merge(
    monthly_active_device_days, on=["device", "month"], how="left"
)
monthly_detections_per_device["detections_per_device_day"] = (
    monthly_detections_per_device["detections"]
    / monthly_detections_per_device["active_device_days"]
).round(2)

monthly_detections_per_device

When adjusted for daily recording effort, detection intensity increases markedly from February to March across most devices, indicating a strong seasonal signal. April shows particularly high per-device-day activity at several sites, although these values are based on fewer active days and may reflect concentrated peak activity periods. Device-level differences persist within sites, suggesting that local habitat characteristics or recorder placement influence acoustic intensity.

### Key takeaways:

- Seasonal increase from February → March.

- April shows strong intensification at certain sites.

- Device-level variation persists even after effort correction.

- Helman Tor consistently lower intensity.

- Higher Trevilmick and Lowertown show particularly high spring intensity.

## Detections By Week

### Overall

In [None]:
weekly_summary = (
    analysis_df.groupby("week")
    .agg(detections=("device", "size"), active_devices=("device", "nunique"))
    .reset_index()
)

weekly_summary["detections_per_device"] = (
    weekly_summary["detections"] / weekly_summary["active_devices"]
).round(0)


weekly_summary

In [None]:
""" Useful line of code to check specific weeks daily detection counts to be
compared with known battery loss/change dates. """

analysis_df.loc[analysis_df["week"] == 15, "date"].value_counts().sort_index()

### Save

In [None]:
data_store.save_dataframe_to_csv(
    weekly_summary,
    Path(PROJECT_ROOT) / "outputs",
    "overall_detections_weekly_summary",
)

Weekly detection totals varied substantially across the study period. However, when normalised by the number of active recording devices, a clear increase in detections per device was observed from late February into late March, peaking in weeks 13–14. This pattern is consistent with increased vocal activity during the spring breeding period. Weeks 10, 11, and 15 likely represent partial recording periods due to reduced device availability, most likely resulting from battery drop-outs.

### Per Device

In [None]:
weekly_detections_per_device = (
    analysis_df.groupby(["site", "device", "week"])
    .size()
    .rename("detections")
    .reset_index()
    .sort_values(["site", "device", "week"])
    .reset_index(drop=True)
)

# Add a column with relative detections per device (normalized by max detections for that device)
weekly_detections_per_device["detections_relative"] = (
    weekly_detections_per_device.groupby(
        "device"
    )["detections"].transform(lambda x: x / x.max())
)

weekly_detections_per_device

### Save

In [None]:
data_store.save_dataframe_to_csv(
    weekly_detections_per_device,
    Path(PROJECT_ROOT) / "outputs",
    "device_detections_weekly_summary",
)

### Improved Visual Format

In [None]:
# Pivot the data for a more visually pleasing format.
weekly_detections_wide = weekly_detections_per_device.pivot_table(
    index="device", columns="week", values="detections", fill_value=0
)

weekly_detections_wide

## Weekly Site Detections Per Active Device Plot

In [None]:
# Add columns for total detections and number of active devices per site-week
weekly_site_effort = (
    weekly_detections_per_device.groupby(["site", "week"])
    .agg(
        total_detections=("detections", "sum"),
        active_devices=("device", "nunique"),
    )
    .reset_index()
)

# Add a detection column that takes into account the number of active devices.
weekly_site_effort["detections_per_device"] = (
    weekly_site_effort["total_detections"] / weekly_site_effort["active_devices"]
)

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(10, 6))

for site, g in weekly_site_effort.groupby("site"):
    plt.plot(g["week"], g["detections_per_device"], label=site)

    # label at end of line
    plt.text(
        g["week"].iloc[-1],
        g["detections_per_device"].iloc[-1],
        site,
        fontsize=9,
        va="center",
        ha="left",
        rotation=350,
    )

plt.xlabel("Week")
plt.ylabel("Detections per active device")
plt.title("Weekly detections per active device (effort-adjusted)")
plt.tight_layout()
plt.show()

Weekly detection patterns vary between sites but show broadly similar temporal trends when adjusted for the number of active devices. Normalising detections by the number of devices recording each week reduces bias introduced by differences in deployment size and periods of device downtime, allowing more meaningful comparison between sites.

Despite this adjustment, variation in detection levels persists, reflecting differences in local habitat, calling intensity, and temporal availability of recording effort. It should be noted that this approach accounts for whether devices were active during a given week, but does not capture partial-week downtime or variation in daily recording effort, which may still influence weekly detection rates.