# Spatial variation in species richness and acoustic activity

This notebook looks at the sample data given by the wildlife trust. Aim is to explore how the data varies between sites.

# Setup System Path

In [None]:
import sys
import os
from pathlib import Path
import pandas as pd


# Go up one level to .../audiomoth
PROJECT_ROOT = Path(os.getcwd()).resolve().parent

# Add project root to sys.path so `src` is importable
sys.path.insert(0, str(PROJECT_ROOT))

EXCEL_PATH = PROJECT_ROOT / "data_raw" / "audiomoth_sample.xlsx"

# Make pandas show more columns/rows while exploring
pd.set_option("display.max_columns", 50)
pd.set_option("display.width", 120)

## Basic normalisation
Standardise column names and parse timestamps if present.


In [None]:
import src.audio_moth_schema as audio_moth_schema
import src.normaliser as normaliser

# Get all the excel sheets available in the auditomoth sample file
sheets = normaliser.get_excel_sheets(EXCEL_PATH)

# Flatten all the sheets into a single DataFrame
sample_df = normaliser.flatten_data(sheets)


# Lowercase/underscore column names (non-destructive copy)
sample_df = normaliser.combine_date_and_time(
    sample_df, date_col="date", time_col="time", output_col="time"
)


# Validate and convert types according to AudioMoth schema
sample_df = audio_moth_schema.AudioMothSchema.validate(sample_df)

# sample_df.head()
sample_df.shape

## Species richness compared with activity per site


In [None]:
summary = (
    sample_df.groupby("site")
    .agg(
        habitat=("habitat", "first"),
        species_richness=("scientific_name", "nunique"),
        detections=("scientific_name", "size"),
    )
    .sort_values("species_richness", ascending=False)
)

summary

# Summary
Red Moor has high species richness to detection rate  
Whereas in comparison Lowertown and Breney Common had low species richness to detection rate.  
Creney Farm has highest richness.  
Dense scrub seems to correlate with high detection rates.  
Whereas Lowland deciduous seems to have lower species richness.  
More data is required to support conclusions.

In [None]:
top_species_by_site = (
    sample_df.groupby(["site", "common_name"])
    .size()
    .reset_index(name="detections")
    .sort_values(["site", "detections"], ascending=[True, False])
)

top_species_by_site.groupby("site").head(10)

Differences in acoustic activity between sites were driven largely by a small number of highly vocal species, particularly Common Chiffchaff in dense scrub habitats, whereas species-rich sites showed more even distributions of detections across species.