- This model will use a banding/recovery dataset from the paper, Disentangling data discrepancies with integrated population models (Saunders. SP., et al. 2019), analysing American Woodcock populations to model the population abundance estimation. The population abundance estimate is highly valuable for management and conservation because it gives insight into abundance of that species. Abundance is one of the most intuitive metric for policymakers and wildlife managers to use. 

Inputs (x variables): 
-	Temporal data 
    -   Banding year (B Year), recovery year (R Year)
    -   Banding/recovery month/day (B Month, R Month, etc.)
    -   Hunting season indicator (Hunt. Season)
-	Spatial data 
    -	Banding/recovery flyway (B Flyway, R Flyway)
    -	Banding/recovery region/state (B Region, R Region)
    -	Latitude/longitude (B Lat, B Long, R Lat, R Long)
    -	Coordinate precision (accuracy descriptors)
-	Biological and demographic information 
    -   Species (Species Game Birds::SPEC)
    -   Age (Age, Age::VAGE)
    -   Sex (Sex, Sex::VSEX)
    -   Condition (Condition::VCondition, Condition::VBandStatus)
-	Banding and recovery metadata
    -   Band type (original/current)
    -   Status (Status::VStatus)
    -   How obtained (How Obt::VHow)
    -   Who reported (Who::VWho)
    -   Permittee (Permits::Permittee)
    -   Report method (Rept Mthd::VRept Mthd)

These features capture when, where, and how birds were banded/recovered, plus their age/sex/condition.

Output (y variable):
-   Population abundance estimate
    -   Derived from banding/recovery counts aggregated by year, region, or flyway.



 

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error
import numpy as np
import matplotlib.pyplot as plt

# Load your dataset
df = pd.read_csv("woodcock_band_data.csv")

# Example feature set
features = [
    "B Year", "B Month", "B Flyway", "B Region",
    "R Year", "R Month", "R Flyway", "R Region",
    "Age", "Sex", "Condition", "Status"
]

X = df[features]
y = df["Population_Estimate"]  # <-- target variable you define

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Model
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Evaluation
r2 = r2_score(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))

print(f"RÂ² Score: {r2:.3f}")
print(f"MAE: {mae:.3f}")
print(f"RMSE: {rmse:.3f}")

# Feature importance visualization
importances = model.feature_importances_
indices = np.argsort(importances)[::-1]

plt.figure(figsize=(10, 6))
plt.bar(range(len(features)), importances[indices], align="center")
plt.xticks(range(len(features)), [features[i] for i in indices], rotation=45)
plt.title("Feature Importance")
plt.show()
