# 00 – Environment Setup

This notebook sets up the Python environment for the **Cardiff AI Talk Runbook**.

It installs (if needed) and imports the core libraries we will use across:

- Data handling: `pandas`, `numpy`
- Visualization: `matplotlib`
- Geospatial (optional for real wildfire data): `geopandas`, `rasterio`
- Machine learning: `scikit-learn`

For the talk, you can run only the import cell and skip library installation if you're in a prepared environment.

In [1]:
# If running locally, you can uncomment the following lines to install packages.
# In many managed environments (e.g. university labs, Colab), these may already be available.
# !pip install numpy pandas matplotlib scikit-learn geopandas rasterio shap shapely

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    accuracy_score,
    precision_score,
    recall_score,
    f1_score,
    roc_auc_score,
    mean_absolute_error,
    mean_squared_error,
    r2_score,
)

print("Environment ready.")

Environment ready.


# What the code does (quickly)

The (commented) pip install line lets students install all required packages if they’re on a clean machine.

Then we import the main libraries used across the notebooks:

numpy and pandas for data handling,

matplotlib.pyplot for plotting,

sklearn.model_selection.train_test_split for splitting data into train/test,

sklearn.metrics functions for evaluating model performance.

Finally it prints "Environment ready." so you know imports ran successfully.

Libraries & their benefits (≈300 words)

NumPy (numpy)
Provides fast operations on arrays and matrices. ML models rely heavily on numeric computations; NumPy makes these efficient and easy to write.

Pandas (pandas)
Built on top of NumPy, it adds the DataFrame—a table-like structure similar to Excel. Perfect for loading CSV files, cleaning data, selecting columns, filtering rows, and doing quick statistics.

Matplotlib (matplotlib.pyplot)
A plotting library for creating charts: line plots, histograms, scatter plots, etc. It helps you visually explore data and inspect model results (e.g., distributions, relationships, ROC curves).

Scikit-Learn – Model Selection (sklearn.model_selection.train_test_split)
Utility to split your dataset into training and test sets. This is essential to evaluate how well a model generalizes to unseen data and avoid overfitting.

Scikit-Learn – Metrics (sklearn.metrics)
Provides standard evaluation metrics:

Classification: accuracy_score, precision_score, recall_score, f1_score, roc_auc_score.

Regression: mean_absolute_error, mean_squared_error, r2_score.
These metrics let you compare different models and decide which is best for your wildfire and mental-health tasks.

(Geopandas, rasterio, shap, shapely are only mentioned in the commented install line; they’re useful later for geospatial data and explainability, but not imported in this setup cell.)