# 01 — Download a Public Smoke Dataset

Choose one path below (Roboflow, Kaggle, DFS/D-Fire/Boreal).

## Setup & installs

In [None]:
# If needed, install dependencies here
# !pip install -q ultralytics roboflow kaggle opencv-python lxml PyYAML tqdm
import os, pathlib, yaml, json

## Option A) Roboflow (YOLO-ready)
- Get an API key: https://docs.roboflow.com/api-reference/authentication
- Set workspace/project/version.

In [None]:
# Set your API key in the environment (restart kernel after setting if needed)
# os.environ['ROBOFLOW_API_KEY'] = 'YOUR_KEY'

# Then download using the helper script in src/
# Example for Wildfire Smoke dataset (workspace='public-datasets', project='wildfire-smoke', version=1)
# !python src/download_roboflow.py --workspace public-datasets --project wildfire-smoke --version 1 --out data/datasets/roboflow_smoke

## Option B) Kaggle (YOLO datasets)
- Add `~/.kaggle/kaggle.json` or set `KAGGLE_USERNAME` and `KAGGLE_KEY`.
- Pick a dataset slug (examples: 'ahemateja19bec1025/wildfiresmokedatasetyolo' or 'sayedgamal99/smoke-fire-detection-yolo').

In [None]:
# Example Kaggle download
# !python scripts/kaggle_download.py --dataset ahemateja19bec1025/wildfiresmokedatasetyolo --out data/datasets/kaggle_smoke

## Option C) DFS / D-Fire / Boreal
For DFS (VOC) download XMLs+images, then convert to YOLO using the converter below.

In [None]:
# Convert Pascal VOC -> YOLO .txt (adjust paths)
# !python src/convert_voc2yolo.py --xml_dir data/datasets/DFS/Annotations --out_dir data/datasets/DFS/labels --classes smoke,fire,other

## Write `data.yaml`
Set your dataset root (`path`) and classes:

In [None]:
from pathlib import Path
data = {
    'path': 'data/datasets/roboflow_smoke',  # change to your dataset root
    'train': 'images/train',
    'val': 'images/valid',
    'test': 'images/test',  # optional
    'nc': 1,
    'names': ['smoke'],
}
Path('data').mkdir(exist_ok=True)
with open('data/data.yaml', 'w') as f:
    import yaml
    yaml.safe_dump(data, f, sort_keys=False)
print('Wrote data/data.yaml\n', yaml.safe_dump(data, sort_keys=False))