
# 2.2 — Từ yêu cầu dữ liệu đến thu thập dữ liệu

**Yêu cầu dữ liệu(cấu trúc tối thiểu)**
- `date` (YYYY‑MM‑DD)
- `temp_max`, `temp_min` (nhiệt độ cao nhất/thấp nhất trong ngày, °C)
- `precipitation_sum` (tổng lượng mưa, mm/day)
- `wind_speed_10m_max` (tốc độ gió ở độ cao 10m, km/h)
- `city`, `lat`, `lon` (toạ độ)

Nhóm sau mỗi lần thu thập dữ liệu thô, sẽ lưu ở thư mục data/raw/ để dễ dàng truy vết giữa mỗi lần chạy.


In [24]:
# Tuỳ chỉnh thông số
CITY = "London"  # Tên thành phố
DAYS = 7          # Số ngày dự báo
RAW_DIR = "../data/raw"  # Thư mục lưu dữ liệu thô

# city_slug + RAW_CITY_DIR
import re, pathlib
city_slug = re.sub(r"[^a-z0-9]+", "-", CITY.lower()).strip("-")
RAW_CITY_DIR = f"{RAW_DIR}/{city_slug}"
pathlib.Path(RAW_CITY_DIR).mkdir(parents=True, exist_ok=True)

print("City:", CITY, "| Horizon (days):", DAYS)
print("Raw city dir:", RAW_CITY_DIR)

City: London | Horizon (days): 7
Raw city dir: ../data/raw/london


In [25]:
# --- Persist config for other notebooks ---
import json, os
CONFIG_PATH = "../.lab2_config.json"

cfg = {
    "CITY": CITY,
    "city_slug": city_slug,
    "RAW_DIR": RAW_DIR,
    "RAW_CITY_DIR": RAW_CITY_DIR,
    "DAYS": DAYS
}
with open(CONFIG_PATH, "w", encoding="utf-8") as f:
    json.dump(cfg, f, ensure_ascii=False, indent=2)

print("Saved active config ->", os.path.abspath(CONFIG_PATH))


Saved active config -> c:\Users\Letrongvangggg\Desktop\Lab2_ADY201m\.lab2_config.json


In [26]:
import os, json, datetime, pandas as pd, pathlib, sys

# ĐÃ tạo RAW_CITY_DIR ở cell trước

def fetch_open_meteo(city: str, days: int) -> pd.DataFrame:
    import requests, pandas as pd
    try:
        geo = requests.get(
            "https://geocoding-api.open-meteo.com/v1/search",
            params={"name": city, "count": 1, "language": "en", "format": "json"},
            timeout=10
        ).json()
        if not geo.get("results"):
            raise RuntimeError("City not found in geocoding API")
        lat, lon = geo["results"][0]["latitude"], geo["results"][0]["longitude"]

        fc = requests.get(
            "https://api.open-meteo.com/v1/forecast",
            params={
                "latitude": lat,
                "longitude": lon,
                "daily": [
                    "temperature_2m_max","temperature_2m_min",
                    "precipitation_sum","wind_speed_10m_max"
                ],
                "forecast_days": days,
                "timezone": "auto"
            },
            timeout=10
        ).json()
        daily = fc["daily"]
        df = pd.DataFrame({
            "date": daily["time"],
            "temp_max": daily["temperature_2m_max"],
            "temp_min": daily["temperature_2m_min"],
            "precipitation_sum": daily["precipitation_sum"],
            "wind_speed_10m_max": daily["wind_speed_10m_max"],
        })
        df["city"], df["lat"], df["lon"], df["source"] = city, lat, lon, "open-meteo"
        return df
    except Exception as e:
        print("Error fetching data:", e)
        sys.exit(1)

# Fetch
df_raw = fetch_open_meteo(CITY, DAYS)
df_raw.head()

Unnamed: 0,date,temp_max,temp_min,precipitation_sum,wind_speed_10m_max,city,lat,lon,source
0,2025-09-20,20.5,13.7,0.3,25.9,London,51.50853,-0.12574,open-meteo
1,2025-09-21,15.7,10.3,0.0,21.6,London,51.50853,-0.12574,open-meteo
2,2025-09-22,16.9,9.3,0.0,17.9,London,51.50853,-0.12574,open-meteo
3,2025-09-23,15.5,7.3,0.9,13.9,London,51.50853,-0.12574,open-meteo
4,2025-09-24,17.5,9.8,0.0,13.4,London,51.50853,-0.12574,open-meteo


In [27]:
from datetime import datetime
from pathlib import Path

Path(RAW_CITY_DIR).mkdir(parents=True, exist_ok=True)

dt = datetime.now().strftime("%Y%m%d_%H%M")
out_path = f"{RAW_CITY_DIR}/forecast_{city_slug}_{DAYS}d_{dt}.csv"  

df_raw.to_csv(out_path, index=False)
print("Saved snapshot ->", out_path)

Saved snapshot -> ../data/raw/london/forecast_london_7d_20250921_0312.csv



**Notes.**
- Luôn lưu timestamp để dễ dàng truy xuất sau này.
- Sẽ viết fallback trong tương lai nếu không có kết nối internet.
