# LSTMSeq2SeqAttn 60d — Ablation A

### 목적: Rainfall 원복 (누적) + Dropout 유지 → rainfall 차분 변환이 성능 하락 원인인지 검증

| 변경 항목 | v1 (원본) | v2 | 이 실험 (A) |
|----------|:---------:|:--:|:---------:|
| Rainfall | 누적 | 차분 | **누적 (v1 원복)** |
| Decoder Dropout | 없음 | 0.2 | 0.2 (v2 유지) |
| LayerNorm 순서 | LN→Drop→FC | Drop→LN→FC | Drop→LN→FC (v2) |
| EarlyStopping patience | 5 | 10 | 10 |
| Multi-run (N=3) | ✗ | ✓ | ✓ |
| pin_memory | ✗ | ✓ | ✓ |
| 평가 지표 | MAPE only | +SMAPE,macro | +SMAPE,macro |

0. flow 전처리 (IQR, Savgol)
1. weather 전처리 (**누적 (v1 원복)**)
2. flow & weather merge
3~5. sliding window → split → 정규화
6. Model (0.2 (v2 유지))
7~8. 학습 (patience=10, N=3) → 평가

In [1]:
from pathlib import Path
import numpy as np
import pandas as pd
import random, os, copy

N_RUNS = 3
SEEDS = [42, 123, 7]

def set_seed(seed):
    random.seed(seed)
    np.random.seed(seed)
    import torch
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    os.environ['PYTHONHASHSEED'] = str(seed)

print(f"Multi-run: {N_RUNS} runs, seeds={SEEDS}")

Multi-run: 3 runs, seeds=[42, 123, 7]


### Data Split 구조 (4계절 Block Sampling, 60일 x 4)

각 블록(60일)을 시간 순서대로 70 / 15 / 15로 분할 (블록 간 87분 gap으로 data leak 방지)

| 계절 | Train (~42일) | Val (~9일) | Test (~9일) |
|------|--------------|-------------|--------------|
| Winter | 23/12/01 ~ 24/01/11 | 24/01/11 ~ 24/01/20 | 24/01/20 ~ 24/01/29 |
| Spring | 24/03/15 ~ 24/04/25 | 24/04/25 ~ 24/05/04 | 24/05/04 ~ 24/05/13 |
| Summer | 23/07/01 ~ 23/08/11 | 23/08/11 ~ 23/08/20 | 23/08/20 ~ 23/08/29 |
| Fall | 23/09/15 ~ 23/10/26 | 23/10/26 ~ 23/11/04 | 23/11/04 ~ 23/11/13 |

**예측 구조:** 72분 입력 (10 features) -> 15분 예측 (flow value) x 4 rolling = 1시간

**모델:** LSTM Seq2Seq + **Bahdanau Attention** — 전체 Hidden Sequence 활용
- **기존 Cross-Attention 실패 원인:**
  - Query = h₇₂ (고정) → 모든 step이 동일한 가중치(1/72) → Attention 무효
- **Bahdanau Attention 해결:**
  - Query = **Decoder의 현재 step hidden state** (step마다 다름)
  - step 1의 Query ≠ step 15의 Query → step별로 다른 곳을 참조
- **Model B(Autoreg) 실패와의 차이:**
  - Model B: prev_pred 피드백 → error accumulation + Scheduled Sampling 필요
  - **이 모델: step_embed 입력 → error 격리 + Teacher forcing 불필요**

**Loss:** Step-weighted MSE (step 1=1.0 -> step 15=2.0)
**LR Scheduler:** ReduceLROnPlateau (factor=0.5, patience=3)

0. flow, weather 데이터 Load

In [2]:
BASE_DIR = Path.cwd().parent

# Flow raw data 로드 (J배수지, reservoir/10.csv)
flow_file = BASE_DIR / "data" / "rawdata" / "reservoir" / "10.csv"
df_flow = pd.read_csv(
    flow_file,
    header=None,
    usecols=[1, 2],
    names=['time', 'value']
).sort_values('time').reset_index(drop=True)

df_flow['time'] = pd.to_datetime(df_flow['time'], errors='coerce')
df_flow = df_flow.dropna(subset=['time'])
print(f"Flow 원본: {len(df_flow):,}개 ({df_flow['time'].min()} ~ {df_flow['time'].max()})")

# Weather 파일 경로
weather_file = BASE_DIR / "data" / "rawdata" / "weather"

Flow 원본: 943,434개 (2023-01-01 00:01:00 ~ 2024-10-17 17:19:00)


0-1. Flow 전처리
- IQR 이상치 + 음수 + 급변동 제거 → NaN
- Linear interpolation (양방향)
- Savitzky-Golay filter (window=51, polyorder=2)

In [3]:
from scipy.signal import savgol_filter

# 1. IQR 이상치 + 음수 + 급변동 → NaN 처리
Q1 = df_flow['value'].quantile(0.25)
Q3 = df_flow['value'].quantile(0.75)
IQR = Q3 - Q1

iqr_mask = (df_flow['value'] < Q1 - 1.5 * IQR) | (df_flow['value'] > Q3 + 1.5 * IQR)
negative_mask = df_flow['value'] < 0
diff = df_flow['value'].diff().abs()
spike_threshold = diff.quantile(0.999)
spike_mask = diff > spike_threshold

total_mask = iqr_mask | negative_mask | spike_mask
df_flow.loc[total_mask, 'value'] = np.nan

print(f"이상치 제거: IQR={iqr_mask.sum()}, 음수={negative_mask.sum()}, "
      f"급변동(>{spike_threshold:.2f})={spike_mask.sum()}, 총={total_mask.sum()}")

# 2. Linear interpolation (양방향) + clip
df_flow['value'] = df_flow['value'].interpolate(method='linear', limit_direction='both')
df_flow['value'] = df_flow['value'].clip(lower=0)

# 3. Savitzky-Golay filter (노이즈 제거, window=51, polyorder=2)
df_flow['value'] = savgol_filter(df_flow['value'], window_length=51, polyorder=2)
df_flow['value'] = df_flow['value'].clip(lower=0)

print(f"전처리 완료: {len(df_flow):,}개, "
      f"범위 [{df_flow['value'].min():.2f}, {df_flow['value'].max():.2f}]")

이상치 제거: IQR=21, 음수=0, 급변동(>91.06)=944, 총=962
전처리 완료: 943,434개, 범위 [0.00, 325.52]


In [4]:
# 전처리 결과 저장 (원본 보호 + 재현성)
save_path = BASE_DIR / "data" / "processed" / "flow_preprocessed.csv"
df_flow.to_csv(save_path, index=False)
print(f"저장 완료: {save_path} ({len(df_flow):,}행)")

저장 완료: /home/kp/web/work/pro/data/processed/flow_preprocessed.csv (943,434행)


- weather 데이터 인코딩, concat

In [5]:
def read_weather_csv(f):
    for enc in ["euc-kr", "cp949", "utf-8", "utf-8-sig"]:
        try:
            return pd.read_csv(f, encoding=enc)
        except (UnicodeDecodeError, UnicodeError):
            continue
    raise ValueError(f"인코딩 실패: {f.name}")

files = sorted(weather_file.glob("*.csv"))
print(f"읽을 파일 수: {len(files)}")

weather = pd.concat(
    [read_weather_csv(f) for f in files],
    ignore_index=True
)
print(f"concat 후 행 수: {len(weather):,}")

읽을 파일 수: 23
concat 후 행 수: 1,885,020


- 데이터 저장

In [6]:
# weather 전체 concat 결과 저장 (원본 보호)
weather.to_csv(BASE_DIR / "data" / "processed" / "weather_total_raw.csv", index=False, encoding="utf-8-sig")

1. weather 전처리 (★ Rainfall: 원본 누적 유지 — v1 동일)

In [7]:
weather['일시'] = pd.to_datetime(weather['일시'])

# 결측 처리: 강수량 0 채움, 기온/습도 linear 보간 (최대 60분 gap)
weather['0.5mm 일 누적 강수량(mm)'] = weather['0.5mm 일 누적 강수량(mm)'].fillna(0)
weather['기온(℃)'] = weather['기온(℃)'].interpolate(method='linear', limit=60)
weather['상대습도(%)'] = weather['상대습도(%)'].interpolate(method='linear', limit=60)

# 장기 결측(>60분) 행 제거
before = len(weather)
weather = weather.dropna(subset=['기온(℃)', '상대습도(%)'])
print(f'장기 결측 제거: {before:,} -> {len(weather):,} ({len(weather)/before*100:.1f}% 유지)')

weather = weather.rename(columns={
    '일시': 'datetime',
    '기온(℃)': 'temperature',
    '0.5mm 일 누적 강수량(mm)': 'rainfall',
    '상대습도(%)': 'humidity',
})

# 중복 timestamp 제거
weather = weather.drop_duplicates(subset='datetime', keep='first')
weather = weather.sort_values('datetime').reset_index(drop=True)

# 시간 불연속 경계 표시
time_diff = weather['datetime'].diff()
seg_boundary = time_diff > pd.Timedelta(minutes=1)
weather['segment_id'] = seg_boundary.cumsum()
print(f'연속 세그먼트 수: {weather["segment_id"].nunique()}')
print(f"Rainfall: 원본 누적 강수량 유지 (v1 동일)")
print(weather.shape)
print(weather.head())

장기 결측 제거: 1,885,020 -> 1,788,078 (94.9% 유지)
연속 세그먼트 수: 3564
Rainfall: 원본 누적 강수량 유지 (v1 동일)
(894039, 5)
             datetime  temperature  rainfall  humidity  segment_id
0 2023-01-01 00:01:00         -3.3       0.0      91.6           0
1 2023-01-01 00:02:00         -3.3       0.0      91.6           0
2 2023-01-01 00:03:00         -3.3       0.0      91.6           0
3 2023-01-01 00:04:00         -3.2       0.0      91.6           0
4 2023-01-01 00:05:00         -3.2       0.0      91.6           0


In [8]:
weather.describe()

Unnamed: 0,datetime,temperature,rainfall,humidity,segment_id
count,894039,894039.0,894039.0,894039.0,894039.0
mean,2023-12-09 07:18:45.749011,15.358957,1.922265,71.807508,1596.019969
min,2023-01-01 00:01:00,-16.5,0.0,2.9,0.0
25%,2023-07-03 09:58:30,6.3,0.0,57.2,465.0
50%,2023-12-13 20:09:00,16.5,0.0,76.0,1066.0
75%,2024-05-25 11:14:30,24.7,0.0,90.3,3267.0
max,2024-11-01 00:00:00,37.7,160.0,99.9,3563.0
std,,10.908432,8.737307,21.188111,1320.07592


In [9]:
df = pd.DataFrame(weather)
df.isnull().sum()

datetime       0
temperature    0
rainfall       0
humidity       0
segment_id     0
dtype: int64

- rainfall: 일 누적 강우량 → 원본 유지 (v1 동일) MinMaxScaler만 적용

In [10]:
df_weather = df[['datetime', 'temperature', 'rainfall', 'humidity', 'segment_id']].copy()

2-1. weather data merge
2-2. weather, time feature 추가

In [11]:
df_merged = pd.merge(df_flow, df_weather, how='inner', left_on='time', right_on="datetime")

In [12]:
df_merged = df_merged.drop(columns=['datetime'])

# 1분 단위 유지 (리샘플링 없음)
df_merged = df_merged.sort_values('time').reset_index(drop=True)
df_merged = df_merged.dropna()
print(f"데이터: {len(df_merged):,}개 (1분 단위)")

# ★ merge 후 시간 불연속 경계 재계산 (weather segment_id 대체)
# weather의 segment_id는 weather 단독 기준이므로,
# flow+weather inner merge 후 실제 불연속을 반영해야 정확함
time_diff_merged = df_merged['time'].diff()
seg_boundary_merged = time_diff_merged > pd.Timedelta(minutes=1)
df_merged['segment_id'] = seg_boundary_merged.cumsum()
print(f"merge 후 연속 세그먼트 수: {df_merged['segment_id'].nunique()}")

# Cyclical temporal features
t: pd.Series = df_merged['time']

# int -> float cast for arithmetic with np.pi
hour = t.dt.hour.astype(np.float64)
minute = t.dt.minute.astype(np.float64)
dow = t.dt.dayofweek.astype(np.float64)
doy = t.dt.dayofyear.astype(np.float64)

# 시간정보 (분 단위 하루 주기, T=1440)
minute_of_day = hour * 60 + minute
df_merged['time_sin'] = 0.5 * np.sin(2 * np.pi * minute_of_day / 1440) + 0.5
df_merged['time_cos'] = 0.5 * np.cos(2 * np.pi * minute_of_day / 1440) + 0.5

# 요일 (주간 주기, T=7)
df_merged['dow_sin'] = 0.5 * np.sin(2 * np.pi * dow / 7) + 0.5
df_merged['dow_cos'] = 0.5 * np.cos(2 * np.pi * dow / 7) + 0.5

# 계절 (연간 주기, T=365.25)
df_merged['season_sin'] = 0.5 * np.sin(2 * np.pi * doy / 365.25) + 0.5
df_merged['season_cos'] = 0.5 * np.cos(2 * np.pi * doy / 365.25) + 0.5

print(f"Temporal features 추가 완료: {df_merged.shape}")
print(df_merged.head())

데이터: 872,723개 (1분 단위)
merge 후 연속 세그먼트 수: 4158
Temporal features 추가 완료: (872723, 12)
                 time      value  temperature  rainfall  humidity  segment_id  \
0 2023-01-01 00:01:00  96.577302         -3.3       0.0      91.6           0   
1 2023-01-01 00:02:00  96.739744         -3.3       0.0      91.6           0   
2 2023-01-01 00:03:00  96.895855         -3.3       0.0      91.6           0   
3 2023-01-01 00:04:00  97.045636         -3.2       0.0      91.6           0   
4 2023-01-01 00:05:00  97.189086         -3.2       0.0      91.6           0   

   time_sin  time_cos   dow_sin   dow_cos  season_sin  season_cos  
0  0.502182  0.999995  0.109084  0.811745    0.508601    0.999926  
1  0.504363  0.999981  0.109084  0.811745    0.508601    0.999926  
2  0.506545  0.999957  0.109084  0.811745    0.508601    0.999926  
3  0.508726  0.999924  0.109084  0.811745    0.508601    0.999926  
4  0.510907  0.999881  0.109084  0.811745    0.508601    0.999926  


In [13]:
df_merged.shape, df_merged.info()

<class 'pandas.DataFrame'>
RangeIndex: 872723 entries, 0 to 872722
Data columns (total 12 columns):
 #   Column       Non-Null Count   Dtype         
---  ------       --------------   -----         
 0   time         872723 non-null  datetime64[us]
 1   value        872723 non-null  float64       
 2   temperature  872723 non-null  float64       
 3   rainfall     872723 non-null  float64       
 4   humidity     872723 non-null  float64       
 5   segment_id   872723 non-null  int64         
 6   time_sin     872723 non-null  float64       
 7   time_cos     872723 non-null  float64       
 8   dow_sin      872723 non-null  float64       
 9   dow_cos      872723 non-null  float64       
 10  season_sin   872723 non-null  float64       
 11  season_cos   872723 non-null  float64       
dtypes: datetime64[us](1), float64(10), int64(1)
memory usage: 79.9 MB


((872723, 12), None)

In [14]:
df_merged.describe()

Unnamed: 0,time,value,temperature,rainfall,humidity,segment_id,time_sin,time_cos,dow_sin,dow_cos,season_sin,season_cos
count,872723,872723.0,872723.0,872723.0,872723.0,872723.0,872723.0,872723.0,872723.0,872723.0,872723.0,872723.0
mean,2023-12-01 18:39:20.141650,151.78657,15.354982,1.932336,71.595445,1853.432576,0.500568,0.498633,0.49676,0.498728,0.5197144,0.470201
min,2023-01-01 00:01:00,0.0,-16.5,0.0,2.9,0.0,0.0,0.0,0.012536,0.049516,2.889877e-07,1e-05
25%,2023-06-27 11:00:30,109.030874,6.1,0.0,56.9,613.0,0.146447,0.144907,0.109084,0.049516,0.1605898,0.1285
50%,2023-12-06 15:46:00,144.881767,16.5,0.0,75.7,1311.0,0.502182,0.497818,0.5,0.38874,0.5600631,0.457583
75%,2024-05-14 15:26:30,192.717293,24.8,0.0,90.0,3509.0,0.853553,0.852007,0.890916,0.811745,0.8732941,0.812944
max,2024-10-17 17:19:00,325.516638,37.7,160.0,99.9,4157.0,1.0,0.999995,0.987464,1.0,0.9999928,0.999926
std,,56.898401,11.013743,8.825355,21.235732,1468.650283,0.353499,0.353606,0.352849,0.354239,0.3565895,0.348665


In [15]:
# 2-3. 계절별 Block Sampling (60일 × 4계절, 2023-2024 교차)
# 45일 대비 데이터 확대 효과 검증 (단일 변인: 기간만 변경)

season_ranges = [
    ('2023-12-01', '2024-01-29', 'Winter'),
    ('2024-03-15', '2024-05-13', 'Spring'),
    ('2023-07-01', '2023-08-29', 'Summer'),
    ('2023-09-15', '2023-11-13', 'Fall'),
]

blocks = []
for i, (start, end, name) in enumerate(season_ranges):
    mask = (df_merged['time'] >= start) & (df_merged['time'] <= end)
    block = df_merged[mask].copy()
    block['block_id'] = i
    blocks.append(block)
    print(f"[{name}] {start} ~ {end}: {len(block):,} rows")

df_merged = pd.concat(blocks, ignore_index=True)
print(f"\nTotal: {len(df_merged):,} rows (4-season block sampling, 60일 × 4)")

[Winter] 2023-12-01 ~ 2024-01-29: 80,828 rows
[Spring] 2024-03-15 ~ 2024-05-13: 78,794 rows
[Summer] 2023-07-01 ~ 2023-08-29: 83,256 rows
[Fall] 2023-09-15 ~ 2023-11-13: 80,430 rows

Total: 323,308 rows (4-season block sampling, 60일 × 4)


3. Sliding Windows 생성
- X: (n_samples, input_time, 10) — value, temperature, rainfall, humidity + 6 cyclical features
- y: (n_samples, output_time) — value만 예측

In [16]:
feature_cols = ['value', 'temperature', 'rainfall', 'humidity',
                'time_sin', 'time_cos', 'dow_sin', 'dow_cos',
                'season_sin', 'season_cos']
scale_cols = ['value', 'temperature', 'rainfall', 'humidity']  # MinMaxScaler target (sin/cos excluded)
target_col = 'value'
input_time = 72    # 72 steps × 1min = 72min
output_time = 15   # 15 steps × 1min = 15min (× 4 rolling = 1h)

print(f"=== Settings ===")
print(f"Input: {input_time} steps ({input_time}min = {input_time/60:.1f}h), {len(feature_cols)} features")
print(f"Output: {output_time} steps ({output_time}min) × 4 rolling = {output_time*4}min")


=== Settings ===
Input: 72 steps (72min = 1.2h), 10 features
Output: 15 steps (15min) × 4 rolling = 60min


4. Train / Val / Test Split (0.7 / 0.15 / 0.15)
- Split 경계에 gap 적용 → sliding window 겹침(data leak) 방지

In [17]:
# Sliding window 생성 + Train/Val/Test split (block × segment 독립)
# ★ 수정: segment_id 경계를 넘는 window 생성 방지
# split 경계에 gap 적용 → sliding window 겹침(data leak) 방지

train_ratio, val_ratio = 0.7, 0.15
split_gap = input_time + output_time  # 87 steps gap (data leak 방지)
min_segment_len = input_time + output_time  # 세그먼트 최소 길이

X_train_list, y_train_list = [], []
X_val_list, y_val_list = [], []
X_test_list, y_test_list = [], []
test_times_list = []

skipped_segments = 0
used_segments = 0

for block_id in sorted(df_merged['block_id'].unique()):
    block = df_merged[df_merged['block_id'] == block_id]
    
    # ★ block 내 segment별로 sliding window 생성
    seg_X, seg_y, seg_times = [], [], []
    
    for seg_id in sorted(block['segment_id'].unique()):
        segment = block[block['segment_id'] == seg_id].reset_index(drop=True)
        
        if len(segment) < min_segment_len:
            skipped_segments += 1
            continue  # 너무 짧은 세그먼트 스킵
        
        used_segments += 1
        seg_features = segment[feature_cols].values.astype(np.float32)
        seg_target = segment[target_col].values.astype(np.float32)
        seg_time = segment['time'].values
        
        n_samples = len(segment) - input_time - output_time + 1
        for i in range(n_samples):
            seg_X.append(seg_features[i : i + input_time])
            seg_y.append(seg_target[i + input_time : i + input_time + output_time])
            seg_times.append(seg_time[i + input_time])
    
    if len(seg_X) == 0:
        print(f"Block {block_id}: 유효 윈도우 없음 (스킵)")
        continue
    
    X_block = np.array(seg_X)
    y_block = np.array(seg_y)
    times_block = np.array(seg_times)
    n_samples = len(X_block)

    # gap을 두어 Train/Val/Test 간 window 겹침 방지
    t_end = int(n_samples * train_ratio)
    v_start = t_end + split_gap
    v_end = v_start + int(n_samples * val_ratio)
    test_start = v_end + split_gap

    X_train_list.append(X_block[:t_end])
    y_train_list.append(y_block[:t_end])
    X_val_list.append(X_block[v_start:v_end])
    y_val_list.append(y_block[v_start:v_end])
    X_test_list.append(X_block[test_start:])
    y_test_list.append(y_block[test_start:])

    for i in range(test_start, n_samples):
        test_times_list.append(times_block[i])

    n_train = t_end
    n_val = v_end - v_start
    n_test = n_samples - test_start
    print(f"Block {block_id}: {n_samples:,} samples "
          f"(train={n_train:,} / val={n_val:,} / test={n_test:,}) "
          f"gap={split_gap}")

X_train = np.concatenate(X_train_list)
y_train = np.concatenate(y_train_list)
X_val = np.concatenate(X_val_list)
y_val = np.concatenate(y_val_list)
X_test = np.concatenate(X_test_list)
y_test = np.concatenate(y_test_list)
test_times = np.array(test_times_list)

n_total = len(X_train) + len(X_val) + len(X_test)
print(f"\n=== Sliding Window + Split (Seasonal Blocks × Segments, gap={split_gap}) ===")
print(f"★ Segment-aware: {used_segments}개 세그먼트 사용, {skipped_segments}개 스킵 (< {min_segment_len} steps)")
print(f"X shape: ({input_time}, {len(feature_cols)}), y shape: ({output_time},)")
print(f"Train: {len(X_train):,} | Val: {len(X_val):,} | Test: {len(X_test):,} | Total: {n_total:,}")
print(f"Memory: X_train {X_train.nbytes/1e6:.1f} MB, y_train {y_train.nbytes/1e6:.1f} MB")

Block 0: 41,514 samples (train=29,059 / val=6,227 / test=6,054) gap=87
Block 1: 55,304 samples (train=38,712 / val=8,295 / test=8,123) gap=87
Block 2: 76,315 samples (train=53,420 / val=11,447 / test=11,274) gap=87
Block 3: 67,289 samples (train=47,102 / val=10,093 / test=9,920) gap=87

=== Sliding Window + Split (Seasonal Blocks × Segments, gap=87) ===
★ Segment-aware: 603개 세그먼트 사용, 1557개 스킵 (< 87 steps)
X shape: (72, 10), y shape: (15,)
Train: 168,293 | Val: 36,062 | Test: 35,371 | Total: 239,726
Memory: X_train 484.7 MB, y_train 10.1 MB


5. 정규화 (Train 기준)
- X: feature별 개별 MinMaxScaler (value, temperature, rainfall, humidity)
- y: value scaler로 정규화

In [18]:
from sklearn.preprocessing import MinMaxScaler

n_features = len(feature_cols)

# Train 데이터로만 scaler fit (scale_cols만)
scalers = {}
for col in scale_cols:
    i = feature_cols.index(col)
    scaler = MinMaxScaler()
    scaler.fit(X_train[:, :, i].reshape(-1, 1))
    scalers[col] = scaler

# X 정규화 (scale_cols만, sin/cos 제외)
def normalize_X(arr):
    arr = arr.copy()
    for col in scale_cols:
        i = feature_cols.index(col)
        s = scalers[col]
        d_min, d_max = np.float32(s.data_min_[0]), np.float32(s.data_max_[0])
        arr[:, :, i] = (arr[:, :, i] - d_min) / (d_max - d_min)
    return arr

X_train_scaled = normalize_X(X_train)
X_val_scaled = normalize_X(X_val)
X_test_scaled = normalize_X(X_test)

# y 정규화 (value scaler 사용)
val_min = np.float32(scalers['value'].data_min_[0])
val_max = np.float32(scalers['value'].data_max_[0])

def normalize_y(arr):
    return (arr - val_min) / (val_max - val_min)

def denormalize_y(arr):
    return arr * (val_max - val_min) + val_min

y_train_scaled = normalize_y(y_train)
y_val_scaled = normalize_y(y_val)
y_test_scaled = normalize_y(y_test)

print(f"Train 기준 Scaler 범위 (scale_cols만):")
for col in scale_cols:
    s = scalers[col]
    print(f"  {col:>12s}: [{s.data_min_[0]:.2f}, {s.data_max_[0]:.2f}]")
print(f"\nsin/cos features ({n_features - len(scale_cols)}개): 정규화 없이 원본 유지 [0, 1]")

Train 기준 Scaler 범위 (scale_cols만):
         value: [0.00, 310.26]
   temperature: [-11.70, 37.70]
      rainfall: [0.00, 55.50]
      humidity: [4.80, 99.90]

sin/cos features (6개): 정규화 없이 원본 유지 [0, 1]


- Tensor 변환 & DataLoader (pin_memory)

In [19]:
import torch
from torch.utils.data import DataLoader, TensorDataset

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Device: {device}")

X_train_tensor = torch.FloatTensor(X_train_scaled)
y_train_tensor = torch.FloatTensor(y_train_scaled)
X_val_tensor = torch.FloatTensor(X_val_scaled)
y_val_tensor = torch.FloatTensor(y_val_scaled)
X_test_tensor = torch.FloatTensor(X_test_scaled)
y_test_tensor = torch.FloatTensor(y_test_scaled)

train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
val_dataset = TensorDataset(X_val_tensor, y_val_tensor)
test_dataset = TensorDataset(X_test_tensor, y_test_tensor)

batch_size = 256
use_pin = (device.type == 'cuda')

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True,
                          drop_last=True, pin_memory=use_pin, num_workers=0)
val_loader = DataLoader(val_dataset, batch_size=512, shuffle=False,
                        pin_memory=use_pin, num_workers=0)
test_loader = DataLoader(test_dataset, batch_size=512, shuffle=False,
                         pin_memory=use_pin, num_workers=0)

print(f"Train: {len(train_loader)} batches | Val: {len(val_loader)} | Test: {len(test_loader)}")

Device: cuda
Train: 657 batches | Val: 71 | Test: 70


## 6. Model (A: dropout=0.2 (v2 유지), LN순서=Drop→LN→FC (v2))

In [20]:
import torch.nn as nn

class LSTMSeq2SeqAttnModel(nn.Module):
    """v2: Decoder dropout=0.2, Drop→LN→FC 순서"""
    def __init__(self, input_size=10, hidden_size=128, num_layers=2,
                 output_size=15, embed_dim=16, dropout=0.2):
        super().__init__()
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.num_layers = num_layers

        self.encoder = nn.LSTM(
            input_size=input_size, hidden_size=hidden_size,
            num_layers=num_layers, batch_first=True,
            dropout=dropout if num_layers > 1 else 0
        )
        self.step_embedding = nn.Embedding(output_size, embed_dim)

        self.attn_Wh = nn.Linear(hidden_size, hidden_size, bias=False)
        self.attn_Ws = nn.Linear(hidden_size, hidden_size, bias=False)
        self.attn_V = nn.Linear(hidden_size, 1, bias=False)

        self.decoder = nn.LSTMCell(
            input_size=hidden_size + embed_dim, hidden_size=hidden_size
        )
        # v2: Dropout → LayerNorm → FC
        self.dec_dropout = nn.Dropout(dropout)
        self.layer_norm = nn.LayerNorm(hidden_size)
        self.fc_out = nn.Linear(hidden_size, 1)

    def forward(self, x):
        batch_size = x.size(0)
        enc_outputs, (h_n, c_n) = self.encoder(x)
        enc_keys_projected = self.attn_Ws(enc_outputs)

        h_dec = h_n[-1]
        c_dec = c_n[-1]

        predictions = []
        self.attn_weights_all = []
        step_ids = torch.arange(self.output_size, device=x.device)
        step_embs = self.step_embedding(step_ids)

        for t in range(self.output_size):
            query = self.attn_Wh(h_dec).unsqueeze(1)
            energy = torch.tanh(query + enc_keys_projected)
            score = self.attn_V(energy).squeeze(-1)
            attn_weights = torch.softmax(score, dim=1)
            context = torch.bmm(attn_weights.unsqueeze(1), enc_outputs).squeeze(1)
            self.attn_weights_all.append(attn_weights.detach())

            step_emb = step_embs[t].unsqueeze(0).expand(batch_size, -1)
            dec_input = torch.cat([context, step_emb], dim=1)
            h_dec, c_dec = self.decoder(dec_input, (h_dec, c_dec))

            # v2 순서: Dropout → LayerNorm → FC
            h_out = self.dec_dropout(h_dec)
            h_out = self.layer_norm(h_out)
            pred_t = self.fc_out(h_out)
            predictions.append(pred_t)

        predictions = torch.cat(predictions, dim=1)
        self.attn_weights_all = torch.stack(self.attn_weights_all, dim=1)
        return predictions

In [21]:
model_name = "Ablation_A"
model_check = LSTMSeq2SeqAttnModel(
    input_size=n_features, hidden_size=128, num_layers=2,
    output_size=output_time, embed_dim=16, dropout=0.2,
)
total_p = sum(p.numel() for p in model_check.parameters())
print(f"Parameters: {total_p:,}")
del model_check

Parameters: 377,585


7. Early Stopping

In [22]:
class EarlyStopping:
    def __init__(self, patience=5, min_delta=1e-5, verbose=True):
        self.patience = patience
        self.min_delta = min_delta
        self.verbose = verbose
        self.counter = 0
        self.best_loss: float | None = None
        self.early_stop = False
        self.best_model: dict[str, torch.Tensor] | None = None

    def __call__(self, val_loss, model):
        if self.best_loss is None:
            self.best_loss = val_loss
            self.save_checkpoint(model)

        elif val_loss < self.best_loss - self.min_delta:
            self.best_loss = val_loss
            self.save_checkpoint(model)
            self.counter = 0
            
        else:
            self.counter += 1
            if self.verbose:
                print(f'EarlyStopping counter: {self.counter}/{self.patience}')
            if self.counter >= self.patience:
                self.early_stop = True

    def save_checkpoint(self, model):
        if self.verbose:
            print(f'Validation loss decreased ({self.best_loss:.6f}). Saving model...')
        self.best_model = model.state_dict().copy()

## 8. 학습 (Multi-Run, patience=10)

In [23]:
import torch.optim as optim
import time

num_epochs = 100
learning_rate = 0.001
es_patience = 10

step_weights = torch.linspace(1.0, 2.0, output_time).to(device)

def weighted_mse_loss(pred, target):
    return (step_weights * (pred - target) ** 2).mean()

criterion = weighted_mse_loss

In [24]:
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

def calc_mape(y_true, y_pred):
    mask = y_true != 0
    return np.mean(np.abs((y_true[mask] - y_pred[mask]) / y_true[mask])) * 100

def calc_smape(y_true, y_pred):
    denom = (np.abs(y_true) + np.abs(y_pred)) / 2
    mask = denom > 0
    return np.mean(np.abs(y_true[mask] - y_pred[mask]) / denom[mask]) * 100

all_run_results = []
all_run_models = []
all_run_losses = []

for run_idx, seed in enumerate(SEEDS[:N_RUNS]):
    set_seed(seed)
    print(f"\n{'='*60}")
    print(f"Run {run_idx+1}/{N_RUNS} (seed={seed})")
    print(f"{'='*60}")
    
    model = LSTMSeq2SeqAttnModel(
        input_size=n_features, hidden_size=128, num_layers=2,
        output_size=output_time, embed_dim=16, dropout=0.2,
    ).to(device)
    
    optimizer = optim.Adam(model.parameters(), lr=learning_rate, weight_decay=1e-5)
    scheduler = optim.lr_scheduler.ReduceLROnPlateau(
        optimizer, mode='min', factor=0.5, patience=3
    )
    early_stopping = EarlyStopping(patience=es_patience, verbose=False)
    
    train_losses, val_losses = [], []
    
    for epoch in range(num_epochs):
        model.train()
        train_loss_epoch = 0.0
        for batch_X, targets in train_loader:
            batch_X, targets = batch_X.to(device), targets.to(device)
            outputs = model(batch_X)
            loss = criterion(outputs, targets)
            optimizer.zero_grad()
            loss.backward()
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
            optimizer.step()
            train_loss_epoch += loss.item()
        
        avg_train_loss = train_loss_epoch / len(train_loader)
        train_losses.append(avg_train_loss)
        
        model.eval()
        val_loss_epoch = 0.0
        with torch.no_grad():
            for batch_X, targets in val_loader:
                batch_X, targets = batch_X.to(device), targets.to(device)
                outputs = model(batch_X)
                loss = criterion(outputs, targets)
                val_loss_epoch += loss.item()
        
        avg_val_loss = val_loss_epoch / len(val_loader)
        val_losses.append(avg_val_loss)
        
        scheduler.step(avg_val_loss)
        current_lr = optimizer.param_groups[0]['lr']
        
        if (epoch + 1) % 10 == 0:
            print(f'  Epoch [{epoch+1:3d}/{num_epochs}] '
                  f'Train: {avg_train_loss:.6f} '
                  f'Val: {avg_val_loss:.6f} '
                  f'LR: {current_lr:.6f}')
        
        early_stopping(avg_val_loss, model)
        if early_stopping.early_stop:
            print(f"  Early Stopping at Epoch {epoch+1} "
                  f"(Best Val Loss: {early_stopping.best_loss:.6f})")
            break
    
    if early_stopping.best_model is not None:
        model.load_state_dict(early_stopping.best_model)
    
    all_run_models.append(copy.deepcopy(model.state_dict()))
    all_run_losses.append({'train': train_losses, 'val': val_losses})
    
    model.eval()
    test_preds_list, test_actuals_list = [], []
    with torch.no_grad():
        for batch_X, batch_y in test_loader:
            batch_X = batch_X.to(device)
            outputs = model(batch_X)
            test_preds_list.append(outputs.cpu().numpy())
            test_actuals_list.append(batch_y.numpy())
    
    run_preds = denormalize_y(np.vstack(test_preds_list))
    run_actuals = denormalize_y(np.vstack(test_actuals_list))
    
    run_result = {
        'seed': seed,
        'epoch': len(train_losses),
        'mape': calc_mape(run_actuals.flatten(), run_preds.flatten()),
        'smape': calc_smape(run_actuals.flatten(), run_preds.flatten()),
        'rmse': np.sqrt(mean_squared_error(run_actuals, run_preds)),
        'mae': mean_absolute_error(run_actuals, run_preds),
        'r2': r2_score(run_actuals.flatten(), run_preds.flatten()),
        'bias': float(np.mean(run_actuals - run_preds)),
        'predictions': run_preds,
        'actuals': run_actuals,
    }
    
    step_mapes = []
    for s in range(output_time):
        step_mapes.append(calc_mape(run_actuals[:, s], run_preds[:, s]))
    run_result['step_mapes'] = step_mapes
    
    season_mapes = []
    offset = 0
    for size in [len(arr) for arr in X_test_list]:
        block_a = run_actuals[offset:offset+size]
        block_p = run_preds[offset:offset+size]
        season_mapes.append(calc_mape(block_a.flatten(), block_p.flatten()))
        offset += size
    run_result['season_mapes'] = season_mapes
    run_result['macro_mape'] = np.mean(season_mapes)
    
    all_run_results.append(run_result)
    
    print(f"  → MAPE: {run_result['mape']:.2f}% | SMAPE: {run_result['smape']:.2f}% "
          f"| R²: {run_result['r2']:.4f} | Bias: {run_result['bias']:+.2f} "
          f"| Epoch: {run_result['epoch']}")

print(f"\n{'='*60}")
print("All runs completed")
print(f"{'='*60}")


Run 1/3 (seed=42)
  Epoch [ 10/100] Train: 0.001518 Val: 0.002994 LR: 0.001000
  Epoch [ 20/100] Train: 0.001220 Val: 0.002109 LR: 0.001000
  Early Stopping at Epoch 28 (Best Val Loss: 0.001377)
  → MAPE: 5.85% | SMAPE: 5.97% | R²: 0.9673 | Bias: -1.26 | Epoch: 28

Run 2/3 (seed=123)
  Epoch [ 10/100] Train: 0.001638 Val: 0.002680 LR: 0.001000
  Epoch [ 20/100] Train: 0.001327 Val: 0.001864 LR: 0.000500
  Epoch [ 30/100] Train: 0.001088 Val: 0.001422 LR: 0.000250
  Epoch [ 40/100] Train: 0.000993 Val: 0.001460 LR: 0.000063
  Early Stopping at Epoch 40 (Best Val Loss: 0.001422)
  → MAPE: 5.91% | SMAPE: 6.16% | R²: 0.9725 | Bias: +1.44 | Epoch: 40

Run 3/3 (seed=7)
  Epoch [ 10/100] Train: 0.001820 Val: 0.002894 LR: 0.001000
  Epoch [ 20/100] Train: 0.001182 Val: 0.001386 LR: 0.000500
  Epoch [ 30/100] Train: 0.000962 Val: 0.001391 LR: 0.000125
  Early Stopping at Epoch 34 (Best Val Loss: 0.001334)
  → MAPE: 6.02% | SMAPE: 6.26% | R²: 0.9716 | Bias: +1.07 | Epoch: 34

All runs completed

### A 결과 요약

In [25]:
metric_labels = {
    'mape': 'MAPE (%)', 'smape': 'SMAPE (%)', 'rmse': 'RMSE',
    'mae': 'MAE', 'r2': 'R²', 'bias': 'Bias', 'macro_mape': 'Macro MAPE (%)'
}

print("=" * 70)
print(f"Multi-Run Summary ({N_RUNS} runs)")
print("=" * 70)

metrics = ['mape', 'smape', 'rmse', 'mae', 'r2', 'bias', 'macro_mape']
print(f"\n{'Metric':<16s}  {'Mean':>8s}  {'Std':>8s}  {'Min':>8s}  {'Max':>8s}")
print("-" * 55)
for m in metrics:
    vals = [r[m] for r in all_run_results]
    print(f"{metric_labels[m]:<16s}  {np.mean(vals):8.4f}  {np.std(vals):8.4f}  "
          f"{np.min(vals):8.4f}  {np.max(vals):8.4f}")

print(f"\n각 Run 상세:")
for r in all_run_results:
    print(f"  seed={r['seed']:3d}: MAPE={r['mape']:.2f}% SMAPE={r['smape']:.2f}% "
          f"R²={r['r2']:.4f} Bias={r['bias']:+.2f} Epoch={r['epoch']}")

print(f"\nStep별 MAPE (mean ± std):")
for s in range(output_time):
    vals = [r['step_mapes'][s] for r in all_run_results]
    print(f"  Step {s+1:2d}: {np.mean(vals):.2f}% ± {np.std(vals):.2f}%")

season_names_list = ['Winter', 'Spring', 'Summer', 'Fall']
print(f"\n계절별 MAPE (mean ± std):")
for i, name in enumerate(season_names_list):
    vals = [r['season_mapes'][i] for r in all_run_results]
    print(f"  {name:8s}: {np.mean(vals):.2f}% ± {np.std(vals):.2f}%")

best_idx = np.argmin([r['mape'] for r in all_run_results])
best_run = all_run_results[best_idx]
print(f"\n★ Best Run: seed={best_run['seed']} (MAPE={best_run['mape']:.2f}%)")

Multi-Run Summary (3 runs)

Metric                Mean       Std       Min       Max
-------------------------------------------------------
MAPE (%)            5.9236    0.0717    5.8453    6.0185
SMAPE (%)           6.1301    0.1170    5.9747    6.2572
RMSE                9.7197    0.3665    9.3905   10.2310
MAE                 7.6494    0.2662    7.3586    8.0018
R²                  0.9705    0.0023    0.9673    0.9725
Bias                0.4182    1.1990   -1.2639    1.4447
Macro MAPE (%)      5.8722    0.0634    5.8090    5.9589

각 Run 상세:
  seed= 42: MAPE=5.85% SMAPE=5.97% R²=0.9673 Bias=-1.26 Epoch=28
  seed=123: MAPE=5.91% SMAPE=6.16% R²=0.9725 Bias=+1.44 Epoch=40
  seed=  7: MAPE=6.02% SMAPE=6.26% R²=0.9716 Bias=+1.07 Epoch=34

Step별 MAPE (mean ± std):
  Step  1: 4.57% ± 0.26%
  Step  2: 4.87% ± 0.28%
  Step  3: 5.06% ± 0.18%
  Step  4: 4.99% ± 0.19%
  Step  5: 5.06% ± 0.15%
  Step  6: 5.21% ± 0.10%
  Step  7: 5.41% ± 0.05%
  Step  8: 5.65% ± 0.01%
  Step  9: 5.92% ± 0.04%
  S