- 가장 가까운 AWS 세곳 정하고 거리 가중치 평균
- 바람 정보 X, Y 벡터 정보로 변환

In [7]:
from collections import defaultdict
from haversine import haversine
import pandas as pd
import numpy as np
import os

In [8]:
df_pm = pd.read_csv("../dataset/META/pmmap.csv")
df_aws = pd.read_csv("../dataset/META/awsmap.csv")

In [9]:
pm_dict = defaultdict(list)
for _, pm in df_pm.iterrows():
  pm_loc = (pm.Latitude, pm.Longitude)
  for _, aws in df_aws.iterrows():
    aws_loc = (aws.Latitude, aws.Longitude)
    dist = haversine(pm_loc, aws_loc, unit='km')
    pm_dict[pm.Location].append((aws.Location, dist))
  pm_dict[pm.Location] = sorted(pm_dict[pm.Location], key=lambda x: x[1])[:3]

In [10]:
for k,v in pm_dict.items():
  s = sum([1 / j for i, j in v])
  w = [(i, 1 / (j * s)) for i, j in v]
  print(k, w)

아름동 [('세종고운', 0.5959398623781422), ('세종금남', 0.20935575535139192), ('세종연서', 0.19470438227046594)]
신흥동 [('세종연서', 0.623771911782061), ('세종고운', 0.23437985962111915), ('세종전의', 0.14184822859681984)]
노은동 [('계룡', 0.37215720734940244), ('세종금남', 0.31442215528240247), ('오월드', 0.313420637368195)]
문창동 [('오월드', 0.43792606175001325), ('세천', 0.36840926563842336), ('장동', 0.19366467261156328)]
읍내동 [('장동', 0.4600092574298802), ('세천', 0.29439745524012173), ('오월드', 0.2455932873299979)]
정림동 [('오월드', 0.6611945451443386), ('계룡', 0.17462776824421944), ('세천', 0.16417768661144194)]
공주 [('공주', 0.595096232672347), ('정안', 0.21560028832884157), ('세종금남', 0.18930347899881153)]
논산 [('논산', 0.8106894830885594), ('계룡', 0.10161496038210217), ('양화', 0.0876955565293385)]
대천2동 [('대천항', 0.5364383656912503), ('청양', 0.24341489240638425), ('춘장대', 0.22014674190236547)]
독곶리 [('대산', 0.805639758960593), ('안도', 0.10337714448943511), ('당진', 0.090983096549972)]
동문동 [('태안', 0.42383826452678564), ('당진', 0.3223343628472369), ('홍북', 0.25382

In [13]:
os.makedirs(f"../dataset/CUSTOM_v1", exist_ok=True)
for pm, aws_list in pm_dict.items():
    df_pm = pd.read_csv(f"../dataset/TRAIN/{pm}.csv")
    df_pm["PM2.5"] = df_pm["PM2.5"].interpolate()
    df_pm.fillna(method="bfill", inplace=True)
    
    aws_dfs = []
    aws_dists = []
    for aws_name, aws_weight in aws_list:
        df = pd.read_csv(f"../dataset/TRAIN_AWS/{aws_name}.csv")
        df.interpolate(inplace=True)
        df.fillna(method="bfill", inplace=True)
        aws_dfs.append(df)
        aws_dists.append(aws_weight)
        
    aws_weights = [round(1 / w, 5) for w in aws_dists]
    s = sum(aws_weights)
    aws_weights = [round(1 / (w * s), 5) for w in aws_weights]
    
    cols = [np.zeros(len(df_pm)) for _ in range(5)]
    for aws_df, w in zip(aws_dfs, aws_weights):
        cols[0] += np.array(aws_df["기온(°C)"]) * w
        cols[1] += np.array(aws_df["풍향(deg)"]) * w
        cols[2] += np.array(aws_df["풍속(m/s)"]) * w
        cols[3] += np.array(aws_df["강수량(mm)"]) * w
        cols[4] += np.array(aws_df["습도(%)"]) * w

    df_pm["기온(°C)"] = cols[0]
    df_pm["풍향(deg)"] = cols[1]
    df_pm["풍속(m/s)"] = cols[2]
    df_pm["강수량(mm)"] = cols[3]
    df_pm["습도(%)"] = cols[4]
    # change wind info to vector
    df_pm["풍향(deg)"] = df_pm["풍향(deg)"].apply(lambda x: x * 359)
    wv = df_pm['풍속(m/s)'].values
    wd_rad = df_pm['풍향(deg)'].values * np.pi / 180
    df_pm['Wx'] = wv*np.cos(wd_rad)
    df_pm['Wy'] = wv*np.sin(wd_rad)
    df_pm.drop(columns=["풍향(deg)", "풍속(m/s)"], inplace=True)
    df_pm.to_csv(f"../dataset/CUSTOM_v1/{pm}.csv", index=False)