# **기상 데이터를 활용한 서울시 모기지수 예측 분석**
20231439 정유정
20231459 황혜린
20231462 김수인

In [61]:
# 패키지 로딩
install.packages(c("dplyr", "caret", "readr"))
library(dplyr)
library(caret)
library(readr)

Installing packages into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

“installation of package ‘dplyr’ had non-zero exit status”
“installation of package ‘caret’ had non-zero exit status”


In [62]:
# 데이터 불러오기
seoul_weather <- read.csv("서울시 자동기상 관측자료.csv", fileEncoding = "euc-kr")
seoul_mosquito <- read.csv("서울시 모기예보제 정보.csv", fileEncoding = "euc-kr")
seoul_mosquito <- seoul_mosquito[c(583:32),] # 날짜 범주 맞추기

In [63]:
# 결측치 처리 (강수량에만 결측치 존재, 비가 오지 않는 날이므로 0 대체)
seoul_weather[is.na(seoul_weather)] <- 0

In [64]:
# 데이터 병합
names(seoul_mosquito)[1] <- c("일시")
seoul_weather <- seoul_weather[,c(3:9)]
seoul_merge <- merge(seoul_mosquito, seoul_weather, by="일시")
seoul_merge

일시,모기지수.수변부.,모기지수.주거지.,모기지수.공원.,평균기온..C.,최저기온..C.,최고기온..C.,일강수량.mm.,평균.상대습도...,합계.일조시간.hr.
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
2021.10.1,100.0,44.3,47.6,21.1,16.9,27.5,13.3,75.8,9.3
2021.10.10,100.0,41.2,45.0,19.9,14.9,24.3,14.2,89.0,0.0
2021.10.11,100.0,43.7,50.1,15.2,12.6,17.8,0.3,78.6,1.7
2021.10.12,94.8,40.1,38.4,17.9,15.8,21.3,0.0,67.1,3.0
2021.10.13,82.1,31.5,27.9,19.6,14.9,25.2,0.0,66.6,9.0
2021.10.14,99.8,36.7,40.0,19.7,17.4,23.9,0.0,73.6,6.1
2021.10.15,94.8,38.1,38.8,19.0,17.0,21.4,0.0,74.4,0.7
2021.10.16,100.0,39.9,52.0,10.7,4.2,18.1,0.0,47.5,1.5
2021.10.17,100.0,40.3,44.4,5.6,1.3,10.8,0.0,37.4,10.3
2021.10.18,35.4,16.1,0.4,9.6,2.8,16.1,5.6,60.4,4.3


# 예측모델구축

In [65]:
# 데이터의 컬럼 이름 확인
colnames(seoul_merge)

# 컬럼 이름 수정
names(seoul_merge)[1] <- "Date"
names(seoul_merge)[2] <- "MosquitoIndex_Water"
names(seoul_merge)[3] <- "MosquitoIndex_Residential"
names(seoul_merge)[4] <- "MosquitoIndex_Park"
names(seoul_merge)[5] <- "Temperature"
names(seoul_merge)[6] <- "MinTemperature"
names(seoul_merge)[7] <- "MaxTemperature"
names(seoul_merge)[8] <- "Precipitation"
names(seoul_merge)[9] <- "Humidity"
names(seoul_merge)[10] <- "Sunshine"

# 필요한 열만 선택
data <- seoul_merge %>%
  select(Temperature, Precipitation, Humidity, Sunshine, MosquitoIndex_Water, MosquitoIndex_Residential, MosquitoIndex_Park)

# 데이터 확인
head(data)

Unnamed: 0_level_0,Temperature,Precipitation,Humidity,Sunshine,MosquitoIndex_Water,MosquitoIndex_Residential,MosquitoIndex_Park
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,21.1,13.3,75.8,9.3,100.0,44.3,47.6
2,19.9,14.2,89.0,0.0,100.0,41.2,45.0
3,15.2,0.3,78.6,1.7,100.0,43.7,50.1
4,17.9,0.0,67.1,3.0,94.8,40.1,38.4
5,19.6,0.0,66.6,9.0,82.1,31.5,27.9
6,19.7,0.0,73.6,6.1,99.8,36.7,40.0


In [74]:
# train 데이터와 test 데이터로 나누기 (70% train, 30% test)
set.seed(123)  # 결과 재현을 위해 시드 설정
train_index <- createDataPartition(data$MosquitoIndex_Water, p = 0.7, list = FALSE)
train_data <- data[train_index, ]
test_data <- data[-train_index, ]

# GLM 모델 생성
glm_model <- glm(MosquitoIndex_Water ~ Temperature + Precipitation + Humidity + Sunshine, data = train_data, family = gaussian())

In [75]:
# 모델 성능 평가
predictions <- predict(glm_model, test_data)
actuals <- test_data$MosquitoIndex_Water

# RMSE 계산
rmse <- sqrt(mean((predictions - actuals)^2))
print(paste("RMSE: ", rmse))

# 모델 요약 정보
summary(glm_model)

[1] "RMSE:  13.3596237660246"



Call:
glm(formula = MosquitoIndex_Water ~ Temperature + Precipitation + 
    Humidity + Sunshine, family = gaussian(), data = train_data)

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   46.46204    7.27160   6.390 4.85e-10 ***
Temperature    2.39938    0.14184  16.916  < 2e-16 ***
Precipitation  0.02546    0.03801   0.670  0.50330    
Humidity      -0.06554    0.09008  -0.728  0.46728    
Sunshine      -0.73385    0.23759  -3.089  0.00216 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 170.1234)

    Null deviance: 124236  on 386  degrees of freedom
Residual deviance:  64987  on 382  degrees of freedom
AIC: 3093.1

Number of Fisher Scoring iterations: 2
