# 第8章: Nonlinear Regression Functions（非線形回帰関数）練習問題

この章では多項式回帰、対数変換、交互作用項について学びます。

In [None]:
library(AER)
data("CASchools")
CASchools$STR <- CASchools$students / CASchools$teachers
CASchools$score <- (CASchools$read + CASchools$math) / 2

---
## 問題1: 多項式回帰（2次）

STRとscoreの関係に非線形性があるか確認します。

1. score ~ STR + I(STR^2) の2次多項式回帰を推定
2. 2次項の係数が有意かどうか検定
3. 回帰曲線を描画

In [None]:
# 問題1の解答欄



### 模範解答

In [None]:
# 模範解答
# 1. 2次多項式回帰
model_quad <- lm(score ~ STR + I(STR^2), data = CASchools)
summary(model_quad)

# 2. 2次項の検定
cat("\n2次項の係数:", coef(model_quad)[3], "\n")
cat("p値:", summary(model_quad)$coefficients[3, 4], "\n")

# 3. 回帰曲線の描画
STR_range <- seq(min(CASchools$STR), max(CASchools$STR), length.out = 100)
pred_quad <- predict(model_quad, newdata = data.frame(STR = STR_range))

plot(CASchools$STR, CASchools$score,
     main = "2次多項式回帰",
     xlab = "STR", ylab = "Score",
     pch = 19, col = rgb(0, 0, 1, 0.5))
lines(STR_range, pred_quad, col = "red", lwd = 2)

# 線形モデルとの比較
model_linear <- lm(score ~ STR, data = CASchools)
abline(model_linear, col = "blue", lwd = 2, lty = 2)
legend("topright", legend = c("2次多項式", "線形"), 
       col = c("red", "blue"), lty = c(1, 2), lwd = 2)

---
## 問題2: 対数変換（log-linear モデル）

被説明変数を対数変換したモデルを推定します。

1. log(score) ~ STR のモデルを推定
2. 係数の解釈（弾力性ではなく半弾力性）
3. 元のモデルとの比較

In [None]:
# 問題2の解答欄



### 模範解答

In [None]:
# 模範解答
# 1. log-linearモデル
model_loglin <- lm(log(score) ~ STR, data = CASchools)
summary(model_loglin)

# 2. 係数の解釈
cat("\n=== 係数の解釈 ===", "\n")
cat("STRの係数:", coef(model_loglin)[2], "\n")
cat("STRが1単位増加すると、scoreは約", 
    round(coef(model_loglin)[2] * 100, 2), "%変化\n")

# 3. 元のモデルとの比較
model_linear <- lm(score ~ STR, data = CASchools)

par(mfrow = c(1, 2))
# 線形モデル
plot(CASchools$STR, CASchools$score,
     main = "線形モデル: score ~ STR",
     xlab = "STR", ylab = "Score", pch = 19, col = rgb(0, 0, 1, 0.5))
abline(model_linear, col = "red", lwd = 2)

# Log-linearモデル
plot(CASchools$STR, log(CASchools$score),
     main = "Log-linear: log(score) ~ STR",
     xlab = "STR", ylab = "log(Score)", pch = 19, col = rgb(0, 0, 1, 0.5))
abline(model_loglin, col = "red", lwd = 2)
par(mfrow = c(1, 1))

---
## 問題3: 対数-対数モデル（log-log）

弾力性を推定するlog-logモデルを推定します。

1. log(score) ~ log(income) のモデルを推定
2. 弾力性を解釈
3. 散布図と回帰直線を描画

In [None]:
# 問題3の解答欄



### 模範解答

In [None]:
# 模範解答
# 1. log-logモデル
model_loglog <- lm(log(score) ~ log(income), data = CASchools)
summary(model_loglog)

# 2. 弾力性の解釈
cat("\n=== 弾力性の解釈 ===", "\n")
cat("log(income)の係数（弾力性）:", coef(model_loglog)[2], "\n")
cat("incomeが1%増加すると、scoreは約", 
    round(coef(model_loglog)[2], 3), "%増加\n")

# 3. 散布図
par(mfrow = c(1, 2))
# 元のスケール
plot(CASchools$income, CASchools$score,
     main = "元のスケール",
     xlab = "Income", ylab = "Score",
     pch = 19, col = rgb(0, 0, 1, 0.5))

# 対数スケール
plot(log(CASchools$income), log(CASchools$score),
     main = "対数スケール",
     xlab = "log(Income)", ylab = "log(Score)",
     pch = 19, col = rgb(0, 0, 1, 0.5))
abline(model_loglog, col = "red", lwd = 2)
par(mfrow = c(1, 1))

---
## 問題4: 交互作用項

STRの効果がenglishの水準によって異なるかを確認します。

1. score ~ STR * HiEL の交互作用モデルを推定（HiEL: english > 10 のダミー変数）
2. 各係数を解釈
3. 低英語学習者率と高英語学習者率でのSTRの効果をそれぞれ計算

In [None]:
# 問題4の解答欄



### 模範解答

In [None]:
# 模範解答
# ダミー変数の作成
CASchools$HiEL <- ifelse(CASchools$english > 10, 1, 0)

# 1. 交互作用モデル
model_inter <- lm(score ~ STR * HiEL, data = CASchools)
summary(model_inter)

# 2. 係数の解釈
cat("\n=== 係数の解釈 ===", "\n")
cat("(Intercept):", coef(model_inter)[1], "\n")
cat("  → HiEL=0（低英語学習者率）の場合の切片\n\n")
cat("STR:", coef(model_inter)[2], "\n")
cat("  → HiEL=0の場合のSTRの効果\n\n")
cat("HiEL:", coef(model_inter)[3], "\n")
cat("  → STR=0での高/低英語学習者率の差（解釈困難）\n\n")
cat("STR:HiEL:", coef(model_inter)[4], "\n")
cat("  → HiELによるSTRの効果の違い\n\n")

# 3. グループ別のSTRの効果
cat("=== グループ別のSTR効果 ===", "\n")
effect_low <- coef(model_inter)[2]
effect_high <- coef(model_inter)[2] + coef(model_inter)[4]
cat("低英語学習者率 (HiEL=0) でのSTR効果:", effect_low, "\n")
cat("高英語学習者率 (HiEL=1) でのSTR効果:", effect_high, "\n")

# 視覚化
plot(CASchools$STR, CASchools$score,
     main = "STRとscoreの関係（HiEL別）",
     xlab = "STR", ylab = "Score",
     col = ifelse(CASchools$HiEL == 1, "red", "blue"),
     pch = 19)
# 回帰直線
abline(a = coef(model_inter)[1], b = coef(model_inter)[2], 
       col = "blue", lwd = 2)
abline(a = coef(model_inter)[1] + coef(model_inter)[3], 
       b = coef(model_inter)[2] + coef(model_inter)[4], 
       col = "red", lwd = 2)
legend("topright", legend = c("HiEL=0", "HiEL=1"), col = c("blue", "red"), pch = 19)

---
## 問題5: 連続変数同士の交互作用

STRとincomeの交互作用を検討します。

1. score ~ STR * income のモデルを推定
2. 低所得（income = 10）と高所得（income = 25）でのSTRの効果を計算
3. 交互作用項が有意かどうか確認

In [None]:
# 問題5の解答欄



### 模範解答

In [None]:
# 模範解答
# 1. 交互作用モデル
model_inter_cont <- lm(score ~ STR * income, data = CASchools)
summary(model_inter_cont)

# 2. 異なる所得レベルでのSTRの効果
# ∂score/∂STR = β_STR + β_STR:income × income
beta_STR <- coef(model_inter_cont)["STR"]
beta_inter <- coef(model_inter_cont)["STR:income"]

income_low <- 10
income_high <- 25

effect_low_income <- beta_STR + beta_inter * income_low
effect_high_income <- beta_STR + beta_inter * income_high

cat("\n=== 所得水準別のSTR効果 ===", "\n")
cat("income = 10 でのSTR効果:", effect_low_income, "\n")
cat("income = 25 でのSTR効果:", effect_high_income, "\n")

# 3. 交互作用項の有意性
cat("\n交互作用項のp値:", summary(model_inter_cont)$coefficients[4, 4], "\n")

---
## 問題6: 3次多項式回帰

より柔軟な関数形を検討します。

1. 1次、2次、3次多項式モデルを推定
2. 各モデルのAICを比較
3. 最適なモデルを選択し、その根拠を説明

In [None]:
# 問題6の解答欄



### 模範解答

In [None]:
# 模範解答
# 1. 多項式モデルの推定
model_1 <- lm(score ~ STR, data = CASchools)
model_2 <- lm(score ~ STR + I(STR^2), data = CASchools)
model_3 <- lm(score ~ STR + I(STR^2) + I(STR^3), data = CASchools)

# 2. モデル比較
cat("=== モデル比較 ===", "\n")
comparison <- data.frame(
  Model = c("1次", "2次", "3次"),
  R2 = c(summary(model_1)$r.squared, summary(model_2)$r.squared, 
         summary(model_3)$r.squared),
  Adj_R2 = c(summary(model_1)$adj.r.squared, summary(model_2)$adj.r.squared,
             summary(model_3)$adj.r.squared),
  AIC = c(AIC(model_1), AIC(model_2), AIC(model_3)),
  BIC = c(BIC(model_1), BIC(model_2), BIC(model_3))
)
print(round(comparison, 4))

# F検定による比較
cat("\n=== F検定 ===", "\n")
cat("1次 vs 2次:\n")
print(anova(model_1, model_2))
cat("\n2次 vs 3次:\n")
print(anova(model_2, model_3))

# 3. 視覚化
STR_range <- seq(min(CASchools$STR), max(CASchools$STR), length.out = 100)
plot(CASchools$STR, CASchools$score, pch = 19, col = rgb(0, 0, 0, 0.3),
     main = "多項式回帰の比較", xlab = "STR", ylab = "Score")
lines(STR_range, predict(model_1, data.frame(STR = STR_range)), col = "blue", lwd = 2)
lines(STR_range, predict(model_2, data.frame(STR = STR_range)), col = "red", lwd = 2)
lines(STR_range, predict(model_3, data.frame(STR = STR_range)), col = "green", lwd = 2)
legend("topright", legend = c("1次", "2次", "3次"), col = c("blue", "red", "green"), lwd = 2)

---
## 問題7: 限界効果の計算

2次多項式モデルでの限界効果を計算します。

1. score ~ STR + I(STR^2) モデルを推定
2. STRの限界効果 ∂score/∂STR = β₁ + 2β₂×STR を計算
3. STR = 15, 20, 25 での限界効果をそれぞれ計算

In [None]:
# 問題7の解答欄



### 模範解答

In [None]:
# 模範解答
# 1. モデル推定
model_quad <- lm(score ~ STR + I(STR^2), data = CASchools)
beta1 <- coef(model_quad)[2]
beta2 <- coef(model_quad)[3]

cat("係数:\n")
cat("β₁ (STR) =", beta1, "\n")
cat("β₂ (STR²) =", beta2, "\n\n")

# 2-3. 限界効果の計算
marginal_effect <- function(STR) {
  beta1 + 2 * beta2 * STR
}

STR_values <- c(15, 20, 25)
cat("=== 限界効果 ∂score/∂STR ===", "\n")
for (s in STR_values) {
  cat("STR =", s, ": 限界効果 =", marginal_effect(s), "\n")
}

# 平均でのSTRの限界効果
mean_STR <- mean(CASchools$STR)
cat("\n平均STR (", round(mean_STR, 2), ") での限界効果:", 
    marginal_effect(mean_STR), "\n")

# 限界効果のグラフ
STR_range <- seq(14, 26, length.out = 100)
plot(STR_range, marginal_effect(STR_range), type = "l",
     main = "STRの限界効果",
     xlab = "STR", ylab = "∂score/∂STR",
     col = "blue", lwd = 2)
abline(h = 0, col = "red", lty = 2)

---
## 問題8: 関数形の選択

異なる関数形を比較し、最適なものを選択します。

1. 以下のモデルを推定
   - A: score ~ income
   - B: score ~ log(income)
   - C: log(score) ~ income
   - D: log(score) ~ log(income)

2. 残差プロットで関数形の適切さを確認

In [None]:
# 問題8の解答欄



### 模範解答

In [None]:
# 模範解答
# 1. モデルの推定
modelA <- lm(score ~ income, data = CASchools)
modelB <- lm(score ~ log(income), data = CASchools)
modelC <- lm(log(score) ~ income, data = CASchools)
modelD <- lm(log(score) ~ log(income), data = CASchools)

# R²の比較（注意：異なる被説明変数のR²は直接比較不可）
cat("=== R²の比較 ===", "\n")
cat("Model A (score ~ income): R² =", summary(modelA)$r.squared, "\n")
cat("Model B (score ~ log(income)): R² =", summary(modelB)$r.squared, "\n")
cat("Model C (log(score) ~ income): R² =", summary(modelC)$r.squared, "\n")
cat("Model D (log(score) ~ log(income)): R² =", summary(modelD)$r.squared, "\n\n")
cat("注意: 被説明変数が異なるモデルのR²は直接比較できない\n\n")

# 2. 残差プロット
par(mfrow = c(2, 2))
plot(fitted(modelA), resid(modelA), main = "A: score ~ income",
     xlab = "Fitted", ylab = "Residuals", pch = 19, col = rgb(0, 0, 1, 0.5))
abline(h = 0, col = "red")

plot(fitted(modelB), resid(modelB), main = "B: score ~ log(income)",
     xlab = "Fitted", ylab = "Residuals", pch = 19, col = rgb(0, 0, 1, 0.5))
abline(h = 0, col = "red")

plot(fitted(modelC), resid(modelC), main = "C: log(score) ~ income",
     xlab = "Fitted", ylab = "Residuals", pch = 19, col = rgb(0, 0, 1, 0.5))
abline(h = 0, col = "red")

plot(fitted(modelD), resid(modelD), main = "D: log(score) ~ log(income)",
     xlab = "Fitted", ylab = "Residuals", pch = 19, col = rgb(0, 0, 1, 0.5))
abline(h = 0, col = "red")
par(mfrow = c(1, 1))

---
## 問題9: スプライン回帰

区分線形モデル（スプライン）を推定します。

1. STR = 20 をノット（結び目）とする区分線形モデルを推定
2. モデル: score ~ STR + (STR - 20) * I(STR > 20)
3. 結果を視覚化

In [None]:
# 問題9の解答欄



### 模範解答

In [None]:
# 模範解答
# 1. ノット変数の作成
knot <- 20
CASchools$STR_spline <- pmax(CASchools$STR - knot, 0)

# 2. 区分線形モデル
model_spline <- lm(score ~ STR + STR_spline, data = CASchools)
summary(model_spline)

# 係数の解釈
cat("\n=== 係数の解釈 ===", "\n")
cat("STR < 20 での傾き:", coef(model_spline)[2], "\n")
cat("STR >= 20 での傾き:", coef(model_spline)[2] + coef(model_spline)[3], "\n")

# 3. 視覚化
plot(CASchools$STR, CASchools$score, pch = 19, col = rgb(0, 0, 1, 0.5),
     main = "区分線形モデル（ノット=20）",
     xlab = "STR", ylab = "Score")

# 予測線
STR_range <- seq(min(CASchools$STR), max(CASchools$STR), length.out = 100)
new_data <- data.frame(STR = STR_range, STR_spline = pmax(STR_range - knot, 0))
pred <- predict(model_spline, newdata = new_data)
lines(STR_range, pred, col = "red", lwd = 2)
abline(v = knot, col = "gray", lty = 2)

---
## 問題10: 非線形モデルの予測

非線形モデルでの予測と予測区間を計算します。

1. 2次多項式モデル score ~ STR + I(STR^2) + english を推定
2. STR = 18, english = 15 の学区のスコアを予測
3. 95%予測区間を計算

In [None]:
# 問題10の解答欄



### 模範解答

In [None]:
# 模範解答
# 1. モデル推定
model_full <- lm(score ~ STR + I(STR^2) + english, data = CASchools)
summary(model_full)

# 2-3. 予測と予測区間
new_data <- data.frame(STR = 18, english = 15)

# 点予測
point_pred <- predict(model_full, newdata = new_data)
cat("\n=== 予測 ===", "\n")
cat("点予測値:", point_pred, "\n")

# 信頼区間
conf_int <- predict(model_full, newdata = new_data, interval = "confidence")
cat("\n平均値の95%信頼区間:\n")
print(conf_int)

# 予測区間
pred_int <- predict(model_full, newdata = new_data, interval = "prediction")
cat("\n個別予測の95%予測区間:\n")
print(pred_int)

# 視覚化（STRを変化させたときの予測）
STR_range <- seq(14, 26, length.out = 100)
new_data_range <- data.frame(STR = STR_range, english = 15)
pred_bands <- predict(model_full, newdata = new_data_range, interval = "prediction")

plot(CASchools$STR, CASchools$score, pch = 19, col = rgb(0, 0, 0, 0.2),
     main = "予測と予測区間（english=15）",
     xlab = "STR", ylab = "Score")
lines(STR_range, pred_bands[, "fit"], col = "blue", lwd = 2)
lines(STR_range, pred_bands[, "lwr"], col = "red", lty = 2)
lines(STR_range, pred_bands[, "upr"], col = "red", lty = 2)
points(18, point_pred, col = "green", pch = 19, cex = 2)
legend("topright", legend = c("予測値", "95%予測区間", "新規予測点"),
       col = c("blue", "red", "green"), lty = c(1, 2, NA), pch = c(NA, NA, 19))