# Multiple Linear Regression




### Air Pollution Data Preparation for Modeling

In [13]:
my_data <- read.table("zanieczyszczenia.txt", header = TRUE)
head(my_data)


Unnamed: 0_level_0,month,day,hour,PM2.5,PM10,SO2,NO2,CO,O3,TEMP
Unnamed: 0_level_1,<int>,<int>,<int>,<int>,<dbl>,<dbl>,<dbl>,<int>,<dbl>,<dbl>
1,3,1,12,8,8,3,12,300,64,2.8
2,3,2,12,19,22,17,45,600,37,2.1
3,3,3,12,95,106,60,73,1700,28,11.8
4,3,4,12,3,7,5,15,500,83,13.4
5,3,5,12,127,147,76,68,1800,33,12.3
6,3,6,12,160,174,103,87,2700,61,9.7


In [14]:
dane<- na.omit(my_data)
dane2 <- dane %>% 
  select(-month, -day, -hour)
head(dane2) 

Unnamed: 0_level_0,PM2.5,PM10,SO2,NO2,CO,O3,TEMP
Unnamed: 0_level_1,<int>,<dbl>,<dbl>,<dbl>,<int>,<dbl>,<dbl>
1,8,8,3,12,300,64,2.8
2,19,22,17,45,600,37,2.1
3,95,106,60,73,1700,28,11.8
4,3,7,5,15,500,83,13.4
5,127,147,76,68,1800,33,12.3
6,160,174,103,87,2700,61,9.7


### Linear Regression Model for Air Temperature Prediction

This task calculates the correlation matrix for air pollution variables and selects independent variables with moderate correlations to include in a linear regression model predicting air temperature (TEMP). The selected variables are SO2, NO2, CO, and O3, while highly correlated variables (PM2.5 and PM10) are excluded. The linear regression model is built to analyze the influence of these independent variables on air temperature. The model's coefficients, Residual Sum of Squares (RSS), Residual Standard Error (RSE), and R-squared (R²) are calculated to evaluate its fit.

In [15]:
macierz_korelacji <- cor(dane2, use = "complete.obs")

print(macierz_korelacji)

dane_model <- dane2 %>%
  select(SO2, NO2, CO, O3, TEMP) 


            PM2.5         PM10        SO2        NO2         CO          O3
PM2.5  1.00000000  0.929209653  0.5427403  0.7414345  0.8080544 -0.09498647
PM10   0.92920965  1.000000000  0.5596753  0.7451827  0.7693860 -0.10128914
SO2    0.54274034  0.559675287  1.0000000  0.6573867  0.6038148 -0.29558754
NO2    0.74143450  0.745182665  0.6573867  1.0000000  0.7596070 -0.43586431
CO     0.80805436  0.769386013  0.6038148  0.7596070  1.0000000 -0.29613393
O3    -0.09498647 -0.101289135 -0.2955875 -0.4358643 -0.2961339  1.00000000
TEMP  -0.02870710 -0.007391401 -0.2865561 -0.2379080 -0.1948071  0.68124989
              TEMP
PM2.5 -0.028707098
PM10  -0.007391401
SO2   -0.286556128
NO2   -0.237908000
CO    -0.194807142
O3     0.681249890
TEMP   1.000000000


In [16]:
model_2 <- lm(TEMP ~ SO2 + NO2 + CO + O3, data = dane_model)
sum_model<-summary(model_2)


In [20]:

wspolczynniki <- sum_model$coefficients
print(wspolczynniki)

# Calculate metrics for the linear regression model
rss <- sum(sum_model$residuals^2)  # Residual Sum of Squares
rse <- sum_model$sigma            # Residual Standard Error
r2 <- sum_model$r.squared         # R-squared

# Display the metrics in a well-formatted table
model_metrics <- data.frame(
  Metric = c("Residual Sum of Squares (RSS)", 
             "Residual Standard Error (RSE)", 
             "R-squared (R²)"),
  Value = round(c(rss, rse, r2), 4)
)

print(model_metrics)



                 Estimate   Std. Error   t value      Pr(>|t|)
(Intercept)  2.5948689246 0.6359765607  4.080133  4.781307e-05
SO2         -0.1041748470 0.0133928542 -7.778390  1.510629e-14
NO2          0.1027068109 0.0148171406  6.931622  6.599259e-12
CO          -0.0004575303 0.0003573789 -1.280239  2.006948e-01
O3           0.1641486729 0.0051090617 32.128928 2.742992e-166
                         Metric     Value
1 Residual Sum of Squares (RSS) 79926.923
2 Residual Standard Error (RSE)     7.930
3                R-squared (R²)     0.496


### Predicting Temperature Using a Linear Regression Model

In [22]:
predictions <- data.frame(
  SO2 = 25,
  NO2 = 90,
  CO = 2000,
  O3 = 50
)

predicted_temp <- predict(model_2, newdata = predictions)

print(paste("Predicted Temperature:", round(predicted_temp, 2)))


[1] "Predicted Temperature: 16.53"
