-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement theilSen()
#4
Comments
Hi @jack-davison and @davidcarslaw , I have done some comparison with two packages I have found that allow for theilSen calculations: library(mblm)
library(openair)
library(ggplot2)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
mydata <- openair::mydata
# define the function
sen <- function(..., weights = NULL) {
#RobustLinearReg::theil_sen_regression(...)
mblm::mblm(..., repeated = FALSE)
}
p1 <- mydata |>
openair::timeAverage(avg.time = "month") |>
ggplot2::ggplot(aes(date, nox)) +
ggplot2::geom_point() +
ggplot2::geom_line() +
ggplot2::geom_smooth(method = sen) +
ggplot2::theme_minimal()
p1
#> `geom_smooth()` using formula = 'y ~ x' # Do the openair plot:
theilsen_data <- TheilSen(mydata,
pollutant = "nox",
avg.time = "month",
statistic = "mean",
deseason = FALSE)
#> Taking bootstrap samples. Please wait. # Combine the data into one plot
p2 <- p1 +
geom_abline(slope = theilsen_data$data$res2$slope/(60*60*24*365.2422),
intercept = theilsen_data$data$res2$intercept,
color = "red",
linewidth = 1) +
geom_abline(slope = theilsen_data$data$res2$lower/(60*60*24*365.2422),
intercept = theilsen_data$data$res2$intercept.lower,
color = "red",
linewidth = 0.5,
linetype = "dashed") +
geom_abline(slope = theilsen_data$data$res2$upper/(60*60*24*365.2422),
intercept = theilsen_data$data$res2$intercept.upper,
color = "red",
linewidth = 0.5,
linetype = "dashed")
p2
#> `geom_smooth()` using formula = 'y ~ x' ###############################################
# check the outcomes of slope and intercepts:
###############################################
# Calculate the model directly
mblm_data <- mydata |>
dplyr::select(date, nox) |>
openair::timeAverage(avg.time = "month") |>
na.omit() |> # needed as mblm cannot handle NA
dplyr::mutate(date = as.numeric(date)) # get the seconds
model <- mblm::mblm(formula = nox~date,
mblm_data,
repeated = FALSE)
# slope from theilSen openair:
theilsen_data$data$res2$slope
#> [1] -8.44159
# slope from mblm
model$coefficients[[2]] * (60*60*24*365.2422)
#> [1] -8.447192
# confidence intervals of the slope
c(theilsen_data$data$res2$lower, theilsen_data$data$res2$upper)
#> [1] -11.671282 -5.439937
mblm::confint.mblm(model)[2,] * (60*60*24*365.2422)
#> 0.025 0.975
#> -1325.990 1364.582
confint.default(model)[2,] * (60*60*24*365.2422)
#> 2.5 % 97.5 %
#> -11.95284 -4.94154
# The latter looks more like the ones from theilSen() in openair. The first
# CI's are weird to say the least.
# intercept from theilSen openair:
theilsen_data$data$res2$intercept
#> [1] 441.5602
# intercept from mblm
model$coefficients[[1]]
#> [1] 439.9073 Created on 2023-12-20 with reprex v2.0.2 In general, we can see that the slope is very similar, and there might be a difference as I might not have used the correct correction factor (number of seconds in a year). There is a slight offset in the intercept which I cannot explain fully. Some other observations from my testing:
|
This is the one
openair
function that doesn't have aggopenair
equivalent, in part because it is the one that feels most like it should be ageom
orstat
.One could imagine doing the below, for example:
Need to check if this already exists somewhere else or, if not, need to confirm how to implement it.
If it is the case that
geom_smooth()
can achieve this straightforwardly, then there may need to be no need to add this at all (in the same way thatsmoothTrend()
doesn't need migrating.The text was updated successfully, but these errors were encountered: