## Polynomial Regression

Polynomial regression adds additional flexibility to the linear regression model to model non-linear trends. It should be noted specifically here that the data modeled here are not longitudinal data. These are data that are collected at a single time point, but the form of relationship is non-linear. 

There are non-linear regression models that can be specified directly. Those are more challenging, difficult to interpret, and difficult to estimate. I also do not personally use those types of methods, therefore, the focus in this section of notes is to show that non-linear trends can be modeled using linear regression and maintain the additivity of the model. 

The key to modeling non-linear trends using a linear regression model, is to add polynomial terms. For example, suppose we think that believe there is a non-linear trend. The simplest model would look like the following:

$$
Y = \beta_{0} + \beta_{1} X + \beta_{2} X^{2} + \epsilon
$$

In this model, the single $X$ attribute/predictor is included. The model includes the linear association, but also includes a quadratic association which would estimate/allow the regression line to be curvilinear rather than straight. Further polynomial terms (i.e., cubic, quartic, etc.) could be added. These terms are challenging to interpret, therefore, often visualizing the relationship helps to identify the effect of the non-linearity. 

### Example

We will again use simulated data to show the model, but will base this on a real research paper:

Glomb, T. M., & Welsh, E. T. (2005). Can opposites attract? Personality heterogeneity in supervisor-subordinate dyads as a predictor of subordinate outcomes. Journal of Applied Psychology, 90(4), 749.

This study looked to explore "Hypothesis 1: Differences between the supervisor and the subordinate in control traits (with the supervisor being higher) will be related to higher subordinate satisfaction with the supervisor."

In [None]:
library(tidyverse)
library(ggformula)
library(mosaic)
library(simglm)

theme_set(theme_bw(base_size = 16))

sim_args <- list(
    formula = satisfaction ~ 1 + poly(sup_control, degree = 2, raw = TRUE) + poly(sub_control, degree = 2, raw = TRUE),
    fixed = list(
        sup_control = list(var_type = 'continuous', 
        mean = 0, sd = 10.93),
        sub_control = list(var_type = 'continuous', 
        mean = 0, sd = 9.81)
    ),
    error = list(variance = 50),
    sample_size = 200,
    reg_weights = c(0, .109, -.109, 0, -0.01)
)

satis_data <- simulate_fixed(data = NULL, sim_args) |> 
  simulate_error(sim_args) |>
  generate_response(sim_args)

head(satis_data)

In [None]:
gf_point(satisfaction ~ sup_control, data = satis_data, size = 4) |>
  gf_smooth(method = 'lm', linewidth = 2) |>
  gf_smooth(method = 'loess', linewidth = 2)

In [None]:
gf_point(satisfaction ~ sub_control, data = satis_data, size = 4) |>
  gf_smooth(method = 'lm', linewidth = 2) |>
  gf_smooth(method = 'loess', linewidth = 2)

In [None]:
lm(satisfaction ~ 1 + sup_control + sub_control, data = satis_data) |> 
   broom::tidy()

In [None]:
lm(satisfaction ~ 1 + sup_control + sub_control + I(sup_control^2) + I(sub_control^2), data = satis_data) |> 
   broom::tidy()

## Spline Models

There are also a type of model called spline models that can also help with the flexibility of the linear regression model. The simplest type of spline model are called linear-linear spline models. These types of models are purely linear, but the linear slope does not need to be the same across the entire span. These can be of interest when something occurs at a specific value of an attribute. 

For example (this example is completely hypothetical), imagine a situation where you are predicting anxiety symptomolgy by the age of the individual. Suppose we are interested in knowing if there are differential associations across different age spans, spline models can aid in this type of estimation. 

In [None]:
sim_args <- list(
    formula = anxiety ~ 1 + age + age_group_post + age:age_group_post,
    fixed = list(
        age = list(var_type = 'ordinal', 
        levels = 20:80)
    ),
    post = list(age_group_post = list(variable = 'age', 
                               fun = 'ifelse',
                               condition = '>= 50',
                               yes = 1,
                               no = 0)),
    error = list(variance = 50),
    sample_size = 1000,
    reg_weights = c(5, 0.05, -0.8, -0.1)
)

anxiety_data <- simulate_fixed(data = NULL, sim_args) |> 
  simulate_error(sim_args) |>
  generate_response(sim_args)

head(anxiety_data)

In [None]:
gf_point(anxiety ~ age, data = anxiety_data, size = 2, alpha = 0.5) |> 
  gf_smooth(method = 'lm', linewidth = 2)

In [None]:
gf_point(anxiety ~ age, data = anxiety_data, color = ~factor(age_group_post), size = 2, alpha = 0.5) |> 
  gf_smooth(method = 'lm', linewidth = 2) |> 
  gf_labs(color = "Age Group")

In [None]:
lm(anxiety ~ 1 + age + age_group_post, data = anxiety_data) |> 
  broom::tidy()

In [None]:
lm(anxiety ~ 1 + age + age_group_post + age:age_group_post, data = anxiety_data) |> 
  broom::tidy()

In [None]:
sim_args <- list(
    formula = anxiety ~ 1 + age + age_group_post + age:age_group_post,
    fixed = list(
        age = list(var_type = 'ordinal', 
        levels = 20:80)
    ),
    post = list(age_group_post = list(variable = 'age', 
                               fun = 'ifelse',
                               condition = '>= 50',
                               yes = 1,
                               no = 0)),
    error = list(variance = 50),
    sample_size = 1000,
    reg_weights = c(5, 0.05, 0, -0.15)
)

anxiety_data <- simulate_fixed(data = NULL, sim_args) |> 
  simulate_error(sim_args) |>
  generate_response(sim_args)

head(anxiety_data)

In [None]:
gf_point(anxiety ~ age, data = anxiety_data, size = 2, alpha = 0.5) |> 
  gf_smooth(method = 'lm', linewidth = 2)

In [None]:
gf_point(anxiety ~ age, data = anxiety_data, color = ~factor(age_group_post), size = 2, alpha = 0.5) |> 
  gf_smooth(method = 'lm', linewidth = 2) |> 
  gf_labs(color = "Age Group")

In [None]:
lm(anxiety ~ 1 + age + age_group_post, data = anxiety_data) |> 
  broom::tidy()

In [None]:
lm(anxiety ~ 1 + age + age_group_post + age:age_group_post, data = anxiety_data) |> 
  broom::tidy()