## <div align="center"> <h1 align="center"> PART IV: Significance testing in GAMMs </h1> </div>

## Visualization

First, let's make sure we have the correct input files. As you can see, I have loaded our raw data again (i.e., data from [data_pup.csv file](https://www.kaggle.com/datasets/priscilalpezbeltrn/pupillometry-sample)), and have also subset it one more time (i.e., target_NV). In addition, I also created a [utility script](https://www.kaggle.com/code/priscilalpezbeltrn/gamms-utility-script) to save the output of model1 as an [.rds file](https://www.kaggle.com/datasets/priscilalpezbeltrn/gamm-model1) to make it easier to acces its results.

In [None]:
list.files(path = "../input")

In [None]:
data <- read.csv("../input/pupillometry-sample/data_pup.csv")
head(data)

In [None]:
target_NV <- droplevels(data[(data$condition == "NVS") | (data$condition == "NVI"), ])
head(target_NV)

In [None]:
target_NV$participant <- as.factor(target_NV$participant)
target_NV$session <- as.factor(target_NV$session)
target_NV$condition <- as.factor(target_NV$condition)
target_NV$item <- as.factor(target_NV$item)
target_NV$regularity <- as.factor(target_NV$regularity)

#sanity check
class(target_NV$participant)

In [None]:
library(readr)
model1 <- readr::read_rds("../input/gamm-model1/model1.rds")
summary(model1)

In Part II, we dissected the output of model1 in detail. However, significant p-values in the model don't really tell us much in GAMMs. P-values only indicate that both conditions are significantly different from zero and we can kind of infer which one is more different than the other. In order to actually assess significance, we will need to visualize the results.

For visualization purposes, we will use the R package [itsadug](https://www.google.com/search?q=itsadug+package&rlz=1C5CHFA_enUS906US906&oq=its&aqs=chrome.0.69i59j69i57j0i67i131i433j46i10i512j0i512j0i433i512j69i60l2.1195j0j7&sourceid=chrome&ie=UTF-8).

First, we visualize the fitted smooths for NVI and NVS. On the Y axis we have pupillary dilation and on the X axis we have time. We see that NVI elicited higher dilation that NVS starting around time bin 5 (~100 ms into the target period).

In [None]:
install.packages("itsadug")
library(itsadug)

In [None]:
plot_smooth(model1, view = "bin", plot_all = c("condition"), 
            rm.ranef = TRUE, rug = FALSE, shade = FALSE, se = 0, lwd = 8,
            main = "Pupillary Response Smooths based on Fitted Values \nNon-variable Governors",
            xlab = "Time Bin (20ms per bin)",
            ylab = "Corrected Pupil Size",
            hide.label = T,
            family = "serif")

Next, we visualize the difference between fitted smooths for NVI and NVS. The blocks of time marked between red dashed lines indicate when during the target period there was a significant difference between conditions. Because we are subtracting NVI from NVS and NVI elicited higer pupilary dilation, this difference shows as positive.

In [None]:
plot_diff(model1, view = "bin", comp = list(condition = c("NVI", "NVS")), rm.ranef = TRUE)

## Binary difference smooths

Besides visualization, another possibility for significance testing is building a model with *binary difference smooths* which modek the difference between the fitted smooths for each condition. This method allows us to determine signiicance from the model output based on the p-value associated with the difference smooth.

To fit a model with binary difference smooths, we first have to create a new, binary variable which is equal to 0 for one level of the nominal variable and 1 for the other level (i.e. a dummy coded variable). 

Below, we create the variable *IsInd* ("is indicative") where the reference value 0 is subjunctive and the alternative value 1 is indicative.

In [None]:
target_NV$IsInd <- (target_NV$condition == "NVI")*1
target_NV$IsInd <- as.factor(target_NV$IsInd)
class(target_NV$IsInd)

In [None]:
model_bin <- bam(corrected_pupil_size ~
                             + s(bin) # reference level = subjunctive
                             + s(bin, by = IsInd) # difference smooth = indicative - subjunctive
                             + s(gaze_x, gaze_y)
                             + s(bin, participant, bs = 'fs')
                             + s(bin, item, bs = 'fs')
                             , family = "scat"
                             , data = target_NV
                             , method = "fREML"
                             , discrete = TRUE)
summary(model_bin)

# If you keep getting the error: "'names' attribute [2] must be the same length as the vector [1]", remember that the variables participant and item need to be converted into factors.

# This model will also take 5+ minutes to run

### In Part V, we will learn how to improve the structure of our models in order to improve their predictive power and goodness of fit
### To be continued!