# Data Science in Psychology & Neuroscience (DSPN): 

## Lecture 17 Data Modeling (part 1)

### Date: October 26, 2023

### To-Dos From Last Class:

* Download Assignment 4 starter kit and get visualizing!
    
### Today:

* So you want to model some data...
* Examples

### Homework

* Assignment 4: <a href="https://www.dropbox.com/request/wbFD7Yj0MNUdqUjmblYf">Data Visualization</a> (before 11/3, 23:00 MDT)

In [1]:
# plot theme stuff
# Many palettes available online, can customize
# these are from: https://colorbrewer2.org/#type=qualitative&scheme=Set1&n=9
my_palette <- c("#e41a1c","#377eb8","#4daf4a","#984ea3","#ff7f00")

# This is the basic function I use for all the ggplots I create. 
# Modified from this black themed ggplot function: https://gist.github.com/jslefche/eff85ef06b4705e6efbc
my_theme = function(base_size = 24, base_family = "") {
  
  theme_grey(base_size = base_size, base_family = base_family) %+replace%
    
    theme(
      # Specify axis options
      axis.line = element_blank(),  
      axis.text.x = element_text(size = base_size*0.8, color = "black", lineheight = 0.9),  
      axis.text.y = element_text(size = base_size*0.8, color = "black", lineheight = 0.9),  
      axis.ticks = element_line(color = "black", size  =  0.2),
      axis.title.x = element_text(size = base_size, color = "black", margin = margin(10, 0, 0, 0)),
      axis.title.y = element_text(size = base_size, color = "black", angle = 90, margin = margin(0, 10, 0, 0)),  
      axis.ticks.length = unit(0.3, "lines"),   
      # Specify legend options
      legend.background = element_rect(color = NA, fill = "#ffffff"),  
      legend.key = element_rect(color = "black",  fill = "#ffffff"),  
      legend.key.size = unit(2, "lines"),  
      legend.key.height = NULL,  
      legend.key.width = NULL,      
      legend.text = element_text(size = base_size*0.8, color = "black"),  
      legend.title = element_text(size = base_size*0.8, face = "bold", hjust = 0, color = "black"),
      legend.position = "right",  
      legend.text.align = NULL,  
      legend.title.align = NULL,  
      legend.direction = "vertical",  
      legend.box = NULL, 
      # Specify panel options
      panel.background = element_rect(fill = "#ffffff", color  =  NA),  
      panel.border = element_rect(fill = NA, color = "black"),  
      panel.grid.major = element_line(color = "#ffffff"),  
      panel.grid.minor = element_line(color = "#ffffff"),  
      panel.spacing = unit(2, "lines"),
      # Specify facetting options
      strip.background = element_rect(fill = "grey30", color = "grey10"),  
      strip.text.x = element_text(size = base_size*0.8, color = "black"),  
      strip.text.y = element_text(size = base_size*0.8, color = "black",angle = -90),  
      # Specify plot options
      plot.background = element_rect(color = "#ffffff", fill = "#ffffff"),  
      plot.title = element_text(size = base_size*1.2, color = "black"),  
      plot.margin = unit(rep(1, 4), "lines")
    ) 
}

## Section 1: So you want to model some data?

<img src="img/what-does-that-mean-david.gif" width=300>

### What did you find?
* Let's say you have...
1. Acquired some data
2. Wrangled the data
3. Computed descriptive statistics (central tendencies, variance, etc.)
4. Generated some visualizations
* Remaining questions: What did you find?

__Inferential (or predictive) modeling is used to generate an interim answer about your pre-defined research questions / hypotheses__

## Which model is appropriate?

<img src="img/decision_tree.png" width=500>

__Key questions:__
1. Are the main outcome measures you're interested in continuous or categorical?
2. If continuous, is your research question about associations or differences?
3. If continuous + associations...
    - Independent and dependent variable?
4. If continuous + differences...
    - Differences between what?

# Exercise #1:
## Gebotys and Roberts (1989) were interested in examining the effects of several variables on the “seriousness rating of the crime”. The variables to be examined within this example are “age” (in years), the “amount of television news watched in hours per week” (i.e., ‘tvnews’), and whether or not the respondents had experience being a victim of crime in the past (i.e., 'experience').

In [2]:
library(tidyverse)

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.2     [32m✔[39m [34mreadr    [39m 2.1.4
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.0
[32m✔[39m [34mggplot2  [39m 3.4.2     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.2     [32m✔[39m [34mtidyr    [39m 1.3.0
[32m✔[39m [34mpurrr    [39m 1.0.1     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


In [3]:
df <- tibble(pid  = c(1,2,3,4,5,6,7,8,9,10,11),
             age = c(10,25,26,25,30,34,40,40,40,25,80),
             tv_news = c(4.0,5.0,5.0,4.5,6.0,7.0,5.5,6.0,7.0,8.5,9.0),
             experience = as_factor(c(0,0,0,0,0,1,0,1,1,1,1)),
             crime_seriousness = c(21,28,27,26,33,36,31,35,41,80,95))
df

pid,age,tv_news,experience,crime_seriousness
<dbl>,<dbl>,<dbl>,<fct>,<dbl>
1,10,4.0,0,21
2,25,5.0,0,28
3,26,5.0,0,27
4,25,4.5,0,26
5,30,6.0,0,33
6,34,7.0,1,36
7,40,5.5,0,31
8,40,6.0,1,35
9,40,7.0,1,41
10,25,8.5,1,80


### Exercise 1a: Is there an association between age and crime seriousness ratings?

__Linear relationship: the variables change together at a constant rate__

__Monotonic relationship: the variables change together, but not _necessarily_ at a constant rate.__

### Exercise 1b: Does tv news viewing predict crime seriousness ratings?

__Linear regression: Does $y$ change at a constant rate as a function of $x$.__

__What does the "estimate" mean?__

### Exercise 1c: Does the effect of TV watching on crime seriousness ratings vary as a function of age?