/
test_one_voxel.Rmd
195 lines (156 loc) · 5.08 KB
/
test_one_voxel.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
---
jupyter:
jupytext:
notebook_metadata_filter: all,-language_info
split_at_heading: true
text_representation:
extension: .Rmd
format_name: rmarkdown
format_version: '1.2'
jupytext_version: 1.11.5
kernelspec:
display_name: Python 3 (ipykernel)
language: python
name: python3
---
# Testing a single voxel
A short ago — [Modeling a single voxel](model_one_voxel.Rmd), we were modeling
a single voxel time course.
Let’s get that same voxel time course back again:
```{python}
import numpy as np
import matplotlib.pyplot as plt
# Print array values to 4 decimal places
np.set_printoptions(precision=4)
```
```{python}
import nibabel as nib
import nipraxis
# Fetch the data file
bold_fname = nipraxis.fetch_file('ds114_sub009_t2r1.nii')
img = nib.load(bold_fname)
data = img.get_fdata()
data = data[..., 4:]
```
The voxel coordinate (3D coordinate) that we were looking at in Voxel time
courses was at (42, 32, 19):
```{python}
voxel_time_course = data[42, 32, 19]
plt.plot(voxel_time_course)
```
We then compiled a design for this time-course and estimated it.
We used the `convolved regressor` from [Convolving with the hemodyamic response
function](convolution_background.Rmd) in a simple regression.
```{python}
conv_reg_fname = nipraxis.fetch_file('ds114_sub009_t2r1_conv.txt')
conv_reg_fname
```
```{python}
convolved = np.loadtxt(conv_reg_fname)
# Knock off first 4 elements to match data
convolved = convolved[4:]
N = len(convolved)
X = np.ones((N, 2))
X[:, 0] = convolved
plt.imshow(X, interpolation='nearest', cmap='gray', aspect=0.1)
```
$\newcommand{\yvec}{\vec{y}}$
$\newcommand{\xvec}{\vec{x}}$
$\newcommand{\evec}{\vec{\varepsilon}}$
$\newcommand{Xmat}{\boldsymbol X} \newcommand{\bvec}{\vec{\beta}}$
$\newcommand{\bhat}{\hat{\bvec}} \newcommand{\yhat}{\hat{\yvec}}$
As you will remember from [introduction to the general linear
model](https://matthew-brett.github.io/teaching/glm_intro.html), our model is:
$$
\yvec = \Xmat \bvec + \evec
$$
We can get our least squares parameter *estimates* for $\bvec$ with:
$$
\bhat = \Xmat^+y
$$
where $\Xmat^+$ is the *pseudoinverse* of $\Xmat$. When $(\Xmat^T \Xmat)$ is
invertible, the pseudoinverse is given by:
$$
\Xmat^+ = (\Xmat^T \Xmat)^{-1} \Xmat^T
$$
We find the $\bhat$ for our data and design:
```{python}
import numpy.linalg as npl
Xp = npl.pinv(X)
beta_hat = Xp.dot(voxel_time_course)
beta_hat
```
Our plan now is to do an hypothesis test on our $\bhat$ values.
The $\bhat$ values are sample estimates of the unobservable true $\bvec$
parameters.
Because the $\bhat$ values are sample estimates, the values we have depend on
the particular sample we have, and the particular instantiation of the random
noise (residuals). If we were to take another set of data from the same
voxel during the same task, we would get another estimate, because there would
be different instantiation of the random noise. It’s possible to show that
the variance / covariance of the $\hat\beta$ estimates is:
$$
\text{Cov}(\hat\beta) = \sigma^2 \left(X^T X\right)^{-1}.
$$
where $\sigma^2$ is the true unknown variance of the errors. See [wikipedia
proof](https://en.wikipedia.org/wiki/Proofs_involving_ordinary_least_squares#Unbiasedness_of_.CE.B2.CC.82),
and [stackoverflow
proof](http://stats.stackexchange.com/questions/72940/covariance-matrix-of-least-squares-estimator-hat-beta).
We can use an estimate $\hat\sigma^2$ of $\sigma^2$ to give us estimated
standard errors of the variance covariance (see: Unbiased estimate of
population variance):
```{python}
y = voxel_time_course
y_hat = X.dot(beta_hat)
residuals = y - y_hat
# Residual sum of squares
RSS = np.sum(residuals ** 2)
# Degrees of freedom: n - no independent columns in X
df = X.shape[0] - npl.matrix_rank(X)
# Mean residual sum of squares
MRSS = RSS / df
# This is our s^2
s2_hat = MRSS
print(s2_hat)
print(np.sqrt(s2_hat))
```
We now have an standard estimate of the variance / covariance of the $\bhat$:
```{python}
v_cov = s2_hat * npl.inv(X.T.dot(X))
```
In particular, I can now divide my estimate for the first parameter, by the
standard error of that estimate:
```{python}
numerator = beta_hat[0]
denominator = np.sqrt(v_cov[0, 0])
t_stat = numerator / denominator
print(t_stat)
```
I can look up the probability of this t statistic using `scipy.stats`:
```{python}
from scipy.stats import t as t_dist
# Get p value for t value using cumulative density dunction
# (CDF) of t distribution
ltp = t_dist.cdf(t_stat, df) # lower tail p
p = 1 - ltp # upper tail p
p
```
# Compare our manual estimation to R
Finally let’s save the voxel time course for us to compare this analysis to
the `lm` estimation in R:
```{python}
np.savetxt('voxel_time_course.txt', voxel_time_course)
```
Here are the commands to run the same analysis in R:
```
# Simple regression model in R
# Load the voxel time course
voxels = read.table('voxel_time_course.txt')$V1
# Load the convolved regressor
convolved = read.table('ds114_sub009_t2r1_conv.txt')$V1
# Drop the first four values to correspond to the data
convolved = convolved[-(1:4)]
# Fit linear model
res = lm(voxels ~ convolved)
print(summary(res))
```