/
validate_against_scipy.Rmd
164 lines (141 loc) · 3.5 KB
/
validate_against_scipy.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
---
jupyter:
jupytext:
text_representation:
extension: .Rmd
format_name: rmarkdown
format_version: '1.2'
jupytext_version: 1.11.5
---
$\newcommand{L}[1]{\| #1 \|}\newcommand{VL}[1]{\L{ \vec{#1} }}\newcommand{R}[1]{\operatorname{Re}\,(#1)}\newcommand{I}[1]{\operatorname{Im}\, (#1)}$
## Validating the GLM against scipy
```{python}
import numpy as np
import numpy.linalg as npl
import matplotlib.pyplot as plt
# Print array values to 4 decimal places
np.set_printoptions(precision=4)
```
Make some random but predictable data with numpy.random:
```{python}
# Make random number generation predictable
np.random.seed(1966)
# Make a fake regressor and data.
n = 20
x = np.random.normal(10, 2, size=n)
y = np.random.normal(20, 1, size=n)
plt.plot(x, y, '+')
```
Do a simple linear regression with the GLM:
$$
\newcommand{\yvec}{\vec{y}}
\newcommand{\xvec}{\vec{x}}
\newcommand{\evec}{\vec{\varepsilon}}
\newcommand{Xmat}{\boldsymbol X}
\newcommand{\bvec}{\vec{\beta}}
\newcommand{\bhat}{\hat{\bvec}}
\newcommand{\yhat}{\hat{\yvec}}
\newcommand{\ehat}{\hat{\evec}}
\newcommand{\cvec}{\vec{c}}
\newcommand{\rank}{\textrm{rank}}
y_i = c + b x_i + e_i \implies \\
\yvec = \Xmat \bvec + \evec
$$
```{python}
X = np.ones((n, 2))
X[:, 1] = x
B = npl.pinv(X).dot(y)
B
E = y - X.dot(B)
```
Build the t statistic:
$$
\newcommand{\cvec}{\vec{c}}
\hat\sigma^2 = \frac{1}{n - \rank(\Xmat)} \sum e_i^2 \\
t = \frac{\cvec^T \bhat}
{\sqrt{\hat{\sigma}^2 \cvec^T (\Xmat^T \Xmat)^+ \cvec}}
$$
```{python}
# Contrast vector selects slope parameter
c = np.array([0, 1])
df = n - npl.matrix_rank(X)
sigma_2 = np.sum(E ** 2) / df
c_b_cov = c.dot(npl.pinv(X.T.dot(X))).dot(c)
t = c.dot(B) / np.sqrt(sigma_2 * c_b_cov)
t
```
Test the t statistic against a t distribution with `df` degrees of freedom:
```{python}
import scipy.stats
t_dist = scipy.stats.t(df=df)
p_value = 1 - t_dist.cdf(t)
# One-tailed t-test (t is positive)
p_value
# Two-tailed p value is just 2 * one tailed value, because
# distribution is symmetric
2 * p_value
```
Now do the same test with `scipy.stats.linregress`:
```{python}
res = scipy.stats.linregress(x, y)
res.slope
res.intercept
# This is the same as the manual GLM fit
np.allclose(B, [res.intercept, res.slope])
# p value is always two-tailed
res.pvalue
np.allclose(p_value * 2, res.pvalue)
```
Now do the same thing with the two-sample t-test.
```{python}
X2 = np.zeros((n, 2))
X2[:10, 0] = 1
X2[10:, 1] = 1
X2
B2 = npl.pinv(X2).dot(y)
E2 = y - X2.dot(B2)
c2 = np.array([-1, 1])
df = n - npl.matrix_rank(X2)
sigma_2 = np.sum(E2 ** 2) / df
c_b_cov = c2.dot(npl.pinv(X2.T.dot(X2))).dot(c2)
t = c2.dot(B2) / np.sqrt(sigma_2 * c_b_cov)
t
t_dist = scipy.stats.t(df=df)
# One-tailed p value, for negative value
p_value_2 = t_dist.cdf(t)
p_value_2
# Two-tailed p value
p_value_2 * 2
```
The same thing using `scipy.stats.ttest_ind` for t test between two
independent samples:
```{python}
scipy.stats.ttest_ind(y[:10], y[10:])
```
<!-- vim:ft=rst -->
<!-- Course -->
<!-- BIC -->
<!-- Python distributions -->
<!-- Version control -->
<!-- Editors -->
<!-- Python and common libraries -->
<!-- IPython -->
<!-- Virtualenv and helpers -->
<!-- Pypi and packaging -->
<!-- Mac development -->
<!-- Windows development -->
<!-- Nipy and friends -->
<!-- FMRI datasets -->
<!-- Languages -->
<!-- Imaging software -->
<!-- Installation -->
<!-- Tutorials -->
<!-- MB tutorials -->
<!-- Ideas -->
<!-- Psych-214 -->
<!-- People -->
<!-- Licenses -->
<!-- Neuroimaging stuff -->
<!-- OpenFMRI projects -->
<!-- Unix -->
<!-- Substitutions -->