-
Notifications
You must be signed in to change notification settings - Fork 1
/
index.Rmd
133 lines (84 loc) · 4.24 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
title: "Project Home"
site: workflowr::wflow_site
# output:
# workflowr::wflow_html:
# toc: false
output:
bookdown::html_document2:
toc: true
editor_options:
chunk_output_type: console
markdown:
wrap: 72
---
This is the website for the research project "Frequency-Aware Similarity
Calibration".
If you have cloned the project to a local computer this website is
rendered in the `docs` subdirectory of the project directory.
If you are using `workflowr` to publish the research website it will
also be rendered online to GitHub Pages.
This page acts as a table of contents for the website. There are links
to the web pages generated from the analysis notebooks and to the
rendered versions of manuscripts/documents/presentations.
------------------------------------------------------------------------
## [Proposal](proposal.html) {.unnumbered}
This notebook explains the central ideas behind the project.
## [Notes](notes.html) {.unnumbered}
This notebook is for keeping notes of any points that may be useful for
later project or manuscript development and which are not covered in the
analysis notebooks or at risk of getting lost in the notebooks.
## [Workflow](workflow.html) {.unnumbered}
This project uses the [`targets`](https://wlandau.github.io/targets/)
and [`workflowr`](https://github.com/jdblischak/workflowr) packages for
managing the workflow of the project (making sure that the dependencies
between computational steps are satisfied). When this work was started
there were no easily found examples of using `targets` and `workflowr`
together. This notebook contains notes on the proposed workflow for
using `targets` and `workflowr`.
------------------------------------------------------------------------
# Manuscripts {.unnumbered}
Links to rendered manuscripts and presentations will go here.
------------------------------------------------------------------------
# Analysis Notebooks {.unnumbered}
## 01 Read, check, and standardise the entity data {.unnumbered}
Initial data preparation of imported entity records.
### [01-1 Get, subset, check, and save data](01-1_get_data.html) {.unnumbered}
Import the raw data, cut it back to the subset of rows and columns that
are possibly useful, sanity check the data, and save the data in an
R-friendly format.
### [01-2 Check administrative variables](01-2_check_admin.html) {.unnumbered}
Check the "administrative" variables. This is data relating to the
administration of voter registration.
### [01-3 Check residence variables](01-3_check_resid.html) {.unnumbered}
Check the residence variables - residential address and phone number.
### [01-4 Check demographic variables](01-4_check_demog.html) {.unnumbered}
Check the demographic variables - sex, age, and birth place.
### [01-5 Check name variables](01-5_check_name.html) {.unnumbered}
Check the name variables.
### [01-6 Clean variables](01-6_clean_vars.html) {.unnumbered}
Clean all the variables.
------------------------------------------------------------------------
## 02 Blocking variables {.unnumbered}
Examine the distributions of potential blocking variables.
### [02-1 Characterise blocking variables](02-1_char_block_vars.html) {.unnumbered}
Characterise the potential blocking variables and combinations of
variables.
### [02-2 Make blocking variables](02-2_mk_block_vars.html) {.unnumbered}
Construct the most promising potential combination blocking variables.
------------------------------------------------------------------------
## 03 Name frequency (equality) {.unnumbered}
Detailed examination of the distributions of name frequencies induced by
the string equality relation.
------------------------------------------------------------------------
## 04 Name frequency (similarity) {.unnumbered}
Detailed examination of the distributions of name frequencies induced by
a string similarity relation.
------------------------------------------------------------------------
## 05 Similarity calibration {.unnumbered}
Detailed examination of the calibration from similarity to probability
of identity match, both unconditionally and as a function of name
frequency.
------------------------------------------------------------------------
## 06 Compatibility models {.unnumbered}
Estimate multivariate compatibility models.