-
Notifications
You must be signed in to change notification settings - Fork 1
/
index.Rmd
160 lines (103 loc) · 5.36 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
---
title: "Project Home"
site: workflowr::wflow_site
# output:
# workflowr::wflow_html:
# toc: false
output:
bookdown::html_document2:
toc: true
editor_options:
chunk_output_type: console
markdown:
wrap: 72
---
This is the website for the research project "Frequency-Aware Similarity
Calibration".
If you have cloned the project to a local computer this website is
rendered in the `docs` subdirectory of the project directory.
If you are using `workflowr` to publish the research website it will
also be rendered online to GitHub Pages.
This page acts as a table of contents for the website. There are links
to the web pages generated from the analysis notebooks and to the
rendered versions of manuscripts/documents/presentations.
------------------------------------------------------------------------
## [Project Workflow Status](m_00_status.html) {.unnumbered}
------------------------------------------------------------------------
# Overview documents {.unnumbered}
## [Proposal](proposal.html) {.unnumbered}
This notebook explains the central ideas behind the project.
## [Notes](notes.html) {.unnumbered}
This notebook is for keeping notes of any points that may be useful for
later project or manuscript development and which are not covered in the
analysis notebooks or at risk of getting lost in the notebooks.
## [Workflow management](workflow.html) {.unnumbered}
This project uses the [`targets`](https://wlandau.github.io/targets/)
and [`workflowr`](https://github.com/jdblischak/workflowr) packages for
managing the workflow of the project (making sure that the dependencies
between computational steps are satisfied). When this work was started
there were no easily found examples of using `targets` and `workflowr`
together. This notebook contains notes on the proposed workflow for
using `targets` and `workflowr`.
------------------------------------------------------------------------
# Punlications {.unnumbered}
Links to rendered manuscripts and presentations will go here.
------------------------------------------------------------------------
# META Notebooks {.unnumbered}
These notebooks capture the analyses that were carried out to develop
the code of the core processing pipeline. They are organised as
side-chains to the core processing pipeline.
Typically, a meta notebook will analyse the data available at one stage
of the core pipeline, to guide the writing of the functions required to
get to the next stage of the core pipeline. These meta notebooks
generally conclude with the definition of a function that will be used
in the core pipeline.
Sometimes the analyses are more diffuse - characterising the data in a
way that may be helpful for guiding the development of future core
stages, but not immediately resulting in the development of functions
for the core pipeline.
## Read, check, and standardise the entity data {.unnumbered}
Determine the initial data preparation of the imported entity records.
### [m_01_get_raw_entity_data](m_01_get_raw_entity_data.html) {.unnumbered}
Import the raw data, cut it back to the subset of rows and columns that
are possibly useful, sanity check the data, and save the data in an
R-friendly format.
### [m_02_check_entity_data](m_02_check_entity_data.html) {.unnumbered}
Import the raw data, cut it back to the subset of rows and columns that
are possibly useful, sanity check the data, and save the data in an
R-friendly format.
### [01-2 Check administrative variables](01-2_check_admin.html) {.unnumbered}
Check the "administrative" variables. This is data relating to the
administration of voter registration.
### [01-3 Check residence variables](01-3_check_resid.html) {.unnumbered}
Check the residence variables - residential address and phone number.
### [01-4 Check demographic variables](01-4_check_demog.html) {.unnumbered}
Check the demographic variables - sex, age, and birth place.
### [01-5 Check name variables](01-5_check_name.html) {.unnumbered}
Check the name variables.
### [01-6 Clean variables](01-6_clean_vars.html) {.unnumbered}
Clean all the variables.
------------------------------------------------------------------------
## 02 Blocking variables {.unnumbered}
Examine the distributions of potential blocking variables.
### [02-1 Characterise blocking variables](02-1_char_block_vars.html) {.unnumbered}
Characterise the potential blocking variables and combinations of
variables.
### [02-2 Make blocking variables](02-2_mk_block_vars.html) {.unnumbered}
Construct the most promising potential combination blocking variables.
------------------------------------------------------------------------
## 03 Name frequency (equality) {.unnumbered}
Detailed examination of the distributions of name frequencies induced by
the string equality relation.
------------------------------------------------------------------------
## 04 Name frequency (similarity) {.unnumbered}
Detailed examination of the distributions of name frequencies induced by
a string similarity relation.
------------------------------------------------------------------------
## 05 Similarity calibration {.unnumbered}
Detailed examination of the calibration from similarity to probability
of identity match, both unconditionally and as a function of name
frequency.
------------------------------------------------------------------------
## 06 Compatibility models {.unnumbered}
Estimate multivariate compatibility models.